1
|
Marques H, Freitas J, Medeiros R, Longatto-Filho A. Methodology for single nucleotide polymorphism selection in promoter regions for clinical use. An example of its applicability. INTERNATIONAL JOURNAL OF MOLECULAR EPIDEMIOLOGY AND GENETICS 2016; 7:126-136. [PMID: 27766139 PMCID: PMC5069276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 09/01/2016] [Indexed: 06/06/2023]
Abstract
Genetic variability in humans can explain many differences in disease risk factors. Polymorphism-related studies focus mainly on the single nucleotide polymorphisms (SNPs) of coding regions of the genes. SNPs on DNA binding motifs of the promoter region have been less explored. On a recent study of SNPs in patients with non-Hodgkin lymphomas we faced the problem of SNP selection from promoter regions and developed a practical methodology for clinical studies. The process consists in identifying SNPs in the coding and promoter regions of the antigen-processing system using the 'dbSNP' database. With the 'HapMap' program, we select SNPs with frequencies >20% in Caucasian populations. For coding regions, we sought biologically and clinically relevant SNPs described in the literature. For the promoter regions, we determined their chromosomal location on 'QiagenSABioscience' site database. The nucleotide sequence of ancestral and variant alleles is available in the 'dbSNP'. These sequences were used in 'Promoter TESS' to determine binding differences of transcription factors. Each sequence may have affinity to different TFs. Thus, SNP selection on the promoter regions was based in the differences on TF binding pattern between the old and the new allele. The potential clinical relevance of the new TFs was also evaluated before the final selection. With this approach, we found that almost half of the relevant SNP fall within the promoter region. In conclusion, we were able to develop a methodology of oriented selection of promoter regions of human genes, comparing the TF with affinity to the ancestral allele with the TF to a variant allele. We selected those SNPs that change the TF's affinity to a pattern with functional significance.
Collapse
Affiliation(s)
- Herlander Marques
- Life and Health Sciences Research Institute (ICVS), School of Health Sciences, University of Minho, Braga, Portugal; ICVS/3B’s-PT Government Associate LaboratoryBraga/Guimarães, Portugal
- Department of Oncology, Hospital de BragaBraga, Portugal
| | - José Freitas
- Nova Medical School, New University of LisbonLisbon, Portugal
| | - Rui Medeiros
- Molecular Oncology Group & Virology LB-CI, Portuguese Institute of Oncology, Porto, Portugal; ICBAS, Abel Salazar Institute for the Biomedical Sciences, University of Porto, Porto, Portugal; CEBIMED, Faculty of Health Sciences of Fernando Pessoa University, Porto, Portugal; PCC, Research Department-Portuguese League Against Cancer (NRNorte)Porto, Portugal
| | - Adhemar Longatto-Filho
- Life and Health Sciences Research Institute (ICVS), School of Health Sciences, University of Minho, Braga, Portugal; ICVS/3B’s-PT Government Associate LaboratoryBraga/Guimarães, Portugal
- Laboratory of Medical Investigation (LIM) 14, Faculty of Medicine, University of São PauloSão Paulo, Brazil
| |
Collapse
|
2
|
Peterson TA, Mort M, Cooper DN, Radivojac P, Kann MG, Mooney SD. Regulatory Single-Nucleotide Variant Predictor Increases Predictive Performance of Functional Regulatory Variants. Hum Mutat 2016; 37:1137-1143. [PMID: 27406314 DOI: 10.1002/humu.23049] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 06/28/2016] [Indexed: 12/20/2022]
Abstract
In silico methods for detecting functionally relevant genetic variants are important for identifying genetic markers of human inherited disease. Much research has focused on protein-coding variants since coding regions have well-defined physicochemical and functional properties. However, many bioinformatics tools are not applicable to variants outside coding regions. Here, we increase the classification performance of our regulatory single-nucleotide variant predictor (RSVP) for variants that cause regulatory abnormalities from an AUC of 0.90-0.97 by incorporating genomic regions identified by the ENCODE project into RSVP. RSVP is comparable to a recently published tool, Genome-Wide Annotation of Variants (GWAVA); both RSVP and GWAVA perform better on regulatory variants than a traditional variant predictor, combined annotation-dependent depletion (CADD). However, our method outperforms GWAVA on variants located at similar distances to the transcription start site as the positive set (AUC: 0.96) as compared with GWAVA (AUC: 0.71). Much of this disparity is due to RSVP's incorporation of features pertaining to the nearest gene (expression, GO terms, etc.), which are not included in GWAVA. Our findings hold out the promise of a framework for the assessment of all functional regulatory variants, providing a means to predict which rare or de novo variants are of pathogenic significance.
Collapse
Affiliation(s)
- Thomas A Peterson
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, Maryland
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, United Kingdom
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, United Kingdom
| | - Predrag Radivojac
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana
| | - Maricel G Kann
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, Maryland
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington.
| |
Collapse
|
3
|
Molineris I, Schiavone D, Rosa F, Matullo G, Poli V, Provero P. Identification of functional cis-regulatory polymorphisms in the human genome. Hum Mutat 2013; 34:735-42. [PMID: 23420607 DOI: 10.1002/humu.22299] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Accepted: 01/31/2013] [Indexed: 12/29/2022]
Abstract
Polymorphisms in regulatory DNA regions are believed to play an important role in determining phenotype, including disease, and in providing raw material for evolution. We devised a new pipeline for the systematic identification of functional variation in human regulatory sequences. The algorithm is based on the identification of SNPs leading to significant changes in both the affinity of a regulatory region for transcription factors (TFs) and the expression in vivo of the regulated gene. We tested the algorithm by identifying SNPs leading to altered regulation by STAT3 in human promoters and introns, and experimentally validated the top-scoring ones, showing that most of the SNPs identified by the algorithm indeed correspond to differential binding of STAT3 and differential induction of the target gene upon stimulation with IL6. Using the same computational approach, we compiled a database of thousands of predicted functional regulatory SNPs for hundreds of human TFs, which we provide as online Supporting Information. We discuss possible applications to the interpretation of noncoding SNPs associated with human diseases. The method we propose and the database of predicted functional cis-regulatory polymorphisms will be useful in future studies of regulatory variation and in particular to interpret the results of past and future genome-wide association studies.
Collapse
Affiliation(s)
- Ivan Molineris
- Department of Molecular Biotechnology and Life Sciences, University of Turin, Italy
| | | | | | | | | | | |
Collapse
|
4
|
Functional Implications of Local DNA Structures in Regulatory Motifs. ScientificWorldJournal 2013; 2013:965752. [PMID: 23766731 PMCID: PMC3666281 DOI: 10.1155/2013/965752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Accepted: 04/23/2013] [Indexed: 11/19/2022] Open
Abstract
The three-dimensional structure of DNA has been proposed to be a major determinant for functional transcription factors (TFs) and DNA interaction. Here, we use hydroxyl radical cleavage pattern as a measure of local DNA structure. We compared the conservation between DNA sequence and structure in terms of information content and attempted to assess the functional implications of DNA structures in regulatory motifs. We used statistical methods to evaluate the structural divergence of substituting a single position within a binding site and applied them to a collection of putative regulatory motifs. The following are our major observations: (i) we observed more information in structural alignment than in the corresponding sequence alignment for most of the transcriptional factors; (ii) for each TF, majority of positions have more information in the structural alignment as compared to the sequence alignment; (iii) we further defined a DNA structural divergence score (SD score) for each wild-type and mutant pair that is distinguished by single-base mutation. The SD score for benign mutations is significantly lower than that of switch mutations. This indicates structural conservation is also important for TFBS to be functional and DNA structures will provide previously unappreciated information for TF to realize the binding specificity.
Collapse
|
5
|
CDX1 confers intestinal phenotype on gastric epithelial cells via induction of stemness-associated reprogramming factors SALL4 and KLF5. Proc Natl Acad Sci U S A 2012; 109:20584-9. [PMID: 23112162 DOI: 10.1073/pnas.1208651109] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Intestinal metaplasia of the stomach, a mucosal change characterized by the conversion of gastric epithelium into an intestinal phenotype, is a precancerous lesion from which intestinal-type gastric adenocarcinoma arises. Chronic infection with Helicobacter pylori is a major cause of gastric intestinal metaplasia, and aberrant induction by H. pylori of the intestine-specific caudal-related homeobox (CDX) transcription factors, CDX1 and CDX2, plays a key role in this metaplastic change. As such, a critical issue arises as to how these factors govern the cell- and tissue-type switching. In this study, we explored genes directly activated by CDX1 in gastric epithelial cells and identified stemness-associated reprogramming factors SALL4 and KLF5. Indeed, SALL4 and KLF5 were aberrantly expressed in the CDX1(+) intestinal metaplasia of the stomach in both humans and mice. In cultured gastric epithelial cells, sustained expression of CDX1 gave rise to the induction of early intestinal-stemness markers, followed by the expression of intestinal-differentiation markers. Furthermore, the induction of these markers was suppressed by inhibiting either SALL4 or KLF5 expression, indicating that CDX1-induced SALL4 and KLF5 converted gastric epithelial cells into tissue stem-like progenitor cells, which then transdifferentiated into intestinal epithelial cells. Our study places the stemness-related reprogramming factors as critical components of CDX1-directed transcriptional circuitries that promote intestinal metaplasia. Requirement of a transit through dedifferentiated stem/progenitor-like cells, which share properties in common with cancer stem cells, may underlie predisposition of intestinal metaplasia to neoplastic transformation.
Collapse
|
6
|
Leuze MR, Karpinets TV, Syed MH, Beliaev AS, Uberbacher EC. Binding Motifs in Bacterial Gene Promoters Modulate Transcriptional Effects of Global Regulators CRP and ArcA. GENE REGULATION AND SYSTEMS BIOLOGY 2012; 6:93-107. [PMID: 22701314 PMCID: PMC3370831 DOI: 10.4137/grsb.s9357] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Bacterial gene regulation involves transcription factors (TF) that bind to DNA recognition sequences in operon promoters. These recognition sequences, many of which are palindromic, are known as regulatory elements or transcription factor binding sites (TFBS). Some TFs are global regulators that can modulate the expression of hundreds of genes. In this study we examine global regulator half-sites, where a half-site, which we shall call a binding motif (BM), is one half of a palindromic TFBS. We explore the hypothesis that the number of BMs plays an important role in transcriptional regulation, examining empirical data from transcriptional profiling of the CRP and ArcA regulons. We compare the power of BM counts and of full TFBS characteristics to predict induced transcriptional activity. We find that CRP BM counts have a nonlinear effect on CRP-dependent transcriptional activity and predict this activity better than full TFBS quality or location.
Collapse
Affiliation(s)
- Michael R. Leuze
- Computer Science and Mathematics Division, Oak Ridge National
Laboratory, Oak Ridge, TN, USA
| | - Tatiana V. Karpinets
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN,
USA
- Department of Plant Sciences, University of Tennessee, Knoxville,
TN, USA
| | - Mustafa H. Syed
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN,
USA
| | - Alexander S. Beliaev
- Biological Sciences Division, Pacific Northwest National Laboratory,
Richland, WA, USA
| | | |
Collapse
|
7
|
Contribution of transcription factor binding site motif variants to condition-specific gene expression patterns in budding yeast. PLoS One 2012; 7:e32274. [PMID: 22384202 PMCID: PMC3285675 DOI: 10.1371/journal.pone.0032274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2011] [Accepted: 01/24/2012] [Indexed: 11/19/2022] Open
Abstract
It is now experimentally well known that variant sequences of a cis transcription factor binding site motif can contribute to differential regulation of genes. We characterize the relationship between motif variants and gene expression by analyzing expression microarray data and binding site predictions. To accomplish this, we statistically detect motif variants with effects that differ among environments. Such environmental specificity may be due to either affinity differences between variants or, more likely, differential interactions of TFs bound to these variants with cofactors, and with differential presence of cofactors across environments. We examine conservation of functional variants across four Saccharomyces species, and find that about a third of transcription factors have target genes that are differentially expressed in a condition-specific manner that is correlated with the nucleotide at variant motif positions. We find good correspondence between our results and some cases in the experimental literature (Reb1, Sum1, Mcm1, and Rap1). These results and growing consensus in the literature indicates that motif variants may often be functionally distinct, that this may be observed in genomic data, and that variants play an important role in condition-specific gene regulation.
Collapse
|
8
|
Chiang S, Swamy KB, Hsu TW, Tsai ZTY, Lu HHS, Wang D, Tsai HK. Analysis of the association between transcription factor binding site variants and distinct accompanying regulatory motifs in yeast. Gene X 2012; 491:237-45. [DOI: 10.1016/j.gene.2011.08.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Accepted: 08/25/2011] [Indexed: 11/25/2022] Open
|
9
|
Zhao Y, Clark WT, Mort M, Cooper DN, Radivojac P, Mooney SD. Prediction of functional regulatory SNPs in monogenic and complex disease. Hum Mutat 2011; 32:1183-90. [PMID: 21796725 DOI: 10.1002/humu.21559] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2011] [Accepted: 06/15/2011] [Indexed: 11/12/2022]
Abstract
Next-generation sequencing (NGS) technologies are yielding ever higher volumes of human genome sequence data. Given this large amount of data, it has become both a possibility and a priority to determine how disease-causing single nucleotide polymorphisms (SNPs) detected within gene regulatory regions (rSNPs) exert their effects on gene expression. Recently, several studies have explored whether disease-causing polymorphisms have attributes that can distinguish them from those that are neutral, attaining moderate success at discriminating between functional and putatively neutral regulatory SNPs. Here, we have extended this work by assessing the utility of both SNP-based features (those associated only with the polymorphism site and the surrounding DNA) and gene-based features (those derived from the associated gene in whose regulatory region the SNP lies) in the identification of functional regulatory polymorphisms involved in either monogenic or complex disease. Gene-based features were found to be capable of both augmenting and enhancing the utility of SNP-based features in the prediction of known regulatory mutations. Adopting this approach, we achieved an AUC of 0.903 for predicting regulatory SNPs. Finally, our tool predicted 225 new regulatory SNPs with a high degree of confidence, with 105 of the 225 falling into linkage disequilibrium blocks of reported disease-associated genome-wide association studies SNPs.
Collapse
Affiliation(s)
- Yiqiang Zhao
- Buck Institute for Research on Aging, Novato, California 94945, USA
| | | | | | | | | | | |
Collapse
|
10
|
Jones BL, Swallow DM. The impact of cis-acting polymorphisms on the human phenotype. THE HUGO JOURNAL 2011. [PMID: 23205161 DOI: 10.1007/s11568-011-9155-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Cis-acting polymorphisms that affect gene expression are now known to be frequent, although the extent and mechanisms by which such variation affects the human phenotype are, as yet, only poorly understood. Key signatures of cis-acting variation are differences in gene expression that are tightly associated with regulatory SNPs or expression Quantitative Trait Loci (eQTL) and an imbalance of allelic expression (AEI) in heterozygous samples. Such cis-acting sequence differences appear often to have been under selection within and between populations and are also thought to be important in speciation. Here we describe the example of lactase persistence. In medical research, variants that affect regulation in cis have been implicated in both monogenic and polygenic disorders, and in the metabolism of drugs. In this review we suggest that by further understanding common regulatory variations and how they interact with other genetic and environmental variables it will be possible to gain insight into important mechanisms behind complex disease, with the potential to lead to new methods of diagnosis and treatments.
Collapse
Affiliation(s)
- Bryony L Jones
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT UK
| | | |
Collapse
|
11
|
Ferreira Z, Hurle B, Rocha J, Seixas S. Differing evolutionary histories of WFDC8 (short-term balancing) in Europeans and SPINT4 (incomplete selective sweep) in Africans. Mol Biol Evol 2011; 28:2811-22. [PMID: 21536719 DOI: 10.1093/molbev/msr106] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The whey acidic protein four-disulfide core (WFDC) gene cluster on human chromosome 20q13, harbors 15 small serine protease inhibitor genes with roles in innate immunity, reproduction, and regulation of endogenous proteases kallikreins. The WFDC cluster has emerged as a prime example of rapid diversification and adaptive evolution in primates. This study sought a better understanding of the evolutionary history of WFDC genes in humans and focused on exploring the adaptive selection signatures found in populations of European (Utah residents with ancestry from northern and western Europe [CEU]) and African (Yoruba from Ibadan, in Nigeria [YRI]) ancestry in a genome-wide scan for putative targets of recent adaptive selection. Our approach included resequencing coding and noncoding regions of WFDC6, EPPIN, and WFDC8 in 20 CEU and of SPINT4 in 20 YRI individuals. We generated 302 kb and 60 kb of high-quality sequence data from CEU and of YRI populations, respectively, enabling the identification of 72 single nucleotide polymorphisms. Using classic neutrality tests, empirical and haplotype-based analysis, we pinpointed WFDC8 and SPINT4 as the likely targets of short-term balancing selection in the CEU population, and recent positive selection (incomplete selective sweep) in the YRI population. Putative candidate variants targeted by selection include 44A (rs7273669A) for WFDC8, which may downregulate gene expression by abolishing the binding site of two transcription factors; and a haplotype configuration [Ser73+98A] (rs6017667A-rs6032474A) for SPINT4, which may simultaneously affect protein function and gene regulation. We propose that the evolution of WFDC8 and SPINT4 has been shaped by complex selective scenarios due to the interdependence of variant fitness and ecological variables.
Collapse
Affiliation(s)
- Zélia Ferreira
- Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
| | | | | | | |
Collapse
|
12
|
Davies JS, Klein DC, Carter DA. Selective genomic targeting by FRA-2/FOSL2 transcription factor: regulation of the Rgs4 gene is mediated by a variant activator protein 1 (AP-1) promoter sequence/CREB-binding protein (CBP) mechanism. J Biol Chem 2011; 286:15227-39. [PMID: 21367864 PMCID: PMC3083148 DOI: 10.1074/jbc.m110.201996] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Revised: 01/12/2011] [Indexed: 01/21/2023] Open
Abstract
FRA-2/FOSL2 is a basic region-leucine zipper motif transcription factor that is widely expressed in mammalian tissues. The functional repertoire of this factor is unclear, partly due to a lack of knowledge of genomic sequences that are targeted. Here, we identified novel, functional FRA-2 targets across the genome through expression profile analysis in a knockdown transgenic rat. In this model, a nocturnal rhythm of pineal gland FRA-2 is suppressed by a genetically encoded, dominant negative mutant protein. Bioinformatic analysis of validated sets of FRA-2-regulated and -nonregulated genes revealed that the FRA-2 regulon is limited by genomic target selection rules that, in general, transcend core cis-sequence identity. However, one variant AP-1-related (AP-1R) sequence was common to a subset of regulated genes. The functional activity and protein binding partners of a candidate AP-1R sequence were determined for a novel FRA-2-repressed gene, Rgs4. FRA-2 protein preferentially associated with a proximal Rgs4 AP-1R sequence as demonstrated by ex vivo ChIP and in vitro EMSA analysis; moreover, transcriptional repression was blocked by mutation of the AP-1R sequence, whereas mutation of an upstream consensus AP-1 family sequence did not affect Rgs4 expression. Nocturnal changes in protein complexes at the Rgs4 AP-1R sequence are associated with FRA-2-dependent dismissal of the co-activator, CBP; this provides a mechanistic basis for Rgs4 gene repression. These studies have also provided functional insight into selective genomic targeting by FRA-2, highlighting discordance between predicted and actual targets. Future studies should address FRA-2-Rgs4 interactions in other systems, including the brain, where FRA-2 function is poorly understood.
Collapse
Affiliation(s)
- Jeff S. Davies
- From the School of Biosciences, Cardiff University, Cardiff CF10 3AX, Wales, United Kingdom and
| | - David C. Klein
- the Section on Neuroendocrinology, Program on Developmental Endocrinology and Genetics, NICHD, National Institutes of Health, Bethesda, Maryland 20892
| | - David A. Carter
- From the School of Biosciences, Cardiff University, Cardiff CF10 3AX, Wales, United Kingdom and
| |
Collapse
|
13
|
Chen K, van Nimwegen E, Rajewsky N, Siegal ML. Correlating gene expression variation with cis-regulatory polymorphism in Saccharomyces cerevisiae. Genome Biol Evol 2010; 2:697-707. [PMID: 20829281 PMCID: PMC2953268 DOI: 10.1093/gbe/evq054] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Identifying the nucleotides that cause gene expression variation is a critical step in dissecting the genetic basis of complex traits. Here, we focus on polymorphisms that are predicted to alter transcription factor binding sites (TFBSs) in the yeast, Saccharomyces cerevisiae. We assembled a confident set of transcription factor motifs using recent protein binding microarray and ChIP-chip data and used our collection of motifs to predict a comprehensive set of TFBSs across the S. cerevisiae genome. We used a population genomics analysis to show that our predictions are accurate and significantly improve on our previous annotation. Although predicting gene expression from sequence is thought to be difficult in general, we identified a subset of genes for which changes in predicted TFBSs correlate well with expression divergence between yeast strains. Our analysis thus demonstrates both the accuracy of our new TFBS predictions and the feasibility of using simple models of gene regulation to causally link differences in gene expression to variation at individual nucleotides.
Collapse
Affiliation(s)
- Kevin Chen
- Center for Genomics and Systems Biology, Department of Biology, New York University
- Max-Delbrück-Centrum für Molekulare Medizin, Berlin-Buch, Germany
- Department of Genetics and BioMaPS Institute, Rutgers University
- Corresponding author: E-mail: ;
| | - Erik van Nimwegen
- Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Basel, Switzerland
| | | | - Mark L. Siegal
- Center for Genomics and Systems Biology, Department of Biology, New York University
- Corresponding author: E-mail: ;
| |
Collapse
|
14
|
Functional dissection of IME1 transcription using quantitative promoter-reporter screening. Genetics 2010; 186:829-41. [PMID: 20739709 DOI: 10.1534/genetics.110.122200] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Transcriptional regulation is a key mechanism that controls the fate and response of cells to diverse signals. Therefore, the identification of the DNA-binding proteins, which mediate these signals, is a crucial step in elucidating how cell fate is regulated. In this report, we applied both bioinformatics and functional genomic approaches to scrutinize the unusually large promoter of the IME1 gene in budding yeast. Using a recently described fluorescent protein-based reporter screen, reporter-synthetic genetic array (R-SGA), we assessed the effect of viable deletion mutants on transcription of various IME1 promoter-reporter genes. We discovered potential transcription factors, many of which have no perfect consensus site within the IME1 promoter. Moreover, most of the cis-regulatory sequences with perfect homology to known transcription factor (TF) consensus were found to be nonfunctional in the R-SGA analysis. In addition, our results suggest that lack of conservation may not discriminate against a TF regulatory role at a specific promoter. We demonstrate that Sum1 and Sok2, which regulate IME1, bind to nonperfect consensuses within nonconserved regions in the sensu stricto Saccharomyces strains. Our analysis supports the view that although comparative analysis can provide a useful guide, functional assays are required for accurate identification of TF-binding site interactions in complex promoters.
Collapse
|
15
|
Swamy KBS, Cho CY, Chiang S, Tsai ZTY, Tsai HK. Impact of DNA-binding position variants on yeast gene expression. Nucleic Acids Res 2010; 37:6991-7001. [PMID: 19767613 PMCID: PMC2790881 DOI: 10.1093/nar/gkp743] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Transcription factors (TFs) regulate gene expression by binding to specific binding sites (TFBSs) in gene promoters. TFBS motifs may contain one or more variable positions. Although the prevailing assumption is that nucleotide variants at such positions are functionally equivalent, there is increasing evidence that such variants play a role in regulation of gene expression. In this article, we propose a method for studying the relationship between the expression of target genes and nucleotide variants in TFBS motifs at a genome-wide scale in Saccharomyces cerevisiae, especially the combinatorial effects of variants at two positions. Our analysis shows that nucleotide variations in more than one-third of variable positions and in 20% of dependent position pairs are highly correlated to gene expression. We define such positions as 'functional'. However, some positions are only functional as dependent pairs, but not individually. In addition, a significant proportion of the functional positions have been well conserved across all yeast-related species studied. We also find that some positions require the presence of co-occurring TFs, while others maintain their functionality in the absence of a co-occurring TF. Our analysis supports the importance of nucleotide variants at variable positions of TFBSs in gene regulation.
Collapse
Affiliation(s)
- Krishna B S Swamy
- Institute of Information Science, National Yang-Ming University, Taiwan
| | | | | | | | | |
Collapse
|
16
|
Loots GG, Ovcharenko I. Human variation in short regions predisposed to deep evolutionary conservation. Mol Biol Evol 2010; 27:1279-88. [PMID: 20093432 PMCID: PMC2872621 DOI: 10.1093/molbev/msq011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
The landscape of the human genome consists of millions of short islands of conservation that are 100% conserved across multiple vertebrate genomes (termed “bricks”), the majority of which are located in noncoding regions. Several hundred thousand bricks are deeply conserved reaching the genomes of amphibians and fish. Deep phylogenetic conservation of noncoding DNA has been reported to be strongly associated with the presence of gene regulatory elements, introducing bricks as a proxy to the functional noncoding landscape of the human genome. Here, we report a significant overrepresentation of bricks in the promoters of transcription factors and developmental genes, where the high level of phylogenetic conservation correlates with an increase in brick overrepresentation. We also found that the presence of a brick dictates a predisposition to evolutionary constraint, with only 0.7% of the amniota brick central nucleotides being diverged within the primate lineage—an 11-fold reduction in the divergence rate compared with random expectation. Human single-nucleotide polymorphism (SNP) data explains only 3% of primate-specific variation in amniota bricks, thus arguing for a widespread fixation of brick mutations within the primate lineage and prior to human radiation. This variation, in turn, might have been utilized as a driving force for primate- and hominoid-specific adaptation. We also discovered a pronounced deviation from the evolutionary predisposition in the human lineage, with over 20-fold increase in the substitution rate at brick SNP sites over expected values. In addition, contrary to typical brick mutations, brick variation commonly encountered in the human population displays limited, if any, signatures of negative selection as measured by the minor allele frequency and population differentiation (F-statistical measure) measures. These observations argue for the plasticity of gene regulatory mechanisms in vertebrates—with evidence of strong purifying selection acting on the gene regulatory landscape of the human genome, where widespread advantageous mutations in putative regulatory elements are likely utilized in functional diversification and adaptation of species.
Collapse
Affiliation(s)
- Gabriela G Loots
- Biology and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | | |
Collapse
|
17
|
Ryu T, Lee S, Hur CG, Lee D. CONVIRT: a web-based tool for transcriptional regulatory site identification using a conserved virtual chromosome. BMB Rep 2009; 42:823-8. [PMID: 20044955 DOI: 10.5483/bmbrep.2009.42.12.823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Techniques for analyzing protein-DNA interactions on a genome-wide scale have recently established regulatory roles for distal enhancers. However, the large sizes of higher eukaryotic genomes have made identification of these elements difficult. Information regarding sequence conservation, exon annotation and repetitive regions can be used to reduce the size of the search region. However, previously developed resources are inadequate for consolidating such information. CONVIRT is a web resource for the identification of transcription factor binding sites and also features comparative genomics. Genomic information on ortholog-independent conserved regions, exons, repeats and sequences is integrated into the virtual chromosome, and statistically over-represented single or combinations of transcription factor binding sites are sought. CONVIRT provides regulatory network analysis for several organisms with long promoter regions and permits inter-species genome alignments. CONVIRT is freely available at http://biosoft.kaist. ac.kr/convirt.
Collapse
Affiliation(s)
- Taewoo Ryu
- Bioinformatics Research Center, KRIBB, Daejeon 305-806, Korea
| | | | | | | |
Collapse
|
18
|
van Hijum SAFT, Medema MH, Kuipers OP. Mechanisms and evolution of control logic in prokaryotic transcriptional regulation. Microbiol Mol Biol Rev 2009; 73:481-509, Table of Contents. [PMID: 19721087 PMCID: PMC2738135 DOI: 10.1128/mmbr.00037-08] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
A major part of organismal complexity and versatility of prokaryotes resides in their ability to fine-tune gene expression to adequately respond to internal and external stimuli. Evolution has been very innovative in creating intricate mechanisms by which different regulatory signals operate and interact at promoters to drive gene expression. The regulation of target gene expression by transcription factors (TFs) is governed by control logic brought about by the interaction of regulators with TF binding sites (TFBSs) in cis-regulatory regions. A factor that in large part determines the strength of the response of a target to a given TF is motif stringency, the extent to which the TFBS fits the optimal TFBS sequence for a given TF. Advances in high-throughput technologies and computational genomics allow reconstruction of transcriptional regulatory networks in silico. To optimize the prediction of transcriptional regulatory networks, i.e., to separate direct regulation from indirect regulation, a thorough understanding of the control logic underlying the regulation of gene expression is required. This review summarizes the state of the art of the elements that determine the functionality of TFBSs by focusing on the molecular biological mechanisms and evolutionary origins of cis-regulatory regions.
Collapse
Affiliation(s)
- Sacha A F T van Hijum
- Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Kerklaan 30, 9751 NN Haren, The Netherlands.
| | | | | |
Collapse
|
19
|
Abstract
MOTIVATION Limited availability of data has hindered the development of algorithms that can identify functionally meaningful regulatory single nucleotide polymorphisms (rSNPs). Given the large number of common polymorphisms known to reside in the human genome, the identification of functional rSNPs via laboratory assays will be costly and time-consuming. Therefore appropriate bioinformatics strategies for predicting functional rSNPs are necessary. Recent data from the Encyclopedia of DNA Elements (ENCODE) Project has significantly expanded the amount of available functional information relevant to non-coding regions of the genome, and, importantly, led to the conclusion that many functional elements in the human genome are not conserved. RESULTS In this article we describe how ENCODE data can be leveraged to probabilistically determine the functional and phenotypic significance of non-coding SNPs (ncSNPs). The method achieves excellent sensitivity ( approximately 80%) and speci.city ( approximately 99%) based on a set of known phenotypically relevant and non-functional SNPs. In addition, we show that our method is not overtrained through the use of cross-validation analyses. AVAILABILITY The software platforms used in our analyses are freely available (http://www.cs.waikato.ac.nz/ml/weka/). In addition, we provide the training dataset (Supplementary Table 3), and our predictions (Supplementary Table 6), in the Supplementary Material. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ali Torkamani
- Department of Molecular and Experimental Medicine, Scripps Genomic Medicine and the Scripps Translational Science Institute, The Scripps Research Institute, La Jolla, CA 92037, USA
| | | |
Collapse
|