1
|
Liu S, Gomez-Alcala P, Leemans C, Glassford WJ, Mann RS, Bussemaker HJ. Predicting the DNA binding specificity of mutated transcription factors using family-level biophysically interpretable machine learning. bioRxiv 2024:2024.01.24.577115. [PMID: 38352411 PMCID: PMC10862739 DOI: 10.1101/2024.01.24.577115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Sequence-specific interactions of transcription factors (TFs) with genomic DNA underlie many cellular processes. High-throughput in vitro binding assays coupled with computational analysis have made it possible to accurately define such sequence recognition in a biophysically interpretable yet mechanism-agonistic way for individual TFs. The fact that such sequence-to-affinity models are now available for hundreds of TFs provides new avenues for predicting how the DNA binding specificity of a TF changes when its protein sequence is mutated. To this end, we developed an analytical framework based on a tetrahedron embedding that can be applied at the level of a given structural TF family. Using bHLH as a test case, we demonstrate that we can systematically map dependencies between the protein sequence of a TF and base preference within the DNA binding site. We also develop a regression approach to predict the quantitative energetic impact of mutations in the DNA binding domain of a TF on its DNA binding specificity, and perform SELEX-seq assays on mutated TFs to experimentally validate our results. Our results point to the feasibility of predicting the functional impact of disease mutations and allelic variation in the cell-wide TF repertoire by leveraging high-quality functional information across sets of homologous wild-type proteins.
Collapse
Affiliation(s)
- Shaoxun Liu
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Pilar Gomez-Alcala
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Christ Leemans
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - William J Glassford
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Richard S Mann
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| |
Collapse
|
2
|
Li X, Melo LAN, Bussemaker HJ. Benchmarking DNA binding affinity models using allele-specific transcription factor binding data. bioRxiv 2023:2023.12.15.571887. [PMID: 38168434 PMCID: PMC10760129 DOI: 10.1101/2023.12.15.571887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity can manifest itself in vivo at heterozygous loci as a difference in TF occupancy between the two alleles. When applied on a genomic scale, functional genomic assays such as ChIP-seq typically lack the statistical power to detect allele-specific binding (ASB) at the level of individual variants. To address this, we propose a framework for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We show that a likelihood function based on an over-dispersed binomial distribution can aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. We introduce PyProBound, an easily extensible reimplementation of the ProBound biophysically interpretable machine learning framework. Configuring PyProBound to explicitly account for a confounding sequence-specific bias in DNA fragmentation rate yields improved TF binding models when training on ChIP-seq data. We also show how our likelihood function can be leveraged to perform de novo motif discovery on the raw allele-aware ChIP-seq counts.
Collapse
Affiliation(s)
- Xiaoting Li
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Lucas A. N. Melo
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Harmen J. Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| |
Collapse
|
3
|
Li X, Lappalainen T, Bussemaker HJ. Identifying genetic regulatory variants that affect transcription factor activity. Cell Genom 2023; 3:100382. [PMID: 37719147 PMCID: PMC10504674 DOI: 10.1016/j.xgen.2023.100382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 05/19/2023] [Accepted: 07/21/2023] [Indexed: 09/19/2023]
Abstract
Genetic variants affecting gene expression levels in humans have been mapped in the Genotype-Tissue Expression (GTEx) project. Trans-acting variants impacting many genes simultaneously through a shared transcription factor (TF) are of particular interest. Here, we developed a generalized linear model (GLM) to estimate protein-level TF activity levels in an individual-specific manner from GTEx RNA sequencing (RNA-seq) profiles. It uses observed differential gene expression after TF perturbation as a predictor and, by analyzing differential expression within pairs of neighboring genes, controls for the confounding effect of variation in chromatin state along the genome. We inferred genotype-specific activities for 55 TFs across 49 tissues. Subsequently performing genome-wide association analysis on this virtual trait revealed TF activity quantitative trait loci (aQTLs) that, as a set, are enriched for functional features. Altogether, the set of tools we introduce here highlights the potential of genetic association studies for cellular endophenotypes based on a network-based multi-omics approach. The transparent peer review record is available.
Collapse
Affiliation(s)
- Xiaoting Li
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY 10013, USA
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Harmen J. Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| |
Collapse
|
4
|
Nora EP, Aerts S, Wittkopp PJ, Bussemaker HJ, Bulyk M, Sinha S, Zeitlinger J, Crocker J, Fuxman Bass JI. Emerging questions in transcriptional regulation. Cell Syst 2023; 14:247-251. [PMID: 37080160 DOI: 10.1016/j.cels.2023.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 03/21/2023] [Accepted: 03/21/2023] [Indexed: 04/22/2023]
Abstract
What new questions can we ask about transcriptional regulation given recent developments in large-scale approaches?
Collapse
|
5
|
Rube HT, Rastogi C, Feng S, Kribelbauer JF, Li A, Becerra B, Melo LAN, Do BV, Li X, Adam HH, Shah NH, Mann RS, Bussemaker HJ. Prediction of protein-ligand binding affinity from sequencing data with interpretable machine learning. Nat Biotechnol 2022; 40:1520-1527. [PMID: 35606422 PMCID: PMC9546773 DOI: 10.1038/s41587-022-01307-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Accepted: 04/04/2022] [Indexed: 01/02/2023]
Abstract
Protein-ligand interactions are increasingly profiled at high throughput using affinity selection and massively parallel sequencing. However, these assays do not provide the biophysical parameters that most rigorously quantify molecular interactions. Here we describe a flexible machine learning method, called ProBound, that accurately defines sequence recognition in terms of equilibrium binding constants or kinetic rates. This is achieved using a multi-layered maximum-likelihood framework that models both the molecular interactions and the data generation process. We show that ProBound quantifies transcription factor (TF) behavior with models that predict binding affinity over a range exceeding that of previous resources; captures the impact of DNA modifications and conformational flexibility of multi-TF complexes; and infers specificity directly from in vivo data such as ChIP-seq without peak calling. When coupled with an assay called KD-seq, it determines the absolute affinity of protein-ligand interactions. We also apply ProBound to profile the kinetics of kinase-substrate interactions. ProBound opens new avenues for decoding biological networks and rationally engineering protein-ligand interactions.
Collapse
Affiliation(s)
- H Tomas Rube
- Department of Bioengineering, University of California, Merced, Merced, CA, USA
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Chaitanya Rastogi
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Siqian Feng
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | | | - Allyson Li
- Department of Chemistry, Columbia University, New York, NY, USA
| | - Basheer Becerra
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Lucas A N Melo
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Bach Viet Do
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Xiaoting Li
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Hammaad H Adam
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Neel H Shah
- Department of Chemistry, Columbia University, New York, NY, USA
| | - Richard S Mann
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA.
| |
Collapse
|
6
|
Feng S, Rastogi C, Loker R, Glassford WJ, Tomas Rube H, Bussemaker HJ, Mann RS. Transcription factor paralogs orchestrate alternative gene regulatory networks by context-dependent cooperation with multiple cofactors. Nat Commun 2022; 13:3808. [PMID: 35778382 PMCID: PMC9249852 DOI: 10.1038/s41467-022-31501-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 06/20/2022] [Indexed: 12/23/2022] Open
Abstract
In eukaryotes, members of transcription factor families often exhibit similar DNA binding properties in vitro, yet orchestrate paralog-specific gene regulatory networks in vivo. The serially homologous first (T1) and third (T3) thoracic legs of Drosophila, which are specified by the Hox proteins Scr and Ubx, respectively, offer a unique opportunity to address this paradox in vivo. Genome-wide analyses using epitope-tagged alleles of both Hox loci in the T1 and T3 leg imaginal discs, the precursors to the adult legs and ventral body regions, show that ~8% of Hox binding is paralog-specific. Binding specificity is mediated by interactions with distinct cofactors in different domains: the Hox cofactor Exd acts in the proximal domain and is necessary for Scr to bind many of its paralog-specific targets, while in the distal leg domain, the homeodomain protein Distal-less (Dll) enhances Scr binding to a different subset of loci. These findings reveal how Hox paralogs, and perhaps paralogs of other transcription factor families, orchestrate alternative downstream gene regulatory networks with the help of multiple, context-specific cofactors.
Collapse
Affiliation(s)
- Siqian Feng
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Chaitanya Rastogi
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Ryan Loker
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
- Department of Genetics and Development, Columbia University, New York, NY, USA
- Department of Biology, New York University, New York, NY, USA
| | - William J Glassford
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - H Tomas Rube
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Bioengineering, University of California, Merced, CA, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, 10027, USA
| | - Richard S Mann
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, 10027, USA.
| |
Collapse
|
7
|
Flynn ED, Tsu AL, Kasela S, Kim-Hellmuth S, Aguet F, Ardlie KG, Bussemaker HJ, Mohammadi P, Lappalainen T. Transcription factor regulation of eQTL activity across individuals and tissues. PLoS Genet 2022; 18:e1009719. [PMID: 35100260 PMCID: PMC8830792 DOI: 10.1371/journal.pgen.1009719] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 02/10/2022] [Accepted: 01/06/2022] [Indexed: 11/18/2022] Open
Abstract
Tens of thousands of genetic variants associated with gene expression (cis-eQTLs) have been discovered in the human population. These eQTLs are active in various tissues and contexts, but the molecular mechanisms of eQTL variability are poorly understood, hindering our understanding of genetic regulation across biological contexts. Since many eQTLs are believed to act by altering transcription factor (TF) binding affinity, we hypothesized that analyzing eQTL effect size as a function of TF level may allow discovery of mechanisms of eQTL variability. Using GTEx Consortium eQTL data from 49 tissues, we analyzed the interaction between eQTL effect size and TF level across tissues and across individuals within specific tissues and generated a list of 10,098 TF-eQTL interactions across 2,136 genes that are supported by at least two lines of evidence. These TF-eQTLs were enriched for various TF binding measures, supporting with orthogonal evidence that these eQTLs are regulated by the implicated TFs. We also found that our TF-eQTLs tend to overlap genes with gene-by-environment regulatory effects and to colocalize with GWAS loci, implying that our approach can help to elucidate mechanisms of context-specificity and trait associations. Finally, we highlight an interesting example of IKZF1 TF regulation of an APBB1IP gene eQTL that colocalizes with a GWAS signal for blood cell traits. Together, our findings provide candidate TF mechanisms for a large number of eQTLs and offer a generalizable approach for researchers to discover TF regulators of genetic variant effects in additional QTL datasets.
Collapse
Affiliation(s)
- Elise D. Flynn
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- New York Genome Center, New York, New York, United States of America
| | - Athena L. Tsu
- New York Genome Center, New York, New York, United States of America
- Department of Biomedical Engineering, Columbia University, New York, New York, United States of America
| | - Silva Kasela
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- New York Genome Center, New York, New York, United States of America
| | - Sarah Kim-Hellmuth
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- New York Genome Center, New York, New York, United States of America
- Department of Pediatrics, Dr. von Hauner Children’s Hospital, University Hospital, LMU Munich, Munich, Germany
| | - Francois Aguet
- The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Kristin G. Ardlie
- The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Harmen J. Bussemaker
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
| | - Pejman Mohammadi
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, United States of America
- Scripps Translational Science Institute, The Scripps Research Institute, La Jolla, California, United States of America
- * E-mail: (PM); (TL)
| | - Tuuli Lappalainen
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- New York Genome Center, New York, New York, United States of America
- KTH Royal Institute of Technology, Stockholm, Sweden
- * E-mail: (PM); (TL)
| |
Collapse
|
8
|
Zhang L, Rube HT, Vakulskas CA, Behlke MA, Bussemaker HJ, Pufall MA. Systematic in vitro profiling of off-target affinity, cleavage and efficiency for CRISPR enzymes. Nucleic Acids Res 2020; 48:5037-5053. [PMID: 32315032 PMCID: PMC7229833 DOI: 10.1093/nar/gkaa231] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 03/06/2020] [Accepted: 03/27/2020] [Indexed: 12/14/2022] Open
Abstract
CRISPR RNA-guided endonucleases (RGEs) cut or direct activities to specific genomic loci, yet each has off-target activities that are often unpredictable. We developed a pair of simple in vitro assays to systematically measure the DNA-binding specificity (Spec-seq), catalytic activity specificity (SEAM-seq) and cleavage efficiency of RGEs. By separately quantifying binding and cleavage specificity, Spec/SEAM-seq provides detailed mechanistic insight into off-target activity. Feature-based models generated from Spec/SEAM-seq data for SpCas9 were consistent with previous reports of its in vitro and in vivo specificity, validating the approach. Spec/SEAM-seq is also useful for profiling less-well characterized RGEs. Application to an engineered SpCas9, HiFi-SpCas9, indicated that its enhanced target discrimination can be attributed to cleavage rather than binding specificity. The ortholog ScCas9, on the other hand, derives specificity from binding to an extended PAM. The decreased off-target activity of AsCas12a (Cpf1) appears to be primarily driven by DNA-binding specificity. Finally, we performed the first characterization of CasX specificity, revealing an all-or-nothing mechanism where mismatches can be bound, but not cleaved. Together, these applications establish Spec/SEAM-seq as an accessible method to rapidly and reliably evaluate the specificity of RGEs, Cas::gRNA pairs, and gain insight into the mechanism and thermodynamics of target discrimination.
Collapse
Affiliation(s)
- Liyang Zhang
- Department of Biochemistry, Carver College of Medicine, University of Iowa, Coralville, IA 52241, USA.,Integrated DNA Technologies, Inc., 1710 Commercial Park, Coralville, IA 52241, USA
| | - H Tomas Rube
- Department of Bioengineering, University of California, Merced, New York, NY 10027, USA.,Department of Biological Sciences, Columbia University, New York, NY 10027, USA.,Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | | | - Mark A Behlke
- Integrated DNA Technologies, Inc., 1710 Commercial Park, Coralville, IA 52241, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA.,Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Miles A Pufall
- Department of Biochemistry, Carver College of Medicine, University of Iowa, Coralville, IA 52241, USA
| |
Collapse
|
9
|
Kribelbauer JF, Loker RE, Feng S, Rastogi C, Abe N, Rube HT, Bussemaker HJ, Mann RS. Context-Dependent Gene Regulation by Homeodomain Transcription Factor Complexes Revealed by Shape-Readout Deficient Proteins. Mol Cell 2020; 78:152-167.e11. [PMID: 32053778 DOI: 10.1016/j.molcel.2020.01.027] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 12/01/2019] [Accepted: 01/27/2020] [Indexed: 01/09/2023]
Abstract
Eukaryotic transcription factors (TFs) form complexes with various partner proteins to recognize their genomic target sites. Yet, how the DNA sequence determines which TF complex forms at any given site is poorly understood. Here, we demonstrate that high-throughput in vitro DNA binding assays coupled with unbiased computational analysis provide unprecedented insight into how different DNA sequences select distinct compositions and configurations of homeodomain TF complexes. Using inferred knowledge about minor groove width readout, we design targeted protein mutations that destabilize homeodomain binding both in vitro and in vivo in a complex-specific manner. By performing parallel systematic evolution of ligands by exponential enrichment sequencing (SELEX-seq), chromatin immunoprecipitation sequencing (ChIP-seq), RNA sequencing (RNA-seq), and Hi-C assays, we not only classify the majority of in vivo binding events in terms of complex composition but also infer complex-specific functions by perturbing the gene regulatory network controlled by a single complex.
Collapse
Affiliation(s)
- Judith F Kribelbauer
- Department of Biological Sciences, Columbia University, New York, NY 10025, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Ryan E Loker
- Department of Biochemistry and Molecular Biophysics, Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
| | - Siqian Feng
- Department of Biochemistry and Molecular Biophysics, Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
| | - Chaitanya Rastogi
- Department of Biological Sciences, Columbia University, New York, NY 10025, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Namiko Abe
- Department of Biochemistry and Molecular Biophysics, Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
| | - H Tomas Rube
- Department of Biological Sciences, Columbia University, New York, NY 10025, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10025, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA.
| | - Richard S Mann
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Biochemistry and Molecular Biophysics, Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA; Department of Neuroscience, Columbia University, New York, NY 10027, USA.
| |
Collapse
|
10
|
Kribelbauer JF, Lu XJ, Rohs R, Mann RS, Bussemaker HJ. Toward a Mechanistic Understanding of DNA Methylation Readout by Transcription Factors. J Mol Biol 2019:S0022-2836(19)30617-5. [PMID: 31689433 DOI: 10.1016/j.jmb.2019.10.021] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 10/23/2019] [Accepted: 10/24/2019] [Indexed: 01/09/2023]
Abstract
Epigenetic DNA modification impacts gene expression, but the underlying molecular mechanisms are only partly understood. Adding a methyl group to a cytosine base locally modifies the structural features of DNA in multiple ways, which may change the interaction with DNA-binding transcription factors (TFs) and trigger a cascade of downstream molecular events. Cells can be probed using various functional genomics assays, but it is difficult to disentangle the confounded effects of DNA modification on TF binding, chromatin accessibility, intranuclear variation in local TF concentration, and rate of transcription. Here we discuss how high-throughput in vitro profiling of protein-DNA interactions has enabled comprehensive characterization and quantification of the methylation sensitivity of TFs. Despite the limited structural data for DNA containing methylated cytosine, automated analysis of structural information in the Protein Data Bank (PDB) shows how 5-methylcytosine (5mC) can be recognized in various ways by amino acid side chains. We discuss how a context-dependent effect of methylation on DNA groove geometry can affect DNA binding by homeodomain proteins and how principled modeling of ChIP-seq data can overcome the confounding that makes the interpretation of in vivo data challenging. The emerging picture is that epigenetic modifications affect TF binding in a highly context-specific manner, with a direction and effect size that depend critically on their position within the TF binding site and the amino acid sequence of the TF. With this improved mechanistic knowledge, we have come closer to understanding how cells use DNA modification to acquire, retain, and change their identity.
Collapse
Affiliation(s)
- Judith F Kribelbauer
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA; Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA; Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Xiang-Jun Lu
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Remo Rohs
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA; Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA; Department of Physics & Astronomy, University of Southern California, Los Angeles, CA 90089, USA; Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Richard S Mann
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA; Department of Systems Biology, Columbia University, New York, NY 10032, USA; Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA; Department of Neuroscience, Columbia University, New York, NY 10027, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA; Department of Systems Biology, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
11
|
Abstract
Eukaryotic transcription factors (TFs) from the same structural family tend to bind similar DNA sequences, despite the ability of these TFs to execute distinct functions in vivo. The cell partly resolves this specificity paradox through combinatorial strategies and the use of low-affinity binding sites, which are better able to distinguish between similar TFs. However, because these sites have low affinity, it is challenging to understand how TFs recognize them in vivo. Here, we summarize recent findings and technological advancements that allow for the quantification and mechanistic interpretation of TF recognition across a wide range of affinities. We propose a model that integrates insights from the fields of genetics and cell biology to provide further conceptual understanding of TF binding specificity. We argue that in eukaryotes, target specificity is driven by an inhomogeneous 3D nuclear distribution of TFs and by variation in DNA binding affinity such that locally elevated TF concentration allows low-affinity binding sites to be functional.
Collapse
Affiliation(s)
- Judith F Kribelbauer
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA; .,Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10031, USA;
| | - Chaitanya Rastogi
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA; .,Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10031, USA;
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA; .,Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10031, USA;
| | - Richard S Mann
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10031, USA; .,Department of Biochemistry and Molecular Biophysics, Columbia University Irving Medical Center, New York, NY 10031, USA.,Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
| |
Collapse
|
12
|
van Arensbergen J, Pagie L, FitzPatrick VD, de Haas M, Baltissen MP, Comoglio F, van der Weide RH, Teunissen H, Võsa U, Franke L, de Wit E, Vermeulen M, Bussemaker HJ, van Steensel B. High-throughput identification of human SNPs affecting regulatory element activity. Nat Genet 2019; 51:1160-1169. [PMID: 31253979 PMCID: PMC6609452 DOI: 10.1038/s41588-019-0455-2] [Citation(s) in RCA: 117] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 05/24/2019] [Indexed: 01/08/2023]
Abstract
Most of the millions of SNPs in the human genome are non-coding, and many overlap with putative regulatory elements. Genome-wide association studies (GWAS) have linked many of these SNPs to human traits or to gene expression levels, but rarely with sufficient resolution to identify the causal SNPs. Functional screens based on reporter assays have previously been of insufficient throughput to test the vast space of SNPs for possible effects on regulatory element activity. Here we leveraged the throughput and resolution of the survey of regulatory elements (SuRE) reporter technology to survey the effect of 5.9 million SNPs, including 57% of the known common SNPs, on enhancer and promoter activity. We identified more than 30,000 SNPs that alter the activity of putative regulatory elements, partially in a cell-type-specific manner. Integration of this dataset with GWAS results may help to pinpoint SNPs that underlie human traits.
Collapse
Affiliation(s)
- Joris van Arensbergen
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, the Netherlands.
| | - Ludo Pagie
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Vincent D FitzPatrick
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Marcel de Haas
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Marijke P Baltissen
- Department of Molecular Biology, Oncode Institute, Radboud Institute for Molecular Life Sciences, Radboud University Nijmegen, Nijmegen, the Netherlands
| | - Federico Comoglio
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, the Netherlands
- Department of Haematology, University of Cambridge, Cambridge, UK
| | - Robin H van der Weide
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Hans Teunissen
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Urmo Võsa
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Lude Franke
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Elzo de Wit
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Michiel Vermeulen
- Department of Molecular Biology, Oncode Institute, Radboud Institute for Molecular Life Sciences, Radboud University Nijmegen, Nijmegen, the Netherlands
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Bas van Steensel
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, the Netherlands.
| |
Collapse
|
13
|
Kribelbauer JF, Laptenko O, Chen S, Martini GD, Freed-Pastor WA, Prives C, Mann RS, Bussemaker HJ. Quantitative Analysis of the DNA Methylation Sensitivity of Transcription Factor Complexes. Cell Rep 2018; 19:2383-2395. [PMID: 28614722 DOI: 10.1016/j.celrep.2017.05.069] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 04/07/2017] [Accepted: 05/22/2017] [Indexed: 01/25/2023] Open
Abstract
Although DNA modifications play an important role in gene regulation, the underlying mechanisms remain elusive. We developed EpiSELEX-seq to probe the sensitivity of transcription factor binding to DNA modification in vitro using massively parallel sequencing. Feature-based modeling quantifies the effect of cytosine methylation (5mC) on binding free energy in a position-specific manner. Application to the human bZIP proteins ATF4 and C/EBPβ and three different Pbx-Hox complexes shows that 5mCpG can both increase and decrease affinity, depending on where the modification occurs within the protein-DNA interface. The TF paralogs tested vary in their methylation sensitivity, for which we provide a structural rationale. We show that 5mCpG can also enhance in vitro p53 binding and provide evidence for increased in vivo p53 occupancy at methylated binding sites, correlating with primed enhancer histone marks. Our results establish a powerful strategy for dissecting the epigenomic modulation of protein-DNA interactions and their role in gene regulation.
Collapse
Affiliation(s)
- Judith F Kribelbauer
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA; Department of Systems Biology, Columbia University Medical Center, New York, NY 10032, USA
| | - Oleg Laptenko
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Siying Chen
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA; Department of Systems Biology, Columbia University Medical Center, New York, NY 10032, USA
| | - Gabriella D Martini
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - William A Freed-Pastor
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA; David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Carol Prives
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Richard S Mann
- Department of Systems Biology, Columbia University Medical Center, New York, NY 10032, USA; Department of Biochemistry and Molecular Biophysics, Columbia University Medical Center, New York, NY 10032, USA.
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA; Department of Systems Biology, Columbia University Medical Center, New York, NY 10032, USA.
| |
Collapse
|
14
|
Rube HT, Rastogi C, Kribelbauer JF, Bussemaker HJ. A unified approach for quantifying and interpreting DNA shape readout by transcription factors. Mol Syst Biol 2018; 14:e7902. [PMID: 29472273 PMCID: PMC5822049 DOI: 10.15252/msb.20177902] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 01/26/2018] [Accepted: 01/31/2018] [Indexed: 01/07/2023] Open
Abstract
Transcription factors (TFs) interpret DNA sequence by probing the chemical and structural properties of the nucleotide polymer. DNA shape is thought to enable a parsimonious representation of dependencies between nucleotide positions. Here, we propose a unified mathematical representation of the DNA sequence dependence of shape and TF binding, respectively, which simplifies and enhances analysis of shape readout. First, we demonstrate that linear models based on mononucleotide features alone account for 60-70% of the variance in minor groove width, roll, helix twist, and propeller twist. This explains why simple scoring matrices that ignore all dependencies between nucleotide positions can partially account for DNA shape readout by a TF Adding dinucleotide features as sequence-to-shape predictors to our model, we can almost perfectly explain the shape parameters. Building on this observation, we developed a post hoc analysis method that can be used to analyze any mechanism-agnostic protein-DNA binding model in terms of shape readout. Our insights provide an alternative strategy for using DNA shape information to enhance our understanding of how cis-regulatory codes are interpreted by the cellular machinery.
Collapse
Affiliation(s)
- H Tomas Rube
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Chaitanya Rastogi
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Program in Applied Physics and Applied Mathematics, Columbia University, New York, NY, USA
| | - Judith F Kribelbauer
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
15
|
Rao S, Chiu TP, Kribelbauer JF, Mann RS, Bussemaker HJ, Rohs R. Systematic prediction of DNA shape changes due to CpG methylation explains epigenetic effects on protein-DNA binding. Epigenetics Chromatin 2018; 11:6. [PMID: 29409522 PMCID: PMC5800008 DOI: 10.1186/s13072-018-0174-4] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2017] [Accepted: 01/15/2018] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND DNA shape analysis has demonstrated the potential to reveal structure-based mechanisms of protein-DNA binding. However, information about the influence of chemical modification of DNA is limited. Cytosine methylation, the most frequent modification, represents the addition of a methyl group at the major groove edge of the cytosine base. In mammalian genomes, cytosine methylation most frequently occurs at CpG dinucleotides. In addition to changing the chemical signature of C/G base pairs, cytosine methylation can affect DNA structure. Since the original discovery of DNA methylation, major efforts have been made to understand its effect from a sequence perspective. Compared to unmethylated DNA, however, little structural information is available for methylated DNA, due to the limited number of experimentally determined structures. To achieve a better mechanistic understanding of the effect of CpG methylation on local DNA structure, we developed a high-throughput method, methyl-DNAshape, for predicting the effect of cytosine methylation on DNA shape. RESULTS Using our new method, we found that CpG methylation significantly altered local DNA shape. Four DNA shape features-helix twist, minor groove width, propeller twist, and roll-were considered in this analysis. Distinct distributions of effect size were observed for different features. Roll and propeller twist were the DNA shape features most strongly affected by CpG methylation with an effect size depending on the local sequence context. Methylation-induced changes in DNA shape were predictive of the measured rate of cleavage by DNase I and suggest a possible mechanism for some of the methylation sensitivities that were recently observed for human Pbx-Hox complexes. CONCLUSIONS CpG methylation is an important epigenetic mark in the mammalian genome. Understanding its role in protein-DNA recognition can further our knowledge of gene regulation. Our high-throughput methyl-DNAshape method can be used to predict the effect of cytosine methylation on DNA shape and its subsequent influence on protein-DNA interactions. This approach overcomes the limited availability of experimental DNA structures that contain 5-methylcytosine.
Collapse
Affiliation(s)
- Satyanarayan Rao
- Computational Biology and Bioinformatics Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089, USA
| | - Tsu-Pei Chiu
- Computational Biology and Bioinformatics Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089, USA
| | - Judith F Kribelbauer
- Department of Biological Sciences, Columbia University, New York, NY, 10027, USA.,Department of Systems Biology, Columbia University, New York, NY, 10032, USA.,Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
| | - Richard S Mann
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA.,Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA.,Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10027, USA.,Department of Neuroscience, Columbia University, New York, NY, 10027, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, 10027, USA. .,Department of Systems Biology, Columbia University, New York, NY, 10032, USA.
| | - Remo Rohs
- Computational Biology and Bioinformatics Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089, USA. .,Department of Chemistry, University of Southern California, Los Angeles, CA, 90089, USA. .,Department of Physics & Astronomy, University of Southern California, Los Angeles, CA, 90089, USA. .,Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
16
|
Zhang L, Martini GD, Rube HT, Kribelbauer JF, Rastogi C, FitzPatrick VD, Houtman JC, Bussemaker HJ, Pufall MA. SelexGLM differentiates androgen and glucocorticoid receptor DNA-binding preference over an extended binding site. Genome Res 2017; 28:111-121. [PMID: 29196557 PMCID: PMC5749176 DOI: 10.1101/gr.222844.117] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 11/22/2017] [Indexed: 11/28/2022]
Abstract
The DNA-binding interfaces of the androgen (AR) and glucocorticoid (GR) receptors are virtually identical, yet these transcription factors share only about a third of their genomic binding sites and regulate similarly distinct sets of target genes. To address this paradox, we determined the intrinsic specificities of the AR and GR DNA-binding domains using a refined version of SELEX-seq. We developed an algorithm, SelexGLM, that quantifies binding specificity over a large (31-bp) binding site by iteratively fitting a feature-based generalized linear model to SELEX probe counts. This analysis revealed that the DNA-binding preferences of AR and GR homodimers differ significantly, both within and outside the 15-bp core binding site. The relative preference between the two factors can be tuned over a wide range by changing the DNA sequence, with AR more sensitive to sequence changes than GR. The specificity of AR extends to the regions flanking the core 15-bp site, where isothermal calorimetry measurements reveal that affinity is augmented by enthalpy-driven readout of poly(A) sequences associated with narrowed minor groove width. We conclude that the increased specificity of AR is correlated with more enthalpy-driven binding than GR. The binding models help explain differences in AR and GR genomic binding and provide a biophysical rationale for how promiscuous binding by GR allows functional substitution for AR in some castration-resistant prostate cancers.
Collapse
Affiliation(s)
- Liyang Zhang
- Department of Biochemistry, Carver College of Medicine, University of Iowa, Iowa City, Iowa 52242, USA
| | - Gabriella D Martini
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York 10032, USA
| | - H Tomas Rube
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York 10032, USA
| | - Judith F Kribelbauer
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York 10032, USA
| | - Chaitanya Rastogi
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York 10032, USA
| | - Vincent D FitzPatrick
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York 10032, USA
| | - Jon C Houtman
- Department of Immunology, Carver College of Medicine, University of Iowa, Iowa City, Iowa 52242, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York 10032, USA
| | - Miles A Pufall
- Department of Biochemistry, Carver College of Medicine, University of Iowa, Iowa City, Iowa 52242, USA
| |
Collapse
|
17
|
Bussemaker HJ, Causton HC, Fazlollahi M, Lee E, Muroff I. Network-based approaches that exploit inferred transcription factor activity to analyze the impact of genetic variation on gene expression. ACTA ACUST UNITED AC 2017; 2:98-102. [PMID: 28691107 DOI: 10.1016/j.coisb.2017.04.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Over the past decade, a number of methods have emerged for inferring protein-level transcription factor activities in individual samples based on prior information about the structure of the gene regulatory network. We discuss how this has enabled new methods for dissecting trans-acting mechanisms that underpin genetic variation in gene expression.
Collapse
Affiliation(s)
- Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027.,Department of Systems Biology, Columbia University, New York, NY 10032
| | - Helen C Causton
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY 10032
| | - Mina Fazlollahi
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY 10029
| | - Eunjee Lee
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY 10029
| | - Ivor Muroff
- Department of Biological Sciences, Columbia University, New York, NY 10027
| |
Collapse
|
18
|
van Arensbergen J, FitzPatrick VD, de Haas M, Pagie L, Sluimer J, Bussemaker HJ, van Steensel B. Genome-wide mapping of autonomous promoter activity in human cells. Nat Biotechnol 2016; 35:145-153. [PMID: 28024146 DOI: 10.1038/nbt.3754] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 12/01/2016] [Indexed: 12/20/2022]
Abstract
Previous methods to systematically characterize sequence-intrinsic activity of promoters have been limited by relatively low throughput and the length of the sequences that could be tested. Here we present 'survey of regulatory elements' (SuRE), a method that assays more than 108 DNA fragments, each 0.2-2 kb in size, for their ability to drive transcription autonomously. In SuRE, a plasmid library of random genomic fragments upstream of a 20-bp barcode is constructed, and decoded by paired-end sequencing. This library is used to transfect cells, and barcodes in transcribed RNA are quantified by high-throughput sequencing. When applied to the human genome, we achieve 55-fold genome coverage, allowing us to map autonomous promoter activity genome-wide in K562 cells. By computational modeling we delineate subregions within promoters that are relevant for their activity. We show that antisense promoter transcription is generally dependent on the sense core promoter sequences, and that most enhancers and several families of repetitive elements act as autonomous transcription initiation sites.
Collapse
Affiliation(s)
- Joris van Arensbergen
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Vincent D FitzPatrick
- Department of Biological Sciences, Columbia University, New York, New York, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York, USA
| | - Marcel de Haas
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Ludo Pagie
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Jasper Sluimer
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, New York, USA.,Department of Systems Biology, Columbia University Medical Center, New York, New York, USA
| | - Bas van Steensel
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, the Netherlands
| |
Collapse
|
19
|
Riley TR, Lazarovici A, Mann RS, Bussemaker HJ. Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE. eLife 2015; 4:e06397. [PMID: 26701911 PMCID: PMC4758951 DOI: 10.7554/elife.06397] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Accepted: 12/20/2015] [Indexed: 01/26/2023] Open
Abstract
Transcription factors are crucial regulators of gene expression. Accurate quantitative definition of their intrinsic DNA binding preferences is critical to understanding their biological function. High-throughput in vitro technology has recently been used to deeply probe the DNA binding specificity of hundreds of eukaryotic transcription factors, yet algorithms for analyzing such data have not yet fully matured. Here, we present a general framework (FeatureREDUCE) for building sequence-to-affinity models based on a biophysically interpretable and extensible model of protein-DNA interaction that can account for dependencies between nucleotides within the binding interface or multiple modes of binding. When training on protein binding microarray (PBM) data, we use robust regression and modeling of technology-specific biases to infer specificity models of unprecedented accuracy and precision. We provide quantitative validation of our results by comparing to gold-standard data when available.
Collapse
Affiliation(s)
- Todd R Riley
- Department of Biological Sciences, Columbia University, New York, United States
- Department of Systems Biology, Columbia University, New York, United States
- Department of Biology, University of Massachusetts Boston, Boston, United States
| | - Allan Lazarovici
- Department of Biological Sciences, Columbia University, New York, United States
- Department of Electrical Engineering, Columbia University, New York, United States
| | - Richard S Mann
- Department of Systems Biology, Columbia University, New York, United States
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, United States
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, United States
- Department of Systems Biology, Columbia University, New York, United States
| |
Collapse
|
20
|
Lu XJ, Bussemaker HJ, Olson WK. DSSR: an integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Res 2015; 43:e142. [PMID: 26184874 PMCID: PMC4666379 DOI: 10.1093/nar/gkv716] [Citation(s) in RCA: 137] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 07/02/2015] [Indexed: 12/16/2022] Open
Abstract
Insight into the three-dimensional architecture of RNA is essential for understanding its cellular functions. However, even the classic transfer RNA structure contains features that are overlooked by existing bioinformatics tools. Here we present DSSR (Dissecting the Spatial Structure of RNA), an integrated and automated tool for analyzing and annotating RNA tertiary structures. The software identifies canonical and noncanonical base pairs, including those with modified nucleotides, in any tautomeric or protonation state. DSSR detects higher-order coplanar base associations, termed multiplets. It finds arrays of stacked pairs, classifies them by base-pair identity and backbone connectivity, and distinguishes a stem of covalently connected canonical pairs from a helix of stacked pairs of arbitrary type/linkage. DSSR identifies coaxial stacking of multiple stems within a single helix and lists isolated canonical pairs that lie outside of a stem. The program characterizes 'closed' loops of various types (hairpin, bulge, internal, and junction loops) and pseudoknots of arbitrary complexity. Notably, DSSR employs isolated pairs and the ends of stems, whether pseudoknotted or not, to define junction loops. This new, inclusive definition provides a novel perspective on the spatial organization of RNA. Tests on all nucleic acid structures in the Protein Data Bank confirm the efficiency and robustness of the software, and applications to representative RNA molecules illustrate its unique features. DSSR and related materials are freely available at http://x3dna.org/.
Collapse
Affiliation(s)
- Xiang-Jun Lu
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Wilma K Olson
- Department of Chemistry and Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
21
|
Zhou T, Shen N, Yang L, Abe N, Horton J, Mann RS, Bussemaker HJ, Gordân R, Rohs R. 14 Quantitative modeling of transcription factor binding specificities using DNA shape. J Biomol Struct Dyn 2015. [DOI: 10.1080/07391102.2015.1032554] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
22
|
Rao S, Carolina Dantas Machado A, Zhou T, Rastogi C, Bussemaker HJ, Rohs R. 22 Evolving insights on how cytosine methylation affects protein-DNA binding. J Biomol Struct Dyn 2015. [DOI: 10.1080/07391102.2015.1032562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
23
|
Abe N, Dror I, Yang L, Slattery M, Zhou T, Bussemaker HJ, Rohs R, Mann RS. Deconvolving the recognition of DNA shape from sequence. Cell 2015; 161:307-18. [PMID: 25843630 DOI: 10.1016/j.cell.2015.02.008] [Citation(s) in RCA: 135] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2014] [Revised: 12/08/2014] [Accepted: 01/26/2015] [Indexed: 01/25/2023]
Abstract
Protein-DNA binding is mediated by the recognition of the chemical signatures of the DNA bases and the 3D shape of the DNA molecule. Because DNA shape is a consequence of sequence, it is difficult to dissociate these modes of recognition. Here, we tease them apart in the context of Hox-DNA binding by mutating residues that, in a co-crystal structure, only recognize DNA shape. Complexes made with these mutants lose the preference to bind sequences with specific DNA shape features. Introducing shape-recognizing residues from one Hox protein to another swapped binding specificities in vitro and gene regulation in vivo. Statistical machine learning revealed that the accuracy of binding specificity predictions improves by adding shape features to a model that only depends on sequence, and feature selection identified shape features important for recognition. Thus, shape readout is a direct and independent component of binding site selection by Hox proteins.
Collapse
Affiliation(s)
- Namiko Abe
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA; Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Iris Dror
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA; Department of Biology, Technion - Israel Institute of Technology, Haifa 32000, Israel
| | - Lin Yang
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew Slattery
- Department of Biomedical Sciences, University of Minnesota Medical School, Duluth, MN 55812, USA
| | - Tianyin Zhou
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10032, USA
| | - Remo Rohs
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA; Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA; Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089, USA; Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA.
| | - Richard S Mann
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA; Department of Systems Biology, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
24
|
Dantas Machado AC, Zhou T, Rao S, Goel P, Rastogi C, Lazarovici A, Bussemaker HJ, Rohs R. Evolving insights on how cytosine methylation affects protein-DNA binding. Brief Funct Genomics 2015; 14:61-73. [PMID: 25319759 PMCID: PMC4303714 DOI: 10.1093/bfgp/elu040] [Citation(s) in RCA: 79] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Many anecdotal observations exist of a regulatory effect of DNA methylation on gene expression. However, in general, the underlying mechanisms of this effect are poorly understood. In this review, we summarize what is currently known about how this important, but mysterious, epigenetic mark impacts cellular functions. Cytosine methylation can abrogate or enhance interactions with DNA-binding proteins, or it may have no effect, depending on the context. Despite being only a small chemical change, the addition of a methyl group to cytosine can affect base readout via hydrophobic contacts in the major groove and shape readout via electrostatic contacts in the minor groove. We discuss the recent discovery that CpG methylation increases DNase I cleavage at adjacent positions by an order of magnitude through altering the local 3D DNA shape and the possible implications of this structural insight for understanding the methylation sensitivity of transcription factors (TFs). Additionally, 5-methylcytosines change the stability of nucleosomes and, thus, affect the local chromatin structure and access of TFs to genomic DNA. Given these complexities, it seems unlikely that the influence of DNA methylation on protein-DNA binding can be captured in a small set of general rules. Hence, data-driven approaches may be essential to gain a better understanding of these mechanisms.
Collapse
|
25
|
Affiliation(s)
- Ronald G Tepper
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | | | | |
Collapse
|
26
|
van Arensbergen J, van Steensel B, Bussemaker HJ. In search of the determinants of enhancer-promoter interaction specificity. Trends Cell Biol 2014; 24:695-702. [PMID: 25160912 DOI: 10.1016/j.tcb.2014.07.004] [Citation(s) in RCA: 120] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Revised: 07/15/2014] [Accepted: 07/21/2014] [Indexed: 02/07/2023]
Abstract
Although it was originally believed that enhancers activate only the nearest promoter, recent global analyses enabled by high-throughput technology suggest that the network of enhancer-promoter interactions is far more complex. The mechanisms that determine the specificity of enhancer-promoter interactions are still poorly understood, but they are thought to include biochemical compatibility, constraints imposed by the three-dimensional architecture of chromosomes, insulator elements, and possibly the effects of local chromatin composition. In this review, we assess the current insights into these determinants, and highlight the functional genomic approaches that will lead the way towards better mechanistic understanding.
Collapse
Affiliation(s)
- Joris van Arensbergen
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Bas van Steensel
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands.
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA; Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
27
|
Ghosh HS, Ceribelli M, Matos I, Lazarovici A, Bussemaker HJ, Lasorella A, Hiebert SW, Liu K, Staudt LM, Reizis B. ETO family protein Mtg16 regulates the balance of dendritic cell subsets by repressing Id2. ACTA ACUST UNITED AC 2014; 211:1623-35. [PMID: 24980046 PMCID: PMC4113936 DOI: 10.1084/jem.20132121] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Dendritic cells (DCs) comprise two major subsets, the interferon (IFN)-producing plasmacytoid DCs (pDCs) and antigen-presenting classical DCs (cDCs). The development of pDCs is promoted by E protein transcription factor E2-2, whereas E protein antagonist Id2 is specifically absent from pDCs. Conversely, Id2 is prominently expressed in cDCs and promotes CD8(+) cDC development. The mechanisms that control the balance between E and Id proteins during DC subset specification remain unknown. We found that the loss of Mtg16, a transcriptional cofactor of the ETO protein family, profoundly impaired pDC development and pDC-dependent IFN response. The residual Mtg16-deficient pDCs showed aberrant phenotype, including the expression of myeloid marker CD11b. Conversely, the development of cDC progenitors (pre-DCs) and of CD8(+) cDCs was enhanced. Genome-wide expression and DNA-binding analysis identified Id2 as a direct target of Mtg16. Mtg16-deficient cDC progenitors and pDCs showed aberrant induction of Id2, and the deletion of Id2 facilitated the impaired development of Mtg16-deficient pDCs. Thus, Mtg16 promotes pDC differentiation and restricts cDC development in part by repressing Id2, revealing a cell-intrinsic mechanism that controls subset balance during DC development.
Collapse
Affiliation(s)
- Hiyaa S Ghosh
- Department of Microbiology and Immunology, Center for Computational Biology and Bioinformatics, Institute for Cancer Genetics, Department of Pathology, and Department of Pediatrics, Columbia University Medical Center and Department of Biological Sciences and Department of Electrical Engineering, Columbia University, New York, NY 10032
| | - Michele Ceribelli
- Lymphoid Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892
| | - Ines Matos
- Department of Microbiology and Immunology, Center for Computational Biology and Bioinformatics, Institute for Cancer Genetics, Department of Pathology, and Department of Pediatrics, Columbia University Medical Center and Department of Biological Sciences and Department of Electrical Engineering, Columbia University, New York, NY 10032
| | - Allan Lazarovici
- Department of Microbiology and Immunology, Center for Computational Biology and Bioinformatics, Institute for Cancer Genetics, Department of Pathology, and Department of Pediatrics, Columbia University Medical Center and Department of Biological Sciences and Department of Electrical Engineering, Columbia University, New York, NY 10032Department of Microbiology and Immunology, Center for Computational Biology and Bioinformatics, Institute for Cancer Genetics, Department of Pathology, and Department of Pediatrics, Columbia University Medical Center and Department of Biological Sciences and Department of Electrical Engineering, Columbia University, New York, NY 10032
| | - Harmen J Bussemaker
- Department of Microbiology and Immunology, Center for Computational Biology and Bioinformatics, Institute for Cancer Genetics, Department of Pathology, and Department of Pediatrics, Columbia University Medical Center and Department of Biological Sciences and Department of Electrical Engineering, Columbia University, New York, NY 10032Department of Microbiology and Immunology, Center for Computational Biology and Bioinformatics, Institute for Cancer Genetics, Department of Pathology, and Department of Pediatrics, Columbia University Medical Center and Department of Biological Sciences and Department of Electrical Engineering, Columbia University, New York, NY 10032
| | - Anna Lasorella
- Department of Microbiology and Immunology, Center for Computational Biology and Bioinformatics, Institute for Cancer Genetics, Department of Pathology, and Department of Pediatrics, Columbia University Medical Center and Department of Biological Sciences and Department of Electrical Engineering, Columbia University, New York, NY 10032Department of Microbiology and Immunology, Center for Computational Biology and Bioinformatics, Institute for Cancer Genetics, Department of Pathology, and Department of Pediatrics, Columbia University Medical Center and Department of Biological Sciences and Department of Electrical Engineering, Columbia University, New York, NY 10032Department of Microbiology and Immunology, Center for Computational Biology and Bioinformatics, Institute for Cancer Genetics, Department of Pathology, and Department of Pediatrics, Columbia University Medical Center and Department of Biological Sciences and Department of Electrical Engineering, Columbia University, New York, NY 10032
| | - Scott W Hiebert
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN 37232
| | - Kang Liu
- Department of Microbiology and Immunology, Center for Computational Biology and Bioinformatics, Institute for Cancer Genetics, Department of Pathology, and Department of Pediatrics, Columbia University Medical Center and Department of Biological Sciences and Department of Electrical Engineering, Columbia University, New York, NY 10032
| | - Louis M Staudt
- Lymphoid Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892
| | - Boris Reizis
- Department of Microbiology and Immunology, Center for Computational Biology and Bioinformatics, Institute for Cancer Genetics, Department of Pathology, and Department of Pediatrics, Columbia University Medical Center and Department of Biological Sciences and Department of Electrical Engineering, Columbia University, New York, NY 10032
| |
Collapse
|
28
|
Ward LD, Wang J, Bussemaker HJ. Characterizing a collective and dynamic component of chromatin immunoprecipitation enrichment profiles in yeast. BMC Genomics 2014; 15:494. [PMID: 24947676 PMCID: PMC4124144 DOI: 10.1186/1471-2164-15-494] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 05/27/2014] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Recent chromatin immunoprecipitation (ChIP) experiments in fly, mouse, and human have revealed the existence of high-occupancy target (HOT) regions or "hotspots" that show enrichment across many assayed DNA-binding proteins. Similar co-enrichment observed in yeast so far has been treated as artifactual, and has not been fully characterized. RESULTS Here we reanalyze ChIP data from both array-based and sequencing-based experiments to show that in the yeast S. cerevisiae, the collective enrichment phenomenon is strongly associated with proximity to noncoding RNA genes and with nucleosome depletion. DNA sequence motifs that confer binding affinity for the proteins are largely absent from these hotspots, suggesting that protein-protein interactions play a prominent role. The hotspots are condition-specific, suggesting that they reflect a chromatin state or protein state, and are not a static feature of underlying sequence. Additionally, only a subset of all assayed factors is associated with these loci, suggesting that the co-enrichment cannot be simply explained by a chromatin state that is universally more prone to immunoprecipitation. CONCLUSIONS Together our results suggest that the co-enrichment patterns observed in yeast represent transcription factor co-occupancy. More generally, they make clear that great caution must be used when interpreting ChIP enrichment profiles for individual factors in isolation, as they will include factor-specific as well as collective contributions.
Collapse
Affiliation(s)
- Lucas D Ward
- />Department of Biological Sciences, Columbia University, 1212 Amsterdam Ave, New York, NY 10027 USA
- />Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Junbai Wang
- />Department of Biological Sciences, Columbia University, 1212 Amsterdam Ave, New York, NY 10027 USA
- />Department of Pathology, Oslo University Hospital - The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway
| | - Harmen J Bussemaker
- />Department of Biological Sciences, Columbia University, 1212 Amsterdam Ave, New York, NY 10027 USA
- />Center for Computational Biology and Bioinformatics, Columbia University, 1130 St. Nicholas Ave, New York, NY 10032 USA
| |
Collapse
|
29
|
Lazarovici A, Zhou T, Shafer A, Machado ACD, Sandstrom R, Sabo PJ, Lu Y, Rohs R, Stamatoyannopoulos JA, Bussemaker HJ. 103 Probing DNA shape and methylation state on a genomic scale with DNase I. J Biomol Struct Dyn 2013. [DOI: 10.1080/07391102.2013.786345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
30
|
Singer S, Zhao R, Barsotti AM, Ouwehand A, Fazollahi M, Coutavas E, Breuhahn K, Neumann O, Longerich T, Pusterla T, Powers MA, Giles KM, Leedman PJ, Hess J, Grunwald D, Bussemaker HJ, Singer RH, Schirmacher P, Prives C. Nuclear pore component Nup98 is a potential tumor suppressor and regulates posttranscriptional expression of select p53 target genes. Mol Cell 2012; 48:799-810. [PMID: 23102701 PMCID: PMC3525737 DOI: 10.1016/j.molcel.2012.09.020] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Revised: 06/06/2012] [Accepted: 09/17/2012] [Indexed: 12/21/2022]
Abstract
The p53 tumor suppressor utilizes multiple mechanisms to selectively regulate its myriad target genes, which in turn mediate diverse cellular processes. Here, using conventional and single-molecule mRNA analyses, we demonstrate that the nucleoporin Nup98 is required for full expression of p21, a key effector of the p53 pathway, but not several other p53 target genes. Nup98 regulates p21 mRNA levels by a posttranscriptional mechanism in which a complex containing Nup98 and the p21 mRNA 3'UTR protects p21 mRNA from degradation by the exosome. An in silico approach revealed another p53 target (14-3-3σ) to be similarly regulated by Nup98. The expression of Nup98 is reduced in murine and human hepatocellular carcinomas (HCCs) and correlates with p21 expression in HCC patients. Our study elucidates a previously unrecognized function of wild-type Nup98 in regulating select p53 target genes that is distinct from the well-characterized oncogenic properties of Nup98 fusion proteins.
Collapse
MESH Headings
- 14-3-3 Proteins/genetics
- 14-3-3 Proteins/metabolism
- 3' Untranslated Regions
- ATP Binding Cassette Transporter, Subfamily B/genetics
- ATP Binding Cassette Transporter, Subfamily B/metabolism
- Animals
- Antineoplastic Agents, Phytogenic/pharmacology
- Apoptosis/drug effects
- Binding Sites
- Camptothecin/pharmacology
- Carcinoma, Hepatocellular/genetics
- Carcinoma, Hepatocellular/metabolism
- Carcinoma, Hepatocellular/pathology
- Cellular Senescence
- Cyclin-Dependent Kinase Inhibitor p21/genetics
- Cyclin-Dependent Kinase Inhibitor p21/metabolism
- Exosomes/metabolism
- Gene Expression Regulation, Neoplastic
- Hep G2 Cells
- Humans
- Liver Neoplasms/genetics
- Liver Neoplasms/metabolism
- Liver Neoplasms/pathology
- Male
- Mice
- Mice, Knockout
- Nuclear Pore Complex Proteins/genetics
- Nuclear Pore Complex Proteins/metabolism
- RNA Interference
- RNA Processing, Post-Transcriptional
- RNA Stability
- RNA, Messenger/metabolism
- Time Factors
- Transfection
- Tumor Suppressor Protein p53/genetics
- Tumor Suppressor Protein p53/metabolism
- ATP-Binding Cassette Sub-Family B Member 4
Collapse
Affiliation(s)
- Stephan Singer
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Institute of Pathology, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Ruiying Zhao
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Anthony M. Barsotti
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Anette Ouwehand
- Department of Bionanoscience, Kavli Institute of NanoScience, Delft University of Technology, 2628 CJ Delft, Netherlands
| | - Mina Fazollahi
- Department of Physics, Columbia University, New York, NY 10027, USA
- Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 100
| | - Elias Coutavas
- Laboratory of Cell Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, NY 10021, USA
| | - Kai Breuhahn
- Institute of Pathology, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Olaf Neumann
- Institute of Pathology, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Thomas Longerich
- Institute of Pathology, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Tobias Pusterla
- Division of Signal Transduction and Growth Control, DKFZ-ZMBH Alliance, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
| | - Maureen A. Powers
- Department of Cell Biology, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Keith M. Giles
- Western Australian Institute for Medical Research and Centre for Medical Research, The University of Western Australia, Perth 6000, Australia
| | - Peter J. Leedman
- Western Australian Institute for Medical Research and Centre for Medical Research, The University of Western Australia, Perth 6000, Australia
| | - Jochen Hess
- Junior Research Group Molecular Mechanisms of Head and Neck Tumors, DKFZ-ZMBH Alliance, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Department of Otolaryngology, Head and Neck Surgery, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - David Grunwald
- Department of Bionanoscience, Kavli Institute of NanoScience, Delft University of Technology, 2628 CJ Delft, Netherlands
| | - Harmen J. Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 100
| | - Robert H. Singer
- Department of Anatomy and Structural Biology, Albert Einstein College of Medicine, New York, NY 10461, USA
| | - Peter Schirmacher
- Institute of Pathology, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Carol Prives
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| |
Collapse
|
31
|
Petti AA, McIsaac RS, Ho-Shing O, Bussemaker HJ, Botstein D. Combinatorial control of diverse metabolic and physiological functions by transcriptional regulators of the yeast sulfur assimilation pathway. Mol Biol Cell 2012; 23:3008-24. [PMID: 22696679 PMCID: PMC3408426 DOI: 10.1091/mbc.e12-03-0233] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2012] [Revised: 06/04/2012] [Accepted: 06/06/2012] [Indexed: 01/03/2023] Open
Abstract
Methionine abundance affects diverse cellular functions, including cell division, redox homeostasis, survival under starvation, and oxidative stress response. Regulation of the methionine biosynthetic pathway involves three DNA-binding proteins-Met31p, Met32p, and Cbf1p. We hypothesized that there exists a "division of labor" among these proteins that facilitates coordination of methionine biosynthesis with diverse biological processes. To explore combinatorial control in this regulatory circuit, we deleted CBF1, MET31, and MET32 individually and in combination in a strain lacking methionine synthase. We followed genome-wide gene expression as these strains were starved for methionine. Using a combination of bioinformatic methods, we found that these regulators control genes involved in biological processes downstream of sulfur assimilation; many of these processes had not previously been documented as methionine dependent. We also found that the different factors have overlapping but distinct functions. In particular, Met31p and Met32p are important in regulating methionine metabolism, whereas p functions as a "generalist" transcription factor that is not specific to methionine metabolism. In addition, Met31p and Met32p appear to regulate iron-sulfur cluster biogenesis through direct and indirect mechanisms and have distinguishable target specificities. Finally, CBF1 deletion sometimes has the opposite effect on gene expression from MET31 and MET32 deletion.
Collapse
Affiliation(s)
- Allegra A. Petti
- The Lewis-Sigler Institute for Integrative Genomics, Columbia University, New York, NY 10027
| | - R. Scott McIsaac
- The Lewis-Sigler Institute for Integrative Genomics, Columbia University, New York, NY 10027
- Graduate Program in Quantitative and Computational Biology, Columbia University, New York, NY 10027
| | - Olivia Ho-Shing
- The Lewis-Sigler Institute for Integrative Genomics, Columbia University, New York, NY 10027
| | | | - David Botstein
- The Lewis-Sigler Institute for Integrative Genomics, Columbia University, New York, NY 10027
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544
| |
Collapse
|
32
|
McIsaac RS, Petti AA, Bussemaker HJ, Botstein D. Perturbation-based analysis and modeling of combinatorial regulation in the yeast sulfur assimilation pathway. Mol Biol Cell 2012; 23:2993-3007. [PMID: 22696683 PMCID: PMC3408425 DOI: 10.1091/mbc.e12-03-0232] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Here we establish the utility of a recently described perturbative method to study complex regulatory circuits in vivo. By combining rapid modulation of single TFs under physiological conditions with genome-wide expression analysis, we elucidate several novel regulatory features within the pathways of sulfur assimilation and beyond. In yeast, the pathways of sulfur assimilation are combinatorially controlled by five transcriptional regulators (three DNA-binding proteins [Met31p, Met32p, and Cbf1p], an activator [Met4p], and a cofactor [Met28p]) and a ubiquitin ligase subunit (Met30p). This regulatory system exerts combinatorial control not only over sulfur assimilation and methionine biosynthesis, but also on many other physiological functions in the cell. Recently we characterized a gene induction system that, upon the addition of an inducer, results in near-immediate transcription of a gene of interest under physiological conditions. We used this to perturb levels of single transcription factors during steady-state growth in chemostats, which facilitated distinction of direct from indirect effects of individual factors dynamically through quantification of the subsequent changes in genome-wide patterns of gene expression. We were able to show directly that Cbf1p acts sometimes as a repressor and sometimes as an activator. We also found circumstances in which Met31p/Met32p function as repressors, as well as those in which they function as activators. We elucidated and numerically modeled feedback relationships among the regulators, notably feedforward regulation of Met32p (but not Met31p) by Met4p that generates dynamic differences in abundance that can account for the differences in function of these two proteins despite their identical binding sites.
Collapse
Affiliation(s)
- R Scott McIsaac
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| | | | | | | |
Collapse
|
33
|
Slattery M, Riley T, Liu P, Abe N, Gomez-Alcala P, Dror I, Zhou T, Rohs R, Honig B, Bussemaker HJ, Mann RS. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 2012; 147:1270-82. [PMID: 22153072 DOI: 10.1016/j.cell.2011.10.053] [Citation(s) in RCA: 366] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2011] [Revised: 08/19/2011] [Accepted: 10/06/2011] [Indexed: 11/30/2022]
Abstract
Members of transcription factor families typically have similar DNA binding specificities yet execute unique functions in vivo. Transcription factors often bind DNA as multiprotein complexes, raising the possibility that complex formation might modify their DNA binding specificities. To test this hypothesis, we developed an experimental and computational platform, SELEX-seq, that can be used to determine the relative affinities to any DNA sequence for any transcription factor complex. Applying this method to all eight Drosophila Hox proteins, we show that they obtain novel recognition properties when they bind DNA with the dimeric cofactor Extradenticle-Homothorax (Exd). Exd-Hox specificities group into three main classes that obey Hox gene collinearity rules and DNA structure predictions suggest that anterior and posterior Hox proteins prefer DNA sequences with distinct minor groove topographies. Together, these data suggest that emergent DNA recognition properties revealed by interactions with cofactors contribute to transcription factor specificities in vivo.
Collapse
Affiliation(s)
- Matthew Slattery
- Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West 168(th) Street, HHSC 1104, New York, NY 10032, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Shechtman CF, Henneberry AL, Seimon TA, Tinkelenberg AH, Wilcox LJ, Lee E, Fazlollahi M, Munkacsi AB, Bussemaker HJ, Tabas I, Sturley SL. Loss of subcellular lipid transport due to ARV1 deficiency disrupts organelle homeostasis and activates the unfolded protein response. J Biol Chem 2011; 286:11951-9. [PMID: 21266578 DOI: 10.1074/jbc.m110.215038] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The ARV1-encoded protein mediates sterol transport from the endoplasmic reticulum (ER) to the plasma membrane. Yeast ARV1 mutants accumulate multiple lipids in the ER and are sensitive to pharmacological modulators of both sterol and sphingolipid metabolism. Using fluorescent and electron microscopy, we demonstrate sterol accumulation, subcellular membrane expansion, elevated lipid droplet formation, and vacuolar fragmentation in ARV1 mutants. Motif-based regression analysis of ARV1 deletion transcription profiles indicates activation of Hac1p, an integral component of the unfolded protein response (UPR). Accordingly, we show constitutive splicing of HAC1 transcripts, induction of a UPR reporter, and elevated expression of UPR targets in ARV1 mutants. IRE1, encoding the unfolded protein sensor in the ER lumen, exhibits a lethal genetic interaction with ARV1, indicating a viability requirement for the UPR in cells lacking ARV1. Surprisingly, ARV1 mutants expressing a variant of Ire1p defective in sensing unfolded proteins are viable. Moreover, these strains also exhibit constitutive HAC1 splicing that interacts with DTT-mediated perturbation of protein folding. These data suggest that a component of UPR induction in arv1Δ strains is distinct from protein misfolding. Decreased ARV1 expression in murine macrophages also results in UPR induction, particularly up-regulation of activating transcription factor-4, CHOP (C/EBP homologous protein), and apoptosis. Cholesterol loading or inhibition of cholesterol esterification further elevated CHOP expression in ARV1 knockdown cells. Thus, loss or down-regulation of ARV1 disturbs membrane and lipid homeostasis, resulting in a disruption of ER integrity, one consequence of which is induction of the UPR.
Collapse
Affiliation(s)
- Caryn F Shechtman
- Institute of Human Nutrition, Department of Pediatrics, Columbia University Medical Center, New York, NY 10032, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Lu XJ, Olson WK, Bussemaker HJ. The RNA backbone plays a crucial role in mediating the intrinsic stability of the GpU dinucleotide platform and the GpUpA/GpA miniduplex. Nucleic Acids Res 2010; 38:4868-76. [PMID: 20223772 PMCID: PMC2919703 DOI: 10.1093/nar/gkq155] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The side-by-side interactions of nucleobases contribute to the organization of RNA, forming the planar building blocks of helices and mediating chain folding. Dinucleotide platforms, formed by side-by-side pairing of adjacent bases, frequently anchor helices against loops. Surprisingly, GpU steps account for over half of the dinucleotide platforms observed in RNA-containing structures. Why GpU should stand out from other dinucleotides in this respect is not clear from the single well-characterized H-bond found between the guanine N2 and the uracil O4 groups. Here, we describe how an RNA-specific H-bond between O2′(G) and O2P(U) adds to the stability of the GpU platform. Moreover, we show how this pair of oxygen atoms forms an out-of-plane backbone ‘edge’ that is specifically recognized by a non-adjacent guanine in over 90% of the cases, leading to the formation of an asymmetric miniduplex consisting of ‘complementary’ GpUpA and GpA subunits. Together, these five nucleotides constitute the conserved core of the well-known loop-E motif. The backbone-mediated intrinsic stabilities of the GpU dinucleotide platform and the GpUpA/GpA miniduplex plausibly underlie observed evolutionary constraints on base identity. We propose that they may also provide a reason for the extreme conservation of GpU observed at most 5′-splice sites.
Collapse
Affiliation(s)
- Xiang-Jun Lu
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA.
| | | | | |
Collapse
|
36
|
Boorsma A, Lu XJ, Zakrzewska A, Klis FM, Bussemaker HJ. Inferring condition-specific modulation of transcription factor activity in yeast through regulon-based analysis of genomewide expression. PLoS One 2008; 3:e3112. [PMID: 18769540 PMCID: PMC2518834 DOI: 10.1371/journal.pone.0003112] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2008] [Accepted: 08/07/2008] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND A key goal of systems biology is to understand how genomewide mRNA expression levels are controlled by transcription factors (TFs) in a condition-specific fashion. TF activity is frequently modulated at the post-translational level through ligand binding, covalent modification, or changes in sub-cellular localization. In this paper, we demonstrate how prior information about regulatory network connectivity can be exploited to infer condition-specific TF activity as a hidden variable from the genomewide mRNA expression pattern in the yeast Saccharomyces cerevisiae. METHODOLOGY/PRINCIPAL FINDINGS We first validate experimentally that by scoring differential expression at the level of gene sets or "regulons" comprised of the putative targets of a TF, we can accurately predict modulation of TF activity at the post-translational level. Next, we create an interactive database of inferred activities for a large number of TFs across a large number of experimental conditions in S. cerevisiae. This allows us to perform TF-centric analysis of the yeast regulatory network. CONCLUSIONS/SIGNIFICANCE We analyze the degree to which the mRNA expression level of each TF is predictive of its regulatory activity. We also organize TFs into "co-modulation networks" based on their inferred activity profile across conditions, and find that this reveals functional and mechanistic relationships. Finally, we present evidence that the PAC and rRPE motifs antagonize TBP-dependent regulation, and function as core promoter elements governed by the transcription regulator NC2. Regulon-based monitoring of TF activity modulation is a powerful tool for analyzing regulatory network function that should be applicable in other organisms. Tools and results are available online at http://bussemakerlab.org/RegulonProfiler/.
Collapse
Affiliation(s)
- André Boorsma
- Swammerdam Institute for Life Sciences, University of Amsterdam, BioCentrum Amsterdam, Amsterdam, The Netherlands
| | - Xiang-Jun Lu
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
| | - Anna Zakrzewska
- Swammerdam Institute for Life Sciences, University of Amsterdam, BioCentrum Amsterdam, Amsterdam, The Netherlands
| | - Frans M. Klis
- Swammerdam Institute for Life Sciences, University of Amsterdam, BioCentrum Amsterdam, Amsterdam, The Netherlands
| | - Harmen J. Bussemaker
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
37
|
Ward LD, Bussemaker HJ. Predicting functional transcription factor binding through alignment-free and affinity-based analysis of orthologous promoter sequences. ACTA ACUST UNITED AC 2008; 24:i165-71. [PMID: 18586710 PMCID: PMC2718632 DOI: 10.1093/bioinformatics/btn154] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Motivation: The identification of transcription factor (TF) binding sites and the regulatory circuitry that they define is currently an area of intense research. Data from whole-genome chromatin immunoprecipitation (ChIP–chip), whole-genome expression microarrays, and sequencing of multiple closely related genomes have all proven useful. By and large, existing methods treat the interpretation of functional data as a classification problem (between bound and unbound DNA), and the analysis of comparative data as a problem of local alignment (to recover phylogenetic footprints of presumably functional elements). Both of these approaches suffer from the inability to model and detect low-affinity binding sites, which have recently been shown to be abundant and functional. Results: We have developed a method that discovers functional regulatory targets of TFs by predicting the total affinity of each promoter for those factors and then comparing that affinity across orthologous promoters in closely related species. At each promoter, we consider the minimum affinity among orthologs to be the fraction of the affinity that is functional. Because we calculate the affinity of the entire promoter, our method is independent of local alignment. By comparing with functional annotation information and gene expression data in Saccharomyces cerevisiae, we have validated that this biophysically motivated use of evolutionary conservation gives rise to dramatic improvement in prediction of regulatory connectivity and factor–factor interactions compared to the use of a single genome. We propose novel biological functions for several yeast TFs, including the factors Snt2 and Stb4, for which no function has been reported. Our affinity-based approach towards comparative genomics may allow a more quantitative analysis of the principles governing the evolution of non-coding DNA. Availability: The MatrixREDUCE software package is available from http://www.bussemakerlab.org/software/MatrixREDUCE Contact:Harmen.Bussemaker@columbia.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lucas D Ward
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | | |
Collapse
|
38
|
de Wit E, Braunschweig U, Greil F, Bussemaker HJ, van Steensel B. Global chromatin domain organization of the Drosophila genome. PLoS Genet 2008; 4:e1000045. [PMID: 18369463 PMCID: PMC2274884 DOI: 10.1371/journal.pgen.1000045] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2007] [Accepted: 02/29/2008] [Indexed: 01/30/2023] Open
Abstract
In eukaryotes, neighboring genes can be packaged together in specific chromatin structures that ensure their coordinated expression. Examples of such multi-gene chromatin domains are well-documented, but a global view of the chromatin organization of eukaryotic genomes is lacking. To systematically identify multi-gene chromatin domains, we constructed a compendium of genome-scale binding maps for a broad panel of chromatin-associated proteins in Drosophila melanogaster. Next, we computationally analyzed this compendium for evidence of multi-gene chromatin domains using a novel statistical segmentation algorithm. We find that at least 50% of all fly genes are organized into chromatin domains, which often consist of dozens of genes. The domains are characterized by various known and novel combinations of chromatin proteins. The genes in many of the domains are coregulated during development and tend to have similar biological functions. Furthermore, during evolution fewer chromosomal rearrangements occur inside chromatin domains than outside domains. Our results indicate that a substantial portion of the Drosophila genome is packaged into functionally coherent, multi-gene chromatin domains. This has broad mechanistic implications for gene regulation and genome evolution. Genes are packaged into chromatin by a variety of specialized proteins. Many different types of chromatin exist, and each may regulate gene expression in different ways. It was previously observed that neighboring genes are sometimes packaged together into a single type of chromatin, which can facilitate their coordinated regulation. However, it has been unclear whether such multi-gene chromatin domains are exceptional, or may occur more frequently. Here, we report a systematic analysis of genome-wide binding patterns of a large set of chromatin components in the fruit fly Drosophila melanogaster. Strikingly, we find that at least 50% of all genes in this organism are packaged together with several of their neighboring genes into a single type of chromatin. Each chromatin domain can include dozens of genes and can be made up of different combinations of chromatin proteins. We show that genes in each domain often have similar functions and are coordinately expressed during development. Moreover, we find that many of these multi-gene domains have been kept intact during evolution, indicating that they are important functional units. In summary, multi-gene chromatin domains are much more common than previously thought, and they are likely to play important roles in the orchestration of gene expression.
Collapse
Affiliation(s)
- Elzo de Wit
- Department of Molecular Biology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Ulrich Braunschweig
- Department of Molecular Biology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Frauke Greil
- Department of Molecular Biology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Harmen J. Bussemaker
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
- * E-mail: (HJB); (BvS)
| | - Bas van Steensel
- Department of Molecular Biology, Netherlands Cancer Institute, Amsterdam, The Netherlands
- * E-mail: (HJB); (BvS)
| |
Collapse
|
39
|
Foat BC, Tepper RG, Bussemaker HJ. TransfactomeDB: a resource for exploring the nucleotide sequence specificity and condition-specific regulatory activity of trans-acting factors. Nucleic Acids Res 2007; 36:D125-31. [PMID: 17947326 PMCID: PMC2238954 DOI: 10.1093/nar/gkm828] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Accurate and comprehensive information about the nucleotide sequence specificity of trans-acting factors (TFs) is essential for computational and experimental analyses of gene regulatory networks. We present the Yeast Transfactome Database, a repository of sequence specificity models and condition-specific regulatory activities for a large number of DNA- and RNA-binding proteins in Saccharomyces cerevisiae. The sequence specificities in TransfactomeDB, represented as position-specific affinity matrices (PSAMs), are directly estimated from genomewide measurements of TF-binding using our previously published MatrixREDUCE algorithm, which is based on a biophysical model. For each mRNA expression profile in the NCBI Gene Expression Omnibus, we used sequence-based regression analysis to estimate the post-translational regulatory activity of each TF for which a PSAM is available. The trans-factor activity profiles across multiple experiments available in TransfactomeDB allow the user to explore potential regulatory roles of hundreds of TFs in any of thousands of microarray experiments. Our resource is freely available at http://bussemakerlab.org/TransfactomeDB/
Collapse
Affiliation(s)
- Barrett C Foat
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA
| | | | | |
Collapse
|
40
|
Abstract
Background The genomewide pattern of changes in mRNA expression measured using DNA
microarrays is typically a complex superposition of the response of multiple
regulatory pathways to changes in the environment of the cells. The use of prior
information, either about the function of the protein encoded by each gene, or
about the physical interactions between regulatory factors and the sequences
controlling its expression, has emerged as a powerful approach for dissecting
complex transcriptional responses. Results We review two different approaches for combining the noisy expression levels of
multiple individual genes into robust pathway-level differential expression
scores. The first is based on a comparison between the distribution of expression
levels of genes within a predefined gene set and those of all other genes in the
genome. The second starts from an estimate of the strength of genomewide
regulatory network connectivities based on sequence information or direct
measurements of protein-DNA interactions, and uses regression analysis to estimate
the activity of gene regulatory pathways. The statistical methods used are
explained in detail. Conclusion By avoiding the thresholding of individual genes, pathway-level analysis of
differential expression based on prior information can be considerably more
sensitive to subtle changes in gene expression than gene-level analysis. The
methods are technically straightforward and yield results that are easily
interpretable, both biologically and statistically.
Collapse
Affiliation(s)
- Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, MC
2441, New York, NY 10027, USA
- Center for Computational Biology and Bioinformatics, Columbia University, New
York, NY, USA
| | - Lucas D Ward
- Department of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, MC
2441, New York, NY 10027, USA
| | - Andre Boorsma
- Swammerdam Institute for Life Sciences, University of Amsterdam, BioCentrum
Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands
| |
Collapse
|
41
|
Abstract
Various algorithms are available for predicting mRNA expression and modeling gene regulatory processes. They differ in whether they rely on the existence of modules of coregulated genes or build a model that applies to all genes, whether they represent regulatory activities as hidden variables or as mRNA levels, and whether they implicitly or explicitly model the complex cis-regulatory logic of multiple interacting transcription factors binding the same DNA. The fact that functional genomics data of different types reflect the same molecular processes provides a natural strategy for integrative computational analysis. One promising avenue toward an accurate and comprehensive model of gene regulation combines biophysical modeling of the interactions among proteins, DNA, and RNA with the use of large-scale functional genomics data to estimate regulatory network connectivity and activity parameters. As the ability of these models to represent complex cis-regulatory logic increases, the need for approaches based on cross-species conservation may diminish.
Collapse
Affiliation(s)
- Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.
| | | | | |
Collapse
|
42
|
Halasz G, van Batenburg MF, Perusse J, Hua S, Lu XJ, White KP, Bussemaker HJ. Detecting transcriptionally active regions using genomic tiling arrays. Genome Biol 2007; 7:R59. [PMID: 16859498 PMCID: PMC1779562 DOI: 10.1186/gb-2006-7-7-r59] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2005] [Revised: 01/05/2006] [Accepted: 07/05/2006] [Indexed: 11/10/2022] Open
Abstract
We have developed a method for interpreting genomic tiling array data, implemented as the program TranscriptionDetector. Probed loci expressed above background are identified by combining replicates in a way that makes minimal assumptions about the data. We performed medium-resolution Anopheles gambiae tiling array experiments and found extensive transcription of both coding and non-coding regions. Our method also showed improved detection of transcriptional units when applied to high-density tiling array data for ten human chromosomes.
Collapse
Affiliation(s)
- Gabor Halasz
- Department of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, New York, NY, 10027 USA
- Integrated Program in Cellular, Molecular and Biophysical Studies, Columbia University, 630 w. 168Street, New York, NY, 10032 USA
| | - Marinus F van Batenburg
- Department of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, New York, NY, 10027 USA
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, Meibergdreef 15, 1105 AZ Amsterdam, The Netherlands
| | - Joelle Perusse
- Department of Genetics, Yale University School of Medicine, 333 Cedar Street, PO Box 208005, New Haven, CT, 06520-8005, USA
| | - Sujun Hua
- Department of Genetics, Yale University School of Medicine, 333 Cedar Street, PO Box 208005, New Haven, CT, 06520-8005, USA
| | - Xiang-Jun Lu
- Department of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, New York, NY, 10027 USA
| | - Kevin P White
- Department of Genetics, Yale University School of Medicine, 333 Cedar Street, PO Box 208005, New Haven, CT, 06520-8005, USA
- Department of Ecology and Evolutionary Biology, Yale University, 165 Prospect Street, PO Box 208106, New Haven, CT, 06250-8106, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, New York, NY, 10027 USA
- Center for Computational Biology and Bioinformatics, Columbia University, 1130 St. Nicholas Avenue, New York, NY, USA
| |
Collapse
|
43
|
Greil F, de Wit E, Bussemaker HJ, van Steensel B. HP1 controls genomic targeting of four novel heterochromatin proteins in Drosophila. EMBO J 2007; 26:741-51. [PMID: 17255947 PMCID: PMC1794385 DOI: 10.1038/sj.emboj.7601527] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2006] [Accepted: 11/23/2006] [Indexed: 01/08/2023] Open
Abstract
Heterochromatin is important for the maintenance of genome stability and regulation of gene expression; yet our knowledge of heterochromatin structure and function is incomplete. We identified four novel Drosophila heterochromatin proteins (HPs). Three of these proteins (HP3, HP4 and HP5) interact directly with HP1, whereas HP6 in turn binds to each of these three proteins. Immunofluorescence microscopy and genome-wide mapping of in vivo binding sites shows that all four proteins are components of heterochromatin. Depletion of HP1 causes redistribution of all four proteins, indicating that HP1 is essential for their heterochromatic targeting. Finally, mutants of HP4 and HP5 are dominant suppressors of position effect variegation, demonstrating their importance in heterochromatic gene silencing. These results indicate that HP1 acts as a docking platform for several mediator proteins that contribute to heterochromatin function.
Collapse
Affiliation(s)
- Frauke Greil
- Department of Molecular Biology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Elzo de Wit
- Department of Molecular Biology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Harmen J Bussemaker
- Department of Biological Sciences and Center for Computational Biology and Bioinformatics, Columbia University, New York, NY, USA
| | - Bas van Steensel
- Department of Molecular Biology, Netherlands Cancer Institute, Amsterdam, The Netherlands
- Department of Molecular Biology, Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam, 1066 CX Amsterdam, The Netherlands. Tel.: +31 20 512 2040; Fax: +31 20 669 1383; E-mail:
| |
Collapse
|
44
|
Abstract
MOTIVATION Regulation of gene expression by a transcription factor requires physical interaction between the factor and the DNA, which can be described by a statistical mechanical model. Based on this model, we developed the MatrixREDUCE algorithm, which uses genome-wide occupancy data for a transcription factor (e.g. ChIP-chip) and associated nucleotide sequences to discover the sequence-specific binding affinity of the transcription factor. Advantages of our approach are that the information for all probes on the microarray is efficiently utilized because there is no need to delineate "bound" and "unbound" sequences, and that, unlike information content-based methods, it does not require a background sequence model. RESULTS We validated the performance of MatrixREDUCE by inferring the sequence-specific binding affinities for several transcription factors in S. cerevisiae and comparing the results with three other independent sources of transcription factor sequence-specific affinity information: (i) experimental measurement of transcription factor binding affinities for specific oligonucleotides, (ii) reporter gene assays for promoters with systematically mutated binding sites, and (iii) relative binding affinities obtained by modeling transcription factor-DNA interactions based on co-crystal structures of transcription factors bound to DNA substrates. We show that transcription factor binding affinities inferred by MatrixREDUCE are in good agreement with all three validating methods. AVAILABILITY MatrixREDUCE source code is freely available for non-commercial use at http://www.bussemakerlab.org/. The software runs on Linux, Unix, and Mac OS X.
Collapse
Affiliation(s)
- Barrett C Foat
- Department of Biological Sciences, Columbia University New York, NY 10027, USA
| | | | | |
Collapse
|
45
|
Moorman C, Sun LV, Wang J, de Wit E, Talhout W, Ward LD, Greil F, Lu XJ, White KP, Bussemaker HJ, van Steensel B. Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster. Proc Natl Acad Sci U S A 2006; 103:12027-32. [PMID: 16880385 PMCID: PMC1567692 DOI: 10.1073/pnas.0605003103] [Citation(s) in RCA: 166] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2006] [Indexed: 11/18/2022] Open
Abstract
Regulation of gene expression is a highly complex process that requires the concerted action of many proteins, including sequence-specific transcription factors, cofactors, and chromatin proteins. In higher eukaryotes, the interplay between these proteins and their interactions with the genome still is poorly understood. We systematically mapped the in vivo binding sites of seven transcription factors with diverse physiological functions, five cofactors, and two heterochromatin proteins at approximately 1-kb resolution in a 2.9 Mb region of the Drosophila melanogaster genome. Surprisingly, all tested transcription factors and cofactors show strongly overlapping localization patterns, and the genome contains many "hotspots" that are targeted by all of these proteins. Several control experiments show that the strong overlap is not an artifact of the techniques used. Colocalization hotspots are 1-5 kb in size, spaced on average by approximately 50 kb, and preferentially located in regions of active transcription. We provide evidence that protein-protein interactions play a role in the hotspot association of some transcription factors. Colocalization hotspots constitute a previously uncharacterized type of feature in the genome of Drosophila, and our results provide insights into the general targeting mechanisms of transcription regulators in a higher eukaryote.
Collapse
Affiliation(s)
- Celine Moorman
- *Department of Molecular Biology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Ling V. Sun
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520; and
| | - Junbai Wang
- Department of Biological Sciences and Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10027
| | - Elzo de Wit
- *Department of Molecular Biology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Wendy Talhout
- *Department of Molecular Biology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Lucas D. Ward
- Department of Biological Sciences and Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10027
| | - Frauke Greil
- *Department of Molecular Biology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Xiang-Jun Lu
- Department of Biological Sciences and Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10027
| | - Kevin P. White
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520; and
| | - Harmen J. Bussemaker
- Department of Biological Sciences and Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10027
| | - Bas van Steensel
- *Department of Molecular Biology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| |
Collapse
|
46
|
|
47
|
Foat BC, Houshmandi SS, Olivas WM, Bussemaker HJ. Profiling condition-specific, genome-wide regulation of mRNA stability in yeast. Proc Natl Acad Sci U S A 2005; 102:17675-80. [PMID: 16317069 PMCID: PMC1295595 DOI: 10.1073/pnas.0503803102] [Citation(s) in RCA: 131] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The steady-state abundance of an mRNA is determined by the balance between transcription and decay. Although regulation of transcription has been well studied both experimentally and computationally, regulation of transcript stability has received little attention. We developed an algorithm, MatrixREDUCE, that discovers the position-specific affinity matrices for unknown RNA-binding factors and infers their condition-specific activities, using only genomic sequence data and steady-state mRNA expression data as input. We identified and computationally characterized the binding sites for six mRNA stability regulators in Saccharomyces cerevisiae, which include two members of the Pumilio-homology domain (Puf) family of RNA-binding proteins, Puf3p and Puf4p. We provide computational and experimental evidence that regulation of mRNA stability by these factors is modulated in response to a variety of environmental stimuli.
Collapse
Affiliation(s)
- Barrett C Foat
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | | | | | | |
Collapse
|
48
|
Boorsma A, Foat BC, Vis D, Klis F, Bussemaker HJ. T-profiler: scoring the activity of predefined groups of genes using gene expression data. Nucleic Acids Res 2005; 33:W592-5. [PMID: 15980543 PMCID: PMC1160244 DOI: 10.1093/nar/gki484] [Citation(s) in RCA: 164] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
One of the key challenges in the analysis of gene expression data is how to relate the expression level of individual genes to the underlying transcriptional programs and cellular state. Here we describe T-profiler, a tool that uses the t-test to score changes in the average activity of predefined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif or location on the same chromosome. If desired, an iterative procedure can be used to select a single, optimal representative from sets of overlapping gene groups. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters. Currently, gene expression data from Saccharomyces cerevisiae and Candida albicans are supported. Users can upload their microarray data for analysis on the web at .
Collapse
Affiliation(s)
| | - Barrett C. Foat
- Department of Biological Sciences, Columbia UniversityNew York, NY 10027, USA
| | | | | | - Harmen J. Bussemaker
- Department of Biological Sciences, Columbia UniversityNew York, NY 10027, USA
- Center for Computational Biology and Bioinformatics, Columbia UniversityNew York, NY 10032, USA
- To whom correspondence should be addressed. Tel: +1 212 854 9932; Fax: +1 212 865 8246;
| |
Collapse
|
49
|
Richards S, Liu Y, Bettencourt BR, Hradecky P, Letovsky S, Nielsen R, Thornton K, Hubisz MJ, Chen R, Meisel RP, Couronne O, Hua S, Smith MA, Zhang P, Liu J, Bussemaker HJ, van Batenburg MF, Howells SL, Scherer SE, Sodergren E, Matthews BB, Crosby MA, Schroeder AJ, Ortiz-Barrientos D, Rives CM, Metzker ML, Muzny DM, Scott G, Steffen D, Wheeler DA, Worley KC, Havlak P, Durbin KJ, Egan A, Gill R, Hume J, Morgan MB, Miner G, Hamilton C, Huang Y, Waldron L, Verduzco D, Clerc-Blankenburg KP, Dubchak I, Noor MAF, Anderson W, White KP, Clark AG, Schaeffer SW, Gelbart W, Weinstock GM, Gibbs RA. Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. Genome Res 2005; 15:1-18. [PMID: 15632085 PMCID: PMC540289 DOI: 10.1101/gr.3059305] [Citation(s) in RCA: 396] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 25-55 million years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences between the species--but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.
Collapse
Affiliation(s)
- Stephen Richards
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston Texas 77030, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Stolc V, Gauhar Z, Mason C, Halasz G, van Batenburg MF, Rifkin SA, Hua S, Herreman T, Tongprasit W, Barbano PE, Bussemaker HJ, White KP. A gene expression map for the euchromatic genome of Drosophila melanogaster. Science 2004; 306:655-60. [PMID: 15499012 DOI: 10.1126/science.1101312] [Citation(s) in RCA: 236] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
We used a maskless photolithography method to produce DNA oligonucleotide microarrays with unique probe sequences tiled throughout the genome of Drosophila melanogaster and across predicted splice junctions. RNA expression of protein coding and nonprotein coding sequences was determined for each major stage of the life cycle, including adult males and females. We detected transcriptional activity for 93% of annotated genes and RNA expression for 41% of the probes in intronic and intergenic sequences. Comparison to genome-wide RNA interference data and to gene annotations revealed distinguishable levels of expression for different classes of genes and higher levels of expression for genes with essential cellular functions. Differential splicing was observed in about 40% of predicted genes, and 5440 previously unknown splice forms were detected. Genes within conserved regions of synteny with D. pseudoobscura had highly correlated expression; these regions ranged in length from 10 to 900 kilobase pairs. The expressed intergenic and intronic sequences are more likely to be evolutionarily conserved than nonexpressed ones, and about 15% of them appear to be developmentally regulated. Our results provide a draft expression map for the entire nonrepetitive genome, which reveals a much more extensive and diverse set of expressed sequences than was previously predicted.
Collapse
Affiliation(s)
- Viktor Stolc
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06520, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|