1
|
Kang CK, Kim AR. Deep molecular learning of transcriptional control of a synthetic CRE enhancer and its variants. iScience 2024; 27:108747. [PMID: 38222110 PMCID: PMC10784702 DOI: 10.1016/j.isci.2023.108747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/29/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024] Open
Abstract
Massively parallel reporter assay measures transcriptional activities of various cis-regulatory modules (CRMs) in a single experiment. We developed a thermodynamic computational model framework that calculates quantitative levels of gene expression directly from regulatory DNA sequences. Using the framework, we investigated the molecular mechanisms of cis-regulatory mutations of a synthetic enhancer that cause abnormal gene expression. We found that, in a human cell line, competitive binding between family transcription factors (TFs) with slightly different binding preferences significantly increases the accuracy of recapitulating the transcriptional effects of thousands of single- or multi-mutations. We also discovered that even if various harmful mutations occurred in an activator binding site, CRM could stably maintain or even increase gene expression through a certain form of competitive binding between family TFs. These findings enhance understanding the effect of SNPs and indels on CRMs and would help building robust custom-designed CRMs for biologics production and gene therapy.
Collapse
Affiliation(s)
- Chan-Koo Kang
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- School of Applied Artificial Intelligence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| |
Collapse
|
2
|
Martin V, Zhuang F, Zhang Y, Pinheiro K, Gordân R. High-throughput data and modeling reveal insights into the mechanisms of cooperative DNA-binding by transcription factor proteins. Nucleic Acids Res 2023; 51:11600-11612. [PMID: 37889068 PMCID: PMC10681739 DOI: 10.1093/nar/gkad872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 09/21/2023] [Accepted: 10/05/2023] [Indexed: 10/28/2023] Open
Abstract
Cooperative DNA-binding by transcription factor (TF) proteins is critical for eukaryotic gene regulation. In the human genome, many regulatory regions contain TF-binding sites in close proximity to each other, which can facilitate cooperative interactions. However, binding site proximity does not necessarily imply cooperative binding, as TFs can also bind independently to each of their neighboring target sites. Currently, the rules that drive cooperative TF binding are not well understood. In addition, it is oftentimes difficult to infer direct TF-TF cooperativity from existing DNA-binding data. Here, we show that in vitro binding assays using DNA libraries of a few thousand genomic sequences with putative cooperative TF-binding events can be used to develop accurate models of cooperativity and to gain insights into cooperative binding mechanisms. Using factors ETS1 and RUNX1 as our case study, we show that the distance and orientation between ETS1 sites are critical determinants of cooperative ETS1-ETS1 binding, while cooperative ETS1-RUNX1 interactions show more flexibility in distance and orientation and can be accurately predicted based on the affinity and sequence/shape features of the binding sites. The approach described here, combining custom experimental design with machine-learning modeling, can be easily applied to study the cooperative DNA-binding patterns of any TFs.
Collapse
Affiliation(s)
- Vincentius Martin
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
| | - Farica Zhuang
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
| | - Yuning Zhang
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
- Program in Computational Biology & Bioinformatics, Durham, NC 27708, USA
| | - Kyle Pinheiro
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
| | - Raluca Gordân
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
- Department of Biostatistics & Bioinformatics, Department of Molecular Genetics and Microbiology, Department of Cell Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
3
|
Luo K, Zhong J, Safi A, Hong LK, Tewari AK, Song L, Reddy TE, Ma L, Crawford GE, Hartemink AJ. Profiling the quantitative occupancy of myriad transcription factors across conditions by modeling chromatin accessibility data. Genome Res 2022; 32:1183-1198. [PMID: 35609992 PMCID: PMC9248881 DOI: 10.1101/gr.272203.120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 05/06/2022] [Indexed: 11/24/2022]
Abstract
Over a thousand different transcription factors (TFs) bind with varying occupancy across the human genome. Chromatin immunoprecipitation (ChIP) can assay occupancy genome-wide, but only one TF at a time, limiting our ability to comprehensively observe the TF occupancy landscape, let alone quantify how it changes across conditions. We developed TF occupancy profiler (TOP), a Bayesian hierarchical regression framework, to profile genome-wide quantitative occupancy of numerous TFs using data from a single chromatin accessibility experiment (DNase- or ATAC-seq). TOP is supervised, and its hierarchical structure allows it to predict the occupancy of any sequence-specific TF, even those never assayed with ChIP. We used TOP to profile the quantitative occupancy of hundreds of sequence-specific TFs at sites throughout the genome and examined how their occupancies changed in multiple contexts: in approximately 200 human cell types, through 12 h of exposure to different hormones, and across the genetic backgrounds of 70 individuals. TOP enables cost-effective exploration of quantitative changes in the landscape of TF binding.
Collapse
Affiliation(s)
- Kaixuan Luo
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Computer Science, Duke University, Durham, North Carolina 27708, USA
- Department of Human Genetics, The University of Chicago, Chicago, Illinois 60637, USA
| | - Jianling Zhong
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Computer Science, Duke University, Durham, North Carolina 27708, USA
| | - Alexias Safi
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Linda K Hong
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Alok K Tewari
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - Lingyun Song
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Timothy E Reddy
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Biostatistics and Bioinformatics, Durham, North Carolina 27710, USA
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA
- Department of Biomedical Engineering, Duke University, Durham, North Carolina 27708, USA
| | - Li Ma
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Department of Statistical Science, Duke University, Durham, North Carolina 27708, USA
| | - Gregory E Crawford
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Alexander J Hartemink
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Computer Science, Duke University, Durham, North Carolina 27708, USA
- Department of Biology, Duke University, Durham, North Carolina 27708, USA
| |
Collapse
|
4
|
Villaluenga JP, Cao-García FJ. Cooperative kinetics of ligand binding to linear polymers. Comput Struct Biotechnol J 2022; 20:521-533. [PMID: 35495112 PMCID: PMC9019704 DOI: 10.1016/j.csbj.2021.12.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/27/2021] [Accepted: 12/30/2021] [Indexed: 11/29/2022] Open
Abstract
Cooperative kinetic equation for large ligands binding to long polymers. Cooperativity in general affects binding and release rates. Appropriate counting of the available binding sites for a ligand to a linear polymer. Positive cooperativity increases polymer coverage by the ligand. Large ligand size reduces cooperativity effects.
Ligands change the chemical and mechanical properties of polymers. In particular, single strand binding protein (SSB) non-specifically bounds to single-stranded DNA (ssDNA), modifying the ssDNA stiffness and the DNA replication rate, as recently measured with single-molecule techniques. SSB is a large ligand presenting cooperativity in some of its binding modes. We aim to develop an accurate kinetic model for the cooperative binding kinetics of large ligands. Cooperativity accounts for the changes in the affinity of a ligand to the polymer due to the presence of another bound ligand. Large ligands, attaching to several binding sites, require a detailed counting of the available binding possibilities. This counting has been done by McGhee and von Hippel to obtain the equilibrium state of the ligands-polymer complex. The same procedure allows to obtain the kinetic equations for the cooperative binding of ligands to long polymers, for all ligand sizes. Here, we also derive approximate cooperative kinetic equations in the large ligand limit, at the leading and next-to-leading orders. We found cooperativity is negligible at the leading-order, and appears at the next-to-leading order. Positive cooperativity (increased affinity) can be originated by increased binding affinity or by decreased release affinity, implying different kinetics. Nevertheless, the equilibrium state is independent of the origin of cooperativity and only depends on the overall increase in affinity. Next-to-leading approximation is found to be accurate, particularly for small cooperativity. These results allow to understand and characterize relevant ligand binding processes, as the binding kinetics of SSB to ssDNA, which has been reported to affect the DNA replication rate for several SSB-polymerase pairs.
Collapse
Affiliation(s)
- Juan P.G. Villaluenga
- Departamento de Estructura de la Materia, Física Térmica y Electrónica, Universidad Complutense de Madrid, Plaza de Ciencias, 1, 28040 Madrid, Spain
- Corresponding author.
| | - Francisco Javier Cao-García
- Departamento de Estructura de la Materia, Física Térmica y Electrónica, Universidad Complutense de Madrid, Plaza de Ciencias, 1, 28040 Madrid, Spain
- Instituto Madrileño de Estudios Avanzados en Nanociencia, IMDEA Nanociencia, Calle Faraday, 9, 28049 Madrid, Spain
| |
Collapse
|
5
|
Waters CT, Gisselbrecht SS, Sytnikova YA, Cafarelli TM, Hill DE, Bulyk ML. Quantitative-enhancer-FACS-seq (QeFS) reveals epistatic interactions among motifs within transcriptional enhancers in developing Drosophila tissue. Genome Biol 2021; 22:348. [PMID: 34930411 PMCID: PMC8686523 DOI: 10.1186/s13059-021-02574-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 12/10/2021] [Indexed: 11/16/2022] Open
Abstract
Understanding the contributions of transcription factor DNA binding sites to transcriptional enhancers is a significant challenge. We developed Quantitative enhancer-FACS-Seq for highly parallel quantification of enhancer activities from a genomically integrated reporter in Drosophila melanogaster embryos. We investigate the contributions of the DNA binding motifs of four poorly characterized TFs to the activities of twelve embryonic mesodermal enhancers. We measure quantitative changes in enhancer activity and discover a range of epistatic interactions among the motifs, both synergistic and alleviating. We find that understanding the regulatory consequences of TF binding motifs requires that they be investigated in combination across enhancer contexts.
Collapse
Affiliation(s)
- Colin T Waters
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, 02138, USA
| | - Stephen S Gisselbrecht
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Yuliya A Sytnikova
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Tiziana M Cafarelli
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - David E Hill
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA.
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, 02138, USA.
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
6
|
Mitra S, Zhong J, Tran TQ, MacAlpine DM, Hartemink AJ. RoboCOP: jointly computing chromatin occupancy profiles for numerous factors from chromatin accessibility data. Nucleic Acids Res 2021; 49:7925-7938. [PMID: 34255854 PMCID: PMC8373080 DOI: 10.1093/nar/gkab553] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 05/28/2021] [Accepted: 07/08/2021] [Indexed: 01/25/2023] Open
Abstract
Chromatin is a tightly packaged structure of DNA and protein within the nucleus of a cell. The arrangement of different protein complexes along the DNA modulates and is modulated by gene expression. Measuring the binding locations and occupancy levels of different transcription factors (TFs) and nucleosomes is therefore crucial to understanding gene regulation. Antibody-based methods for assaying chromatin occupancy are capable of identifying the binding sites of specific DNA binding factors, but only one factor at a time. In contrast, epigenomic accessibility data like MNase-seq, DNase-seq, and ATAC-seq provide insight into the chromatin landscape of all factors bound along the genome, but with little insight into the identities of those factors. Here, we present RoboCOP, a multivariate state space model that integrates chromatin accessibility data with nucleotide sequence to jointly compute genome-wide probabilistic scores of nucleosome and TF occupancy, for hundreds of different factors. We apply RoboCOP to MNase-seq and ATAC-seq data to elucidate the protein-binding landscape of nucleosomes and 150 TFs across the yeast genome, and show that our model makes better predictions than existing methods. We also compute a chromatin occupancy profile of the yeast genome under cadmium stress, revealing chromatin dynamics associated with transcriptional regulation.
Collapse
Affiliation(s)
- Sneha Mitra
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Jianling Zhong
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA
| | - Trung Q Tran
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - David M MacAlpine
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA.,Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, NC 27710, USA.,Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
| | - Alexander J Hartemink
- Department of Computer Science, Duke University, Durham, NC 27708, USA.,Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA.,Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
7
|
Vanzan L, Soldati H, Ythier V, Anand S, Braun SMG, Francis N, Murr R. High throughput screening identifies SOX2 as a super pioneer factor that inhibits DNA methylation maintenance at its binding sites. Nat Commun 2021; 12:3337. [PMID: 34099689 PMCID: PMC8184831 DOI: 10.1038/s41467-021-23630-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 04/27/2021] [Indexed: 12/13/2022] Open
Abstract
Binding of mammalian transcription factors (TFs) to regulatory regions is hindered by chromatin compaction and DNA methylation of their binding sites. Nevertheless, pioneer transcription factors (PFs), a distinct class of TFs, have the ability to access nucleosomal DNA, leading to nucleosome remodelling and enhanced chromatin accessibility. Whether PFs can bind to methylated sites and induce DNA demethylation is largely unknown. Using a highly parallelized approach to investigate PF ability to bind methylated DNA and induce DNA demethylation, we show that the interdependence between DNA methylation and TF binding is more complex than previously thought, even within a select group of TFs displaying pioneering activity; while some PFs do not affect the methylation status of their binding sites, we identified PFs that can protect DNA from methylation and others that can induce DNA demethylation at methylated binding sites. We call the latter super pioneer transcription factors (SPFs), as they are seemingly able to overcome several types of repressive epigenetic marks. Finally, while most SPFs induce TET-dependent active DNA demethylation, SOX2 binding leads to passive demethylation, an activity enhanced by the co-binding of OCT4. This finding suggests that SPFs could interfere with epigenetic memory during DNA replication.
Collapse
Affiliation(s)
- Ludovica Vanzan
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
- Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Hadrien Soldati
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | - Victor Ythier
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
- Diagnostic Department, Clinical Pathology Division, University Hospital of Geneva, Geneva, Switzerland
| | - Santosh Anand
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
- Department of Informatics, Systems and Communications (DISCo), University of Milano-Bicocca, Milan, Italy
| | - Simon M G Braun
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | - Nicole Francis
- Institut de Recherches Cliniques de Montréal (IRCM) and Département de Biochimie et Médecine Moléculaire, Université de Montréal, Montréal, Canada
| | - Rabih Murr
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.
- Institute for Genetics and Genomics of Geneva (iGE3), University of Geneva, Geneva, Switzerland.
| |
Collapse
|
8
|
Kharerin H, Bai L. Thermodynamic modeling of genome-wide nucleosome depleted regions in yeast. PLoS Comput Biol 2021; 17:e1008560. [PMID: 33428627 PMCID: PMC7822557 DOI: 10.1371/journal.pcbi.1008560] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Revised: 01/22/2021] [Accepted: 11/25/2020] [Indexed: 01/09/2023] Open
Abstract
Nucleosome positioning in the genome is essential for the regulation of many nuclear processes. We currently have limited capability to predict nucleosome positioning in vivo, especially the locations and sizes of nucleosome depleted regions (NDRs). Here, we present a thermodynamic model that incorporates the intrinsic affinity of histones, competitive binding of sequence-specific factors, and nucleosome remodeling to predict nucleosome positioning in budding yeast. The model shows that the intrinsic affinity of histones, at near-saturating histone concentration, is not sufficient in generating NDRs in the genome. However, the binding of a few factors, especially RSC towards GC-rich and poly(A/T) sequences, allows us to predict ~ 66% of genome-wide NDRs. The model also shows that nucleosome remodeling activity is required to predict the correct NDR sizes. The validity of the model was further supported by the agreement between the predicted and the measured nucleosome positioning upon factor deletion or on exogenous sequences introduced into yeast. Overall, our model quantitatively evaluated the impact of different genetic components on NDR formation and illustrated the vital roles of sequence-specific factors and nucleosome remodeling in this process. Nucleosome is the basic unit of chromatin, containing 147 base-pairs of DNA wrapped around a histone core. The positioning of nucleosomes, i.e., which parts of DNA are inside nucleosome and which parts are nucleosome-free, is highly regulated. In particular, regulatory sequences tend to be exposed in nucleosome-depleted regions (NDRs), and such exposure is crucial for a variety of processes including DNA replication, repair, and gene expression. Here, we used a thermodynamics model to predict nucleosome positioning on the yeast genome. The model shows that the intrinsic sequence preference of histones is not sufficient in generating NDRs. In contrast, binding of a few transcription factors, especially RSC, is largely responsible for NDR formation. Nucleosome remodeling activity is also required in the model to recapitulate the NDR sizes. This model contributes to our understanding of the mechanisms that regulate nucleosome positioning. It can also be used to predict nucleosome positioning in mutant yeast or on novel DNA sequences.
Collapse
Affiliation(s)
- Hungyo Kharerin
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Physics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
9
|
Sönmezer C, Kleinendorst R, Imanci D, Barzaghi G, Villacorta L, Schübeler D, Benes V, Molina N, Krebs AR. Molecular Co-occupancy Identifies Transcription Factor Binding Cooperativity In Vivo. Mol Cell 2020; 81:255-267.e6. [PMID: 33290745 DOI: 10.1016/j.molcel.2020.11.015] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 11/04/2020] [Accepted: 11/09/2020] [Indexed: 01/18/2023]
Abstract
Gene activation requires the cooperative activity of multiple transcription factors at cis-regulatory elements (CREs). Yet, most transcription factors have short residence time, questioning the requirement of their physical co-occupancy on DNA to achieve cooperativity. Here, we present a DNA footprinting method that detects individual molecular interactions of transcription factors and nucleosomes with DNA in vivo. We apply this strategy to quantify the simultaneous binding of multiple transcription factors on single DNA molecules at mouse CREs. Analysis of the binary occupancy patterns at thousands of motif combinations reveals that high DNA co-occupancy occurs for most types of transcription factors, in the absence of direct physical interaction, at sites of competition with nucleosomes. Perturbation of pairwise interactions demonstrates the function of molecular co-occupancy in binding cooperativity. Our results reveal the interactions regulating CREs at molecular resolution and identify DNA co-occupancy as a widespread cooperativity mechanism used by transcription factors to remodel chromatin.
Collapse
Affiliation(s)
- Can Sönmezer
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany; Faculty of Biosciences, Collaboration for Joint PhD Degree between EMBL and Heidelberg University, Heidelberg, Germany
| | - Rozemarijn Kleinendorst
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Dilek Imanci
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland
| | - Guido Barzaghi
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany; Faculty of Biosciences, Collaboration for Joint PhD Degree between EMBL and Heidelberg University, Heidelberg, Germany
| | - Laura Villacorta
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Dirk Schübeler
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland; University of Basel, Faculty of Sciences, Petersplatz 1, 4001 Basel, Switzerland
| | - Vladimir Benes
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Nacho Molina
- Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), Université de Strasbourg-CNRS-INSERM, 1 rue Laurent Fries, 67404 Illkirch, France
| | - Arnaud Regis Krebs
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany.
| |
Collapse
|
10
|
Chen H, Liang H. A High-Resolution Map of Human Enhancer RNA Loci Characterizes Super-enhancer Activities in Cancer. Cancer Cell 2020; 38:701-715.e5. [PMID: 33007258 PMCID: PMC7658066 DOI: 10.1016/j.ccell.2020.08.020] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 07/21/2020] [Accepted: 08/28/2020] [Indexed: 12/20/2022]
Abstract
Although enhancers play critical roles in cancer, quantifying enhancer activities in clinical samples remains challenging, especially for super-enhancers. Enhancer activities can be inferred from enhancer RNA (eRNA) signals, which requires enhancer transcription loci definition. Only a small proportion of human eRNA loci has been precisely identified, limiting investigations of enhancer-mediated oncogenic mechanisms. Here, we characterize super-enhancer regions using aggregated RNA sequencing (RNA-seq) data from large cohorts. Super-enhancers usually contain discrete loci featuring sharp eRNA expression peaks. We identify >300,000 eRNA loci in ∼377 Mb super-enhancer regions that are regulated by evolutionarily conserved, well-positioned nucleosomes and are frequently dysregulated in cancer. The eRNAs provide explanatory power for cancer phenotypes beyond that provided by mRNA expression through resolving intratumoral heterogeneity with enhancer cell-type specificity. Our study provides a high-resolution map of eRNA loci through which super-enhancer activities can be quantified by RNA-seq and a user-friendly data portal, enabling a broad range of biomedical investigations.
Collapse
Affiliation(s)
- Han Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Han Liang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
| |
Collapse
|
11
|
Saxton MJ. Diffusion of DNA-Binding Species in the Nucleus: A Transient Anomalous Subdiffusion Model. Biophys J 2020; 118:2151-2167. [PMID: 32294478 PMCID: PMC7203007 DOI: 10.1016/j.bpj.2020.03.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 02/28/2020] [Accepted: 03/16/2020] [Indexed: 12/21/2022] Open
Abstract
Single-particle tracking experiments have measured escape times of DNA-binding species diffusing in living cells: CRISPR-Cas9, TetR, and LacI. The observed distribution is a truncated power law. Working backward from the experimental results, the observed distribution appears inconsistent with a Gaussian distribution of binding energies. Working forward, the observed distribution leads to transient anomalous subdiffusion, in which diffusion is anomalous at short times and normal at long times, here only mildly anomalous. Monte Carlo simulations are used to characterize the time-dependent diffusion coefficient D(t) in terms of the anomalous exponent α, the crossover time tcross, and the limits D(0) and D(∞) and to relate these quantities to the escape time distribution. The simplest interpretations identify the escape time as the actual binding time to DNA or the period of one-dimensional diffusion on DNA in the standard model combining one-dimensional and three-dimensional search, but a more complicated interpretation may be required. The model has several implications for cell biophysics. 1) The initial anomalous regime represents the search of the DNA-binding species for its target DNA sequence. 2) Non-target DNA sites have a significant effect on search kinetics. False positives in bioinformatic searches of the genome are potentially rate-determining in vivo. For simple binding, the search would be speeded if false-positive sequences were eliminated from the genome. 3) Both binding and obstruction affect diffusion. Obstruction ought to be measured directly, using as the primary probe the DNA-binding species with the binding site inactivated and eGFP as a calibration standard among laboratories and cell types. 4) Overexpression of the DNA-binding species reduces anomalous subdiffusion because the deepest binding sites are occupied and unavailable. 5) The model provides a coarse-grained phenomenological description of diffusion of a DNA-binding species, useful in larger-scale modeling of kinetics, FCS, and FRAP.
Collapse
Affiliation(s)
- Michael J Saxton
- Department of Biochemistry and Molecular Medicine, University of California, Davis, California.
| |
Collapse
|
12
|
Mitra S, Zhong J, MacAlpine DM, Hartemink AJ. RoboCOP: Multivariate State Space Model Integrating Epigenomic Accessibility Data to Elucidate Genome-Wide Chromatin Occupancy. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY : ... ANNUAL INTERNATIONAL CONFERENCE, RECOMB ... : PROCEEDINGS. RECOMB (CONFERENCE : 2005- ) 2020; 12074:136-151. [PMID: 34386808 PMCID: PMC8356533 DOI: 10.1007/978-3-030-45257-5_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Chromatin is the tightly packaged structure of DNA and protein within the nucleus of a cell. The arrangement of different protein complexes along the DNA modulates and is modulated by gene expression. Measuring the binding locations and level of occupancy of different transcription factors (TFs) and nucleosomes is therefore crucial to understanding gene regulation. Antibody-based methods for assaying chromatin occupancy are capable of identifying the binding sites of specific DNA binding factors, but only one factor at a time. On the other hand, epigenomic accessibility data like ATAC-seq, DNase-seq, and MNase-seq provide insight into the chromatin landscape of all factors bound along the genome, but with minimal insight into the identities of those factors. Here, we present RoboCOP, a multivariate state space model that integrates chromatin information from epigenomic accessibility data with nucleotide sequence to compute genome-wide probabilistic scores of nucleosome and TF occupancy, for hundreds of different factors at once. RoboCOP can be applied to any epigenomic dataset that provides quantitative insight into chromatin accessibility in any organism, but here we apply it to MNase-seq data to elucidate the protein-binding landscape of nucleosomes and 150 TFs across the yeast genome. Using available protein-binding datasets from the literature, we show that our model more accurately predicts the binding of these factors genome-wide.
Collapse
Affiliation(s)
- Sneha Mitra
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Jianling Zhong
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA
| | - David M MacAlpine
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA
- Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, NC 27710, USA
- Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
| | - Alexander J Hartemink
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
13
|
Peng PC, Khoueiry P, Girardot C, Reddington JP, Garfield DA, Furlong EEM, Sinha S. The Role of Chromatin Accessibility in cis-Regulatory Evolution. Genome Biol Evol 2020; 11:1813-1828. [PMID: 31114856 PMCID: PMC6601868 DOI: 10.1093/gbe/evz103] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/13/2019] [Indexed: 02/07/2023] Open
Abstract
Transcription factor (TF) binding is determined by sequence as well as chromatin accessibility. Although the role of accessibility in shaping TF-binding landscapes is well recorded, its role in evolutionary divergence of TF binding, which in turn can alter cis-regulatory activities, is not well understood. In this work, we studied the evolution of genome-wide binding landscapes of five major TFs in the core network of mesoderm specification, between Drosophila melanogaster and Drosophila virilis, and examined its relationship to accessibility and sequence-level changes. We generated chromatin accessibility data from three important stages of embryogenesis in both Drosophila melanogaster and Drosophila virilis and recorded conservation and divergence patterns. We then used multivariable models to correlate accessibility and sequence changes to TF-binding divergence. We found that accessibility changes can in some cases, for example, for the master regulator Twist and for earlier developmental stages, more accurately predict binding change than is possible using TF-binding motif changes between orthologous enhancers. Accessibility changes also explain a significant portion of the codivergence of TF pairs. We noted that accessibility and motif changes offer complementary views of the evolution of TF binding and developed a combined model that captures the evolutionary data much more accurately than either view alone. Finally, we trained machine learning models to predict enhancer activity from TF binding and used these functional models to argue that motif and accessibility-based predictors of TF-binding change can substitute for experimentally measured binding change, for the purpose of predicting evolutionary changes in enhancer activity.
Collapse
Affiliation(s)
- Pei-Chen Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign.,Center for Bioinformatics and Functional Genomics, Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA
| | - Pierre Khoueiry
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.,American University of Beirut (AUB), Department of Biochemistry and Molecular Genetics, Beirut, Lebanon
| | - Charles Girardot
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - James P Reddington
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - David A Garfield
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.,IRI-Life Sciences, Humboldt Universität zu Berlin, Berlin, Germany
| | - Eileen E M Furlong
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign
| |
Collapse
|
14
|
Nagarajan N, Yapp EKY, Le NQK, Kamaraj B, Al-Subaie AM, Yeh HY. Application of Computational Biology and Artificial Intelligence Technologies in Cancer Precision Drug Discovery. BIOMED RESEARCH INTERNATIONAL 2019; 2019:8427042. [PMID: 31886259 PMCID: PMC6925679 DOI: 10.1155/2019/8427042] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 10/14/2019] [Indexed: 02/08/2023]
Abstract
Artificial intelligence (AI) proves to have enormous potential in many areas of healthcare including research and chemical discoveries. Using large amounts of aggregated data, the AI can discover and learn further transforming these data into "usable" knowledge. Being well aware of this, the world's leading pharmaceutical companies have already begun to use artificial intelligence to improve their research regarding new drugs. The goal is to exploit modern computational biology and machine learning systems to predict the molecular behaviour and the likelihood of getting a useful drug, thus saving time and money on unnecessary tests. Clinical studies, electronic medical records, high-resolution medical images, and genomic profiles can be used as resources to aid drug development. Pharmaceutical and medical researchers have extensive data sets that can be analyzed by strong AI systems. This review focused on how computational biology and artificial intelligence technologies can be implemented by integrating the knowledge of cancer drugs, drug resistance, next-generation sequencing, genetic variants, and structural biology in the cancer precision drug discovery.
Collapse
Affiliation(s)
| | - Edward K. Y. Yapp
- Singapore Institute of Manufacturing Technology, 2 Fusionopolis Way, Singapore 138634
| | - Nguyen Quoc Khanh Le
- School of Humanities, Nanyang Technological University, 14 Nanyang Dr, Singapore 637332
| | - Balu Kamaraj
- Department of Neuroscience Technology, College of Applied Medical Sciences, Imam Abdulrahman Bin Faisal University, Jubail 35816, Saudi Arabia
| | - Abeer Mohammed Al-Subaie
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Hui-Yuan Yeh
- School of Humanities, Nanyang Technological University, 14 Nanyang Dr, Singapore 637332
| |
Collapse
|
15
|
Rudnizky S, Khamis H, Malik O, Squires AH, Meller A, Melamed P, Kaplan A. Single-molecule DNA unzipping reveals asymmetric modulation of a transcription factor by its binding site sequence and context. Nucleic Acids Res 2019; 46:1513-1524. [PMID: 29253225 PMCID: PMC5815098 DOI: 10.1093/nar/gkx1252] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 12/11/2017] [Indexed: 12/31/2022] Open
Abstract
Most functional transcription factor (TF) binding sites deviate from their ‘consensus’ recognition motif, although their sites and flanking sequences are often conserved across species. Here, we used single-molecule DNA unzipping with optical tweezers to study how Egr-1, a TF harboring three zinc fingers (ZF1, ZF2 and ZF3), is modulated by the sequence and context of its functional sites in the Lhb gene promoter. We find that both the core 9 bp bound to Egr-1 in each of the sites, and the base pairs flanking them, modulate the affinity and structure of the protein–DNA complex. The effect of the flanking sequences is asymmetric, with a stronger effect for the sequence flanking ZF3. Characterization of the dissociation time of Egr-1 revealed that a local, mechanical perturbation of the interactions of ZF3 destabilizes the complex more effectively than a perturbation of the ZF1 interactions. Our results reveal a novel role for ZF3 in the interaction of Egr-1 with other proteins and the DNA, providing insight on the regulation of Lhb and other genes by Egr-1. Moreover, our findings reveal the potential of small changes in DNA sequence to alter transcriptional regulation, and may shed light on the organization of regulatory elements at promoters.
Collapse
Affiliation(s)
- Sergei Rudnizky
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Hadeel Khamis
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel.,Faculty of Physics, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Omri Malik
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel.,Russell Berrie Nanotechnology Institute, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Allison H Squires
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Amit Meller
- Russell Berrie Nanotechnology Institute, Technion-Israel Institute of Technology, Haifa 32000, Israel.,Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA.,Faculty of Biomedical Engineering, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Philippa Melamed
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel.,Russell Berrie Nanotechnology Institute, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Ariel Kaplan
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel.,Russell Berrie Nanotechnology Institute, Technion-Israel Institute of Technology, Haifa 32000, Israel
| |
Collapse
|
16
|
Gutiérrez MP, MacAlpine HK, MacAlpine DM. Nascent chromatin occupancy profiling reveals locus- and factor-specific chromatin maturation dynamics behind the DNA replication fork. Genome Res 2019; 29:1123-1133. [PMID: 31217252 PMCID: PMC6633257 DOI: 10.1101/gr.243386.118] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Accepted: 05/28/2019] [Indexed: 01/06/2023]
Abstract
Proper regulation and maintenance of the epigenome is necessary to preserve genome function. However, in every cell division, the epigenetic state is disassembled and then reassembled in the wake of the DNA replication fork. Chromatin restoration on nascent DNA is a complex and regulated process that includes nucleosome assembly and remodeling, deposition of histone variants, and the re-establishment of transcription factor binding. To study the genome-wide dynamics of chromatin restoration behind the DNA replication fork, we developed nascent chromatin occupancy profiles (NCOPs) to comprehensively profile nascent and mature chromatin at nucleotide resolution. Although nascent chromatin is inherently less organized than mature chromatin, we identified locus-specific differences in the kinetics of chromatin maturation that were predicted by the epigenetic landscape, including the histone variant H2AZ, which marked loci with rapid maturation kinetics. The chromatin maturation at origins of DNA replication was dependent on whether the origin underwent initiation or was passively replicated from distal-originating replication forks, suggesting distinct chromatin assembly mechanisms surrounding activated and disassembled prereplicative complexes. Finally, we identified sites that were only occupied transiently by DNA-binding factors following passage of the replication fork, which may provide a mechanism for perturbations of the DNA replication program to shape the regulatory landscape of the genome.
Collapse
Affiliation(s)
- Mónica P Gutiérrez
- University Program in Genetics and Genomics, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Heather K MacAlpine
- Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - David M MacAlpine
- University Program in Genetics and Genomics, Duke University Medical Center, Durham, North Carolina 27710, USA
- Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, North Carolina 27710, USA
| |
Collapse
|
17
|
Digital Mapping of Soil Classes Using Ensemble of Models in Isfahan Region, Iran. SOIL SYSTEMS 2019. [DOI: 10.3390/soilsystems3020037] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Digital soil maps can be used to depict the ability of soil to fulfill certain functions. Digital maps offer reliable information that can be used in spatial planning programs. Several broad types of data mining approaches through Digital Soil Mapping (DSM) have been tested. The usual approach is to select a model that produces the best validation statistics. However, instead of choosing the best model, it is possible to combine all models realizing their strengths and weaknesses. We applied seven different techniques for the prediction of soil classes based on 194 sites located in Isfahan region. The mapping exercise aims to produce a soil class map that can be used for better understanding and management of soil resources. The models used in this study include Multinomial Logistic Regression (MnLR), Artificial Neural Networks (ANN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Bayesian Networks (BN), and Sparse Multinomial Logistic Regression (SMnLR). Two ensemble models based on majority votes (Ensemble.1) and MnLR (Ensemble.2) were implemented for integrating the optimal aspects of the individual techniques. The overall accuracy (OA), Cohen's kappa coefficient index (κ) and the area under the curve (AUC) were calculated based on 10-fold-cross validation with 100 repeats at four soil taxonomic levels. The Ensemble.2 model was able to achieve larger OA, κ coefficient and AUC compared to the best performing individual model (i.e., RF). Results of the ensemble model showed a decreasing trend in OA from Order (0.90) to Subgroup (0.53). This was also the case for the κ statistic, which was the largest for the Order (0.66) and smallest for the Subgroup (0.43). Same decrease was observed for AUC from Order (0.81) to Subgroup (0.67). The improvement in κ was substantial (43 to 60%) at all soil taxonomic levels, except the Order level. We conclude that the application of the ensemble model using the MnLR was optimal, as it provided a highly accurate prediction for all soil taxonomic levels over and above the individual models. It also used information from all models, and thus this method can be recommended for improved soil class modelling. Soil maps created by this DSM approach showed soils that are prone to degradation and need to be carefully managed and conserved to avoid further land degradation.
Collapse
|
18
|
Ma X, Ezer D, Adryan B, Stevens TJ. Canonical and single-cell Hi-C reveal distinct chromatin interaction sub-networks of mammalian transcription factors. Genome Biol 2018; 19:174. [PMID: 30359306 PMCID: PMC6203279 DOI: 10.1186/s13059-018-1558-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 10/04/2018] [Indexed: 12/20/2022] Open
Abstract
Background Transcription factor (TF) binding to regulatory DNA sites is a key determinant of cell identity within multi-cellular organisms and has been studied extensively in relation to site affinity and chromatin modifications. There has been a strong focus on the inference of TF-gene regulatory networks and TF-TF physical interaction networks. Here, we present a third type of TF network, the spatial network of co-localized TF binding sites within the three-dimensional genome. Results Using published canonical Hi-C data and single-cell genome structures, we assess the spatial proximity of a genome-wide array of potential TF-TF co-localizations in human and mouse cell lines. For individual TFs, the abundance of occupied binding sites shows a positive correspondence with their clustering in three dimensions, and this is especially apparent for weak TF binding sites and at enhancer regions. An analysis between different TF proteins identifies significantly proximal pairs, which are enriched in reported physical interactions. Furthermore, clustering of different TFs based on proximity enrichment identifies two partially segregated co-localization sub-networks, involving different TFs in different cell types. Using data from both human lymphoblastoid cells and mouse embryonic stem cells, we find that these sub-networks are enriched within, but not exclusive to, different chromosome sub-compartments that have been identified previously in Hi-C data. Conclusions This suggests that the association of TFs within spatial networks is closely coupled to gene regulatory networks. This applies to both differentiated and undifferentiated cells and is a potential causal link between lineage-specific TF binding and chromosome sub-compartment segregation. Electronic supplementary material The online version of this article (10.1186/s13059-018-1558-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyan Ma
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| | - Daphne Ezer
- The Alan Turing Institute for Data Science, British Library, 96 Euston Rd, Kings Cross, London, NW1 2DB, UK.,Department of Statistics, University of Warwick, Coventry, CV4 7AL, UK
| | - Boris Adryan
- Merck KGaA, Chief Digital Office, 64293, Darmstadt, Germany
| | - Tim J Stevens
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge, CB2 0QH, UK.
| |
Collapse
|
19
|
Abstract
Abstract
Next Generation Sequencing (NGS) or deep sequencing technology enables parallel reading of multiple individual DNA fragments, thereby enabling the identification of millions of base pairs in several hours. Recent research has clearly shown that machine learning technologies can efficiently analyse large sets of genomic data and help to identify novel gene functions and regulation regions. A deep artificial neural network consists of a group of artificial neurons that mimic the properties of living neurons. These mathematical models, termed Artificial Neural Networks (ANN), can be used to solve artificial intelligence engineering problems in several different technological fields (e.g., biology, genomics, proteomics, and metabolomics). In practical terms, neural networks are non-linear statistical structures that are organized as modelling tools and are used to simulate complex genomic relationships between inputs and outputs. To date, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNN) have been demonstrated to be the best tools for improving performance in problem solving tasks within the genomic field.
Collapse
|
20
|
Tan H, Liu T, Zhang J, Zhou T. Random positioning of nucleosomes enhances heritable bistability. MOLECULAR BIOSYSTEMS 2017; 13:132-141. [PMID: 27833942 DOI: 10.1039/c6mb00729e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Chromosomal regions are often dynamically modified by histones, leading to the uncertainty of nucleosome positions. Experiments have provided evidence for this randomness, but it is unclear how it impacts epigenetic heritability. Here, by analyzing a mechanic model at the molecular level, which considers three representative types of nucleosomes (unmodified, methylated, and acetylated) and dynamic nucleosome modifications, we find that in contrast to the equidistance partition of nucleosomes, random partition can significantly enhance heritable bistability. Moreover, the more "chaotic" the nucleosome positions are, the better the heritable bistability is, in contrast to the previous view. In both cases of nucleosome positioning, heritable bistability occurs only when the total nucleosome number is beyond a threshold, and it depends strongly on the allocation rate that enzymes regulate transitions between different nucleosome types. Thus, we conclude that random positioning of nucleosomes is an unneglectable factor impacting heritable bistability. A point worth mentioning is that our model established on a master equation can easily be extended to include other more complex processes underlying dynamic nucleosome modifications.
Collapse
Affiliation(s)
- Heli Tan
- School of Mathematics, Sun Yat-Sen University, Guangzhou 510275, P. R. China. and School of Mathematics and Computational Science, Xiangtan University, XiangTan 411105, P. R. China
| | - Tuoqi Liu
- School of Mathematics, Sun Yat-Sen University, Guangzhou 510275, P. R. China.
| | - Jiajun Zhang
- School of Mathematics, Sun Yat-Sen University, Guangzhou 510275, P. R. China.
| | - Tianshou Zhou
- School of Mathematics, Sun Yat-Sen University, Guangzhou 510275, P. R. China.
| |
Collapse
|
21
|
Inherent limitations of probabilistic models for protein-DNA binding specificity. PLoS Comput Biol 2017; 13:e1005638. [PMID: 28686588 PMCID: PMC5521849 DOI: 10.1371/journal.pcbi.1005638] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Revised: 07/21/2017] [Accepted: 06/21/2017] [Indexed: 01/10/2023] Open
Abstract
The specificities of transcription factors are most commonly represented with probabilistic models. These models provide a probability for each base occurring at each position within the binding site and the positions are assumed to contribute independently. The model is simple and intuitive and is the basis for many motif discovery algorithms. However, the model also has inherent limitations that prevent it from accurately representing true binding probabilities, especially for the highest affinity sites under conditions of high protein concentration. The limitations are not due to the assumption of independence between positions but rather are caused by the non-linear relationship between binding affinity and binding probability and the fact that independent normalization at each position skews the site probabilities. Generally probabilistic models are reasonably good approximations, but new high-throughput methods allow for biophysical models with increased accuracy that should be used whenever possible. Transcription factors (TFs), a class of DNA-binding proteins, play a central role in the regulation of gene expression. TFs control the rate of transcription by binding to the genome in a sequence-specific manner. Thus, one important aspect in the study of gene regulation mechanism is to model the binding specificities of TFs, namely the features of the DNA sequences that a TF prefers to bind. Multiple models have been proposed to characterize the binding specificities of TFs, among which the class of probabilistic models is the most popular. In this study, we point out several major limitations of the well-established probabilistic model by comparing it with the biophysical model. Through simulations we demonstrate that the probabilistic model is only an approximation of the biophysical model. The latter has most of the advantages of the former, and is a more accurate representation of binding specificities. We propose a shift from the probabilistic model to the biophysical model in future studies of protein-DNA interactions.
Collapse
|
22
|
Peng PC, Sinha S. Quantitative modeling of gene expression using DNA shape features of binding sites. Nucleic Acids Res 2016; 44:e120. [PMID: 27257066 PMCID: PMC5291265 DOI: 10.1093/nar/gkw446] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Revised: 05/06/2016] [Accepted: 05/09/2016] [Indexed: 12/11/2022] Open
Abstract
Prediction of gene expression levels driven by regulatory sequences is pivotal in genomic biology. A major focus in transcriptional regulation is sequence-to-expression modeling, which interprets the enhancer sequence based on transcription factor concentrations and DNA binding specificities and predicts precise gene expression levels in varying cellular contexts. Such models largely rely on the position weight matrix (PWM) model for DNA binding, and the effect of alternative models based on DNA shape remains unexplored. Here, we propose a statistical thermodynamics model of gene expression using DNA shape features of binding sites. We used rigorous methods to evaluate the fits of expression readouts of 37 enhancers regulating spatial gene expression patterns in Drosophila embryo, and show that DNA shape-based models perform arguably better than PWM-based models. We also observed DNA shape captures information complimentary to the PWM, in a way that is useful for expression modeling. Furthermore, we tested if combining shape and PWM-based features provides better predictions than using either binding model alone. Our work demonstrates that the increasingly popular DNA-binding models based on local DNA shape can be useful in sequence-to-expression modeling. It also provides a framework for future studies to predict gene expression better than with PWM models alone.
Collapse
Affiliation(s)
- Pei-Chen Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
23
|
Knox DA, Dowell RD. A Modeling Framework for Generation of Positional and Temporal Simulations of Transcriptional Regulation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:459-471. [PMID: 27295631 DOI: 10.1109/tcbb.2015.2459708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We present a modeling framework aimed at capturing both the positional and temporal behavior of transcriptional regulatory proteins in eukaryotic cells. There is growing evidence that transcriptional regulation is the complex behavior that emerges not solely from the individual components, but rather from their collective behavior, including competition and cooperation. Our framework describes individual regulatory components using generic action oriented descriptions of their biochemical interactions with a DNA sequence. All the possible actions are based on the current state of factors bound to the DNA. We developed a rule builder to automatically generate the complete set of biochemical interaction rules for any given DNA sequence. Off-the-shelf stochastic simulation engines can model the behavior of a system of rules and the resulting changes in the configuration of bound factors can be visualized. We compared our model to experimental data at well-studied loci in yeast, confirming that our model captures both the positional and temporal behavior of transcriptional regulation.
Collapse
Affiliation(s)
- David A Knox
- Computational Bioscience Program, University of Colorado, School of Medicine, Anschutz Medical Campus, Aurora, CO
| | - Robin D Dowell
- Molecular, Cellular, Developmental Biology Department, BioFrontiers Institute, University of Colorado, Boulder, CO
| |
Collapse
|
24
|
Bottani S, Veitia RA. Hill function-based models of transcriptional switches: impact of specific, nonspecific, functional and nonfunctional binding. Biol Rev Camb Philos Soc 2016; 92:953-963. [PMID: 27061969 DOI: 10.1111/brv.12262] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Revised: 02/12/2016] [Accepted: 02/16/2016] [Indexed: 12/25/2022]
Abstract
We explore minimalist models of transcription in which we take into account that a cis-regulatory sequence is embedded in, and interacts with, a complex genome. The classical Hill equation is the simplest way to represent a transcriptional response. However, it may overlook the fact that a transcription factor (TF) establishes specific and nonspecific nonfunctional interactions with chromatin. Classical papers have shown that nonfunctional binding (not leading to transcription) may influence gene expression. We examine how the presence of additional binding sites for a TF, besides those on the gene(s) of interest, affect the shape and parameters of the transcriptional response. We consider two conditions: at equilibrium and at steady-state. In many cases the TF level is determined by the position of the cell within a spatial or temporal gradient. We show that such gradients can be adjusted by evolutionary selection to compensate for the alteration of the gene transcription response by the presence of nonfunctional binding sites. Finally, we analyse how the transcriptional response is affected by a decrease in TF concentration, as in cases of haploinsufficiency. We show that the nonlinearity of the transcriptional response as a function of [TF] exacerbates the effect of a decrease in the latter, at least for weakly expressed TFs. Although decades of work on TFs have led to the impression that almost everything is known about the control of gene expression, we show that even the simplest models of transcription control have not delivered all their secrets yet.
Collapse
Affiliation(s)
- Samuel Bottani
- Matière et Systèmes Complexes CNRS UMR 7057, 75013 Paris, France.,Université Paris Diderot, Sorbonne Paris Cité, 75013 Paris, France
| | - Reiner A Veitia
- Université Paris Diderot, Sorbonne Paris Cité, 75013 Paris, France.,Institut Jacques Monod, CNRS UMR 7592, 75013 Paris, France
| |
Collapse
|
25
|
Boeva V. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells. Front Genet 2016; 7:24. [PMID: 26941778 PMCID: PMC4763482 DOI: 10.3389/fgene.2016.00024] [Citation(s) in RCA: 91] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 02/05/2016] [Indexed: 12/27/2022] Open
Abstract
Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation.
Collapse
Affiliation(s)
- Valentina Boeva
- Centre de Recherche, Institut CurieParis, France; INSERM, U900Paris, France; Mines ParisTechFontainebleau, France; PSL Research UniversityParis, France; Department of Development, Reproduction and Cancer, Institut CochinParis, France; INSERM, U1016Paris, France; Centre National de la Recherche Scientifique UMR 8104Paris, France; Université Paris Descartes UMR-S1016Paris, France
| |
Collapse
|
26
|
Abstract
Although deoxyribonuclease I (DNase I) was used to probe the structure of the nucleosome in the 1960s and 1970s, in the current high-throughput sequencing era, DNase I has mainly been used to study genomic regions devoid of nucleosomes. Here, we reveal for the first time that DNase I can be used to precisely map the (translational) positions of in vivo nucleosomes genome-wide. Specifically, exploiting a distinctive DNase I cleavage profile within nucleosome-associated DNA—including a signature 10.3 base pair oscillation that corresponds to accessibility of the minor groove as DNA winds around the nucleosome—we develop a Bayes-factor–based method that can be used to map nucleosome positions along the genome. Compared to methods that require genetically modified histones, our DNase-based approach is easily applied in any organism, which we demonstrate by producing maps in yeast and human. Compared to micrococcal nuclease (MNase)-based methods that map nucleosomes based on cuts in linker regions, we utilize DNase I cuts both outside and within nucleosomal DNA; the oscillatory nature of the DNase I cleavage profile within nucleosomal DNA enables us to identify translational positioning details not apparent in MNase digestion of linker DNA. Because the oscillatory pattern corresponds to nucleosome rotational positioning, it also reveals the rotational context of transcription factor (TF) binding sites. We show that potential binding sites within nucleosome-associated DNA are often centered preferentially on an exposed major or minor groove. This preferential localization may modulate TF interaction with nucleosome-associated DNA as TFs search for binding sites.
Collapse
|
27
|
|
28
|
Maeso I, Tena JJ. Favorable genomic environments for cis-regulatory evolution: A novel theoretical framework. Semin Cell Dev Biol 2015; 57:2-10. [PMID: 26673387 DOI: 10.1016/j.semcdb.2015.12.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Revised: 12/02/2015] [Accepted: 12/05/2015] [Indexed: 12/22/2022]
Abstract
Cis-regulatory changes are arguably the primary evolutionary source of animal morphological diversity. With the recent explosion of genome-wide comparisons of the cis-regulatory content in different animal species is now possible to infer general principles underlying enhancer evolution. However, these studies have also revealed numerous discrepancies and paradoxes, suggesting that the mechanistic causes and modes of cis-regulatory evolution are still not well understood and are probably much more complex than generally appreciated. Here, we argue that the mutational mechanisms and genomic regions generating new regulatory activities must comply with the constraints imposed by the molecular properties of cis-regulatory elements (CREs) and the organizational features of long-range chromatin interactions. Accordingly, we propose a new integrative evolutionary framework for cis-regulatory evolution based on two major premises for the origin of novel enhancer activity: (i) an accessible chromatin environment and (ii) compatibility with the 3D structure and interactions of pre-existing CREs. Mechanisms and DNA sequences not fulfilling these premises, will be less likely to have a measurable impact on gene expression and as such, will have a minor contribution to the evolution of gene regulation. Finally, we discuss current comparative cis-regulatory data under the light of this new evolutionary model, and propose that the two most prominent mechanisms for the evolution of cis-regulatory changes are the overprinting of ancestral CREs and the exaptation of transposable elements.
Collapse
Affiliation(s)
- Ignacio Maeso
- Centro Andaluz de Biología del Desarrollo (CSIC/UPO/JA), Universidad Pablo de Olavide, 41013 Seville, Spain.
| | - Juan J Tena
- Centro Andaluz de Biología del Desarrollo (CSIC/UPO/JA), Universidad Pablo de Olavide, 41013 Seville, Spain.
| |
Collapse
|
29
|
Jolma A, Yin Y, Nitta KR, Dave K, Popov A, Taipale M, Enge M, Kivioja T, Morgunova E, Taipale J. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 2015; 527:384-8. [DOI: 10.1038/nature15518] [Citation(s) in RCA: 369] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 08/24/2015] [Indexed: 12/28/2022]
|
30
|
Abstract
Recent advances in experimental and computational methodologies are enabling ultra-high resolution genome-wide profiles of protein-DNA binding events. For example, the ChIP-exo protocol precisely characterizes protein-DNA cross-linking patterns by combining chromatin immunoprecipitation (ChIP) with 5' → 3' exonuclease digestion. Similarly, deeply sequenced chromatin accessibility assays (e.g. DNase-seq and ATAC-seq) enable the detection of protected footprints at protein-DNA binding sites. With these techniques and others, we have the potential to characterize the individual nucleotides that interact with transcription factors, nucleosomes, RNA polymerases and other regulatory proteins in a particular cellular context. In this review, we explain the experimental assays and computational analysis methods that enable high-resolution profiling of protein-DNA binding events. We discuss the challenges and opportunities associated with such approaches.
Collapse
Affiliation(s)
- Shaun Mahony
- a Department of Biochemistry & Molecular Biology , Center for Eukaryotic Gene Regulation, The Pennsylvania State University , University Park , PA , USA
| | - B Franklin Pugh
- a Department of Biochemistry & Molecular Biology , Center for Eukaryotic Gene Regulation, The Pennsylvania State University , University Park , PA , USA
| |
Collapse
|
31
|
Abstract
The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.
Collapse
Affiliation(s)
- Maxwell W Libbrecht
- Department of Computer Science and Engineering, University of Washington, 185 Stevens Way, Seattle, Washington 98195-2350, USA
| | - William Stafford Noble
- 1] Department of Computer Science and Engineering, University of Washington, 185 Stevens Way, Seattle, Washington 98195-2350, USA. [2] Department of Genome Sciences, University of Washington, 3720 15th Ave NE Seattle, Washington 98195-5065, USA
| |
Collapse
|
32
|
Drillon G, Audit B, Argoul F, Arneodo A. Ubiquitous human 'master' origins of replication are encoded in the DNA sequence via a local enrichment in nucleosome excluding energy barriers. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2015; 27:064102. [PMID: 25563930 DOI: 10.1088/0953-8984/27/6/064102] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
As the elementary building block of eukaryotic chromatin, the nucleosome is at the heart of the compromise between the necessity of compacting DNA in the cell nucleus and the required accessibility to regulatory proteins. The recent availability of genome-wide experimental maps of nucleosome positions for many different organisms and cell types has provided an unprecedented opportunity to elucidate to what extent the DNA sequence conditions the primary structure of chromatin and in turn participates in the chromatin-mediated regulation of nuclear functions, such as gene expression and DNA replication. In this study, we use in vivo and in vitro genome-wide nucleosome occupancy data together with the set of nucleosome-free regions (NFRs) predicted by a physical model of nucleosome formation based on sequence-dependent bending properties of the DNA double-helix, to investigate the role of intrinsic nucleosome occupancy in the regulation of the replication spatio-temporal programme in human. We focus our analysis on the so-called replication U/N-domains that were shown to cover about half of the human genome in the germline (skew-N domains) as well as in embryonic stem cells, somatic and HeLa cells (mean replication timing U-domains). The 'master' origins of replication (MaOris) that border these megabase-sized U/N-domains were found to be specified by a few hundred kb wide regions that are hyper-sensitive to DNase I cleavage, hypomethylated, and enriched in epigenetic marks involved in transcription regulation, the hallmarks of localized open chromatin structures. Here we show that replication U/N-domain borders that are conserved in all considered cell lines have an environment highly enriched in nucleosome-excluding-energy barriers, suggesting that these ubiquitous MaOris have been selected during evolution. In contrast, MaOris that are cell-type-specific are mainly regulated epigenetically and are no longer favoured by a local abundance of intrinsic NFRs encoded in the DNA sequence. At the smaller few hundred bp scale of gene promoters, CpG-rich promoters of housekeeping genes found nearby ubiquitous MaOris as well as CpG-poor promoters of tissue-specific genes found nearby cell-type-specific MaOris, both correspond to in vivo NFRs that are not coded as nucleosome-excluding-energy barriers. Whereas the former promoters are likely to correspond to high occupancy transcription factor binding regions, the latter are an illustration that gene regulation in human is typically cell-type-specific.
Collapse
Affiliation(s)
- Guénola Drillon
- Université de Lyon, F-69000 Lyon, France. Laboratoire de Physique, CNRS UMR 5672, École Normale Supérieure de Lyon, F-69007 Lyon, France
| | | | | | | |
Collapse
|
33
|
Zabet NR, Adryan B. Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res 2015; 43:84-94. [PMID: 25432957 PMCID: PMC4288167 DOI: 10.1093/nar/gku1269] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Revised: 10/22/2014] [Accepted: 11/19/2014] [Indexed: 12/20/2022] Open
Abstract
The binding of transcription factors (TFs) is essential for gene expression. One important characteristic is the actual occupancy of a putative binding site in the genome. In this study, we propose an analytical model to predict genomic occupancy that incorporates the preferred target sequence of a TF in the form of a position weight matrix (PWM), DNA accessibility data (in the case of eukaryotes), the number of TF molecules expected to be bound specifically to the DNA and a parameter that modulates the specificity of the TF. Given actual occupancy data in the form of ChIP-seq profiles, we backwards inferred copy number and specificity for five Drosophila TFs during early embryonic development: Bicoid, Caudal, Giant, Hunchback and Kruppel. Our results suggest that these TFs display thousands of molecules that are specifically bound to the DNA and that whilst Bicoid and Caudal display a higher specificity, the other three TFs (Giant, Hunchback and Kruppel) display lower specificity in their binding (despite having PWMs with higher information content). This study gives further weight to earlier investigations into TF copy numbers that suggest a significant proportion of molecules are not bound specifically to the DNA.
Collapse
Affiliation(s)
- Nicolae Radu Zabet
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| | - Boris Adryan
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| |
Collapse
|
34
|
Slattery M, Zhou T, Yang L, Dantas Machado AC, Gordân R, Rohs R. Absence of a simple code: how transcription factors read the genome. Trends Biochem Sci 2014; 39:381-99. [PMID: 25129887 DOI: 10.1016/j.tibs.2014.07.002] [Citation(s) in RCA: 352] [Impact Index Per Article: 35.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Revised: 07/11/2014] [Accepted: 07/15/2014] [Indexed: 12/21/2022]
Abstract
Transcription factors (TFs) influence cell fate by interpreting the regulatory DNA within a genome. TFs recognize DNA in a specific manner; the mechanisms underlying this specificity have been identified for many TFs based on 3D structures of protein-DNA complexes. More recently, structural views have been complemented with data from high-throughput in vitro and in vivo explorations of the DNA-binding preferences of many TFs. Together, these approaches have greatly expanded our understanding of TF-DNA interactions. However, the mechanisms by which TFs select in vivo binding sites and alter gene expression remain unclear. Recent work has highlighted the many variables that influence TF-DNA binding, while demonstrating that a biophysical understanding of these many factors will be central to understanding TF function.
Collapse
Affiliation(s)
- Matthew Slattery
- Department of Biomedical Sciences, University of Minnesota Medical School, Duluth, MN 55812, USA; Developmental Biology Center, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Tianyin Zhou
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Lin Yang
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Ana Carolina Dantas Machado
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Raluca Gordân
- Center for Genomic and Computational Biology, Departments of Biostatistics and Bioinformatics, Computer Science, and Molecular Genetics and Microbiology, Duke University, Durham, NC 27708, USA.
| | - Remo Rohs
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
35
|
Ezer D, Zabet NR, Adryan B. Homotypic clusters of transcription factor binding sites: A model system for understanding the physical mechanics of gene expression. Comput Struct Biotechnol J 2014; 10:63-9. [PMID: 25349675 PMCID: PMC4204428 DOI: 10.1016/j.csbj.2014.07.005] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The organization of binding sites in cis-regulatory elements (CREs) can influence gene expression through a combination of physical mechanisms, ranging from direct interactions between TF molecules to DNA looping and transient chromatin interactions. The study of simple and common building blocks in promoters and other CREs allows us to dissect how all of these mechanisms work together. Many adjacent TF binding sites for the same TF species form homotypic clusters, and these CRE architecture building blocks serve as a prime candidate for understanding interacting transcriptional mechanisms. Homotypic clusters are prevalent in both bacterial and eukaryotic genomes, and are present in both promoters as well as more distal enhancer/silencer elements. Here, we review previous theoretical and experimental studies that show how the complexity (number of binding sites) and spatial organization (distance between sites and overall distance from transcription start sites) of homotypic clusters influence gene expression. In particular, we describe how homotypic clusters modulate the temporal dynamics of TF binding, a mechanism that can affect gene expression, but which has not yet been sufficiently characterized. We propose further experiments on homotypic clusters that would be useful in developing mechanistic models of gene expression.
Collapse
Affiliation(s)
- Daphne Ezer
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK
| | - Nicolae Radu Zabet
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK
| | - Boris Adryan
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK
| |
Collapse
|
36
|
Zhong J, Wasson T, Hartemink AJ. Learning protein-DNA interaction landscapes by integrating experimental data through computational models. ACTA ACUST UNITED AC 2014; 30:2868-74. [PMID: 24974204 DOI: 10.1093/bioinformatics/btu408] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
MOTIVATION Transcriptional regulation is directly enacted by the interactions between DNA and many proteins, including transcription factors (TFs), nucleosomes and polymerases. A critical step in deciphering transcriptional regulation is to infer, and eventually predict, the precise locations of these interactions, along with their strength and frequency. While recent datasets yield great insight into these interactions, individual data sources often provide only partial information regarding one aspect of the complete interaction landscape. For example, chromatin immunoprecipitation (ChIP) reveals the binding positions of a protein, but only for one protein at a time. In contrast, nucleases like MNase and DNase can be used to reveal binding positions for many different proteins at once, but cannot easily determine the identities of those proteins. Currently, few statistical frameworks jointly model these different data sources to reveal an accurate, holistic view of the in vivo protein-DNA interaction landscape. RESULTS Here, we develop a novel statistical framework that integrates different sources of experimental information within a thermodynamic model of competitive binding to jointly learn a holistic view of the in vivo protein-DNA interaction landscape. We show that our framework learns an interaction landscape with increased accuracy, explaining multiple sets of data in accordance with thermodynamic principles of competitive DNA binding. The resulting model of genomic occupancy provides a precise mechanistic vantage point from which to explore the role of protein-DNA interactions in transcriptional regulation. AVAILABILITY AND IMPLEMENTATION The C source code for compete and Python source code for MCMC-based inference are available at http://www.cs.duke.edu/∼amink. CONTACT amink@cs.duke.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianling Zhong
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Todd Wasson
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Alexander J Hartemink
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA
| |
Collapse
|
37
|
Abstract
Instructions for when, where and to what level each gene should be expressed are encoded within regulatory sequences. The importance of motifs recognized by DNA-binding regulators has long been known, but their extensive characterization afforded by recent technologies only partly accounts for how regulatory instructions are encoded in the genome. Here, we review recent advances in our understanding of regulatory sequences that influence transcription and go beyond the description of motifs. We discuss how understanding different aspects of the sequence-encoded regulation can help to unravel the genotype-phenotype relationship, which would lead to a more accurate and mechanistic interpretation of personal genome sequences.
Collapse
Affiliation(s)
- Michal Levo
- Department of Molecular Cell Biology, and Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Eran Segal
- Department of Molecular Cell Biology, and Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
38
|
Single-cell nucleosome mapping reveals the molecular basis of gene expression heterogeneity. Proc Natl Acad Sci U S A 2014; 111:E2462-71. [PMID: 24889621 DOI: 10.1073/pnas.1400517111] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Nucleosomes, the basic unit of chromatin, have a critical role in the control of gene expression. Nucleosome positions have generally been determined by examining bulk populations of cells and then correlated with overall gene expression. Here, we describe a technique to determine nucleosome positioning in single cells by virtue of the ability of the nucleosome to protect DNA from GpC methylation. In the acid phosphatase inducible PHO5 gene, we find that there is significant cell-to-cell variation in nucleosome positions and shifts in nucleosome positioning correlate with changes in gene expression. However, nucleosome positioning is not absolute, and even with major shifts in gene expression, some cells fail to change nucleosome configuration. Mutations of the PHO5 promoter that introduce a poly(dA:dT) tract-stimulated gene expression under nonpermissive conditions led to shifts of positioned nucleosomes similar to induction of PHO5. By contrast, mutations that altered AA/TT/AT periodicity reduced gene expression upon PHO5 induction and stabilized nucleosomes in most cells, suggesting that enhanced nucleosome affinity for DNA antagonizes chromatin remodelers. Finally, we determined nucleosome positioning in two regions described as "fuzzy" or nucleosome-free when examined in a bulk assay. These regions consisted of distinct nucleosomes with a larger footprint for potential location and an increase population of cells lacking a nucleosome altogether. These data indicate an underlying complexity of nucleosome positioning that may contribute to the flexibility and heterogeneity of gene expression.
Collapse
|
39
|
Teif VB, Beshnova DA, Vainshtein Y, Marth C, Mallm JP, Höfer T, Rippe K. Nucleosome repositioning links DNA (de)methylation and differential CTCF binding during stem cell development. Genome Res 2014; 24:1285-95. [PMID: 24812327 PMCID: PMC4120082 DOI: 10.1101/gr.164418.113] [Citation(s) in RCA: 106] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
During differentiation of embryonic stem cells, chromatin reorganizes to establish cell type-specific expression programs. Here, we have dissected the linkages between DNA methylation (5mC), hydroxymethylation (5hmC), nucleosome repositioning, and binding of the transcription factor CTCF during this process. By integrating MNase-seq and ChIP-seq experiments in mouse embryonic stem cells (ESC) and their differentiated counterparts with biophysical modeling, we found that the interplay between these factors depends on their genomic context. The mostly unmethylated CpG islands have reduced nucleosome occupancy and are enriched in cell type-independent binding sites for CTCF. The few remaining methylated CpG dinucleotides are preferentially associated with nucleosomes. In contrast, outside of CpG islands most CpGs are methylated, and the average methylation density oscillates so that it is highest in the linker region between nucleosomes. Outside CpG islands, binding of TET1, an enzyme that converts 5mC to 5hmC, is associated with labile, MNase-sensitive nucleosomes. Such nucleosomes are poised for eviction in ESCs and become stably bound in differentiated cells where the TET1 and 5hmC levels go down. This process regulates a class of CTCF binding sites outside CpG islands that are occupied by CTCF in ESCs but lose the protein during differentiation. We rationalize this cell type-dependent targeting of CTCF with a quantitative biophysical model of competitive binding with the histone octamer, depending on the TET1, 5hmC, and 5mC state.
Collapse
Affiliation(s)
- Vladimir B Teif
- Research Group Genome Organization and Function, Deutsches Krebsforschungszentrum (DKFZ) and BioQuant, 69120 Heidelberg, Germany
| | - Daria A Beshnova
- Research Group Genome Organization and Function, Deutsches Krebsforschungszentrum (DKFZ) and BioQuant, 69120 Heidelberg, Germany
| | - Yevhen Vainshtein
- Division Theoretical Systems Biology, Deutsches Krebsforschungszentrum (DKFZ) and BioQuant, 69120 Heidelberg, Germany
| | - Caroline Marth
- Research Group Genome Organization and Function, Deutsches Krebsforschungszentrum (DKFZ) and BioQuant, 69120 Heidelberg, Germany
| | - Jan-Philipp Mallm
- Research Group Genome Organization and Function, Deutsches Krebsforschungszentrum (DKFZ) and BioQuant, 69120 Heidelberg, Germany
| | - Thomas Höfer
- Division Theoretical Systems Biology, Deutsches Krebsforschungszentrum (DKFZ) and BioQuant, 69120 Heidelberg, Germany
| | - Karsten Rippe
- Research Group Genome Organization and Function, Deutsches Krebsforschungszentrum (DKFZ) and BioQuant, 69120 Heidelberg, Germany
| |
Collapse
|
40
|
Abstract
The term “transcriptional network” refers to the mechanism(s) that underlies coordinated expression of genes, typically involving transcription factors (TFs) binding to the promoters of multiple genes, and individual genes controlled by multiple TFs. A multitude of studies in the last two decades have aimed to map and characterize transcriptional networks in the yeast Saccharomyces cerevisiae. We review the methodologies and accomplishments of these studies, as well as challenges we now face. For most yeast TFs, data have been collected on their sequence preferences, in vivo promoter occupancy, and gene expression profiles in deletion mutants. These systematic studies have led to the identification of new regulators of numerous cellular functions and shed light on the overall organization of yeast gene regulation. However, many yeast TFs appear to be inactive under standard laboratory growth conditions, and many of the available data were collected using techniques that have since been improved. Perhaps as a consequence, comprehensive and accurate mapping among TF sequence preferences, promoter binding, and gene expression remains an open challenge. We propose that the time is ripe for renewed systematic efforts toward a complete mapping of yeast transcriptional regulatory mechanisms.
Collapse
|
41
|
Levinson M, Zhou Q. A penalized Bayesian approach to predicting sparse protein-DNA binding landscapes. ACTA ACUST UNITED AC 2014; 30:636-43. [PMID: 24115169 DOI: 10.1093/bioinformatics/btt585] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Cellular processes are controlled, directly or indirectly, by the binding of hundreds of different DNA binding factors (DBFs) to the genome. One key to deeper understanding of the cell is discovering where, when and how strongly these DBFs bind to the DNA sequence. Direct measurement of DBF binding sites (BSs; e.g. through ChIP-Chip or ChIP-Seq experiments) is expensive, noisy and not available for every DBF in every cell type. Naive and most existing computational approaches to detecting which DBFs bind in a set of genomic regions of interest often perform poorly, due to the high false discovery rates and restrictive requirements for prior knowledge. RESULTS We develop SparScape, a penalized Bayesian method for identifying DBFs active in the considered regions and predicting a joint probabilistic binding landscape. Using a sparsity-inducing penalization, SparScape is able to select a small subset of DBFs with enriched BSs in a set of DNA sequences from a much larger candidate set. This substantially reduces the false positives in prediction of BSs. Analysis of ChIP-Seq data in mouse embryonic stem cells and simulated data show that SparScape dramatically outperforms the naive motif scanning method and the comparable computational approaches in terms of DBF identification and BS prediction. AVAILABILITY AND IMPLEMENTATION SparScape is implemented in C++ with OpenMP (optional at compilation) and is freely available at 'www.stat.ucla.edu/∼zhou/Software.html' for academic use.
Collapse
Affiliation(s)
- Matthew Levinson
- Department of Statistics, University of California, Los Angeles, CA 90095, USA
| | | |
Collapse
|
42
|
Ezer D, Zabet NR, Adryan B. Physical constraints determine the logic of bacterial promoter architectures. Nucleic Acids Res 2014; 42:4196-207. [PMID: 24476912 PMCID: PMC3985651 DOI: 10.1093/nar/gku078] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Site-specific transcription factors (TFs) bind to their target sites on the DNA, where they regulate the rate at which genes are transcribed. Bacterial TFs undergo facilitated diffusion (a combination of 3D diffusion around and 1D random walk on the DNA) when searching for their target sites. Using computer simulations of this search process, we show that the organization of the binding sites, in conjunction with TF copy number and binding site affinity, plays an important role in determining not only the steady state of promoter occupancy, but also the order at which TFs bind. These effects can be captured by facilitated diffusion-based models, but not by standard thermodynamics. We show that the spacing of binding sites encodes complex logic, which can be derived from combinations of three basic building blocks: switches, barriers and clusters, whose response alone and in higher orders of organization we characterize in detail. Effective promoter organizations are commonly found in the E. coli genome and are highly conserved between strains. This will allow studies of gene regulation at a previously unprecedented level of detail, where our framework can create testable hypothesis of promoter logic.
Collapse
Affiliation(s)
- Daphne Ezer
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK and Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| | | | | |
Collapse
|
43
|
Zeigler RD, Cohen BA. Discrimination between thermodynamic models of cis-regulation using transcription factor occupancy data. Nucleic Acids Res 2013; 42:2224-34. [PMID: 24288374 PMCID: PMC3936720 DOI: 10.1093/nar/gkt1230] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Many studies have identified binding preferences for transcription factors (TFs), but few have yielded predictive models of how combinations of transcription factor binding sites generate specific levels of gene expression. Synthetic promoters have emerged as powerful tools for generating quantitative data to parameterize models of combinatorial cis-regulation. We sought to improve the accuracy of such models by quantifying the occupancy of TFs on synthetic promoters in vivo and incorporating these data into statistical thermodynamic models of cis-regulation. Using chromatin immunoprecipitation-seq, we measured the occupancy of Gcn4 and Cbf1 in synthetic promoter libraries composed of binding sites for Gcn4, Cbf1, Met31/Met32 and Nrg1. We measured the occupancy of these two TFs and the expression levels of all promoters in two growth conditions. Models parameterized using only expression data predicted expression but failed to identify several interactions between TFs. In contrast, models parameterized with occupancy and expression data predicted expression data, and also revealed Gcn4 self-cooperativity and a negative interaction between Gcn4 and Nrg1. Occupancy data also allowed us to distinguish between competing regulatory mechanisms for the factor Gcn4. Our framework for combining occupancy and expression data produces predictive models that better reflect the mechanisms underlying combinatorial cis-regulation of gene expression.
Collapse
Affiliation(s)
- Robert D Zeigler
- Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, MO 63108, USA
| | | |
Collapse
|
44
|
Zabet NR, Adryan B. The effects of transcription factor competition on gene regulation. Front Genet 2013; 4:197. [PMID: 24109486 PMCID: PMC3791378 DOI: 10.3389/fgene.2013.00197] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2013] [Accepted: 09/17/2013] [Indexed: 01/03/2023] Open
Abstract
Transcription factor (TF) molecules translocate by facilitated diffusion (a combination of 3D diffusion around and 1D random walk on the DNA). Despite the attention this mechanism received in the last 40 years, only a few studies investigated the influence of the cellular environment on the facilitated diffusion mechanism and, in particular, the influence of "other" DNA binding proteins competing with the TF molecules for DNA space. Molecular crowding on the DNA is likely to influence the association rate of TFs to their target site and the steady state occupancy of those sites, but it is still not clear how it influences the search in a genome-wide context, when the model includes biologically relevant parameters (such as: TF abundance, TF affinity for DNA and TF dynamics on the DNA). We performed stochastic simulations of TFs performing the facilitated diffusion mechanism, and considered various abundances of cognate and non-cognate TFs. We show that, for both obstacles that move on the DNA and obstacles that are fixed on the DNA, changes in search time are not statistically significant in case of biologically relevant crowding levels on the DNA. In the case of non-cognate proteins that slide on the DNA, molecular crowding on the DNA always leads to statistically significant lower levels of occupancy, which may confer a general mechanism to control gene activity levels globally. When the "other" molecules are immobile on the DNA, we found a completely different behavior, namely: the occupancy of the target site is always increased by higher molecular crowding on the DNA. Finally, we show that crowding on the DNA may increase transcriptional noise through increased variability of the occupancy time of the target sites.
Collapse
Affiliation(s)
- Nicolae Radu Zabet
- Cambridge Systems Biology Centre, University of CambridgeCambridge, UK
- Department of Genetics, University of CambridgeCambridge, UK
| | - Boris Adryan
- Cambridge Systems Biology Centre, University of CambridgeCambridge, UK
- Department of Genetics, University of CambridgeCambridge, UK
| |
Collapse
|
45
|
The influence of transcription factor competition on the relationship between occupancy and affinity. PLoS One 2013; 8:e73714. [PMID: 24086290 PMCID: PMC3785477 DOI: 10.1371/journal.pone.0073714] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Accepted: 07/31/2013] [Indexed: 01/17/2023] Open
Abstract
Transcription factors (TFs) are proteins that bind to specific sites on the DNA and regulate gene activity. Identifying where TF molecules bind and how much time they spend on their target sites is key to understanding transcriptional regulation. It is usually assumed that the free energy of binding of a TF to the DNA (the affinity of the site) is highly correlated to the amount of time the TF remains bound (the occupancy of the site). However, knowing the binding energy is not sufficient to infer actual binding site occupancy. This mismatch between the occupancy predicted by the affinity and the observed occupancy may be caused by various factors, such as TF abundance, competition between TFs or the arrangement of the sites on the DNA. We investigated the relationship between the affinity of a TF for a set of binding sites and their occupancy. In particular, we considered the case of the transcription factor lac repressor (lacI) in E.coli, and performed stochastic simulations of the TF dynamics on the DNA for various combinations of lacI abundance and competing TFs that contribute to macromolecular crowding. We also investigated the relationship of site occupancy and the information content of position weight matrices (PWMs) used to represent binding sites. Our results showed that for medium and high affinity sites, TF competition does not play a significant role for genomic occupancy except in cases when the abundance of the TF is significantly increased, or when the PWM displays relatively low information content. Nevertheless, for medium and low affinity sites, an increase in TF abundance (for both cognate and non-cognate molecules) leads to an increase in occupancy at several sites.
Collapse
|
46
|
Mathelier A, Wasserman WW. The next generation of transcription factor binding site prediction. PLoS Comput Biol 2013; 9:e1003214. [PMID: 24039567 PMCID: PMC3764009 DOI: 10.1371/journal.pcbi.1003214] [Citation(s) in RCA: 124] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Accepted: 07/22/2013] [Indexed: 12/29/2022] Open
Abstract
Finding where transcription factors (TFs) bind to the DNA is of key importance to decipher gene regulation at a transcriptional level. Classically, computational prediction of TF binding sites (TFBSs) is based on basic position weight matrices (PWMs) which quantitatively score binding motifs based on the observed nucleotide patterns in a set of TFBSs for the corresponding TF. Such models make the strong assumption that each nucleotide participates independently in the corresponding DNA-protein interaction and do not account for flexible length motifs. We introduce transcription factor flexible models (TFFMs) to represent TF binding properties. Based on hidden Markov models, TFFMs are flexible, and can model both position interdependence within TFBSs and variable length motifs within a single dedicated framework. The availability of thousands of experimentally validated DNA-TF interaction sequences from ChIP-seq allows for the generation of models that perform as well as PWMs for stereotypical TFs and can improve performance for TFs with flexible binding characteristics. We present a new graphical representation of the motifs that convey properties of position interdependence. TFFMs have been assessed on ChIP-seq data sets coming from the ENCODE project, revealing that they can perform better than both PWMs and the dinucleotide weight matrix extension in discriminating ChIP-seq from background sequences. Under the assumption that ChIP-seq signal values are correlated with the affinity of the TF-DNA binding, we find that TFFM scores correlate with ChIP-seq peak signals. Moreover, using available TF-DNA affinity measurements for the Max TF, we demonstrate that TFFMs constructed from ChIP-seq data correlate with published experimentally measured DNA-binding affinities. Finally, TFFMs allow for the straightforward computation of an integrated TF occupancy score across a sequence. These results demonstrate the capacity of TFFMs to accurately model DNA-protein interactions, while providing a single unified framework suitable for the next generation of TFBS prediction.
Collapse
Affiliation(s)
- Anthony Mathelier
- Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Wyeth W. Wasserman
- Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
47
|
Nucleosome free regions in yeast promoters result from competitive binding of transcription factors that interact with chromatin modifiers. PLoS Comput Biol 2013; 9:e1003181. [PMID: 23990766 PMCID: PMC3749953 DOI: 10.1371/journal.pcbi.1003181] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 07/04/2013] [Indexed: 11/19/2022] Open
Abstract
Because DNA packaging in nucleosomes modulates its accessibility to transcription factors (TFs), unraveling the causal determinants of nucleosome positioning is of great importance to understanding gene regulation. Although there is evidence that intrinsic sequence specificity contributes to nucleosome positioning, the extent to which other factors contribute to nucleosome positioning is currently highly debated. Here we obtained both in vivo and in vitro reference maps of positions that are either consistently covered or free of nucleosomes across multiple experimental data-sets in Saccharomyces cerevisiae. We then systematically quantified the contribution of TF binding to nucleosome positiong using a rigorous statistical mechanics model in which TFs compete with nucleosomes for binding DNA. Our results reconcile previous seemingly conflicting results on the determinants of nucleosome positioning and provide a quantitative explanation for the difference between in vivo and in vitro positioning. On a genome-wide scale, nucleosome positioning is dominated by the phasing of nucleosome arrays over gene bodies, and their positioning is mainly determined by the intrinsic sequence preferences of nucleosomes. In contrast, larger nucleosome free regions in promoters, which likely have a much more significant impact on gene expression, are determined mainly by TF binding. Interestingly, of the 158 yeast TFs included in our modeling, we find that only 10–20 significantly contribute to inducing nucleosome-free regions, and these TFs are highly enriched for having direct interations with chromatin remodelers. Together our results imply that nucleosome free regions in yeast promoters results from the binding of a specific class of TFs that recruit chromatin remodelers. The DNA of all eukaryotic organisms is packaged into nucleosomes, which cover roughly of the genome. As nucleosome positioning profoundly affects DNA accessibility to other DNA binding proteins such as transcription factors (TFs), it plays an important role in transcription regulation. However, to what extent nucleosome positioning is guided by intrinsic DNA sequence preferences of nucleosomes, and to what extent other DNA binding factors play a role, is currently highly debated. Here we use a rigorous biophysical model to systematically study the relative contributions of intrinsic sequence preferences and competitive binding of TFs to nucleosome positioning in yeast. We find that, on the one hand, the phasing of the many small spacers within dense nucleosome arrays that cover gene bodies are mainly determined by intrinsic sequence preferences. On the other hand, larger nucleosome free regions (NFRs) in promoters are explained predominantly by TF binding. Strikingly, we find that only 10–20 TFs make a significant contribution to explaining NFRs, and these TFs are highly enriched for directly interacting with chromatin modifiers. Thus, the picture that emerges is that binding by a specific class of TFs recruits chromatin modifiers which mediate local nucleosome expulsion.
Collapse
|
48
|
Cheng Q, Kazemian M, Pham H, Blatti C, Celniker SE, Wolfe SA, Brodsky MH, Sinha S. Computational identification of diverse mechanisms underlying transcription factor-DNA occupancy. PLoS Genet 2013; 9:e1003571. [PMID: 23935523 PMCID: PMC3731213 DOI: 10.1371/journal.pgen.1003571] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Accepted: 05/02/2013] [Indexed: 12/13/2022] Open
Abstract
ChIP-based genome-wide assays of transcription factor (TF) occupancy have emerged as a powerful, high-throughput method to understand transcriptional regulation, especially on a global scale. This has led to great interest in the underlying biochemical mechanisms that direct TF-DNA binding, with the ultimate goal of computationally predicting a TF's occupancy profile in any cellular condition. In this study, we examined the influence of various potential determinants of TF-DNA binding on a much larger scale than previously undertaken. We used a thermodynamics-based model of TF-DNA binding, called “STAP,” to analyze 45 TF-ChIP data sets from Drosophila embryonic development. We built a cross-validation framework that compares a baseline model, based on the ChIP'ed (“primary”) TF's motif, to more complex models where binding by secondary TFs is hypothesized to influence the primary TF's occupancy. Candidates interacting TFs were chosen based on RNA-SEQ expression data from the time point of the ChIP experiment. We found widespread evidence of both cooperative and antagonistic effects by secondary TFs, and explicitly quantified these effects. We were able to identify multiple classes of interactions, including (1) long-range interactions between primary and secondary motifs (separated by ≤150 bp), suggestive of indirect effects such as chromatin remodeling, (2) short-range interactions with specific inter-site spacing biases, suggestive of direct physical interactions, and (3) overlapping binding sites suggesting competitive binding. Furthermore, by factoring out the previously reported strong correlation between TF occupancy and DNA accessibility, we were able to categorize the effects into those that are likely to be mediated by the secondary TF's effect on local accessibility and those that utilize accessibility-independent mechanisms. Finally, we conducted in vitro pull-down assays to test model-based predictions of short-range cooperative interactions, and found that seven of the eight TF pairs tested physically interact and that some of these interactions mediate cooperative binding to DNA. Chromatin Immunoprecipitation (ChIP)-based genome-wide assays of transcription factor (TF) occupancy have emerged as a powerful, high throughput method to understand transcriptional regulation, especially on a global scale. Here, we utilize 45 ChIP-chip and ChIP-SEQ data sets from Drosophila to explore the underlying mechanisms of TF-DNA binding. For this, we employ a biophysically motivated computational model, in conjunction with over 300 TF motifs (binding specificities) as well as gene expression and DNA accessibility data from different developmental stages in Drosophila embryos. Our findings provide robust statistical evidence of the role played by TF-TF interactions in shaping genome-wide TF-DNA binding profiles, and thus in directing gene regulation. Our method allows us to go beyond simply recognizing the existence of such interactions, to quantifying their effects on TF occupancy. We are able to categorize the probable mechanisms of these effects as involving direct physical interactions versus accessibility-mediated indirect interactions, long-range versus short-range interactions, and cooperative versus antagonistic interactions. Our analysis reveals widespread evidence of combinatorial regulation present in recently generated ChIP data sets, and sets the stage for rich integrative models of the future that will predict cell type-specific TF occupancy values from sequence and expression data.
Collapse
Affiliation(s)
- Qiong Cheng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Majid Kazemian
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Hannah Pham
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Charles Blatti
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Susan E. Celniker
- Department of Genome Dynamics, Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Scot A. Wolfe
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Michael H. Brodsky
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- * E-mail: (MHB); (SS)
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Institute of Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail: (MHB); (SS)
| |
Collapse
|
49
|
Teif VB, Erdel F, Beshnova DA, Vainshtein Y, Mallm JP, Rippe K. Taking into account nucleosomes for predicting gene expression. Methods 2013; 62:26-38. [PMID: 23523656 DOI: 10.1016/j.ymeth.2013.03.011] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Accepted: 03/10/2013] [Indexed: 01/10/2023] Open
Abstract
The eukaryotic genome is organized in a chain of nucleosomes that consist of 145-147 bp of DNA wrapped around a histone octamer protein core. Binding of transcription factors (TF) to nucleosomal DNA is frequently impeded, which makes it a challenging task to calculate TF occupancy at a given regulatory genomic site for predicting gene expression. Here, we review methods to calculate TF binding to DNA in the presence of nucleosomes. The main theoretical problems are (i) the computation speed that is becoming a bottleneck when partial unwrapping of DNA from the nucleosome is considered, (ii) the perturbation of the binding equilibrium by the activity of ATP-dependent chromatin remodelers, which translocate nucleosomes along the DNA, and (iii) the model parameterization from high-throughput sequencing data and fluorescence microscopy experiments in living cells. We discuss strategies that address these issues to efficiently compute transcription factor binding in chromatin.
Collapse
Affiliation(s)
- Vladimir B Teif
- Research Group Genome Organization & Function, Deutsches Krebsforschungszentrum-DKFZ & BioQuant, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany.
| | | | | | | | | | | |
Collapse
|
50
|
Gaffney DJ, McVicker G, Pai AA, Fondufe-Mittendorf YN, Lewellen N, Michelini K, Widom J, Gilad Y, Pritchard JK. Controls of nucleosome positioning in the human genome. PLoS Genet 2012; 8:e1003036. [PMID: 23166509 PMCID: PMC3499251 DOI: 10.1371/journal.pgen.1003036] [Citation(s) in RCA: 207] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 09/02/2012] [Indexed: 11/19/2022] Open
Abstract
Nucleosomes are important for gene regulation because their arrangement on the genome can control which proteins bind to DNA. Currently, few human nucleosomes are thought to be consistently positioned across cells; however, this has been difficult to assess due to the limited resolution of existing data. We performed paired-end sequencing of micrococcal nuclease-digested chromatin (MNase-seq) from seven lymphoblastoid cell lines and mapped over 3.6 billion MNase-seq fragments to the human genome to create the highest-resolution map of nucleosome occupancy to date in a human cell type. In contrast to previous results, we find that most nucleosomes have more consistent positioning than expected by chance and a substantial fraction (8.7%) of nucleosomes have moderate to strong positioning. In aggregate, nucleosome sequences have 10 bp periodic patterns in dinucleotide frequency and DNase I sensitivity; and, across cells, nucleosomes frequently have translational offsets that are multiples of 10 bp. We estimate that almost half of the genome contains regularly spaced arrays of nucleosomes, which are enriched in active chromatin domains. Single nucleotide polymorphisms that reduce DNase I sensitivity can disrupt the phasing of nucleosome arrays, which indicates that they often result from positioning against a barrier formed by other proteins. However, nucleosome arrays can also be created by DNA sequence alone. The most striking example is an array of over 400 nucleosomes on chromosome 12 that is created by tandem repetition of sequences with strong positioning properties. In summary, a large fraction of nucleosomes are consistently positioned--in some regions because they adopt favored sequence positions, and in other regions because they are forced into specific arrangements by chromatin remodeling or DNA binding proteins.
Collapse
Affiliation(s)
- Daniel J. Gaffney
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Howard Hughes Medical Institute, Chevy Chase, Maryland, United States of America
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom
- * E-mail: (DJG); (GM); (YG); (JKP)
| | - Graham McVicker
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Howard Hughes Medical Institute, Chevy Chase, Maryland, United States of America
- * E-mail: (DJG); (GM); (YG); (JKP)
| | - Athma A. Pai
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Yvonne N. Fondufe-Mittendorf
- Department of Molecular Biosciences and Department of Chemistry, Northwestern University, Chicago, Illinois, United States of America
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
| | - Noah Lewellen
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Howard Hughes Medical Institute, Chevy Chase, Maryland, United States of America
| | - Katelyn Michelini
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Howard Hughes Medical Institute, Chevy Chase, Maryland, United States of America
| | - Jonathan Widom
- Department of Molecular Biosciences and Department of Chemistry, Northwestern University, Chicago, Illinois, United States of America
| | - Yoav Gilad
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (DJG); (GM); (YG); (JKP)
| | - Jonathan K. Pritchard
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Howard Hughes Medical Institute, Chevy Chase, Maryland, United States of America
- * E-mail: (DJG); (GM); (YG); (JKP)
| |
Collapse
|