1
|
Kang CK, Kim AR. Deep molecular learning of transcriptional control of a synthetic CRE enhancer and its variants. iScience 2024; 27:108747. [PMID: 38222110 PMCID: PMC10784702 DOI: 10.1016/j.isci.2023.108747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/29/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024] Open
Abstract
Massively parallel reporter assay measures transcriptional activities of various cis-regulatory modules (CRMs) in a single experiment. We developed a thermodynamic computational model framework that calculates quantitative levels of gene expression directly from regulatory DNA sequences. Using the framework, we investigated the molecular mechanisms of cis-regulatory mutations of a synthetic enhancer that cause abnormal gene expression. We found that, in a human cell line, competitive binding between family transcription factors (TFs) with slightly different binding preferences significantly increases the accuracy of recapitulating the transcriptional effects of thousands of single- or multi-mutations. We also discovered that even if various harmful mutations occurred in an activator binding site, CRM could stably maintain or even increase gene expression through a certain form of competitive binding between family TFs. These findings enhance understanding the effect of SNPs and indels on CRMs and would help building robust custom-designed CRMs for biologics production and gene therapy.
Collapse
Affiliation(s)
- Chan-Koo Kang
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- School of Applied Artificial Intelligence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| |
Collapse
|
2
|
Loell KJ, Friedman RZ, Myers CA, Corbo JC, Cohen BA, White MA. Transcription factor interactions explain the context-dependent activity of CRX binding sites. PLoS Comput Biol 2024; 20:e1011802. [PMID: 38227575 PMCID: PMC10817189 DOI: 10.1371/journal.pcbi.1011802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 01/26/2024] [Accepted: 01/06/2024] [Indexed: 01/18/2024] Open
Abstract
The effects of transcription factor binding sites (TFBSs) on the activity of a cis-regulatory element (CRE) depend on the local sequence context. In rod photoreceptors, binding sites for the transcription factor (TF) Cone-rod homeobox (CRX) occur in both enhancers and silencers, but the sequence context that determines whether CRX binding sites contribute to activation or repression of transcription is not understood. To investigate the context-dependent activity of CRX sites, we fit neural network-based models to the activities of synthetic CREs composed of photoreceptor TFBSs. The models revealed that CRX binding sites consistently make positive, independent contributions to CRE activity, while negative homotypic interactions between sites cause CREs composed of multiple CRX sites to function as silencers. The effects of negative homotypic interactions can be overcome by the presence of other TFBSs that either interact cooperatively with CRX sites or make independent positive contributions to activity. The context-dependent activity of CRX sites is thus determined by the balance between positive heterotypic interactions, independent contributions of TFBSs, and negative homotypic interactions. Our findings explain observed patterns of activity among genomic CRX-bound enhancers and silencers, and suggest that enhancers may require diverse TFBSs to overcome negative homotypic interactions between TFBSs.
Collapse
Affiliation(s)
- Kaiser J. Loell
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Ryan Z. Friedman
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Connie A. Myers
- Department of Pathology and Immunology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Joseph C. Corbo
- Department of Pathology and Immunology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Barak A. Cohen
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Michael A. White
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| |
Collapse
|
3
|
Horton CA, Alexandari AM, Hayes MGB, Marklund E, Schaepe JM, Aditham AK, Shah N, Suzuki PH, Shrikumar A, Afek A, Greenleaf WJ, Gordân R, Zeitlinger J, Kundaje A, Fordyce PM. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Science 2023; 381:eadd1250. [PMID: 37733848 DOI: 10.1126/science.add1250] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/26/2023] [Indexed: 09/23/2023]
Abstract
Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.
Collapse
Affiliation(s)
- Connor A Horton
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Michael G B Hayes
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Emil Marklund
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Julia M Schaepe
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Arjun K Aditham
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
| | - Nilay Shah
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Peter H Suzuki
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ariel Afek
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | | | - Raluca Gordân
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
- The University of Kansas Medical Center, Kansas City, KS 66103, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Polly M Fordyce
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94110, USA
| |
Collapse
|
4
|
Kleinschmidt H, Xu C, Bai L. Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 2023; 132:167-189. [PMID: 37184694 PMCID: PMC10542970 DOI: 10.1007/s00412-023-00796-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/16/2023]
Abstract
Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.
Collapse
Affiliation(s)
- Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Physics, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
5
|
Reiter F, de Almeida BP, Stark A. Enhancers display constrained sequence flexibility and context-specific modulation of motif function. Genome Res 2023; 33:346-358. [PMID: 36941077 PMCID: PMC10078294 DOI: 10.1101/gr.277246.122] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 02/14/2023] [Indexed: 03/23/2023]
Abstract
The information about when and where each gene is to be expressed is mainly encoded in the DNA sequence of enhancers, sequence elements that comprise binding sites (motifs) for different transcription factors (TFs). Most of the research on enhancer sequences has been focused on TF motif presence, whereas the enhancer syntax, that is, the flexibility of important motif positions and how the sequence context modulates the activity of TF motifs, remains poorly understood. Here, we explore the rules of enhancer syntax by a two-pronged approach in Drosophila melanogaster S2 cells: we (1) replace important TF motifs by all possible 65,536 eight-nucleotide-long sequences and (2) paste eight important TF motif types into 763 positions within 496 enhancers. These complementary strategies reveal that enhancers display constrained sequence flexibility and the context-specific modulation of motif function. Important motifs can be functionally replaced by hundreds of sequences constituting several distinct motif types, but these are only a fraction of all possible sequences and motif types. Moreover, TF motifs contribute with different intrinsic strengths that are strongly modulated by the enhancer sequence context (the flanking sequence, the presence and diversity of other motif types, and the distance between motifs), such that not all motif types can work in all positions. The context-specific modulation of motif function is also a hallmark of human enhancers, as we demonstrate experimentally. Overall, these two general principles of enhancer sequences are important to understand and predict enhancer function during development, evolution, and in disease.
Collapse
Affiliation(s)
- Franziska Reiter
- Research Institute of Molecular Pathology, Vienna BioCenter, Campus-Vienna-BioCenter 1, 1030 Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, 1030 Vienna, Austria
| | - Bernardo P de Almeida
- Research Institute of Molecular Pathology, Vienna BioCenter, Campus-Vienna-BioCenter 1, 1030 Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, 1030 Vienna, Austria
| | - Alexander Stark
- Research Institute of Molecular Pathology, Vienna BioCenter, Campus-Vienna-BioCenter 1, 1030 Vienna, Austria;
- Medical University of Vienna, Vienna BioCenter, 1030 Vienna, Austria
| |
Collapse
|
6
|
Nuclear Factor Kappa B Promotes Ferritin Heavy Chain Expression in Bombyx mori in Response to B. mori Nucleopolyhedrovirus Infection. Int J Mol Sci 2022; 23:ijms231810380. [PMID: 36142290 PMCID: PMC9499628 DOI: 10.3390/ijms231810380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/01/2022] [Accepted: 09/06/2022] [Indexed: 11/16/2022] Open
Abstract
Ferritin heavy chain (FerHCH) is a major component of ferritin and plays an important role in maintaining iron homeostasis and redox equilibrium. Our previous studies have demonstrated that the Bombyx mori ferritin heavy chain homolog (BmFerHCH) could respond to B. mori nucleopolyhedrovirus (BmNPV) infection. However, the mechanism by which BmNPV regulates the expression of BmFerHCH remains unclear. In this study, BmFerHCH increased after BmNPV infection and BmNPV infection enhanced nuclear factor kappa B (NF-κB) activity in BmN cells. An NF-κB inhibitor (PDTC) reduced the expression of the virus-induced BmFerHCH in BmN cells, and overexpression of BmRelish (NF-κB) increased the expression of virus-induced BmFerHCH in BmN cells. Furthermore, BmNPV infection enhanced BmFerHCH promoter activity. The potential NF-κB cis-regulatory elements (CREs) in the BmFerHCH promoter were screened by using the JASPAR CORE database, and two effective NF-κB CREs were identified using a dual luciferase reporting system and electrophoretic mobility shift assay (EMSA). BmRelish (NF-κB) bound to NF-κB CREs and promoted the transcription of BmFerHCH. Taken together, BmNPV promotes activation of BmRelish (NF-κB), and activated BmRelish (NF-κB) binds to NF-κB CREs of BmFerHCH promoter to enhance BmFerHCH expression. Our study provides a foundation for future research on the function of BmFerHCH in BmNPV infection.
Collapse
|
7
|
Mellul M, Lahav S, Imashimizu M, Tokunaga Y, Lukatsky DB, Ram O. Repetitive DNA symmetry elements negatively regulate gene expression in embryonic stem cells. Biophys J 2022; 121:3126-3135. [PMID: 35810331 PMCID: PMC9463640 DOI: 10.1016/j.bpj.2022.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 06/13/2022] [Accepted: 07/07/2022] [Indexed: 11/30/2022] Open
Abstract
Transcription factor (TF) binding to genomic DNA elements constitutes one of the key mechanisms that regulates gene expression program in cells. Both consensus and nonconsensus DNA sequence elements influence the recognition specificity of TFs. Based on the analysis of experimentally determined c-Myc binding preferences to genomic DNA, here we statistically predict that certain repetitive, nonconsensus DNA symmetry elements can relatively reduce TF-DNA binding preferences. This is in contrast to a different set of repetitive, nonconsensus symmetry elements that can increase the strength of TF-DNA binding. Using c-Myc enhancer reporter system containing consensus motif flanked by nonconsensus sequences in embryonic stem cells, we directly demonstrate that the enrichment in such negatively regulating repetitive symmetry elements is sufficient to reduce the gene expression level compared with native genomic sequences. Negatively regulating repetitive symmetry elements around consensus c-Myc motif and DNA sequences containing consensus c-Myc motif flanked by entirely randomized sequences show similar expression baseline. A possible explanation for this observation is that rather than complete repression, negatively regulating repetitive symmetry elements play a regulatory role in fine-tuning the reduction of gene expression, most probably by binding TFs other than c-Myc.
Collapse
Affiliation(s)
- Meir Mellul
- Department of Biological Chemistry, The Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Jerusalem, Israel
| | - Shlomtzion Lahav
- Department of Biological Chemistry, The Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Jerusalem, Israel
| | - Masahiko Imashimizu
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan
| | - Yuji Tokunaga
- Graduate School of Pharmaceutical Sciences, the University of Tokyo, Tokyo, Japan
| | - David B Lukatsky
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| | - Oren Ram
- Department of Biological Chemistry, The Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Jerusalem, Israel.
| |
Collapse
|
8
|
DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat Genet 2022; 54:613-624. [PMID: 35551305 DOI: 10.1038/s41588-022-01048-5] [Citation(s) in RCA: 69] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 03/08/2022] [Indexed: 02/06/2023]
Abstract
Enhancer sequences control gene expression and comprise binding sites (motifs) for different transcription factors (TFs). Despite extensive genetic and computational studies, the relationship between DNA sequence and regulatory activity is poorly understood, and de novo enhancer design has been challenging. Here, we built a deep-learning model, DeepSTARR, to quantitatively predict the activities of thousands of developmental and housekeeping enhancers directly from DNA sequence in Drosophila melanogaster S2 cells. The model learned relevant TF motifs and higher-order syntax rules, including functionally nonequivalent instances of the same TF motif that are determined by motif-flanking sequence and intermotif distances. We validated these rules experimentally and demonstrated that they can be generalized to humans by testing more than 40,000 wildtype and mutant Drosophila and human enhancers. Finally, we designed and functionally validated synthetic enhancers with desired activities de novo.
Collapse
|
9
|
Mauduit D, Taskiran II, Minnoye L, de Waegeneer M, Christiaens V, Hulselmans G, Demeulemeester J, Wouters J, Aerts S. Analysis of long and short enhancers in melanoma cell states. eLife 2021; 10:e71735. [PMID: 34874265 PMCID: PMC8691835 DOI: 10.7554/elife.71735] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 12/06/2021] [Indexed: 12/14/2022] Open
Abstract
Understanding how enhancers drive cell-type specificity and efficiently identifying them is essential for the development of innovative therapeutic strategies. In melanoma, the melanocytic (MEL) and the mesenchymal-like (MES) states present themselves with different responses to therapy, making the identification of specific enhancers highly relevant. Using massively parallel reporter assays (MPRAs) in a panel of patient-derived melanoma lines (MM lines), we set to identify and decipher melanoma enhancers by first focusing on regions with state-specific H3K27 acetylation close to differentially expressed genes. An in-depth evaluation of those regions was then pursued by investigating the activity of overlapping ATAC-seq peaks along with a full tiling of the acetylated regions with 190 bp sequences. Activity was observed in more than 60% of the selected regions, and we were able to precisely locate the active enhancers within ATAC-seq peaks. Comparison of sequence content with activity, using the deep learning model DeepMEL2, revealed that AP-1 alone is responsible for the MES enhancer activity. In contrast, SOX10 and MITF both influence MEL enhancer function with SOX10 being required to achieve high levels of activity. Overall, our MPRAs shed light on the relationship between long and short sequences in terms of their sequence content, enhancer activity, and specificity across melanoma cell states.
Collapse
Affiliation(s)
- David Mauduit
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Ibrahim Ihsan Taskiran
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Liesbeth Minnoye
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Maxime de Waegeneer
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Valerie Christiaens
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Gert Hulselmans
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Jonas Demeulemeester
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
- Cancer Genomics Laboratory, The Francis Crick InstituteLondonUnited Kingdom
| | - Jasper Wouters
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Stein Aerts
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| |
Collapse
|
10
|
Lin J, Huang L, Chen X, Zhang S, Wong KC. DeepMotifSyn: a deep learning approach to synthesize heterodimeric DNA motifs. Brief Bioinform 2021; 23:6370301. [PMID: 34524404 DOI: 10.1093/bib/bbab334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 07/21/2021] [Accepted: 07/28/2021] [Indexed: 11/12/2022] Open
Abstract
The cooperativity of transcription factors (TFs) is a widespread phenomenon in the gene regulation system. However, the interaction patterns between TF binding motifs remain elusive. The recent high-throughput assays, CAP-SELEX, have identified over 600 composite DNA sites (i.e. heterodimeric motifs) bound by cooperative TF pairs. However, there are over 25 000 inferentially effective heterodimeric TFs in the human cells. It is not practically feasible to validate all heterodimeric motifs due to cost and labor. We introduce DeepMotifSyn, a deep learning-based tool for synthesizing heterodimeric motifs from monomeric motif pairs. Specifically, DeepMotifSyn is composed of heterodimeric motif generator and evaluator. The generator is a U-Net-based neural network that can synthesize heterodimeric motifs from aligned motif pairs. The evaluator is a machine learning-based model that can score the generated heterodimeric motif candidates based on the motif sequence features. Systematic evaluations on CAP-SELEX data illustrate that DeepMotifSyn significantly outperforms the current state-of-the-art predictors. In addition, DeepMotifSyn can synthesize multiple heterodimeric motifs with different orientation and spacing settings. Such a feature can address the shortcomings of previous models. We believe DeepMotifSyn is a more practical and reliable model than current predictors on heterodimeric motif synthesis. Contact: kc.w@cityu.edu.hk.
Collapse
Affiliation(s)
- Jiecong Lin
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| | - Lei Huang
- Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| | - Xingjian Chen
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Shixiong Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| |
Collapse
|
11
|
Singh G, Mullany S, Moorthy SD, Zhang R, Mehdi T, Tian R, Duncan AG, Moses AM, Mitchell JA. A flexible repertoire of transcription factor binding sites and a diversity threshold determines enhancer activity in embryonic stem cells. Genome Res 2021; 31:564-575. [PMID: 33712417 PMCID: PMC8015845 DOI: 10.1101/gr.272468.120] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 02/19/2021] [Indexed: 12/28/2022]
Abstract
Transcriptional enhancers are critical for development and phenotype evolution and are often mutated in disease contexts; however, even in well-studied cell types, the sequence code conferring enhancer activity remains unknown. To examine the enhancer regulatory code for pluripotent stem cells, we identified genomic regions with conserved binding of multiple transcription factors in mouse and human embryonic stem cells (ESCs). Examination of these regions revealed that they contain on average 12.6 conserved transcription factor binding site (TFBS) sequences. Enriched TFBSs are a diverse repertoire of 70 different sequences representing the binding sequences of both known and novel ESC regulators. Using a diverse set of TFBSs from this repertoire was sufficient to construct short synthetic enhancers with activity comparable to native enhancers. Site-directed mutagenesis of conserved TFBSs in endogenous enhancers or TFBS deletion from synthetic sequences revealed a requirement for 10 or more different TFBSs. Furthermore, specific TFBSs, including the POU5F1:SOX2 comotif, are dispensable, despite cobinding the POU5F1 (also known as OCT4), SOX2, and NANOG master regulators of pluripotency. These findings reveal that a TFBS sequence diversity threshold overrides the need for optimized regulatory grammar and individual TFBSs that recruit specific master regulators.
Collapse
Affiliation(s)
- Gurdeep Singh
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Shanelle Mullany
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Sakthi D Moorthy
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Richard Zhang
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Tahmid Mehdi
- Department of Computer Science, University of Toronto, Toronto, M5S 2E4, Canada
| | - Ruxiao Tian
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Andrew G Duncan
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Alan M Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada.,Department of Computer Science, University of Toronto, Toronto, M5S 2E4, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, M5S 3B3, Canada
| | - Jennifer A Mitchell
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| |
Collapse
|
12
|
Jindal GA, Farley EK. Enhancer grammar in development, evolution, and disease: dependencies and interplay. Dev Cell 2021; 56:575-587. [PMID: 33689769 PMCID: PMC8462829 DOI: 10.1016/j.devcel.2021.02.016] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 02/15/2021] [Accepted: 02/16/2021] [Indexed: 12/19/2022]
Abstract
Each language has standard books describing that language's grammatical rules. Biologists have searched for similar, albeit more complex, principles relating enhancer sequence to gene expression. Here, we review the literature on enhancer grammar. We introduce dependency grammar, a model where enhancers encode information based on dependencies between enhancer features shaped by mechanistic, evolutionary, and biological constraints. Classifying enhancers based on the types of dependencies may identify unifying principles relating enhancer sequence to gene expression. Such rules would allow us to read the instructions for development within genomes and pinpoint causal enhancer variants underlying disease and evolutionary changes.
Collapse
Affiliation(s)
- Granton A Jindal
- Division of Cardiology, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Division of Biological Sciences, Section of Molecular Biology, University of California San Diego, La Jolla, CA 92093, USA
| | - Emma K Farley
- Division of Cardiology, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Division of Biological Sciences, Section of Molecular Biology, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
13
|
Avsec Ž, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, Fropf R, McAnany C, Gagneur J, Kundaje A, Zeitlinger J. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 2021; 53:354-366. [PMID: 33603233 PMCID: PMC8812996 DOI: 10.1038/s41588-021-00782-6] [Citation(s) in RCA: 251] [Impact Index Per Article: 83.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Accepted: 01/07/2021] [Indexed: 01/30/2023]
Abstract
The arrangement (syntax) of transcription factor (TF) binding motifs is an important part of the cis-regulatory code, yet remains elusive. We introduce a deep learning model, BPNet, that uses DNA sequence to predict base-resolution chromatin immunoprecipitation (ChIP)-nexus binding profiles of pluripotency TFs. We develop interpretation tools to learn predictive motif representations and identify soft syntax rules for cooperative TF binding interactions. Strikingly, Nanog preferentially binds with helical periodicity, and TFs often cooperate in a directional manner, which we validate using clustered regularly interspaced short palindromic repeat (CRISPR)-induced point mutations. Our model represents a powerful general approach to uncover the motifs and syntax of cis-regulatory sequences in genomics data.
Collapse
Affiliation(s)
- Žiga Avsec
- Department of Informatics, Technical University of Munich, Garching, Germany,Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Munich, Germany,Currently at DeepMind, London, UK
| | - Melanie Weilert
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Sabrina Krueger
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Amr Alexandari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Khyati Dalal
- Stowers Institute for Medical Research, Kansas City, MO, USA,The University of Kansas Medical Center, Kansas City, KS, USA
| | - Robin Fropf
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Charles McAnany
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Julien Gagneur
- Department of Informatics, Technical University of Munich, Garching, Germany
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA,Department of Genetics, Stanford University, Stanford, CA, USA,correspondence: ,
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO, USA,The University of Kansas Medical Center, Kansas City, KS, USA,correspondence: ,
| |
Collapse
|
14
|
Yu TC, Liu WL, Brinck MS, Davis JE, Shek J, Bower G, Einav T, Insigne KD, Phillips R, Kosuri S, Urtecho G. Multiplexed characterization of rationally designed promoter architectures deconstructs combinatorial logic for IPTG-inducible systems. Nat Commun 2021; 12:325. [PMID: 33436562 PMCID: PMC7804116 DOI: 10.1038/s41467-020-20094-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 11/04/2020] [Indexed: 12/21/2022] Open
Abstract
A crucial step towards engineering biological systems is the ability to precisely tune the genetic response to environmental stimuli. In the case of Escherichia coli inducible promoters, our incomplete understanding of the relationship between sequence composition and gene expression hinders our ability to predictably control transcriptional responses. Here, we profile the expression dynamics of 8269 rationally designed, IPTG-inducible promoters that collectively explore the individual and combinatorial effects of RNA polymerase and LacI repressor binding site strengths. We then fit a statistical mechanics model to measured expression that accurately models gene expression and reveals properties of theoretically optimal inducible promoters. Furthermore, we characterize three alternative promoter architectures and show that repositioning binding sites within promoters influences the types of combinatorial effects observed between promoter elements. In total, this approach enables us to deconstruct relationships between inducible promoter elements and discover practical insights for engineering inducible promoters with desirable characteristics.
Collapse
Affiliation(s)
- Timothy C Yu
- Department of Bioengineering, University of California, Los Angeles, CA, 90095, USA
| | - Winnie L Liu
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA, 90095, USA
| | - Marcia S Brinck
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA, 90095, USA
| | - Jessica E Davis
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Jeremy Shek
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Grace Bower
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA, 90095, USA
| | - Tal Einav
- Department of Physics, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Kimberly D Insigne
- Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, CA, 90095, USA
| | - Rob Phillips
- Department of Physics, California Institute of Technology, Pasadena, CA, 91125, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
- Department of Applied Physics, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Sriram Kosuri
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA.
- UCLA-DOE Institute for Genomics and Proteomics, Los Angeles, CA, 90095, USA.
- Institute for Quantitative and Computational Biosciences (QCB), University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, 90095, USA.
- Molecular Biology Interdepartmental Doctoral Program, University of California, Los Angeles, CA, 90095, USA.
| | - Guillaume Urtecho
- Molecular Biology Interdepartmental Doctoral Program, University of California, Los Angeles, CA, 90095, USA.
| |
Collapse
|
15
|
Mulvey B, Lagunas T, Dougherty JD. Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants Across Biological Contexts. Biol Psychiatry 2021; 89:76-89. [PMID: 32843144 PMCID: PMC7938388 DOI: 10.1016/j.biopsych.2020.06.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 06/09/2020] [Accepted: 06/10/2020] [Indexed: 12/18/2022]
Abstract
Neuropsychiatric phenotypes have long been known to be influenced by heritable risk factors, directly confirmed by the past decade of genetic studies that have revealed specific genetic variants enriched in disease cohorts. However, the initial hope that a small set of genes would be responsible for a given disorder proved false. The more complex reality is that a given disorder may be influenced by myriad small-effect noncoding variants and/or by rare but severe coding variants, many de novo. Noncoding genomic sequences-for which molecular functions cannot usually be inferred-harbor a large portion of these variants, creating a substantial barrier to understanding higher-order molecular and biological systems of disease. Fortunately, novel genetic technologies-scalable oligonucleotide synthesis, RNA sequencing, and CRISPR (clustered regularly interspaced short palindromic repeats)-have opened novel avenues to experimentally identify biologically significant variants en masse. Massively parallel reporter assays (MPRAs) are an especially versatile technique resulting from such innovations. MPRAs are powerful molecular genetics tools that can be used to screen thousands of untranscribed or untranslated sequences and their variants for functional effects in a single experiment. This approach, though underutilized in psychiatric genetics, has several useful features for the field. We review methods for assaying putatively functional genetic variants and regions, emphasizing MPRAs and the opportunities they hold for dissection of psychiatric polygenicity. We discuss literature applying functional assays in neurogenetics, highlighting strengths, caveats, and design considerations-especially regarding disease-relevant variables (cell type, neurodevelopment, and sex), and we ultimately propose applications of MPRA to both computational and experimental neurogenetics of polygenic disease risk.
Collapse
Affiliation(s)
- Bernard Mulvey
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Tomás Lagunas
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Joseph D Dougherty
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri.
| |
Collapse
|
16
|
Sanford EM, Emert BL, Coté A, Raj A. Gene regulation gravitates toward either addition or multiplication when combining the effects of two signals. eLife 2020; 9:e59388. [PMID: 33284110 PMCID: PMC7771960 DOI: 10.7554/elife.59388] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 12/04/2020] [Indexed: 01/07/2023] Open
Abstract
Two different cell signals often affect transcription of the same gene. In such cases, it is natural to ask how the combined transcriptional response compares to the individual responses. The most commonly used mechanistic models predict additive or multiplicative combined responses, but a systematic genome-wide evaluation of these predictions is not available. Here, we analyzed the transcriptional response of human MCF-7 cells to retinoic acid and TGF-β, applied individually and in combination. The combined transcriptional responses of induced genes exhibited a range of behaviors, but clearly favored both additive and multiplicative outcomes. We performed paired chromatin accessibility measurements and found that increases in accessibility were largely additive. There was some association between super-additivity of accessibility and multiplicative or super-multiplicative combined transcriptional responses, while sub-additivity of accessibility associated with additive transcriptional responses. Our findings suggest that mechanistic models of combined transcriptional regulation must be able to reproduce a range of behaviors.
Collapse
Affiliation(s)
- Eric M Sanford
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| | - Benjamin L Emert
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| | - Allison Coté
- Department of Bioengineering, School of Engineering and Applied Sciences, University of PennsylvaniaPhiladelphiaUnited States
- Department of Genetics, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| | - Arjun Raj
- Department of Bioengineering, School of Engineering and Applied Sciences, University of PennsylvaniaPhiladelphiaUnited States
- Department of Genetics, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| |
Collapse
|
17
|
Hammelman J, Krismer K, Banerjee B, Gifford DK, Sherwood RI. Identification of determinants of differential chromatin accessibility through a massively parallel genome-integrated reporter assay. Genome Res 2020; 30:1468-1480. [PMID: 32973041 PMCID: PMC7605270 DOI: 10.1101/gr.263228.120] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 08/26/2020] [Indexed: 12/20/2022]
Abstract
A key mechanism in cellular regulation is the ability of the transcriptional machinery to physically access DNA. Transcription factors interact with DNA to alter the accessibility of chromatin, which enables changes to gene expression during development or disease or as a response to environmental stimuli. However, the regulation of DNA accessibility via the recruitment of transcription factors is difficult to study in the context of the native genome because every genomic site is distinct in multiple ways. Here we introduce the multiplexed integrated accessibility assay (MIAA), an assay that measures chromatin accessibility of synthetic oligonucleotide sequence libraries integrated into a controlled genomic context with low native accessibility. We apply MIAA to measure the effects of sequence motifs on cell type-specific accessibility between mouse embryonic stem cells and embryonic stem cell-derived definitive endoderm cells, screening 7905 distinct DNA sequences. MIAA recapitulates differential accessibility patterns of 100-nt sequences derived from natively differential genomic regions, identifying E-box motifs common to epithelial-mesenchymal transition driver transcription factors in stem cell-specific accessible regions that become repressed in endoderm. We show that a single binding motif for a key regulatory transcription factor is sufficient to open chromatin, and classify sets of stem cell-specific, endoderm-specific, and shared accessibility-modifying transcription factor motifs. We also show that overexpression of two definitive endoderm transcription factors, T and Foxa2, results in changes to accessibility in DNA sequences containing their respective DNA-binding motifs and identify preferential motif arrangements that influence accessibility.
Collapse
Affiliation(s)
- Jennifer Hammelman
- Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Konstantin Krismer
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Budhaditya Banerjee
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | - David K Gifford
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Richard I Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
- Hubrecht Institute, 3584 CT Utrecht, Netherlands
| |
Collapse
|
18
|
Kreimer A, Yosef N. Evaluation of Davis et al.: Exploring Sequence of Determinants of Transcriptional Regulation-The Case of c-AMP Response Element. Cell Syst 2020; 11:2-4. [PMID: 32702318 DOI: 10.1016/j.cels.2020.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
One snapshot of the peer review process for "Dissection of c-AMP Response Element Architecture by Using Genomic and Episomal Massively Parallel Reporter Assays" (Davis et al., 2020).
Collapse
Affiliation(s)
- Anat Kreimer
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94158, USA; Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Nir Yosef
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA; Chan Zuckerberg Biohub, San Francisco, CA, USA; Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard University, Boston, MA, USA.
| |
Collapse
|
19
|
Davis JE, Insigne KD, Jones EM, Hastings QA, Boldridge WC, Kosuri S. Dissection of c-AMP Response Element Architecture by Using Genomic and Episomal Massively Parallel Reporter Assays. Cell Syst 2020; 11:75-85.e7. [PMID: 32603702 DOI: 10.1016/j.cels.2020.05.011] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 02/16/2020] [Accepted: 05/26/2020] [Indexed: 11/15/2022]
Abstract
In eukaryotes, transcription factors (TFs) orchestrate gene expression by binding to TF-binding sites (TFBSs) and localizing transcriptional co-regulators and RNA polymerase II to cis-regulatory elements. However, we lack a basic understanding of the relationship between TFBS composition and their quantitative transcriptional responses. Here, we measured expression driven by 17,406 synthetic cis-regulatory elements with varied compositions of a model TFBS, the c-AMP response element (CRE) by using massively parallel reporter assays (MPRAs). We find CRE number, affinity, and promoter proximity largely determines expression. In addition, we observe expression modulation based on the spacing between CREs and CRE distance to the promoter, where expression follows a helical periodicity. Finally, we compare library expression between an episomal MPRA and a genomically integrated MPRA, where a single cis-regulatory element is assayed per cell at a defined locus. These assays largely recapitulate each other, although weaker, non-canonical CREs exhibit greater activity in a genomic context.
Collapse
Affiliation(s)
- Jessica E Davis
- Department of Chemistry and Biochemistry, UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, and Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kimberly D Insigne
- Department of Chemistry and Biochemistry, UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, and Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA; Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Eric M Jones
- Department of Chemistry and Biochemistry, UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, and Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Quinn A Hastings
- Department of Chemistry and Biochemistry, UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, and Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - W Clifford Boldridge
- Department of Chemistry and Biochemistry, UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, and Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Sriram Kosuri
- Department of Chemistry and Biochemistry, UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, and Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
20
|
Goldshtein M, Mellul M, Deutch G, Imashimizu M, Takeuchi K, Meshorer E, Ram O, Lukatsky DB. Transcription Factor Binding in Embryonic Stem Cells Is Constrained by DNA Sequence Repeat Symmetry. Biophys J 2020; 118:2015-2026. [PMID: 32101712 DOI: 10.1016/j.bpj.2020.02.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 02/05/2020] [Accepted: 02/10/2020] [Indexed: 01/21/2023] Open
Abstract
Transcription factor (TF) recognition is dictated by the underlying DNA motif sequence specific for each TF. Here, we reveal that DNA sequence repeat symmetry plays a central role in defining TF-DNA-binding preferences. In particular, we find that different TFs bind similar symmetry patterns in the context of different developmental layers. Most TFs possess dominant preferences for similar DNA repeat symmetry types. However, in some cases, preferences of specific TFs are changed during differentiation, suggesting the importance of information encoded outside of known motif regions. Histone modifications also exhibit strong preferences for similar DNA repeat symmetry patterns unique to each type of modification. Next, using an in vivo reporter assay, we show that gene expression in embryonic stem cells can be positively modulated by the presence of genomic and computationally designed DNA oligonucleotides containing identified nonconsensus-repetitive sequence elements. This supports the hypothesis that certain nonconsensus-repetitive patterns possess a functional ability to regulate gene expression. We also performed a solution NMR experiment to probe the stability of double-stranded DNA via imino proton resonances for several double-stranded DNA sequences characterized by different repetitive patterns. We suggest that such local stability might play a key role in determining TF-DNA binding preferences. Overall, our findings show that despite the enormous sequence complexity of the TF-DNA binding landscape in differentiating embryonic stem cells, this landscape can be quantitatively characterized in simple terms using the notion of DNA sequence repeat symmetry.
Collapse
Affiliation(s)
- Matan Goldshtein
- Avram and Stella Goldstein-Goren Department of Biotechnology Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Meir Mellul
- Department of Biological Chemistry, The Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Gai Deutch
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Masahiko Imashimizu
- Molecular Profiling Research Center for Drug Discovery, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | - Koh Takeuchi
- Molecular Profiling Research Center for Drug Discovery, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | - Eran Meshorer
- Department of Genetics, The Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel; The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Oren Ram
- Department of Biological Chemistry, The Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
| | - David B Lukatsky
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| |
Collapse
|
21
|
King DM, Hong CKY, Shepherdson JL, Granas DM, Maricque BB, Cohen BA. Synthetic and genomic regulatory elements reveal aspects of cis-regulatory grammar in mouse embryonic stem cells. eLife 2020; 9:41279. [PMID: 32043966 PMCID: PMC7077988 DOI: 10.7554/elife.41279] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 02/07/2020] [Indexed: 01/08/2023] Open
Abstract
In embryonic stem cells (ESCs), a core transcription factor (TF) network establishes the gene expression program necessary for pluripotency. To address how interactions between four key TFs contribute to cis-regulation in mouse ESCs, we assayed two massively parallel reporter assay (MPRA) libraries composed of binding sites for SOX2, POU5F1 (OCT4), KLF4, and ESRRB. Comparisons between synthetic cis-regulatory elements and genomic sequences with comparable binding site configurations revealed some aspects of a regulatory grammar. The expression of synthetic elements is influenced by both the number and arrangement of binding sites. This grammar plays only a small role for genomic sequences, as the relative activities of genomic sequences are best explained by the predicted occupancy of binding sites, regardless of binding site identity and positioning. Our results suggest that the effects of transcription factor binding sites (TFBS) are influenced by the order and orientation of sites, but that in the genome the overall occupancy of TFs is the primary determinant of activity. Transcription factors are proteins that flip genetic switches; their role is to control when and where genes are active. They do this by binding to short stretches of DNA called cis-regulatory sequences. Each sequence can have several binding sites for different transcription factors, but it is largely unclear whether the transcription factors binding to the same regulatory sequence actually work together. It is possible that each transcription factor may work independently and there only needs to be critical mass of transcription factors bound to throw the genetic switch. If this is the case, the most important features of a cis-regulatory sequence should be the number of binding sites it contains, and how tightly the transcription factors bind to those sites. The more transcription factors and the more strongly they bind, the more active the gene should be. An alternative option is that certain transcription factors may work better together, enhancing each other's effects such that the total effect is more than the sum of its parts. If this is true, the order, orientation and spacing of the binding sites within a sequence should matter more than the number. One way to investigate to distinguish between these possibilities is to study mouse embryonic stem cells, which have a core set of four transcription factors. Looking directly at a real genome, however, can be confusing and it is difficult to measure the effects of different cis-regulatory sequences because genes differ in so many other ways. To tackle this problem, King et al. created a synthetic set of cis-regulatory sequences based on the four core transcription factors found in mouse stem cells. The synthetic set had every combination of two, three or four of the binding sites, with each site either facing forwards or backwards along the DNA strand. King et al. attached each of the synthetic cis-regulatory sequences to a reporter gene to find out how well each sequence performed. This revealed that the cis-regulatory sequences with the most binding sites and the tightest binding affinities work best, suggesting that transcription factors mainly work independently. There was evidence of some interaction between some transcription factors, because, of the synthetic sequences with four binding sites, some worked better than others, and there were patterns in the most effective binding site combinations. However, these effects were small and when King et al. went on to test sequences from the real mouse genome, the most important factor by far was the number of binding sites. Synthetic libraries of DNA sequences allow researchers to examine gene regulation more clearly than is possible in real genomes. Yet this approach does have its limitations and it is impossible to capture every type of cis-regulatory sequence in one library. The next step to extend this work is to combine the two approaches, taking sequences from the real genome and manipulating them one by one. This could help to unravel the rules that govern how cis-regulatory sequences work in real cells.
Collapse
Affiliation(s)
- Dana M King
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - Clarice Kit Yee Hong
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - James L Shepherdson
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - David M Granas
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - Brett B Maricque
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| | - Barak A Cohen
- Edison Center for Genome Sciences and Systems Biology, Washington University in St. Louis, St. Louis, United States.,Department of Genetics, Washington University in St. Louis, St. Louis, United States
| |
Collapse
|
22
|
Comoglio F, Park HJ, Schoenfelder S, Barozzi I, Bode D, Fraser P, Green AR. Thrombopoietin signaling to chromatin elicits rapid and pervasive epigenome remodeling within poised chromatin architectures. Genome Res 2018; 28:295-309. [PMID: 29429976 PMCID: PMC5848609 DOI: 10.1101/gr.227272.117] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Accepted: 01/26/2018] [Indexed: 12/13/2022]
Abstract
Thrombopoietin (TPO) is a critical cytokine regulating hematopoietic stem cell maintenance and differentiation into the megakaryocytic lineage. However, the transcriptional and chromatin dynamics elicited by TPO signaling are poorly understood. Here, we study the immediate early transcriptional and cis-regulatory responses to TPO in hematopoietic stem/progenitor cells (HSPCs) and use this paradigm of cytokine signaling to chromatin to dissect the relationship between cis-regulatory activity and chromatin architecture. We show that TPO profoundly alters the transcriptome of HSPCs, with key hematopoietic regulators being transcriptionally repressed within 30 min of TPO. By examining cis-regulatory dynamics and chromatin architectures, we demonstrate that these changes are accompanied by rapid and extensive epigenome remodeling of cis-regulatory landscapes that is spatially coordinated within topologically associating domains (TADs). Moreover, TPO-responsive enhancers are spatially clustered and engage in preferential homotypic intra- and inter-TAD interactions that are largely refractory to TPO signaling. By further examining the link between cis-regulatory dynamics and chromatin looping, we show that rapid modulation of cis-regulatory activity is largely independent of chromatin looping dynamics. Finally, we show that, although activated and repressed cis-regulatory elements share remarkably similar DNA sequence compositions, transcription factor binding patterns accurately predict rapid cis-regulatory responses to TPO.
Collapse
Affiliation(s)
- Federico Comoglio
- Cambridge Institute for Medical Research, Medical Research Council/Wellcome Trust Stem Cell Institute, and Department of Haematology, University of Cambridge, Cambridge CB2 0XY, United Kingdom
| | - Hyun Jung Park
- Cambridge Institute for Medical Research, Medical Research Council/Wellcome Trust Stem Cell Institute, and Department of Haematology, University of Cambridge, Cambridge CB2 0XY, United Kingdom
| | - Stefan Schoenfelder
- Nuclear Dynamics Programme, The Babraham Institute, Cambridge CB22 3AT, United Kingdom
| | - Iros Barozzi
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Daniel Bode
- Cambridge Institute for Medical Research, Medical Research Council/Wellcome Trust Stem Cell Institute, and Department of Haematology, University of Cambridge, Cambridge CB2 0XY, United Kingdom
| | - Peter Fraser
- Nuclear Dynamics Programme, The Babraham Institute, Cambridge CB22 3AT, United Kingdom
- Department of Biological Science, Florida State University, Tallahassee, Florida 32301, USA
| | - Anthony R Green
- Cambridge Institute for Medical Research, Medical Research Council/Wellcome Trust Stem Cell Institute, and Department of Haematology, University of Cambridge, Cambridge CB2 0XY, United Kingdom
- Department of Haematology, Addenbrooke's Hospital, Cambridge CB2 0XY, United Kingdom
| |
Collapse
|
23
|
Catarino RR, Stark A. Assessing sufficiency and necessity of enhancer activities for gene expression and the mechanisms of transcription activation. Genes Dev 2018; 32:202-223. [PMID: 29491135 PMCID: PMC5859963 DOI: 10.1101/gad.310367.117] [Citation(s) in RCA: 124] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Enhancers are important genomic regulatory elements directing cell type-specific transcription. They assume a key role during development and disease, and their identification and functional characterization have long been the focus of scientific interest. The advent of next-generation sequencing and clustered regularly interspaced short palindromic repeat (CRISPR)/Cas9-based genome editing has revolutionized the means by which we study enhancer biology. In this review, we cover recent developments in the prediction of enhancers based on chromatin characteristics and their identification by functional reporter assays and endogenous DNA perturbations. We discuss that the two latter approaches provide different and complementary insights, especially in assessing enhancer sufficiency and necessity for transcription activation. Furthermore, we discuss recent insights into mechanistic aspects of enhancer function, including findings about cofactor requirements and the role of post-translational histone modifications such as monomethylation of histone H3 Lys4 (H3K4me1). Finally, we survey how these approaches advance our understanding of transcription regulation with respect to promoter specificity and transcriptional bursting and provide an outlook covering open questions and promising developments.
Collapse
Affiliation(s)
- Rui R Catarino
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), 1030 Vienna, Austria
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), 1030 Vienna, Austria
| |
Collapse
|
24
|
PTRE-seq reveals mechanism and interactions of RNA binding proteins and miRNAs. Nat Commun 2018; 9:301. [PMID: 29352242 PMCID: PMC5775260 DOI: 10.1038/s41467-017-02745-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 12/22/2017] [Indexed: 11/08/2022] Open
Abstract
RNA binding proteins (RBP) and microRNAs (miRNAs) often bind sequences in 3' untranslated regions (UTRs) of mRNAs, and regulate stability and translation efficiency. With the identification of numerous RBPs and miRNAs, there is an urgent need for new technologies to dissect the function of the cis-acting elements of RBPs and miRNAs. We describe post-transcriptional regulatory element sequencing (PTRE-seq), a massively parallel method for assaying the target sequences of miRNAs and RBPs. We use PTRE-seq to dissect sequence preferences and interactions between miRNAs and RBPs. The binding sites for these effector molecules influenced different aspects of the RNA lifecycle: RNA stability, translation efficiency, and translation initiation. In some cases, post-transcriptional control is modular, with different factors acting independently of each other, while in other cases factors show specific epistatic interactions. The throughput, flexibility, and reproducibility of PTRE-seq make it a valuable tool to study post-transcriptional regulation by 3'UTR elements.
Collapse
|
25
|
Sundaram V, Wang T. Transposable Element Mediated Innovation in Gene Regulatory Landscapes of Cells: Re-Visiting the "Gene-Battery" Model. Bioessays 2018; 40:10.1002/bies.201700155. [PMID: 29206283 PMCID: PMC5912915 DOI: 10.1002/bies.201700155] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 10/25/2017] [Indexed: 01/31/2023]
Abstract
Transposable elements (TEs) are no longer considered to be "junk" DNA. Here, we review how TEs can impact gene regulation systematically. TEs encode various regulatory elements that enables them to regulate gene expression. RJ Britten and EH Davidson hypothesized that TEs can integrate the function of various transcriptional regulators into gene regulatory networks. Uniquely TEs can deposit regulatory sites across the genome when they transpose, and thereby bring multiple genes under control of the same regulatory logic. Several studies together have robustly established that TEs participate in embryonic development and oncogenesis. We discuss the regulatory characteristics of TEs in context of evolution to understand the extent of their impact on gene networks. Understanding these features of TEs is central to future investigations of TEs in cellular processes and phenotypic presentations, which are applicable to development and disease studies. We re-visit the Britten-Davidson "gene-battery" model and understand the genetic and transcriptional impact of TEs in innovating gene regulatory networks.
Collapse
Affiliation(s)
- Vasavi Sundaram
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Ting Wang
- Department of Genetics, Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St Louis, Missouri 63110, United States of America
| |
Collapse
|
26
|
A Simple Grammar Defines Activating and Repressing cis-Regulatory Elements in Photoreceptors. Cell Rep 2017; 17:1247-1254. [PMID: 27783940 DOI: 10.1016/j.celrep.2016.09.066] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Revised: 08/06/2016] [Accepted: 09/20/2016] [Indexed: 12/22/2022] Open
Abstract
Transcription factors often activate and repress different target genes in the same cell. How activation and repression are encoded by different arrangements of transcription factor binding sites in cis-regulatory elements is poorly understood. We investigated how sites for the transcription factor CRX encode both activation and repression in photoreceptors by assaying thousands of genomic and synthetic cis-regulatory elements in wild-type and Crx-/- retinas. We found that sequences with high affinity for CRX repress transcription, whereas sequences with lower affinity activate. This rule is modified by a cooperative interaction between CRX sites and sites for the transcription factor NRL, which overrides the repressive effect of high affinity for CRX. Our results show how simple rearrangements of transcription factor binding sites encode qualitatively different responses to a single transcription factor and explain how CRX plays multiple cis-regulatory roles in the same cell.
Collapse
|
27
|
Levo M, Avnit-Sagi T, Lotan-Pompan M, Kalma Y, Weinberger A, Yakhini Z, Segal E. Systematic Investigation of Transcription Factor Activity in the Context of Chromatin Using Massively Parallel Binding and Expression Assays. Mol Cell 2017; 65:604-617.e6. [DOI: 10.1016/j.molcel.2017.01.007] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2016] [Revised: 11/28/2016] [Accepted: 01/10/2017] [Indexed: 10/20/2022]
|
28
|
Reiter F, Wienerroither S, Stark A. Combinatorial function of transcription factors and cofactors. Curr Opin Genet Dev 2017; 43:73-81. [PMID: 28110180 DOI: 10.1016/j.gde.2016.12.007] [Citation(s) in RCA: 190] [Impact Index Per Article: 27.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Revised: 12/14/2016] [Accepted: 12/21/2016] [Indexed: 12/31/2022]
Abstract
Differential gene expression gives rise to the many cell types of complex organisms. Enhancers regulate transcription by binding transcription factors (TFs), which in turn recruit cofactors to activate RNA Polymerase II at core promoters. Transcriptional regulation is typically mediated by distinct combinations of TFs, enabling a relatively small number of TFs to generate a large diversity of cell types. However, how TFs achieve combinatorial enhancer control and how enhancers, enhancer-bound TFs, and the cofactors they recruit regulate RNA Polymerase II activity is not entirely clear. Here, we review how TF synergy is mediated at the level of DNA binding and after binding, the role of cofactors and the post-translational modifications they catalyze, and discuss different models of enhancer-core-promoter communication.
Collapse
Affiliation(s)
- Franziska Reiter
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, 1030 Vienna, Austria
| | - Sebastian Wienerroither
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, 1030 Vienna, Austria
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, 1030 Vienna, Austria.
| |
Collapse
|
29
|
Fish AE, Capra JA, Bush WS. Are Interactions between cis-Regulatory Variants Evidence for Biological Epistasis or Statistical Artifacts? Am J Hum Genet 2016; 99:817-830. [PMID: 27640306 DOI: 10.1016/j.ajhg.2016.07.022] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Accepted: 07/29/2016] [Indexed: 01/26/2023] Open
Abstract
The importance of epistasis-or statistical interactions between genetic variants-to the development of complex disease in humans has been controversial. Genome-wide association studies of statistical interactions influencing human traits have recently become computationally feasible and have identified many putative interactions. However, statistical models used to detect interactions can be confounded, which makes it difficult to be certain that observed statistical interactions are evidence for true molecular epistasis. In this study, we investigate whether there is evidence for epistatic interactions between genetic variants within the cis-regulatory region that influence gene expression after accounting for technical, statistical, and biological confounding factors. We identified 1,119 (FDR = 5%) interactions that appear to regulate gene expression in human lymphoblastoid cell lines, a tightly controlled, largely genetically determined phenotype. Many of these interactions replicated in an independent dataset (90 of 803 tested, Bonferroni threshold). We then performed an exhaustive analysis of both known and novel confounders, including ceiling/floor effects, missing genotype combinations, haplotype effects, single variants tagged through linkage disequilibrium, and population stratification. Every interaction could be explained by at least one of these confounders, and replication in independent datasets did not protect against some confounders. Assuming that the confounding factors provide a more parsimonious explanation for each interaction, we find it unlikely that cis-regulatory interactions contribute strongly to human gene expression, which calls into question the relevance of cis-regulatory interactions for other human phenotypes. We additionally propose several best practices for epistasis testing to protect future studies from confounding.
Collapse
|