1
|
Wang S, Wang W. Interpretable prediction of mRNA abundance from promoter sequence using contextual regression models. NAR Genom Bioinform 2024; 6:lqae055. [PMID: 38807713 PMCID: PMC11131020 DOI: 10.1093/nargab/lqae055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 04/08/2024] [Accepted: 05/12/2024] [Indexed: 05/30/2024] Open
Abstract
While machine learning models have been successfully applied to predicting gene expression from promoter sequences, it remains a great challenge to derive intuitive interpretation of the model and reveal DNA motif grammar such as motif cooperation and distance constraint between motif sites. Previous interpretation approaches are often time-consuming or have difficulty to learn the combinatory rules. In this work, we designed interpretable neural network models to predict the mRNA expression levels from DNA sequences. By applying the Contextual Regression framework we developed, we extracted weighted features to cluster samples into different groups, which have different gene expression levels. We performed motif analysis in each cluster and found motifs with active or repressive regulation on gene expression. By comparing the co-occurrence locations of discovered motifs, we also uncovered multiple grammars of motif combination including communities of cooperative motifs and distance constraints between motif pairs. These results revealed new insights of the regulatory architecture of promoter sequences.
Collapse
Affiliation(s)
- Song Wang
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0359, USA
| | - Wei Wang
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0359, USA
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093-0359, USA
| |
Collapse
|
2
|
He J, Huo X, Pei G, Jia Z, Yan Y, Yu J, Qu H, Xie Y, Yuan J, Zheng Y, Hu Y, Shi M, You K, Li T, Ma T, Zhang MQ, Ding S, Li P, Li Y. Dual-role transcription factors stabilize intermediate expression levels. Cell 2024; 187:2746-2766.e25. [PMID: 38631355 DOI: 10.1016/j.cell.2024.03.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 12/08/2023] [Accepted: 03/18/2024] [Indexed: 04/19/2024]
Abstract
Precise control of gene expression levels is essential for normal cell functions, yet how they are defined and tightly maintained, particularly at intermediate levels, remains elusive. Here, using a series of newly developed sequencing, imaging, and functional assays, we uncover a class of transcription factors with dual roles as activators and repressors, referred to as condensate-forming level-regulating dual-action transcription factors (TFs). They reduce high expression but increase low expression to achieve stable intermediate levels. Dual-action TFs directly exert activating and repressing functions via condensate-forming domains that compartmentalize core transcriptional unit selectively. Clinically relevant mutations in these domains, which are linked to a range of developmental disorders, impair condensate selectivity and dual-action TF activity. These results collectively address a fundamental question in expression regulation and demonstrate the potential of level-regulating dual-action TFs as powerful effectors for engineering controlled expression levels.
Collapse
Affiliation(s)
- Jinnan He
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Xiangru Huo
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Gaofeng Pei
- State Key Laboratory of Membrane Biology, Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing 100084, China; Tsinghua University-Peking University Joint Center for Life Sciences, Beijing 100084, China
| | - Zeran Jia
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Yiming Yan
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Jiawei Yu
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Haozhi Qu
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Yunxin Xie
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Junsong Yuan
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Yuan Zheng
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Yanyan Hu
- School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China; Tsinghua University-Peking University Joint Center for Life Sciences, Beijing 100084, China
| | - Minglei Shi
- Bioinformatics Division, National Research Center for Information Science and Technology, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Kaiqiang You
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Tingting Li
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Tianhua Ma
- School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China; Tsinghua University-Peking University Joint Center for Life Sciences, Beijing 100084, China
| | - Michael Q Zhang
- Bioinformatics Division, National Research Center for Information Science and Technology, School of Medicine, Tsinghua University, Beijing 100084, China; Department of Biological Sciences, Center for Systems Biology, The University of Texas, Dallas, TX 75080-3021, USA
| | - Sheng Ding
- School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China; Tsinghua University-Peking University Joint Center for Life Sciences, Beijing 100084, China
| | - Pilong Li
- State Key Laboratory of Membrane Biology, Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing 100084, China; Tsinghua University-Peking University Joint Center for Life Sciences, Beijing 100084, China.
| | - Yinqing Li
- The IDG/McGovern Institute for Brain Research, MOE Key Laboratory of Bioinformatics, State Key Lab of Molecular Oncology, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China; School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
3
|
Kang CK, Kim AR. Deep molecular learning of transcriptional control of a synthetic CRE enhancer and its variants. iScience 2024; 27:108747. [PMID: 38222110 PMCID: PMC10784702 DOI: 10.1016/j.isci.2023.108747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/29/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024] Open
Abstract
Massively parallel reporter assay measures transcriptional activities of various cis-regulatory modules (CRMs) in a single experiment. We developed a thermodynamic computational model framework that calculates quantitative levels of gene expression directly from regulatory DNA sequences. Using the framework, we investigated the molecular mechanisms of cis-regulatory mutations of a synthetic enhancer that cause abnormal gene expression. We found that, in a human cell line, competitive binding between family transcription factors (TFs) with slightly different binding preferences significantly increases the accuracy of recapitulating the transcriptional effects of thousands of single- or multi-mutations. We also discovered that even if various harmful mutations occurred in an activator binding site, CRM could stably maintain or even increase gene expression through a certain form of competitive binding between family TFs. These findings enhance understanding the effect of SNPs and indels on CRMs and would help building robust custom-designed CRMs for biologics production and gene therapy.
Collapse
Affiliation(s)
- Chan-Koo Kang
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- School of Applied Artificial Intelligence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| |
Collapse
|
4
|
de Boer CG, Taipale J. Hold out the genome: a roadmap to solving the cis-regulatory code. Nature 2024; 625:41-50. [PMID: 38093018 DOI: 10.1038/s41586-023-06661-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/20/2023] [Indexed: 01/05/2024]
Abstract
Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.
Collapse
Affiliation(s)
- Carl G de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Jussi Taipale
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.
- Department of Biochemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
5
|
Kleinschmidt H, Xu C, Bai L. Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 2023; 132:167-189. [PMID: 37184694 PMCID: PMC10542970 DOI: 10.1007/s00412-023-00796-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/16/2023]
Abstract
Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.
Collapse
Affiliation(s)
- Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Physics, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
6
|
Chen H, Yan C, Dhasarathy A, Kladde M, Bai L. Investigating pioneer factor activity and its coordination with chromatin remodelers using integrated synthetic oligo assay. STAR Protoc 2023; 4:102279. [PMID: 37289591 PMCID: PMC10323128 DOI: 10.1016/j.xpro.2023.102279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 03/24/2023] [Accepted: 04/07/2023] [Indexed: 06/10/2023] Open
Abstract
Chromatin accessibility is regulated by pioneer factors (PFs) and chromatin remodelers (CRs). Here, we present a protocol, based on integrated synthetic oligonucleotide libraries in yeast, to systematically interrogate the nucleosome-displacing activities of PFs and their coordination with CRs. We describe steps for designing oligonucleotide sequences, constructing yeast libraries, measuring nucleosome configurations, and data analyses. This approach potentially can be adapted for use in higher eukaryotes to investigate the activities of many types of chromatin-associated factors. For complete details on the use and execution of this protocol, please refer to Yan et al.,1 and Chen et al.2.
Collapse
Affiliation(s)
- Hengye Chen
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA.
| | - Chao Yan
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Archana Dhasarathy
- Department of Biomedical Sciences, University of North Dakota School of Medicine and Health Sciences, Grand Forks, ND 58201, USA
| | - Michael Kladde
- Department of Biochemistry and Molecular Biology, College of Medicine, University of Florida, Gainesville, FL 32610, USA; UF Health Cancer Center, University of Florida, Gainesville, FL 32610, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA; Department of Physics, The Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
7
|
Isbel L, Grand RS, Schübeler D. Generating specificity in genome regulation through transcription factor sensitivity to chromatin. Nat Rev Genet 2022; 23:728-740. [PMID: 35831531 DOI: 10.1038/s41576-022-00512-6] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/30/2022] [Indexed: 12/11/2022]
Abstract
Cell type-specific gene expression relies on transcription factors (TFs) binding DNA sequence motifs embedded in chromatin. Understanding how motifs are accessed in chromatin is crucial to comprehend differential transcriptional responses and the phenotypic impact of sequence variation. Chromatin obstacles to TF binding range from DNA methylation to restriction of DNA access by nucleosomes depending on their position, composition and modification. In vivo and in vitro approaches now enable the study of TF binding in chromatin at unprecedented resolution. Emerging insights suggest that TFs vary in their ability to navigate chromatin states. However, it remains challenging to link binding and transcriptional outcomes to molecular characteristics of TFs or the local chromatin substrate. Here, we discuss our current understanding of how TFs access DNA in chromatin and novel techniques and directions towards a better understanding of this critical step in genome regulation.
Collapse
Affiliation(s)
- Luke Isbel
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.,School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Ralph S Grand
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.,Zentrum für Molekulare Biologie der Universität Heidelberg, Heidelberg, Germany
| | - Dirk Schübeler
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland. .,Faculty of Sciences, University of Basel, Basel, Switzerland.
| |
Collapse
|
8
|
Distinct functions of three chromatin remodelers in activator binding and preinitiation complex assembly. PLoS Genet 2022; 18:e1010277. [PMID: 35793348 PMCID: PMC9292117 DOI: 10.1371/journal.pgen.1010277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 07/18/2022] [Accepted: 05/28/2022] [Indexed: 12/01/2022] Open
Abstract
The nucleosome remodeling complexes (CRs) SWI/SNF, RSC, and Ino80C cooperate in evicting or repositioning nucleosomes to produce nucleosome depleted regions (NDRs) at the promoters of many yeast genes induced by amino acid starvation. We analyzed mutants depleted of the catalytic subunits of these CRs for binding of transcriptional activator Gcn4 and recruitment of TATA-binding protein (TBP) during preinitiation complex (PIC) assembly. RSC and Ino80 were found to enhance Gcn4 binding to both UAS elements in NDRs upstream of promoters and to unconventional binding sites within nucleosome-occupied coding sequences; and SWI/SNF contributes to UAS binding when RSC is depleted. All three CRs are actively recruited by Gcn4 to most UAS elements and appear to enhance Gcn4 binding by reducing nucleosome occupancies at the binding motifs, indicating a positive regulatory loop. SWI/SNF acts unexpectedly in WT cells to prevent excessive Gcn4 binding at many UAS elements, indicating a dual mode of action that is modulated by the presence of RSC. RSC and SWI/SNF collaborate to enhance TBP recruitment at Gcn4 target genes, together with Ino80C, in a manner associated with nucleosome eviction at the TBP binding sites. Cooperation among the CRs in TBP recruitment is also evident at the highly transcribed ribosomal protein genes, while RSC and Ino80C act more broadly than SWI/SNF at the majority of other constitutively expressed genes to stimulate this step in PIC assembly. Our findings indicate a complex interplay among the CRs in evicting promoter nucleosomes to regulate activator binding and stimulate PIC assembly. ATP-dependent chromatin remodelers (CRs), including SWI/SNF and RSC in budding yeast, are thought to stimulate transcription by repositioning or evicting promoter nucleosomes, and we recently implicated the CR Ino80C in this process as well. The relative importance of these CRs in stimulating activator binding and recruitment of TATA-binding protein (TBP) to promoters is incompletely understood. Examining mutants depleted of the catalytic subunits of these CRs, we determined that RSC and Ino80C stimulate binding of transcription factor Gcn4 to nucleosome-depleted regions, or linkers between genic nucleosomes, at multiple target genes activated by Gcn4 in amino acid-starved cells, frequently via evicting nucleosomes from the Gcn4 binding motifs. At some genes, SWI/SNF functionally complements RSC, while opposing RSC at others to limit Gcn4 binding. The CRs in turn are recruited by Gcn4, consistent with a positive feedback loop that enhances Gcn4 binding. The three CRs also cooperate to enhance TBP recruitment, again involving nucleosome depletion, at both Gcn4 target and highly expressed ribosomal protein genes, whereas only RSC and Ino80C act broadly throughout the genome to enhance this key step in preinitiation complex assembly. Our findings illuminate functional cooperation among multiple CRs in regulating activator binding and promoter activation.
Collapse
|
9
|
Boldyreva LV, Andreyeva EN, Pindyurin AV. Position Effect Variegation: Role of the Local Chromatin Context in Gene Expression Regulation. Mol Biol 2022. [DOI: 10.1134/s0026893322030049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
10
|
Vaknin I, Amit R. Molecular and experimental tools to design synthetic enhancers. Curr Opin Biotechnol 2022; 76:102728. [PMID: 35525178 DOI: 10.1016/j.copbio.2022.102728] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Revised: 03/16/2022] [Accepted: 04/03/2022] [Indexed: 11/03/2022]
Abstract
Understanding the grammar of enhancers and how they regulate gene expression is key for both basic research and for the pharma and biotech industries. The design and characterization of synthetic enhancers can expand the known regulatory space. This is achieved by the utilization of DNA Oligo Libraries (OLs), which facilitates screening of as many as millions of synthetic enhancer variants simultaneously. This review includes the latest commercial DNA OL synthesis technology and its capabilities, and a general 'know-how' guide for the design, construction, and analysis of OL-based synthetic enhancer characterization experiments. Specifically, we focus on synthetic-enhancer-based massively parallel reporter assay, Sort-seq methodologies (e.g. flow cytometry, deep sequencing), and a brief description of machine learning-based attempts for OL-analysis and follow-up validation experiments.
Collapse
Affiliation(s)
- Inbal Vaknin
- Department of Biotechnology and Food Engineering, Technion - Israel Institute of Technology, Haifa 3200000, Israel
| | - Roee Amit
- Department of Biotechnology and Food Engineering, Technion - Israel Institute of Technology, Haifa 3200000, Israel; The Russell Berrie Nanotechnology Institute, Technion - Israel Institute of Technology, Haifa 3200000, Israel.
| |
Collapse
|
11
|
Genome-wide quantification of transcription factor binding at single-DNA-molecule resolution using methyl-transferase footprinting. Nat Protoc 2021; 16:5673-5706. [PMID: 34773120 DOI: 10.1038/s41596-021-00630-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 09/06/2021] [Indexed: 01/16/2023]
Abstract
Precise control of gene expression requires the coordinated action of multiple factors at cis-regulatory elements. We recently developed single-molecule footprinting to simultaneously resolve the occupancy of multiple proteins including transcription factors, RNA polymerase II and nucleosomes on single DNA molecules genome-wide. The technique combines the use of cytosine methyltransferases to footprint the genome with bisulfite sequencing to resolve transcription factor binding patterns at cis-regulatory elements. DNA footprinting is performed by incubating permeabilized nuclei with recombinant methyltransferases. Upon DNA extraction, whole-genome or targeted bisulfite libraries are prepared and loaded on Illumina sequencers. The protocol can be completed in 4-5 d in any laboratory with access to high-throughput sequencing. Analysis can be performed in 2 d using a dedicated R package and requires access to a high-performance computing system. Our method can be used to analyze how transcription factors cooperate and antagonize to regulate transcription.
Collapse
|
12
|
Krebs AR. Studying transcription factor function in the genome at molecular resolution. Trends Genet 2021; 37:798-806. [DOI: 10.1016/j.tig.2021.03.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 03/22/2021] [Accepted: 03/23/2021] [Indexed: 12/11/2022]
|
13
|
John E, Singh KB, Oliver RP, Tan K. Transcription factor control of virulence in phytopathogenic fungi. MOLECULAR PLANT PATHOLOGY 2021; 22:858-881. [PMID: 33973705 PMCID: PMC8232033 DOI: 10.1111/mpp.13056] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 03/02/2021] [Accepted: 03/04/2021] [Indexed: 05/12/2023]
Abstract
Plant-pathogenic fungi are a significant threat to economic and food security worldwide. Novel protection strategies are required and therefore it is critical we understand the mechanisms by which these pathogens cause disease. Virulence factors and pathogenicity genes have been identified, but in many cases their roles remain elusive. It is becoming increasingly clear that gene regulation is vital to enable plant infection and transcription factors play an essential role. Efforts to determine their regulatory functions in plant-pathogenic fungi have expanded since the annotation of fungal genomes revealed the ubiquity of transcription factors from a broad range of families. This review establishes the significance of transcription factors as regulatory elements in plant-pathogenic fungi and provides a systematic overview of those that have been functionally characterized. Detailed analysis is provided on regulators from well-characterized families controlling various aspects of fungal metabolism, development, stress tolerance, and the production of virulence factors such as effectors and secondary metabolites. This covers conserved transcription factors with either specialized or nonspecialized roles, as well as recently identified regulators targeting key virulence pathways. Fundamental knowledge of transcription factor regulation in plant-pathogenic fungi provides avenues to identify novel virulence factors and improve our understanding of the regulatory networks linked to pathogen evolution, while transcription factors can themselves be specifically targeted for disease control. Areas requiring further insight regarding the molecular mechanisms and/or specific classes of transcription factors are identified, and direction for future investigation is presented.
Collapse
Affiliation(s)
- Evan John
- Centre for Crop and Disease ManagementCurtin UniversityBentleyWestern AustraliaAustralia
- School of Molecular and Life SciencesCurtin UniversityBentleyWestern AustraliaAustralia
| | - Karam B. Singh
- Agriculture and FoodCommonwealth Scientific and Industrial Research OrganisationFloreatWestern AustraliaAustralia
| | - Richard P. Oliver
- School of Molecular and Life SciencesCurtin UniversityBentleyWestern AustraliaAustralia
| | - Kar‐Chun Tan
- Centre for Crop and Disease ManagementCurtin UniversityBentleyWestern AustraliaAustralia
- School of Molecular and Life SciencesCurtin UniversityBentleyWestern AustraliaAustralia
| |
Collapse
|
14
|
Jindal GA, Farley EK. Enhancer grammar in development, evolution, and disease: dependencies and interplay. Dev Cell 2021; 56:575-587. [PMID: 33689769 PMCID: PMC8462829 DOI: 10.1016/j.devcel.2021.02.016] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 02/15/2021] [Accepted: 02/16/2021] [Indexed: 12/19/2022]
Abstract
Each language has standard books describing that language's grammatical rules. Biologists have searched for similar, albeit more complex, principles relating enhancer sequence to gene expression. Here, we review the literature on enhancer grammar. We introduce dependency grammar, a model where enhancers encode information based on dependencies between enhancer features shaped by mechanistic, evolutionary, and biological constraints. Classifying enhancers based on the types of dependencies may identify unifying principles relating enhancer sequence to gene expression. Such rules would allow us to read the instructions for development within genomes and pinpoint causal enhancer variants underlying disease and evolutionary changes.
Collapse
Affiliation(s)
- Granton A Jindal
- Division of Cardiology, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Division of Biological Sciences, Section of Molecular Biology, University of California San Diego, La Jolla, CA 92093, USA
| | - Emma K Farley
- Division of Cardiology, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Division of Biological Sciences, Section of Molecular Biology, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
15
|
Yu TC, Liu WL, Brinck MS, Davis JE, Shek J, Bower G, Einav T, Insigne KD, Phillips R, Kosuri S, Urtecho G. Multiplexed characterization of rationally designed promoter architectures deconstructs combinatorial logic for IPTG-inducible systems. Nat Commun 2021; 12:325. [PMID: 33436562 PMCID: PMC7804116 DOI: 10.1038/s41467-020-20094-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 11/04/2020] [Indexed: 12/21/2022] Open
Abstract
A crucial step towards engineering biological systems is the ability to precisely tune the genetic response to environmental stimuli. In the case of Escherichia coli inducible promoters, our incomplete understanding of the relationship between sequence composition and gene expression hinders our ability to predictably control transcriptional responses. Here, we profile the expression dynamics of 8269 rationally designed, IPTG-inducible promoters that collectively explore the individual and combinatorial effects of RNA polymerase and LacI repressor binding site strengths. We then fit a statistical mechanics model to measured expression that accurately models gene expression and reveals properties of theoretically optimal inducible promoters. Furthermore, we characterize three alternative promoter architectures and show that repositioning binding sites within promoters influences the types of combinatorial effects observed between promoter elements. In total, this approach enables us to deconstruct relationships between inducible promoter elements and discover practical insights for engineering inducible promoters with desirable characteristics.
Collapse
Affiliation(s)
- Timothy C Yu
- Department of Bioengineering, University of California, Los Angeles, CA, 90095, USA
| | - Winnie L Liu
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA, 90095, USA
| | - Marcia S Brinck
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA, 90095, USA
| | - Jessica E Davis
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Jeremy Shek
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Grace Bower
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA, 90095, USA
| | - Tal Einav
- Department of Physics, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Kimberly D Insigne
- Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, CA, 90095, USA
| | - Rob Phillips
- Department of Physics, California Institute of Technology, Pasadena, CA, 91125, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
- Department of Applied Physics, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Sriram Kosuri
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA.
- UCLA-DOE Institute for Genomics and Proteomics, Los Angeles, CA, 90095, USA.
- Institute for Quantitative and Computational Biosciences (QCB), University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, 90095, USA.
- Molecular Biology Interdepartmental Doctoral Program, University of California, Los Angeles, CA, 90095, USA.
| | - Guillaume Urtecho
- Molecular Biology Interdepartmental Doctoral Program, University of California, Los Angeles, CA, 90095, USA.
| |
Collapse
|
16
|
Mulvey B, Lagunas T, Dougherty JD. Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants Across Biological Contexts. Biol Psychiatry 2021; 89:76-89. [PMID: 32843144 PMCID: PMC7938388 DOI: 10.1016/j.biopsych.2020.06.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 06/09/2020] [Accepted: 06/10/2020] [Indexed: 12/18/2022]
Abstract
Neuropsychiatric phenotypes have long been known to be influenced by heritable risk factors, directly confirmed by the past decade of genetic studies that have revealed specific genetic variants enriched in disease cohorts. However, the initial hope that a small set of genes would be responsible for a given disorder proved false. The more complex reality is that a given disorder may be influenced by myriad small-effect noncoding variants and/or by rare but severe coding variants, many de novo. Noncoding genomic sequences-for which molecular functions cannot usually be inferred-harbor a large portion of these variants, creating a substantial barrier to understanding higher-order molecular and biological systems of disease. Fortunately, novel genetic technologies-scalable oligonucleotide synthesis, RNA sequencing, and CRISPR (clustered regularly interspaced short palindromic repeats)-have opened novel avenues to experimentally identify biologically significant variants en masse. Massively parallel reporter assays (MPRAs) are an especially versatile technique resulting from such innovations. MPRAs are powerful molecular genetics tools that can be used to screen thousands of untranscribed or untranslated sequences and their variants for functional effects in a single experiment. This approach, though underutilized in psychiatric genetics, has several useful features for the field. We review methods for assaying putatively functional genetic variants and regions, emphasizing MPRAs and the opportunities they hold for dissection of psychiatric polygenicity. We discuss literature applying functional assays in neurogenetics, highlighting strengths, caveats, and design considerations-especially regarding disease-relevant variables (cell type, neurodevelopment, and sex), and we ultimately propose applications of MPRA to both computational and experimental neurogenetics of polygenic disease risk.
Collapse
Affiliation(s)
- Bernard Mulvey
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Tomás Lagunas
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Joseph D Dougherty
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri.
| |
Collapse
|
17
|
Sönmezer C, Kleinendorst R, Imanci D, Barzaghi G, Villacorta L, Schübeler D, Benes V, Molina N, Krebs AR. Molecular Co-occupancy Identifies Transcription Factor Binding Cooperativity In Vivo. Mol Cell 2020; 81:255-267.e6. [PMID: 33290745 DOI: 10.1016/j.molcel.2020.11.015] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 11/04/2020] [Accepted: 11/09/2020] [Indexed: 01/18/2023]
Abstract
Gene activation requires the cooperative activity of multiple transcription factors at cis-regulatory elements (CREs). Yet, most transcription factors have short residence time, questioning the requirement of their physical co-occupancy on DNA to achieve cooperativity. Here, we present a DNA footprinting method that detects individual molecular interactions of transcription factors and nucleosomes with DNA in vivo. We apply this strategy to quantify the simultaneous binding of multiple transcription factors on single DNA molecules at mouse CREs. Analysis of the binary occupancy patterns at thousands of motif combinations reveals that high DNA co-occupancy occurs for most types of transcription factors, in the absence of direct physical interaction, at sites of competition with nucleosomes. Perturbation of pairwise interactions demonstrates the function of molecular co-occupancy in binding cooperativity. Our results reveal the interactions regulating CREs at molecular resolution and identify DNA co-occupancy as a widespread cooperativity mechanism used by transcription factors to remodel chromatin.
Collapse
Affiliation(s)
- Can Sönmezer
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany; Faculty of Biosciences, Collaboration for Joint PhD Degree between EMBL and Heidelberg University, Heidelberg, Germany
| | - Rozemarijn Kleinendorst
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Dilek Imanci
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland
| | - Guido Barzaghi
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany; Faculty of Biosciences, Collaboration for Joint PhD Degree between EMBL and Heidelberg University, Heidelberg, Germany
| | - Laura Villacorta
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Dirk Schübeler
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland; University of Basel, Faculty of Sciences, Petersplatz 1, 4001 Basel, Switzerland
| | - Vladimir Benes
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Nacho Molina
- Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), Université de Strasbourg-CNRS-INSERM, 1 rue Laurent Fries, 67404 Illkirch, France
| | - Arnaud Regis Krebs
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany.
| |
Collapse
|
18
|
Nieuwkoop T, Finger-Bou M, van der Oost J, Claassens NJ. The Ongoing Quest to Crack the Genetic Code for Protein Production. Mol Cell 2020; 80:193-209. [PMID: 33010203 DOI: 10.1016/j.molcel.2020.09.014] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 08/10/2020] [Accepted: 09/10/2020] [Indexed: 01/05/2023]
Abstract
Understanding the genetic design principles that determine protein production remains a major challenge. Although the key principles of gene expression were discovered 50 years ago, additional factors are still being uncovered. Both protein-coding and non-coding sequences harbor elements that collectively influence the efficiency of protein production by modulating transcription, mRNA decay, and translation. The influences of many contributing elements are intertwined, which complicates a full understanding of the individual factors. In natural genes, a functional balance between these factors has been obtained in the course of evolution, whereas for genetic-engineering projects, our incomplete understanding still limits optimal design of synthetic genes. However, notable advances have recently been made, supported by high-throughput analysis of synthetic gene libraries as well as by state-of-the-art biomolecular techniques. We discuss here how these advances further strengthen understanding of the gene expression process and how they can be harnessed to optimize protein production.
Collapse
Affiliation(s)
- Thijs Nieuwkoop
- Laboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, the Netherlands
| | - Max Finger-Bou
- Laboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, the Netherlands
| | - John van der Oost
- Laboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, the Netherlands
| | - Nico J Claassens
- Laboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, the Netherlands.
| |
Collapse
|
19
|
Hammelman J, Krismer K, Banerjee B, Gifford DK, Sherwood RI. Identification of determinants of differential chromatin accessibility through a massively parallel genome-integrated reporter assay. Genome Res 2020; 30:1468-1480. [PMID: 32973041 PMCID: PMC7605270 DOI: 10.1101/gr.263228.120] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 08/26/2020] [Indexed: 12/20/2022]
Abstract
A key mechanism in cellular regulation is the ability of the transcriptional machinery to physically access DNA. Transcription factors interact with DNA to alter the accessibility of chromatin, which enables changes to gene expression during development or disease or as a response to environmental stimuli. However, the regulation of DNA accessibility via the recruitment of transcription factors is difficult to study in the context of the native genome because every genomic site is distinct in multiple ways. Here we introduce the multiplexed integrated accessibility assay (MIAA), an assay that measures chromatin accessibility of synthetic oligonucleotide sequence libraries integrated into a controlled genomic context with low native accessibility. We apply MIAA to measure the effects of sequence motifs on cell type-specific accessibility between mouse embryonic stem cells and embryonic stem cell-derived definitive endoderm cells, screening 7905 distinct DNA sequences. MIAA recapitulates differential accessibility patterns of 100-nt sequences derived from natively differential genomic regions, identifying E-box motifs common to epithelial-mesenchymal transition driver transcription factors in stem cell-specific accessible regions that become repressed in endoderm. We show that a single binding motif for a key regulatory transcription factor is sufficient to open chromatin, and classify sets of stem cell-specific, endoderm-specific, and shared accessibility-modifying transcription factor motifs. We also show that overexpression of two definitive endoderm transcription factors, T and Foxa2, results in changes to accessibility in DNA sequences containing their respective DNA-binding motifs and identify preferential motif arrangements that influence accessibility.
Collapse
Affiliation(s)
- Jennifer Hammelman
- Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Konstantin Krismer
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Budhaditya Banerjee
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | - David K Gifford
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Richard I Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
- Hubrecht Institute, 3584 CT Utrecht, Netherlands
| |
Collapse
|
20
|
de Jonge WJ, Brok M, Lijnzaad P, Kemmeren P, Holstege FCP. Genome-wide off-rates reveal how DNA binding dynamics shape transcription factor function. Mol Syst Biol 2020; 16:e9885. [PMID: 33280256 PMCID: PMC7586999 DOI: 10.15252/msb.20209885] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 09/06/2020] [Accepted: 09/10/2020] [Indexed: 11/25/2022] Open
Abstract
Protein-DNA interactions are dynamic, and these dynamics are an important aspect of chromatin-associated processes such as transcription or replication. Due to a lack of methods to study on- and off-rates across entire genomes, protein-DNA interaction dynamics have not been studied extensively. Here, we determine in vivo off-rates for the Saccharomyces cerevisiae chromatin organizing factor Abf1, at 191 sites simultaneously across the yeast genome. Average Abf1 residence times span a wide range, varying between 4.2 and 33 min. Sites with different off-rates are associated with different functional characteristics. This includes their transcriptional dependency on Abf1, nucleosome positioning and the size of the nucleosome-free region, as well as the ability to roadblock RNA polymerase II for termination. The results show how off-rates contribute to transcription factor function and that DIVORSEQ (Determining In Vivo Off-Rates by SEQuencing) is a meaningful way of investigating protein-DNA binding dynamics genome-wide.
Collapse
Affiliation(s)
- Wim J de Jonge
- Princess Máxima Center for Pediatric OncologyUtrechtThe Netherlands
| | - Mariël Brok
- Princess Máxima Center for Pediatric OncologyUtrechtThe Netherlands
| | - Philip Lijnzaad
- Princess Máxima Center for Pediatric OncologyUtrechtThe Netherlands
| | - Patrick Kemmeren
- Princess Máxima Center for Pediatric OncologyUtrechtThe Netherlands
| | | |
Collapse
|
21
|
Davis JE, Insigne KD, Jones EM, Hastings QA, Boldridge WC, Kosuri S. Dissection of c-AMP Response Element Architecture by Using Genomic and Episomal Massively Parallel Reporter Assays. Cell Syst 2020; 11:75-85.e7. [PMID: 32603702 DOI: 10.1016/j.cels.2020.05.011] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 02/16/2020] [Accepted: 05/26/2020] [Indexed: 11/15/2022]
Abstract
In eukaryotes, transcription factors (TFs) orchestrate gene expression by binding to TF-binding sites (TFBSs) and localizing transcriptional co-regulators and RNA polymerase II to cis-regulatory elements. However, we lack a basic understanding of the relationship between TFBS composition and their quantitative transcriptional responses. Here, we measured expression driven by 17,406 synthetic cis-regulatory elements with varied compositions of a model TFBS, the c-AMP response element (CRE) by using massively parallel reporter assays (MPRAs). We find CRE number, affinity, and promoter proximity largely determines expression. In addition, we observe expression modulation based on the spacing between CREs and CRE distance to the promoter, where expression follows a helical periodicity. Finally, we compare library expression between an episomal MPRA and a genomically integrated MPRA, where a single cis-regulatory element is assayed per cell at a defined locus. These assays largely recapitulate each other, although weaker, non-canonical CREs exhibit greater activity in a genomic context.
Collapse
Affiliation(s)
- Jessica E Davis
- Department of Chemistry and Biochemistry, UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, and Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kimberly D Insigne
- Department of Chemistry and Biochemistry, UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, and Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA; Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Eric M Jones
- Department of Chemistry and Biochemistry, UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, and Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Quinn A Hastings
- Department of Chemistry and Biochemistry, UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, and Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - W Clifford Boldridge
- Department of Chemistry and Biochemistry, UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, and Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Sriram Kosuri
- Department of Chemistry and Biochemistry, UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, and Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
22
|
Dubois V, Gheeraert C, Vankrunkelsven W, Dubois‐Chevalier J, Dehondt H, Bobowski‐Gerard M, Vinod M, Zummo FP, Güiza F, Ploton M, Dorchies E, Pineau L, Boulinguiez A, Vallez E, Woitrain E, Baugé E, Lalloyer F, Duhem C, Rabhi N, van Kesteren RE, Chiang C, Lancel S, Duez H, Annicotte J, Paumelle R, Vanhorebeek I, Van den Berghe G, Staels B, Lefebvre P, Eeckhoute J. Endoplasmic reticulum stress actively suppresses hepatic molecular identity in damaged liver. Mol Syst Biol 2020; 16:e9156. [PMID: 32407006 PMCID: PMC7224309 DOI: 10.15252/msb.20199156] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 04/09/2020] [Accepted: 04/14/2020] [Indexed: 02/06/2023] Open
Abstract
Liver injury triggers adaptive remodeling of the hepatic transcriptome for repair/regeneration. We demonstrate that this involves particularly profound transcriptomic alterations where acute induction of genes involved in handling of endoplasmic reticulum stress (ERS) is accompanied by partial hepatic dedifferentiation. Importantly, widespread hepatic gene downregulation could not simply be ascribed to cofactor squelching secondary to ERS gene induction, but rather involves a combination of active repressive mechanisms. ERS acts through inhibition of the liver-identity (LIVER-ID) transcription factor (TF) network, initiated by rapid LIVER-ID TF protein loss. In addition, induction of the transcriptional repressor NFIL3 further contributes to LIVER-ID gene repression. Alteration to the liver TF repertoire translates into compromised activity of regulatory regions characterized by the densest co-recruitment of LIVER-ID TFs and decommissioning of BRD4 super-enhancers driving hepatic identity. While transient repression of the hepatic molecular identity is an intrinsic part of liver repair, sustained disequilibrium between the ERS and LIVER-ID transcriptional programs is linked to liver dysfunction as shown using mouse models of acute liver injury and livers from deceased human septic patients.
Collapse
Affiliation(s)
- Vanessa Dubois
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
- Present address:
Clinical and Experimental EndocrinologyDepartment of Chronic Diseases, Metabolism and Ageing (CHROMETA)KU LeuvenLeuvenBelgium
| | - Céline Gheeraert
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Wouter Vankrunkelsven
- Clinical Division and Laboratory of Intensive Care MedicineDepartment of Cellular and Molecular MedicineKU LeuvenLeuvenBelgium
| | | | - Hélène Dehondt
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | | | - Manjula Vinod
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | | | - Fabian Güiza
- Clinical Division and Laboratory of Intensive Care MedicineDepartment of Cellular and Molecular MedicineKU LeuvenLeuvenBelgium
| | - Maheul Ploton
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Emilie Dorchies
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Laurent Pineau
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Alexis Boulinguiez
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Emmanuelle Vallez
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Eloise Woitrain
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Eric Baugé
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Fanny Lalloyer
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Christian Duhem
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Nabil Rabhi
- UMR 8199 ‐ EGIDCNRSInstitut Pasteur de LilleUniversity of LilleLilleFrance
| | - Ronald E van Kesteren
- Center for Neurogenomics and Cognitive ResearchNeuroscience Campus AmsterdamVU UniversityAmsterdamThe Netherlands
| | - Cheng‐Ming Chiang
- Simmons Comprehensive Cancer CenterDepartments of Biochemistry and PharmacologyUniversity of Texas Southwestern Medical CenterDallasTXUSA
| | - Steve Lancel
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Hélène Duez
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | | | - Réjane Paumelle
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Ilse Vanhorebeek
- Clinical Division and Laboratory of Intensive Care MedicineDepartment of Cellular and Molecular MedicineKU LeuvenLeuvenBelgium
| | - Greet Van den Berghe
- Clinical Division and Laboratory of Intensive Care MedicineDepartment of Cellular and Molecular MedicineKU LeuvenLeuvenBelgium
| | - Bart Staels
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Philippe Lefebvre
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| | - Jérôme Eeckhoute
- Inserm, CHU LilleInstitut Pasteur de LilleU1011‐EGIDUniversity of LilleLilleFrance
| |
Collapse
|
23
|
de Jongh RP, van Dijk AD, Julsing MK, Schaap PJ, de Ridder D. Designing Eukaryotic Gene Expression Regulation Using Machine Learning. Trends Biotechnol 2020; 38:191-201. [DOI: 10.1016/j.tibtech.2019.07.007] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 07/12/2019] [Accepted: 07/19/2019] [Indexed: 12/11/2022]
|
24
|
Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat Biotechnol 2019; 38:56-65. [PMID: 31792407 PMCID: PMC6954276 DOI: 10.1038/s41587-019-0315-8] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 10/16/2019] [Indexed: 11/26/2022]
Abstract
How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF’s specificity, activity, and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation. Gene expression levels in yeast are predicted using a massive dataset on promoters with random sequences.
Collapse
|
25
|
Oberbeckmann E, Wolff M, Krietenstein N, Heron M, Ellins JL, Schmid A, Krebs S, Blum H, Gerland U, Korber P. Absolute nucleosome occupancy map for the Saccharomyces cerevisiae genome. Genome Res 2019; 29:1996-2009. [PMID: 31694866 PMCID: PMC6886505 DOI: 10.1101/gr.253419.119] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 10/31/2019] [Indexed: 12/23/2022]
Abstract
Mapping of nucleosomes, the basic DNA packaging unit in eukaryotes, is fundamental for understanding genome regulation because nucleosomes modulate DNA access by their positioning along the genome. A cell-population nucleosome map requires two observables: nucleosome positions along the DNA ("Where?") and nucleosome occupancies across the population ("In how many cells?"). All available genome-wide nucleosome mapping techniques are yield methods because they score either nucleosomal (e.g., MNase-seq, chemical cleavage-seq) or nonnucleosomal (e.g., ATAC-seq) DNA but lose track of the total DNA population for each genomic region. Therefore, they only provide nucleosome positions and maybe compare relative occupancies between positions, but cannot measure absolute nucleosome occupancy, which is the fraction of all DNA molecules occupied at a given position and time by a nucleosome. Here, we established two orthogonal and thereby cross-validating approaches to measure absolute nucleosome occupancy across the Saccharomyces cerevisiae genome via restriction enzymes and DNA methyltransferases. The resulting high-resolution (9-bp) map shows uniform absolute occupancies. Most nucleosome positions are occupied in most cells: 97% of all nucleosomes called by chemical cleavage-seq have a mean absolute occupancy of 90 ± 6% (±SD). Depending on nucleosome position calling procedures, there are 57,000 to 60,000 nucleosomes per yeast cell. The few low absolute occupancy nucleosomes do not correlate with highly transcribed gene bodies, but correlate with increased presence of the nucleosome-evicting chromatin structure remodeling (RSC) complex, and are enriched upstream of highly transcribed or regulated genes. Our work provides a quantitative method and reference frame in absolute terms for future chromatin studies.
Collapse
Affiliation(s)
- Elisa Oberbeckmann
- Molecular Biology Division, Biomedical Center, Faculty of Medicine, Ludwig-Maximilians-Universität München, 82152 Planegg-Martinsried, Germany
| | - Michael Wolff
- Physik Department, Technische Universität München, 85748 Garching, Germany
| | - Nils Krietenstein
- Molecular Biology Division, Biomedical Center, Faculty of Medicine, Ludwig-Maximilians-Universität München, 82152 Planegg-Martinsried, Germany.,Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Mark Heron
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany.,Gene Center, Faculty of Chemistry and Pharmacy, Ludwig-Maximilians-Universität München, 81377 Munich, Germany
| | - Jessica L Ellins
- Department of Biochemistry, University of Oxford, Oxford, OX1 3QU, United Kingdom
| | - Andrea Schmid
- Molecular Biology Division, Biomedical Center, Faculty of Medicine, Ludwig-Maximilians-Universität München, 82152 Planegg-Martinsried, Germany
| | - Stefan Krebs
- Laboratory of Functional Genome Analysis (LAFUGA), Gene Center, Faculty of Chemistry and Pharmacy, Ludwig-Maximilians-Universität München, 81377 Munich, Germany
| | - Helmut Blum
- Laboratory of Functional Genome Analysis (LAFUGA), Gene Center, Faculty of Chemistry and Pharmacy, Ludwig-Maximilians-Universität München, 81377 Munich, Germany
| | - Ulrich Gerland
- Physik Department, Technische Universität München, 85748 Garching, Germany
| | - Philipp Korber
- Molecular Biology Division, Biomedical Center, Faculty of Medicine, Ludwig-Maximilians-Universität München, 82152 Planegg-Martinsried, Germany
| |
Collapse
|
26
|
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019; 20:223. [PMID: 31679514 PMCID: PMC6827219 DOI: 10.1186/s13059-019-1845-6] [Citation(s) in RCA: 106] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here, we present MaveDB ( https://www.mavedb.org ), a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first such application, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.
Collapse
Affiliation(s)
- Daniel Esposito
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
| | - Jochen Weile
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Frederick P Roth
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| |
Collapse
|
27
|
Beytebiere JR, Greenwell BJ, Sahasrabudhe A, Menet JS. Clock-controlled rhythmic transcription: is the clock enough and how does it work? Transcription 2019; 10:212-221. [PMID: 31595813 DOI: 10.1080/21541264.2019.1673636] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Circadian clocks regulate the rhythmic expression of thousands of genes underlying the daily oscillations of biological functions. Here, we discuss recent findings showing that circadian clock rhythmic transcriptional outputs rely on additional mechanisms than just clock gene DNA binding, which may ultimately contribute to the plasticity of circadian transcriptional programs.
Collapse
Affiliation(s)
- Joshua R Beytebiere
- Department of Biology, Center for Biological Clock Research, Texas A&M University, TX, USA
| | - Ben J Greenwell
- Department of Biology, Center for Biological Clock Research, Texas A&M University, TX, USA.,Program of Genetics, Texas A&M University, College Station, TX, USA
| | - Aishwarya Sahasrabudhe
- Department of Biology, Center for Biological Clock Research, Texas A&M University, TX, USA
| | - Jerome S Menet
- Department of Biology, Center for Biological Clock Research, Texas A&M University, TX, USA.,Program of Genetics, Texas A&M University, College Station, TX, USA
| |
Collapse
|
28
|
Qiu C, Kaplan CD. Functional assays for transcription mechanisms in high-throughput. Methods 2019; 159-160:115-123. [PMID: 30797033 PMCID: PMC6589137 DOI: 10.1016/j.ymeth.2019.02.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 02/18/2019] [Indexed: 01/12/2023] Open
Abstract
Dramatic increases in the scale of programmed synthesis of nucleic acid libraries coupled with deep sequencing have powered advances in understanding nucleic acid and protein biology. Biological systems centering on nucleic acids or encoded proteins greatly benefit from such high-throughput studies, given that large DNA variant pools can be synthesized and DNA, or RNA products of transcription, can be easily analyzed by deep sequencing. Here we review the scope of various high-throughput functional assays for studies of nucleic acids and proteins in general, followed by discussion of how these types of study have yielded insights into the RNA Polymerase II (Pol II) active site as an example. We discuss methodological considerations in the design and execution of these experiments that should be valuable to studies in any system.
Collapse
Affiliation(s)
- Chenxi Qiu
- Department of Medicine, Division of Translational Therapeutics, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Craig D Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
29
|
Barnes SL, Belliveau NM, Ireland WT, Kinney JB, Phillips R. Mapping DNA sequence to transcription factor binding energy in vivo. PLoS Comput Biol 2019; 15:e1006226. [PMID: 30716072 PMCID: PMC6375646 DOI: 10.1371/journal.pcbi.1006226] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 02/14/2019] [Accepted: 11/06/2018] [Indexed: 11/18/2022] Open
Abstract
Despite the central importance of transcriptional regulation in biology, it has proven difficult to determine the regulatory mechanisms of individual genes, let alone entire gene networks. It is particularly difficult to decipher the biophysical mechanisms of transcriptional regulation in living cells and determine the energetic properties of binding sites for transcription factors and RNA polymerase. In this work, we present a strategy for dissecting transcriptional regulatory sequences using in vivo methods (massively parallel reporter assays) to formulate quantitative models that map a transcription factor binding site’s DNA sequence to transcription factor-DNA binding energy. We use these models to predict the binding energies of transcription factor binding sites to within 1 kBT of their measured values. We further explore how such a sequence-energy mapping relates to the mechanisms of trancriptional regulation in various promoter contexts. Specifically, we show that our models can be used to design specific induction responses, analyze the effects of amino acid mutations on DNA sequence preference, and determine how regulatory context affects a transcription factor’s sequence specificity. It has been said that we live in the “genomic era,” a time where we can readily sequence full genomes at will. However, it remains difficult to interpret much of the information within a genome. This is especially true of non-coding sequences such as promoters, which contain a number of features such as transcription factor binding sites that determine how genes are regulated. There is no straightforward regulatory “code” that tells us how transcription factor binding sites are organized within a promoter. In this work we examine how DNA sequence determines one of the most important features of a promoter, the strength with which a transcription factor binds to its DNA binding site. We discuss an approach to modeling DNA sequence-specific transcription factor binding energies in vivo using a massively parellel reporter assay. We develop models that allow us to predict the binding energy between a transcription factor and a mutated version of its binding site. We then show that this modeling technique can be used to address a number of scientific and design questions, such as engineering the behavior of genetic circuit elements or examining how transcription factors and their binding sites co-evolve.
Collapse
Affiliation(s)
- Stephanie L. Barnes
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Nathan M. Belliveau
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - William T. Ireland
- Department of Physics, California Institute of Technology, Pasadena, California, United States of America
| | - Justin B. Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Rob Phillips
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
- Department of Physics, California Institute of Technology, Pasadena, California, United States of America
- * E-mail:
| |
Collapse
|
30
|
Weingarten-Gabbay S, Nir R, Lubliner S, Sharon E, Kalma Y, Weinberger A, Segal E. Systematic interrogation of human promoters. Genome Res 2019; 29:171-183. [PMID: 30622120 PMCID: PMC6360817 DOI: 10.1101/gr.236075.118] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Accepted: 12/05/2018] [Indexed: 12/19/2022]
Abstract
Despite much research, our understanding of the architecture and cis-regulatory elements of human promoters is still lacking. Here, we devised a high-throughput assay to quantify the activity of approximately 15,000 fully designed sequences that we integrated and expressed from a fixed location within the human genome. We used this method to investigate thousands of native promoters and preinitiation complex (PIC) binding regions followed by in-depth characterization of the sequence motifs underlying promoter activity, including core promoter elements and TF binding sites. We find that core promoters drive transcription mostly unidirectionally and that sequences originating from promoters exhibit stronger activity than those originating from enhancers. By testing multiple synthetic configurations of core promoter elements, we dissect the motifs that positively and negatively regulate transcription as well as the effect of their combinations and distances, including a 10-bp periodicity in the optimal distance between the TATA and the initiator. By comprehensively screening 133 TF binding sites, we find that in contrast to core promoters, TF binding sites maintain similar activity levels in both orientations, supporting a model by which divergent transcription is driven by two distinct unidirectional core promoters sharing bidirectional TF binding sites. Finally, we find a striking agreement between the effect of binding site multiplicity of individual TFs in our assay and their tendency to appear in homotypic clusters throughout the genome. Overall, our study systematically assays the elements that drive expression in core and proximal promoter regions and sheds light on organization principles of regulatory regions in the human genome.
Collapse
Affiliation(s)
- Shira Weingarten-Gabbay
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Ronit Nir
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Shai Lubliner
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Eilon Sharon
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Yael Kalma
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Adina Weinberger
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Eran Segal
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
31
|
Single-cell and single-molecule epigenomics to uncover genome regulation at unprecedented resolution. Nat Genet 2018; 51:19-25. [DOI: 10.1038/s41588-018-0290-x] [Citation(s) in RCA: 115] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2017] [Accepted: 08/30/2018] [Indexed: 12/19/2022]
|
32
|
Systematic Study of Nucleosome-Displacing Factors in Budding Yeast. Mol Cell 2018; 71:294-305.e4. [PMID: 30017582 DOI: 10.1016/j.molcel.2018.06.017] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Revised: 05/04/2018] [Accepted: 06/07/2018] [Indexed: 12/11/2022]
Abstract
Nucleosomes present a barrier for the binding of most transcription factors (TFs). However, special TFs known as nucleosome-displacing factors (NDFs) can access embedded sites and cause the depletion of the local nucleosomes as well as repositioning of the neighboring nucleosomes. Here, we developed a novel high-throughput method in yeast to identify NDFs among 104 TFs and systematically characterized the impact of orientation, affinity, location, and copy number of their binding motifs on the nucleosome occupancy. Using this assay, we identified 29 NDF motifs and divided the nuclear TFs into three groups with strong, weak, and no nucleosome-displacing activities. Further studies revealed that tight DNA binding is the key property that underlies NDF activity, and the NDFs may partially rely on the DNA replication to compete with nucleosome. Overall, our study presents a framework to functionally characterize NDFs and elucidate the mechanism of nucleosome invasion.
Collapse
|
33
|
Aguilar‐Rodríguez J, Peel L, Stella M, Wagner A, Payne JL. The architecture of an empirical genotype-phenotype map. Evolution 2018; 72:1242-1260. [PMID: 29676774 PMCID: PMC6055911 DOI: 10.1111/evo.13487] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 04/03/2018] [Indexed: 12/15/2022]
Abstract
Recent advances in high-throughput technologies are bringing the study of empirical genotype-phenotype (GP) maps to the fore. Here, we use data from protein-binding microarrays to study an empirical GP map of transcription factor (TF) -binding preferences. In this map, each genotype is a DNA sequence. The phenotype of this DNA sequence is its ability to bind one or more TFs. We study this GP map using genotype networks, in which nodes represent genotypes with the same phenotype, and edges connect nodes if their genotypes differ by a single small mutation. We describe the structure and arrangement of genotype networks within the space of all possible binding sites for 525 TFs from three eukaryotic species encompassing three kingdoms of life (animal, plant, and fungi). We thus provide a high-resolution depiction of the architecture of an empirical GP map. Among a number of findings, we show that these genotype networks are "small-world" and assortative, and that they ubiquitously overlap and interface with one another. We also use polymorphism data from Arabidopsis thaliana to show how genotype network structure influences the evolution of TF-binding sites in vivo. We discuss our findings in the context of regulatory evolution.
Collapse
Affiliation(s)
- José Aguilar‐Rodríguez
- Department of Evolutionary Biology and Environmental StudiesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
- Current Address: Department of Biology, Stanford University, StanfordCA, USA; Department of Chemical and Systems Biology, Stanford UniversityStanfordCAUSA
| | - Leto Peel
- Institute of Information and Communication Technologies, Electronics and Applied MathematicsUniversité Catholique de LouvainLouvain‐la‐NeuveBelgium
- Namur Center for Complex SystemsUniversity of NamurNamurBelgium
| | - Massimo Stella
- Institute for Complex Systems Simulation, Department of Electronics and Computer ScienceUniversity of SouthamptonSouthamptonUnited Kingdom
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental StudiesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
- The Santa Fe InstituteSanta FeNew MexicoUSA
| | - Joshua L. Payne
- Swiss Institute of BioinformaticsLausanneSwitzerland
- Institute for Integrative Biology, ETHZurichSwitzerland
| |
Collapse
|
34
|
Liu Y, Ding D, Liu H, Sun X. The accessible chromatin landscape during conversion of human embryonic stem cells to trophoblast by bone morphogenetic protein 4. Biol Reprod 2018; 96:1267-1278. [PMID: 28430877 DOI: 10.1093/biolre/iox028] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2016] [Accepted: 04/14/2017] [Indexed: 12/12/2022] Open
Abstract
Human embryonic stem cells (hESCs) exposed to the growth factor bone morphogenetic protein 4 (BMP4) in the absence of FGF2 have been used as a model to study the development of placental development. However, little is known about the cis-regulatory mechanisms underlying this important process. In this study, we used the public available chromatin accessibility data of hESC H1 cells and BMP4-induced trophoblast (TB) cell lines to identify DNase I hypersensitive sites (DHSs) in the two cell lines, as well as the transcription factor (TF) binding sites within the DHSs. By comparing read profiles in H1 and TB, we identified 17 472 TB-specific DHSs. The TB-specific DHSs are enriched in terms of "blood vessel" and "trophectoderm," consisting of TF motifs family: Leucine Zipper, Helix-Loop-Helix, GATA, and ETS. To validate differential expression of the TFs binding to these motifs, we analyzed public available RNA-seq and microarray data in the same context. Finally, by integrating the protein-protein interaction data, we constructed a TF network for placenta development and identified top 20 key TFs through centrality analysis in the network. Our results indicate BMP4-induced TB system provided an invaluable model for the study of TB development and highlighted novel candidate genes in placenta development in human.
Collapse
Affiliation(s)
- Yajun Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, P.R. China
| | - Dewu Ding
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, P.R. China.,Department of Mathematics and Computer Science, Chizhou College, Chizhou, P.R. China
| | - Hongde Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, P.R. China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, P.R. China
| |
Collapse
|
35
|
Rawal Y, Chereji RV, Valabhoju V, Qiu H, Ocampo J, Clark DJ, Hinnebusch AG. Gcn4 Binding in Coding Regions Can Activate Internal and Canonical 5' Promoters in Yeast. Mol Cell 2018; 70:297-311.e4. [PMID: 29628310 PMCID: PMC6133248 DOI: 10.1016/j.molcel.2018.03.007] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Revised: 02/16/2018] [Accepted: 03/02/2018] [Indexed: 01/07/2023]
Abstract
Gcn4 is a yeast transcriptional activator induced by amino acid starvation. ChIP-seq analysis revealed 546 genomic sites occupied by Gcn4 in starved cells, representing ∼30% of Gcn4-binding motifs. Surprisingly, only ∼40% of the bound sites are in promoters, of which only ∼60% activate transcription, indicating extensive negative control over Gcn4 function. Most of the remaining ∼300 Gcn4-bound sites are within coding sequences (CDSs), with ∼75 representing the only bound sites near Gcn4-induced genes. Many such unconventional sites map between divergent antisense and sub-genic sense transcripts induced within CDSs adjacent to induced TBP peaks, consistent with Gcn4 activation of cryptic bidirectional internal promoters. Mutational analysis confirms that Gcn4 sites within CDSs can activate sub-genic and full-length transcripts from the same or adjacent genes, showing that functional Gcn4 binding is not confined to promoters. Our results show that internal promoters can be regulated by an activator that functions at conventional 5'-positioned promoters.
Collapse
Affiliation(s)
- Yashpal Rawal
- Laboratory of Gene Regulation and Development, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD 20892, USA
| | - Răzvan V Chereji
- Division of Developmental Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD 20892, USA
| | - Vishalini Valabhoju
- Laboratory of Gene Regulation and Development, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD 20892, USA
| | - Hongfang Qiu
- Laboratory of Gene Regulation and Development, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD 20892, USA
| | - Josefina Ocampo
- Division of Developmental Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD 20892, USA
| | - David J Clark
- Division of Developmental Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD 20892, USA.
| | - Alan G Hinnebusch
- Laboratory of Gene Regulation and Development, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD 20892, USA.
| |
Collapse
|
36
|
Rossi MJ, Lai WKM, Pugh BF. Genome-wide determinants of sequence-specific DNA binding of general regulatory factors. Genome Res 2018; 28:497-508. [PMID: 29563167 PMCID: PMC5880240 DOI: 10.1101/gr.229518.117] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Accepted: 03/05/2018] [Indexed: 01/01/2023]
Abstract
General regulatory factors (GRFs), such as Reb1, Abf1, Rap1, Mcm1, and Cbf1, positionally organize yeast chromatin through interactions with a core consensus DNA sequence. It is assumed that sequence recognition via direct base readout suffices for specificity and that spurious nonfunctional sites are rendered inaccessible by chromatin. We tested these assumptions through genome-wide mapping of GRFs in vivo and in purified biochemical systems at near–base pair (bp) resolution using several ChIP-exo–based assays. We find that computationally predicted DNA shape features (e.g., minor groove width, helix twist, base roll, and propeller twist) that are not defined by a unique consensus sequence are embedded in the nonunique portions of GRF motifs and contribute critically to sequence-specific binding. This dual source specificity occurs at GRF sites in promoter regions where chromatin organization starts. Outside of promoter regions, strong consensus sites lack the shape component and consequently lack an intrinsic ability to bind cognate GRFs, without regard to influences from chromatin. However, sites having a weak consensus and low intrinsic affinity do exist in these regions but are rendered inaccessible in a chromatin environment. Thus, GRF site-specificity is achieved through integration of favorable DNA sequence and shape readouts in promoter regions and by chromatin-based exclusion from fortuitous weak sites within gene bodies. This study further revealed a severe G/C nucleotide cross-linking selectivity inherent in all formaldehyde-based ChIP assays, which includes ChIP-seq. However, for most tested proteins, G/C selectivity did not appreciably affect binding site detection, although it does place limits on the quantitativeness of occupancy levels.
Collapse
Affiliation(s)
- Matthew J Rossi
- Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - William K M Lai
- Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - B Franklin Pugh
- Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
37
|
Xin B, Rohs R. Relationship between histone modifications and transcription factor binding is protein family specific. Genome Res 2018; 28:gr.220079.116. [PMID: 29326300 PMCID: PMC5848611 DOI: 10.1101/gr.220079.116] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Accepted: 01/10/2018] [Indexed: 12/20/2022]
Abstract
The very small fraction of putative binding sites (BSs) that are occupied by transcription factors (TFs) in vivo can be highly variable across different cell types. This observation has been partly attributed to changes in chromatin accessibility and histone modification (HM) patterns surrounding BSs. Previous studies focusing on BSs within DNA regulatory regions found correlations between HM patterns and TF binding specificities. However, a mechanistic understanding of TF-DNA binding specificity determinants is still not available. The ability to predict in vivo TF binding on a genome-wide scale requires the identification of features that determine TF binding based on evolutionary relationships of DNA binding proteins. To reveal protein family-dependent mechanisms of TF binding, we conducted comprehensive comparisons of HM patterns surrounding BSs and non-BSs with exactly matched core motifs for TFs in three cell lines: 33 TFs in GM12878, 37 TFs in K562, and 18 TFs in H1-hESC. These TFs displayed protein family-specific preferences for HM patterns surrounding BSs, with high agreement among cell lines. Moreover, compared to models based on DNA sequence and shape at flanking regions of BSs, HM-augmented quantitative machine-learning methods resulted in increased performance in a TF family-specific manner. Analysis of the relative importance of features in these models indicated that TFs, displaying larger HM pattern differences between BSs and non-BSs, bound DNA in an HM-specific manner on a protein family-specific basis. We propose that TF family-specific HM preferences reveal distinct mechanisms that assist in guiding TFs to their cognate BSs by altering chromatin structure and accessibility.
Collapse
Affiliation(s)
- Beibei Xin
- Computational Biology and Bioinformatics Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, California 90089, USA
| | - Remo Rohs
- Computational Biology and Bioinformatics Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
38
|
Krebs AR, Imanci D, Hoerner L, Gaidatzis D, Burger L, Schübeler D. Genome-wide Single-Molecule Footprinting Reveals High RNA Polymerase II Turnover at Paused Promoters. Mol Cell 2017; 67:411-422.e4. [PMID: 28735898 PMCID: PMC5548954 DOI: 10.1016/j.molcel.2017.06.027] [Citation(s) in RCA: 124] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Revised: 05/22/2017] [Accepted: 06/22/2017] [Indexed: 11/19/2022]
Abstract
Transcription initiation entails chromatin opening followed by pre-initiation complex formation and RNA polymerase II recruitment. Subsequent polymerase elongation requires additional signals, resulting in increased residence time downstream of the start site, a phenomenon referred to as pausing. Here, we harnessed single-molecule footprinting to quantify distinct steps of initiation in vivo throughout the Drosophila genome. This identifies the impact of promoter structure on initiation dynamics in relation to nucleosomal occupancy. Additionally, perturbation of transcriptional initiation reveals an unexpectedly high turnover of polymerases at paused promoters-an observation confirmed at the level of nascent RNAs. These observations argue that absence of elongation is largely caused by premature termination rather than by stable polymerase stalling. In support of this non-processive model, we observe that induction of the paused heat shock promoter depends on continuous initiation. Our study provides a framework to quantify protein binding at single-molecule resolution and refines concepts of transcriptional pausing.
Collapse
Affiliation(s)
- Arnaud R Krebs
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland.
| | - Dilek Imanci
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland
| | - Leslie Hoerner
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland
| | - Dimos Gaidatzis
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland; Swiss Institute of Bioinformatics, Maulbeerstrasse 66, 4058 Basel, Switzerland
| | - Lukas Burger
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland; Swiss Institute of Bioinformatics, Maulbeerstrasse 66, 4058 Basel, Switzerland
| | - Dirk Schübeler
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland; University of Basel, Faculty of Sciences, Petersplatz 1, 4001 Basel, Switzerland.
| |
Collapse
|