1
|
Kang CK, Kim AR. Deep molecular learning of transcriptional control of a synthetic CRE enhancer and its variants. iScience 2024; 27:108747. [PMID: 38222110 PMCID: PMC10784702 DOI: 10.1016/j.isci.2023.108747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/29/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024] Open
Abstract
Massively parallel reporter assay measures transcriptional activities of various cis-regulatory modules (CRMs) in a single experiment. We developed a thermodynamic computational model framework that calculates quantitative levels of gene expression directly from regulatory DNA sequences. Using the framework, we investigated the molecular mechanisms of cis-regulatory mutations of a synthetic enhancer that cause abnormal gene expression. We found that, in a human cell line, competitive binding between family transcription factors (TFs) with slightly different binding preferences significantly increases the accuracy of recapitulating the transcriptional effects of thousands of single- or multi-mutations. We also discovered that even if various harmful mutations occurred in an activator binding site, CRM could stably maintain or even increase gene expression through a certain form of competitive binding between family TFs. These findings enhance understanding the effect of SNPs and indels on CRMs and would help building robust custom-designed CRMs for biologics production and gene therapy.
Collapse
Affiliation(s)
- Chan-Koo Kang
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- School of Applied Artificial Intelligence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| |
Collapse
|
2
|
Gonçalves TM, Stewart CL, Baxley SD, Xu J, Li D, Gabel HW, Wang T, Avraham O, Zhao G. Towards a comprehensive regulatory map of Mammalian Genomes. RESEARCH SQUARE 2023:rs.3.rs-3294408. [PMID: 37841836 PMCID: PMC10571623 DOI: 10.21203/rs.3.rs-3294408/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]
Abstract
Genome mapping studies have generated a nearly complete collection of genes for the human genome, but we still lack an equivalently vetted inventory of human regulatory sequences. Cis-regulatory modules (CRMs) play important roles in controlling when, where, and how much a gene is expressed. We developed a training data-free CRM-prediction algorithm, the Mammalian Regulatory MOdule Detector (MrMOD) for accurate CRM prediction in mammalian genomes. MrMOD provides genome position-fixed CRM models similar to the fixed gene models for the mouse and human genomes using only genomic sequences as the inputs with one adjustable parameter - the significance p-value. Importantly, MrMOD predicts a comprehensive set of high-resolution CRMs in the mouse and human genomes including all types of regulatory modules not limited to any tissue, cell type, developmental stage, or condition. We computationally validated MrMOD predictions used a compendium of 21 orthogonal experimental data sets including thousands of experimentally defined CRMs and millions of putative regulatory elements derived from hundreds of different tissues, cell types, and stimulus conditions obtained from multiple databases. In ovo transgenic reporter assay demonstrates the power of our prediction in guiding experimental design. We analyzed CRMs located in the chromosome 17 using unsupervised machine learning and identified groups of CRMs with multiple lines of evidence supporting their functionality, linking CRMs with upstream binding transcription factors and downstream target genes. Our work provides a comprehensive base pair resolution annotation of the functional regulatory elements and non-functional regions in the mammalian genomes.
Collapse
Affiliation(s)
| | | | | | - Jason Xu
- Missouri University of Science & Technology
| | - Daofeng Li
- Washington University School of Medicine
| | | | - Ting Wang
- Washington University School of Medicine
| | | | | |
Collapse
|
3
|
Kleinschmidt H, Xu C, Bai L. Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 2023; 132:167-189. [PMID: 37184694 PMCID: PMC10542970 DOI: 10.1007/s00412-023-00796-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/16/2023]
Abstract
Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.
Collapse
Affiliation(s)
- Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Physics, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
4
|
Snyman M, Xu S. The effects of mutations on gene expression and alternative splicing. Proc Biol Sci 2023; 290:20230565. [PMID: 37403507 PMCID: PMC10320348 DOI: 10.1098/rspb.2023.0565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 06/12/2023] [Indexed: 07/06/2023] Open
Abstract
Understanding the relationship between mutations and their genomic and phenotypic consequences has been a longstanding goal of evolutionary biology. However, few studies have investigated the impact of mutations on gene expression and alternative splicing on the genome-wide scale. In this study, we aim to bridge this knowledge gap by utilizing whole-genome sequencing data and RNA sequencing data from 16 obligately parthenogenetic Daphnia mutant lines to investigate the effects of ethyl methanesulfonate-induced mutations on gene expression and alternative splicing. Using rigorous analyses of mutations, expression changes and alternative splicing, we show that trans-effects are the major contributor to the variance in gene expression and alternative splicing between the wild-type and mutant lines, whereas cis mutations only affected a limited number of genes and do not always alter gene expression. Moreover, we show that there is a significant association between differentially expressed genes and exonic mutations, indicating that exonic mutations are an important driver of altered gene expression.
Collapse
Affiliation(s)
- Marelize Snyman
- Department of Biology, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Sen Xu
- Department of Biology, University of Texas at Arlington, Arlington, TX 76019, USA
| |
Collapse
|
5
|
Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023; 24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Enhancers are genomic DNA elements controlling spatiotemporal gene expression. Their flexible organization and functional redundancies make deciphering their sequence-function relationships challenging. This article provides an overview of the current understanding of enhancer organization and evolution, with an emphasis on factors that influence these relationships. Technological advancements, particularly in machine learning and synthetic biology, are discussed in light of how they provide new ways to understand this complexity. Exciting opportunities lie ahead as we continue to unravel the intricacies of enhancer function.
Collapse
Affiliation(s)
- Gabrielle D Smith
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Wan Hern Ching
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
| | - Paola Cornejo-Páramo
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Emily S Wong
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia.
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia.
| |
Collapse
|
6
|
Аpplication of massive parallel reporter analysis in biotechnology and medicine. КЛИНИЧЕСКАЯ ПРАКТИКА 2023. [DOI: 10.17816/clinpract115063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The development and functioning of an organism relies on tissue-specific gene programs. Genome regulatory elements play a key role in the regulation of such programs, and disruptions in their function can lead to the development of various pathologies, including cancers, malformations and autoimmune diseases. The emergence of high-throughput genomic studies has led to massively parallel reporter analysis (MPRA) methods, which allow the functional verification and identification of regulatory elements on a genome-wide scale. Initially MPRA was used as a tool to investigate fundamental aspects of epigenetics, but the approach also has great potential for clinical and practical biotechnology. Currently, MPRA is used for validation of clinically significant mutations, identification of tissue-specific regulatory elements, search for the most promising loci for transgene integration, and is an indispensable tool for creating highly efficient expression systems, the range of application of which extends from approaches for protein development and design of next-generation therapeutic antibody superproducers to gene therapy. In this review, the main principles and areas of practical application of high-throughput reporter assays will be discussed.
Collapse
|
7
|
Kircher M, Ludwig KU. Systematic assays and resources for the functional annotation of non-coding variants. MED GENET-BERLIN 2022; 34:275-286. [PMID: 37034418 PMCID: PMC10081529 DOI: 10.1515/medgen-2022-2161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Abstract
Identification of genetic variation in individual genomes is now a routine procedure in human genetic research and diagnostics. For many variants, however, insufficient evidence is available to establish a pathogenic effect, particularly for variants in non-coding regions. Furthermore, the sheer number of candidate variants renders testing in individual assays virtually impossible. While scalable approaches are being developed, the selection of methods and resources and the application of a given framework to a particular disease or trait remain major challenges. This limits the translation of results from both genome-wide association studies and genome sequencing. Here, we discuss computational and experimental approaches available for functional annotation of non-coding variation.
Collapse
Affiliation(s)
- Martin Kircher
- Institute of Human Genetics , University of Lübeck , Lübeck , Germany
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin , Berlin , Germany
| | - Kerstin U. Ludwig
- Institute of Human Genetics, University Hospital Bonn , University of Bonn , Venusberg-Campus 1, Building 76 , Bonn , Germany
| |
Collapse
|
8
|
Cooper YA, Guo Q, Geschwind DH. Multiplexed functional genomic assays to decipher the noncoding genome. Hum Mol Genet 2022; 31:R84-R96. [PMID: 36057282 PMCID: PMC9585676 DOI: 10.1093/hmg/ddac194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/14/2022] Open
Abstract
Linkage disequilibrium and the incomplete regulatory annotation of the noncoding genome complicates the identification of functional noncoding genetic variants and their causal association with disease. Current computational methods for variant prioritization have limited predictive value, necessitating the application of highly parallelized experimental assays to efficiently identify functional noncoding variation. Here, we summarize two distinct approaches, massively parallel reporter assays and CRISPR-based pooled screens and describe their flexible implementation to characterize human noncoding genetic variation at unprecedented scale. Each approach provides unique advantages and limitations, highlighting the importance of multimodal methodological integration. These multiplexed assays of variant effects are undoubtedly poised to play a key role in the experimental characterization of noncoding genetic risk, informing our understanding of the underlying mechanisms of disease-associated loci and the development of more robust predictive classification algorithms.
Collapse
Affiliation(s)
- Yonatan A Cooper
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Medical Scientist Training Program, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Qiuyu Guo
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Program in Neurogenetics, Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, University of California Los Angeles, Los Angeles, CA, USA
- Institute of Precision Health, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
9
|
McAfee JC, Bell JL, Krupa O, Matoba N, Stein JL, Won H. Focus on your locus with a massively parallel reporter assay. J Neurodev Disord 2022; 14:50. [PMID: 36085003 PMCID: PMC9463819 DOI: 10.1186/s11689-022-09461-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 09/01/2022] [Indexed: 01/01/2023] Open
Abstract
A growing number of variants associated with risk for neurodevelopmental disorders have been identified by genome-wide association and whole genome sequencing studies. As common risk variants often fall within large haplotype blocks covering long stretches of the noncoding genome, the causal variants within an associated locus are often unknown. Similarly, the effect of rare noncoding risk variants identified by whole genome sequencing on molecular traits is seldom known without functional assays. A massively parallel reporter assay (MPRA) is an assay that can functionally validate thousands of regulatory elements simultaneously using high-throughput sequencing and barcode technology. MPRA has been adapted to various experimental designs that measure gene regulatory effects of genetic variants within cis- and trans-regulatory elements as well as posttranscriptional processes. This review discusses different MPRA designs that have been or could be used in the future to experimentally validate genetic variants associated with neurodevelopmental disorders. Though MPRA has limitations such as it does not model genomic context, this assay can help narrow down the underlying genetic causes of neurodevelopmental disorders by screening thousands of sequences in one experiment. We conclude by describing future directions of this technique such as applications of MPRA for gene-by-environment interactions and pharmacogenetics.
Collapse
Affiliation(s)
- Jessica C. McAfee
- grid.10698.360000000122483208Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA ,grid.10698.360000000122483208UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Jessica L. Bell
- grid.10698.360000000122483208Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA ,grid.10698.360000000122483208UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Oleh Krupa
- grid.10698.360000000122483208Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA ,grid.10698.360000000122483208UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Nana Matoba
- grid.10698.360000000122483208Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA ,grid.10698.360000000122483208UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Jason L. Stein
- grid.10698.360000000122483208Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA ,grid.10698.360000000122483208UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA. .,UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
10
|
Hansen TJ, Hodges E. ATAC-STARR-seq reveals transcription factor-bound activators and silencers within chromatin-accessible regions of the human genome. Genome Res 2022; 32:1529-1541. [PMID: 35858748 PMCID: PMC9435738 DOI: 10.1101/gr.276766.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 07/11/2022] [Indexed: 11/26/2022]
Abstract
Massively parallel reporter assays (MPRAs) test the capacity of putative gene regulatory elements to drive transcription on a genome-wide scale. Most gene regulatory activity occurs within accessible chromatin, and recently described methods have combined assays that capture these regions-such as assay for transposase-accessible chromatin using sequencing (ATAC-seq)-with self-transcribing active regulatory region sequencing (STARR-seq) to selectively assay the regulatory potential of accessible DNA (ATAC-STARR-seq). Here, we report an integrated approach that quantifies activating and silencing regulatory activity, chromatin accessibility, and transcription factor (TF) occupancy with one assay using ATAC-STARR-seq. Our strategy, including important updates to the ATAC-STARR-seq assay and workflow, enabled high-resolution testing of ∼50 million unique DNA fragments tiling ∼101,000 accessible chromatin regions in human lymphoblastoid cells. We discovered that 30% of all accessible regions contain an activator, a silencer, or both. Although few MPRA studies have explored silencing activity, we demonstrate that silencers occur at similar frequencies to activators, and they represent a distinct functional group enriched for unique TF motifs and repressive histone modifications. We further show that Tn5 cut-site frequencies are retained in the ATAC-STARR plasmid library compared to standard ATAC-seq, enabling TF occupancy to be ascertained from ATAC-STARR data. With this approach, we found that activators and silencers cluster by distinct TF footprint combinations, and these groups of activity represent different gene regulatory networks of immune cell function. Altogether, these data highlight the multilayered capabilities of ATAC-STARR-seq to comprehensively investigate the regulatory landscape of the human genome all from a single DNA fragment source.
Collapse
Affiliation(s)
- Tyler J Hansen
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, USA
| | - Emily Hodges
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, USA
| |
Collapse
|
11
|
Spielmann M, Kircher M. Computational and experimental methods for classifying variants of unknown clinical significance. Cold Spring Harb Mol Case Stud 2022; 8:mcs.a006196. [PMID: 35483875 PMCID: PMC9059783 DOI: 10.1101/mcs.a006196] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
The increase in sequencing capacity, reduction in costs, and national and international coordinated efforts have led to the widespread introduction of next-generation sequencing (NGS) technologies in patient care. More generally, human genetics and genomic medicine are gaining importance for more and more patients. Some communities are already discussing the prospect of sequencing each individual's genome at time of birth. Together with digital health records, this shall enable individualized treatments and preventive measures, so-called precision medicine. A central step in this process is the identification of disease causal mutations or variant combinations that make us more susceptible for diseases. Although various technological advances have improved the identification of genetic alterations, the interpretation and ranking of the identified variants remains a major challenge. Based on our knowledge of molecular processes or previously identified disease variants, we can identify potentially functional genetic variants and, using different lines of evidence, we are sometimes able to demonstrate their pathogenicity directly. However, the vast majority of variants are classified as variants of uncertain clinical significance (VUSs) with not enough experimental evidence to determine their pathogenicity. In these cases, computational methods may be used to improve the prioritization and an increasing toolbox of experimental methods is emerging that can be used to assay the molecular effects of VUSs. Here, we discuss how computational and experimental methods can be used to create catalogs of variant effects for a variety of molecular and cellular phenotypes. We discuss the prospects of integrating large-scale functional data with machine learning and clinical knowledge for the development of accurate pathogenicity predictions for clinical applications.
Collapse
Affiliation(s)
- Malte Spielmann
- Institute of Human Genetics, University of Lübeck, 23562 Lübeck, Germany;,Institute of Human Genetics, Christian-Albrechts-Universität, 24105 Kiel, Germany;,Human Molecular Genomics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany;,DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Lübeck/Kiel, 23562 Lübeck, Germany
| | - Martin Kircher
- Institute of Human Genetics, University of Lübeck, 23562 Lübeck, Germany;,Berlin Institute of Health at Charité—Universitätsmedizin Berlin, 10117 Berlin, Germany;,DZHK (German Centre for Cardiovascular Research), partner site Berlin, 10115 Berlin, Germany
| |
Collapse
|
12
|
Romanov SE, Kalashnikova DA, Laktionov PP. Methods of massive parallel reporter assays for investigation of enhancers. Vavilovskii Zhurnal Genet Selektsii 2021; 25:344-355. [PMID: 34901731 PMCID: PMC8627875 DOI: 10.18699/vj21.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 03/28/2021] [Accepted: 03/28/2021] [Indexed: 11/19/2022] Open
Abstract
The correct deployment of genetic programs for development and differentiation relies on finely coordinated regulation of specific gene sets. Genomic regulatory elements play an exceptional role in this process. There are few types of gene regulatory elements, including promoters, enhancers, insulators and silencers. Alterations of gene regulatory elements may cause various pathologies, including cancer, congenital disorders and autoimmune diseases. The development of high-throughput genomic assays has made it possible to significantly accelerate the accumulation of information about the characteristic epigenetic properties of regulatory elements. In combination with high-throughput studies focused on the genome-wide distribution of epigenetic marks, regulatory proteins and the spatial structure of chromatin, this significantly expands the understanding of the principles of epigenetic regulation of genes and allows potential regulatory elements to be searched for in silico. However, common experimental approaches used to study the local characteristics of chromatin have a number of technical limitations that may reduce the reliability of computational identification of genomic regulatory sequences. Taking into account the variability of the functions of epigenetic determinants and complex multicomponent regulation of genomic elements activity, their functional verification is often required. A plethora of methods have been developed to study the functional role of regulatory elements on the genome scale. Common experimental approaches for in silico identification of regulatory elements and their inherent technical limitations will be described. The present review is focused on original high-throughput methods of enhancer activity reporter analysis that are currently used to validate predicted regulatory elements and to perform de novo searches. The methods described allow assessing the functional role of the nucleotide sequence of a regulatory element, to determine its exact boundaries and to assess the influence of the local state of chromatin on the activity of enhancers and gene expression. These approaches have contributed substantially to the understanding of the fundamental principles of gene regulation.
Collapse
Affiliation(s)
- S E Romanov
- Novosibirsk State University, Epigenetics Laboratory, Department of Natural Sciences, Novosibirsk, Russia Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Genomics Laboratory, Novosibirsk, Russia
| | - D A Kalashnikova
- Novosibirsk State University, Epigenetics Laboratory, Department of Natural Sciences, Novosibirsk, Russia Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Genomics Laboratory, Novosibirsk, Russia
| | - P P Laktionov
- Novosibirsk State University, Epigenetics Laboratory, Department of Natural Sciences, Novosibirsk, Russia Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Genomics Laboratory, Novosibirsk, Russia
| |
Collapse
|
13
|
Dibaeinia P, Sinha S. Deciphering enhancer sequence using thermodynamics-based models and convolutional neural networks. Nucleic Acids Res 2021; 49:10309-10327. [PMID: 34508359 PMCID: PMC8501998 DOI: 10.1093/nar/gkab765] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 08/18/2021] [Accepted: 08/25/2021] [Indexed: 11/18/2022] Open
Abstract
Deciphering the sequence-function relationship encoded in enhancers holds the key to interpreting non-coding variants and understanding mechanisms of transcriptomic variation. Several quantitative models exist for predicting enhancer function and underlying mechanisms; however, there has been no systematic comparison of these models characterizing their relative strengths and shortcomings. Here, we interrogated a rich data set of neuroectodermal enhancers in Drosophila, representing cis- and trans- sources of expression variation, with a suite of biophysical and machine learning models. We performed rigorous comparisons of thermodynamics-based models implementing different mechanisms of activation, repression and cooperativity. Moreover, we developed a convolutional neural network (CNN) model, called CoNSEPT, that learns enhancer 'grammar' in an unbiased manner. CoNSEPT is the first general-purpose CNN tool for predicting enhancer function in varying conditions, such as different cell types and experimental conditions, and we show that such complex models can suggest interpretable mechanisms. We found model-based evidence for mechanisms previously established for the studied system, including cooperative activation and short-range repression. The data also favored one hypothesized activation mechanism over another and suggested an intriguing role for a direct, distance-independent repression mechanism. Our modeling shows that while fundamentally different models can yield similar fits to data, they vary in their utility for mechanistic inference. CoNSEPT is freely available at: https://github.com/PayamDiba/CoNSEPT.
Collapse
Affiliation(s)
- Payam Dibaeinia
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
14
|
Duveau F, Vande Zande P, Metzger BP, Diaz CJ, Walker EA, Tryban S, Siddiq MA, Yang B, Wittkopp PJ. Mutational sources of trans-regulatory variation affecting gene expression in Saccharomyces cerevisiae. eLife 2021; 10:67806. [PMID: 34463616 PMCID: PMC8456550 DOI: 10.7554/elife.67806] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 08/03/2021] [Indexed: 12/15/2022] Open
Abstract
Heritable variation in a gene’s expression arises from mutations impacting cis- and trans-acting components of its regulatory network. Here, we investigate how trans-regulatory mutations are distributed within the genome and within a gene regulatory network by identifying and characterizing 69 mutations with trans-regulatory effects on expression of the same focal gene in Saccharomyces cerevisiae. Relative to 1766 mutations without effects on expression of this focal gene, we found that these trans-regulatory mutations were enriched in coding sequences of transcription factors previously predicted to regulate expression of the focal gene. However, over 90% of the trans-regulatory mutations identified mapped to other types of genes involved in diverse biological processes including chromatin state, metabolism, and signal transduction. These data show how genetic changes in diverse types of genes can impact a gene’s expression in trans, revealing properties of trans-regulatory mutations that provide the raw material for trans-regulatory variation segregating within natural populations.
Collapse
Affiliation(s)
- Fabien Duveau
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, United States.,Laboratory of Biology and Modeling of the Cell, Ecole Normale Supérieure de Lyon, CNRS, Université Claude Bernard Lyon, Université de Lyon, Lyon, France
| | - Petra Vande Zande
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, United States
| | - Brian Ph Metzger
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, United States
| | - Crisandra J Diaz
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, United States
| | - Elizabeth A Walker
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, United States
| | - Stephen Tryban
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, United States
| | - Mohammad A Siddiq
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, United States
| | - Bing Yang
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, United States
| | - Patricia J Wittkopp
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, United States.,Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, United States
| |
Collapse
|
15
|
Letiagina AE, Omelina ES, Ivankin AV, Pindyurin AV. MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. Front Genet 2021; 12:618189. [PMID: 34046055 PMCID: PMC8148044 DOI: 10.3389/fgene.2021.618189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/25/2021] [Indexed: 11/13/2022] Open
Abstract
Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC-ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC-ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional "mapping" samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.
Collapse
Affiliation(s)
- Anna E Letiagina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Faculty of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Evgeniya S Omelina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anton V Ivankin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Alexey V Pindyurin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
16
|
Mulvey B, Lagunas T, Dougherty JD. Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants Across Biological Contexts. Biol Psychiatry 2021; 89:76-89. [PMID: 32843144 PMCID: PMC7938388 DOI: 10.1016/j.biopsych.2020.06.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 06/09/2020] [Accepted: 06/10/2020] [Indexed: 12/18/2022]
Abstract
Neuropsychiatric phenotypes have long been known to be influenced by heritable risk factors, directly confirmed by the past decade of genetic studies that have revealed specific genetic variants enriched in disease cohorts. However, the initial hope that a small set of genes would be responsible for a given disorder proved false. The more complex reality is that a given disorder may be influenced by myriad small-effect noncoding variants and/or by rare but severe coding variants, many de novo. Noncoding genomic sequences-for which molecular functions cannot usually be inferred-harbor a large portion of these variants, creating a substantial barrier to understanding higher-order molecular and biological systems of disease. Fortunately, novel genetic technologies-scalable oligonucleotide synthesis, RNA sequencing, and CRISPR (clustered regularly interspaced short palindromic repeats)-have opened novel avenues to experimentally identify biologically significant variants en masse. Massively parallel reporter assays (MPRAs) are an especially versatile technique resulting from such innovations. MPRAs are powerful molecular genetics tools that can be used to screen thousands of untranscribed or untranslated sequences and their variants for functional effects in a single experiment. This approach, though underutilized in psychiatric genetics, has several useful features for the field. We review methods for assaying putatively functional genetic variants and regions, emphasizing MPRAs and the opportunities they hold for dissection of psychiatric polygenicity. We discuss literature applying functional assays in neurogenetics, highlighting strengths, caveats, and design considerations-especially regarding disease-relevant variables (cell type, neurodevelopment, and sex), and we ultimately propose applications of MPRA to both computational and experimental neurogenetics of polygenic disease risk.
Collapse
Affiliation(s)
- Bernard Mulvey
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Tomás Lagunas
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Joseph D Dougherty
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri.
| |
Collapse
|
17
|
Hammelman J, Krismer K, Banerjee B, Gifford DK, Sherwood RI. Identification of determinants of differential chromatin accessibility through a massively parallel genome-integrated reporter assay. Genome Res 2020; 30:1468-1480. [PMID: 32973041 PMCID: PMC7605270 DOI: 10.1101/gr.263228.120] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 08/26/2020] [Indexed: 12/20/2022]
Abstract
A key mechanism in cellular regulation is the ability of the transcriptional machinery to physically access DNA. Transcription factors interact with DNA to alter the accessibility of chromatin, which enables changes to gene expression during development or disease or as a response to environmental stimuli. However, the regulation of DNA accessibility via the recruitment of transcription factors is difficult to study in the context of the native genome because every genomic site is distinct in multiple ways. Here we introduce the multiplexed integrated accessibility assay (MIAA), an assay that measures chromatin accessibility of synthetic oligonucleotide sequence libraries integrated into a controlled genomic context with low native accessibility. We apply MIAA to measure the effects of sequence motifs on cell type-specific accessibility between mouse embryonic stem cells and embryonic stem cell-derived definitive endoderm cells, screening 7905 distinct DNA sequences. MIAA recapitulates differential accessibility patterns of 100-nt sequences derived from natively differential genomic regions, identifying E-box motifs common to epithelial-mesenchymal transition driver transcription factors in stem cell-specific accessible regions that become repressed in endoderm. We show that a single binding motif for a key regulatory transcription factor is sufficient to open chromatin, and classify sets of stem cell-specific, endoderm-specific, and shared accessibility-modifying transcription factor motifs. We also show that overexpression of two definitive endoderm transcription factors, T and Foxa2, results in changes to accessibility in DNA sequences containing their respective DNA-binding motifs and identify preferential motif arrangements that influence accessibility.
Collapse
Affiliation(s)
- Jennifer Hammelman
- Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Konstantin Krismer
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Budhaditya Banerjee
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | - David K Gifford
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Richard I Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
- Hubrecht Institute, 3584 CT Utrecht, Netherlands
| |
Collapse
|
18
|
Liu J, Shively CA, Mitra RD. Quantitative analysis of transcription factor binding and expression using calling cards reporter arrays. Nucleic Acids Res 2020; 48:e50. [PMID: 32133534 PMCID: PMC7229839 DOI: 10.1093/nar/gkaa141] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 01/31/2020] [Accepted: 02/25/2020] [Indexed: 12/13/2022] Open
Abstract
We report a tool, Calling Cards Reporter Arrays (CCRA), that measures transcription factor (TF) binding and the consequences on gene expression for hundreds of synthetic promoters in yeast. Using Cbf1p and MAX, we demonstrate that the CCRA method is able to detect small changes in binding free energy with a sensitivity comparable to in vitro methods, enabling the measurement of energy landscapes in vivo. We then demonstrate the quantitative analysis of cooperative interactions by measuring Cbf1p binding at synthetic promoters with multiple sites. We find that the cooperativity between Cbf1p dimers varies sinusoidally with a period of 10.65 bp and energetic cost of 1.37 KBT for sites that are positioned ‘out of phase’. Finally, we characterize the binding and expression of a group of TFs, Tye7p, Gcr1p and Gcr2p, that act together as a ‘TF collective’, an important but poorly characterized model of TF cooperativity. We demonstrate that Tye7p often binds promoters without its recognition site because it is recruited by other collective members, whereas these other members require their recognition sites, suggesting a hierarchy where these factors recruit Tye7p but not vice versa. Our experiments establish CCRA as a useful tool for quantitative investigations into TF binding and function.
Collapse
Affiliation(s)
- Jiayue Liu
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA.,The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
| | - Christian A Shively
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA.,The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
| | - Robi D Mitra
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA.,The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA.,McDonnell Genome Institute, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
| |
Collapse
|
19
|
Morgan RA, Ma F, Unti MJ, Brown D, Ayoub PG, Tam C, Lathrop L, Aleshe B, Kurita R, Nakamura Y, Senadheera S, Wong RL, Hollis RP, Pellegrini M, Kohn DB. Creating New β-Globin-Expressing Lentiviral Vectors by High-Resolution Mapping of Locus Control Region Enhancer Sequences. Mol Ther Methods Clin Dev 2020; 17:999-1013. [PMID: 32426415 PMCID: PMC7225380 DOI: 10.1016/j.omtm.2020.04.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 04/13/2020] [Indexed: 12/18/2022]
Abstract
Hematopoietic stem cell gene therapy is a promising approach for treating disorders of the hematopoietic system. Identifying combinations of cis-regulatory elements that do not impede packaging or transduction efficiency when included in lentiviral vectors has proven challenging. In this study, we deploy LV-MPRA (lentiviral vector-based, massively parallel reporter assay), an approach that simultaneously analyzes thousands of synthetic DNA fragments in parallel to identify sequence-intrinsic and lineage-specific enhancer function at near-base-pair resolution. We demonstrate the power of LV-MPRA in elucidating the boundaries of previously unknown intrinsic enhancer sequences of the human β-globin locus control region. Our approach facilitated the rapid assembly of novel therapeutic βAS3-globin lentiviral vectors harboring strong lineage-specific recombinant control elements capable of correcting a mouse model of sickle cell disease. LV-MPRA can be used to map any genomic locus for enhancer activity and facilitates the rapid development of therapeutic vectors for treating disorders of the hematopoietic system or other specific tissues and cell types.
Collapse
Affiliation(s)
- Richard A. Morgan
- Charles R. Drew University of Medicine and Science, Los Angeles, CA 90059, USA
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Feiyang Ma
- Molecular Biology Institute Interdepartmental Doctoral Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Mildred J. Unti
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Devin Brown
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Paul George Ayoub
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Curtis Tam
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Lindsay Lathrop
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Bamidele Aleshe
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Ryo Kurita
- Cell Engineering Division, RIKEN BioResource Center, Tsukuba, Ibaraki, Japan
| | - Yukio Nakamura
- Cell Engineering Division, RIKEN BioResource Center, Tsukuba, Ibaraki, Japan
| | - Shantha Senadheera
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Ryan L. Wong
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Roger P. Hollis
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Matteo Pellegrini
- Molecular Biology Institute Interdepartmental Doctoral Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Donald B. Kohn
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Pediatrics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- The Eli & Edythe Broad Center of Regenerative Medicine & Stem Cell Research, University of California, Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
20
|
Singh R, Sophiarani Y. A report on DNA sequence determinants in gene expression. Bioinformation 2020; 16:422-431. [PMID: 32831525 PMCID: PMC7434957 DOI: 10.6026/97320630016422] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 04/24/2020] [Indexed: 11/26/2022] Open
Abstract
The biased usage of nucleotides in coding sequence and its correlation with gene expression has been observed in several studies. A complex set of interactions between genes and other components of the expression system determine the amount of proteins produced from coding sequences. It is known that the elongation rate of polypeptide chain is affected by both codon usage bias and specific amino acid compositional constraints. Therefore, it is of interest to review local DNA-sequence elements and other positional as well as combinatorial constraints that play significant role in gene expression.
Collapse
Affiliation(s)
- Ravail Singh
- Indian Institute of Integrative Medicine, CSIR, Canal Road, Jammu-180001
| | | |
Collapse
|
21
|
Charting the cis-regulome of activated B cells by coupling structural and functional genomics. Nat Immunol 2019; 21:210-220. [PMID: 31873292 DOI: 10.1038/s41590-019-0565-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 11/18/2019] [Indexed: 12/25/2022]
Abstract
Cis-regulomes underlying immune-cell-specific genomic states have been extensively analyzed by structure-based chromatin profiling. By coupling such approaches with a high-throughput enhancer screen (self-transcribing active regulatory region sequencing (STARR-seq)), we assembled a functional cis-regulome for lipopolysaccharide-activated B cells. Functional enhancers, in contrast with accessible chromatin regions that lack enhancer activity, were enriched for enhancer RNAs (eRNAs) and preferentially interacted in vivo with B cell lineage-determining transcription factors. Interestingly, preferential combinatorial binding by these transcription factors was not associated with differential enrichment of their sites. Instead, active enhancers were resolved by principal component analysis (PCA) from all accessible regions by co-varying transcription factor motif scores involving a distinct set of signaling-induced transcription factors. High-resolution chromosome conformation capture (Hi-C) analysis revealed multiplex, activated enhancer-promoter configurations encompassing numerous multi-enhancer genes and multi-genic enhancers engaged in the control of divergent molecular pathways. Motif analysis of pathway-specific enhancers provides a catalog of diverse transcription factor codes for biological processes encompassing B cell activation, cycling and differentiation.
Collapse
|
22
|
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019; 20:223. [PMID: 31679514 PMCID: PMC6827219 DOI: 10.1186/s13059-019-1845-6] [Citation(s) in RCA: 112] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here, we present MaveDB ( https://www.mavedb.org ), a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first such application, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.
Collapse
Affiliation(s)
- Daniel Esposito
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
| | - Jochen Weile
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Frederick P Roth
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| |
Collapse
|
23
|
Kreimer A, Yan Z, Ahituv N, Yosef N. Meta-analysis of massively parallel reporter assays enables prediction of regulatory function across cell types. Hum Mutat 2019; 40:1299-1313. [PMID: 31131957 PMCID: PMC6771677 DOI: 10.1002/humu.23820] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 05/18/2019] [Accepted: 05/24/2019] [Indexed: 01/01/2023]
Abstract
Deciphering the potential of noncoding loci to influence gene regulation has been the subject of intense research, with important implications in understanding genetic underpinnings of human diseases. Massively parallel reporter assays (MPRAs) can measure regulatory activity of thousands of DNA sequences and their variants in a single experiment. With increasing number of publically available MPRA data sets, one can now develop data-driven models which, given a DNA sequence, predict its regulatory activity. Here, we performed a comprehensive meta-analysis of several MPRA data sets in a variety of cellular contexts. We first applied an ensemble of methods to predict MPRA output in each context and observed that the most predictive features are consistent across data sets. We then demonstrate that predictive models trained in one cellular context can be used to predict MPRA output in another, with loss of accuracy attributed to cell-type-specific features. Finally, we show that our approach achieves top performance in the Fifth Critical Assessment of Genome Interpretation "Regulation Saturation" Challenge for predicting effects of single-nucleotide variants. Overall, our analysis provides insights into how MPRA data can be leveraged to highlight functional regulatory regions throughout the genome and can guide effective design of future experiments by better prioritizing regions of interest.
Collapse
Affiliation(s)
- Anat Kreimer
- Department of Electrical Engineering and Computer Sciences, Center for Computational BiologyUniversity of CaliforniaBerkeleyCalifornia
- Department of Bioengineering and Therapeutic SciencesUniversity of California, San FranciscoSan FranciscoCalifornia
| | - Zhongxia Yan
- Department of Electrical Engineering and Computer Sciences, Center for Computational BiologyUniversity of CaliforniaBerkeleyCalifornia
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic SciencesUniversity of California, San FranciscoSan FranciscoCalifornia
| | - Nir Yosef
- Department of Electrical Engineering and Computer Sciences, Center for Computational BiologyUniversity of CaliforniaBerkeleyCalifornia
- Ragon Institute of MGH MIT and HarvardCambridgeMassachusetts
- Chan Zuckerberg BiohubSan FranciscoCalifornia
| |
Collapse
|
24
|
Movva R, Greenside P, Marinov GK, Nair S, Shrikumar A, Kundaje A. Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays. PLoS One 2019; 14:e0218073. [PMID: 31206543 PMCID: PMC6576758 DOI: 10.1371/journal.pone.0218073] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 05/24/2019] [Indexed: 11/19/2022] Open
Abstract
The relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of ∼500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.
Collapse
Affiliation(s)
- Rajiv Movva
- The Harker School, San Jose, CA, United States of America
- Department of Genetics, Stanford University, Stanford, CA, United States of America
| | - Peyton Greenside
- Biomedical Informatics Training Program, Stanford University, Stanford, CA, United States of America
| | - Georgi K. Marinov
- Department of Genetics, Stanford University, Stanford, CA, United States of America
| | - Surag Nair
- Department of Computer Science, Stanford University, Stanford, CA, United States of America
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA, United States of America
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA, United States of America
- Department of Computer Science, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
25
|
Qiu C, Kaplan CD. Functional assays for transcription mechanisms in high-throughput. Methods 2019; 159-160:115-123. [PMID: 30797033 PMCID: PMC6589137 DOI: 10.1016/j.ymeth.2019.02.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 02/18/2019] [Indexed: 01/12/2023] Open
Abstract
Dramatic increases in the scale of programmed synthesis of nucleic acid libraries coupled with deep sequencing have powered advances in understanding nucleic acid and protein biology. Biological systems centering on nucleic acids or encoded proteins greatly benefit from such high-throughput studies, given that large DNA variant pools can be synthesized and DNA, or RNA products of transcription, can be easily analyzed by deep sequencing. Here we review the scope of various high-throughput functional assays for studies of nucleic acids and proteins in general, followed by discussion of how these types of study have yielded insights into the RNA Polymerase II (Pol II) active site as an example. We discuss methodological considerations in the design and execution of these experiments that should be valuable to studies in any system.
Collapse
Affiliation(s)
- Chenxi Qiu
- Department of Medicine, Division of Translational Therapeutics, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Craig D Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
26
|
Lentiviral Vectors as Tools for the Study and Treatment of Glioblastoma. Cancers (Basel) 2019; 11:cancers11030417. [PMID: 30909628 PMCID: PMC6468594 DOI: 10.3390/cancers11030417] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Revised: 03/06/2019] [Accepted: 03/19/2019] [Indexed: 12/17/2022] Open
Abstract
Glioblastoma (GBM) has the worst prognosis among brain tumors, hence basic biology, preclinical, and clinical studies are necessary to design effective strategies to defeat this disease. Gene transfer vectors derived from the most-studied lentivirus-the Human Immunodeficiency Virus type 1-have wide application in dissecting GBM specific features to identify potential therapeutic targets. Last-generation lentiviruses (LV), highly improved in safety profile and gene transfer capacity, are also largely employed as delivery systems of therapeutic molecules to be employed in gene therapy (GT) approaches. LV were initially used in GT protocols aimed at the expression of suicide factors to induce GBM cell death. Subsequently, LV were adopted to either express small noncoding RNAs to affect different aspects of GBM biology or to overcome the resistance to both chemo- and radiotherapy that easily develop in this tumor after initial therapy. Newer frontiers include adoption of LV for engineering T cells to express chimeric antigen receptors recognizing specific GBM antigens, or for transducing specific cell types that, due to their biological properties, can function as carriers of therapeutic molecules to the cancer mass. Finally, LV allow the setting up of improved animal models crucial for the validation of GBM specific therapies.
Collapse
|
27
|
Myint L, Avramopoulos DG, Goff LA, Hansen KD. Linear models enable powerful differential activity analysis in massively parallel reporter assays. BMC Genomics 2019; 20:209. [PMID: 30866806 PMCID: PMC6417258 DOI: 10.1186/s12864-019-5556-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 02/22/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Massively parallel reporter assays (MPRAs) have emerged as a popular means for understanding noncoding variation in a variety of conditions. While a large number of experiments have been described in the literature, analysis typically uses ad-hoc methods. There has been little attention to comparing performance of methods across datasets. RESULTS We present the mpralm method which we show is calibrated and powerful, by analyzing its performance on multiple MPRA datasets. We show that it outperforms existing statistical methods for analysis of this data type, in the first comprehensive evaluation of statistical methods on several datasets. We investigate theoretical and real-data properties of barcode summarization methods and show an unappreciated impact of summarization method for some datasets. Finally, we use our model to conduct a power analysis for this assay and show substantial improvements in power by performing up to 6 replicates per condition, whereas sequencing depth has smaller impact; we recommend to always use at least 4 replicates. An R package is available from the Bioconductor project. CONCLUSIONS Together, these results inform recommendations for differential analysis, general group comparisons, and power analysis and will help improve design and analysis of MPRA experiments.
Collapse
Affiliation(s)
- Leslie Myint
- Department of Mathematics, Statistics, and Computer Science, Macalester College, 1600 Grand Ave, Saint Paul, MN 55105 USA
| | | | - Loyal A. Goff
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, USA
| | - Kasper D. Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe St, E3527, Baltimore, MD 21212 USA
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA
| |
Collapse
|
28
|
Weingarten-Gabbay S, Nir R, Lubliner S, Sharon E, Kalma Y, Weinberger A, Segal E. Systematic interrogation of human promoters. Genome Res 2019; 29:171-183. [PMID: 30622120 PMCID: PMC6360817 DOI: 10.1101/gr.236075.118] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Accepted: 12/05/2018] [Indexed: 12/19/2022]
Abstract
Despite much research, our understanding of the architecture and cis-regulatory elements of human promoters is still lacking. Here, we devised a high-throughput assay to quantify the activity of approximately 15,000 fully designed sequences that we integrated and expressed from a fixed location within the human genome. We used this method to investigate thousands of native promoters and preinitiation complex (PIC) binding regions followed by in-depth characterization of the sequence motifs underlying promoter activity, including core promoter elements and TF binding sites. We find that core promoters drive transcription mostly unidirectionally and that sequences originating from promoters exhibit stronger activity than those originating from enhancers. By testing multiple synthetic configurations of core promoter elements, we dissect the motifs that positively and negatively regulate transcription as well as the effect of their combinations and distances, including a 10-bp periodicity in the optimal distance between the TATA and the initiator. By comprehensively screening 133 TF binding sites, we find that in contrast to core promoters, TF binding sites maintain similar activity levels in both orientations, supporting a model by which divergent transcription is driven by two distinct unidirectional core promoters sharing bidirectional TF binding sites. Finally, we find a striking agreement between the effect of binding site multiplicity of individual TFs in our assay and their tendency to appear in homotypic clusters throughout the genome. Overall, our study systematically assays the elements that drive expression in core and proximal promoter regions and sheds light on organization principles of regulatory regions in the human genome.
Collapse
Affiliation(s)
- Shira Weingarten-Gabbay
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Ronit Nir
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Shai Lubliner
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Eilon Sharon
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Yael Kalma
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Adina Weinberger
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Eran Segal
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
29
|
Multiplexed assays of variant effects contribute to a growing genotype-phenotype atlas. Hum Genet 2018; 137:665-678. [PMID: 30073413 PMCID: PMC6153521 DOI: 10.1007/s00439-018-1916-x] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 07/21/2018] [Indexed: 12/12/2022]
Abstract
Given the constantly improving cost and speed of genome sequencing, it is reasonable to expect that personal genomes will soon be known for many millions of humans. This stands in stark contrast with our limited ability to interpret the sequence variants which we find. Although it is, perhaps, easiest to interpret variants in coding regions, knowledge of functional impact is unknown for the vast majority of missense variants. While many computational approaches can predict the impact of coding variants, they are given a little weight in the current guidelines for interpreting clinical variants. Laboratory assays produce comparatively more trustworthy results, but until recently did not scale to the space of all possible mutations. The development of deep mutational scanning and other multiplexed assays of variant effect has now brought feasibility of this endeavour within view. Here, we review progress in this field over the last decade, break down the different approaches into their components, and compare methodological differences.
Collapse
|
30
|
Hegde M, Strand C, Hanna RE, Doench JG. Uncoupling of sgRNAs from their associated barcodes during PCR amplification of combinatorial CRISPR screens. PLoS One 2018; 13:e0197547. [PMID: 29799876 PMCID: PMC5969736 DOI: 10.1371/journal.pone.0197547] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 05/03/2018] [Indexed: 12/26/2022] Open
Abstract
Many implementations of pooled screens in mammalian cells rely on linking an element of interest to a barcode, with the latter subsequently quantitated by next generation sequencing. However, substantial uncoupling between these paired elements during lentiviral production has been reported, especially as the distance between elements increases. We detail that PCR amplification is another major source of uncoupling, and becomes more pronounced with increased amounts of DNA template molecules and PCR cycles. To lessen uncoupling in systems that use paired elements for detection, we recommend minimizing the distance between elements, using low and equal template DNA inputs for plasmid and genomic DNA during PCR, and minimizing the number of PCR cycles. We also present a vector design for conducting combinatorial CRISPR screens that enables accurate barcode-based detection with a single short sequencing read and minimal uncoupling.
Collapse
Affiliation(s)
- Mudra Hegde
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Christine Strand
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Ruth E. Hanna
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - John G. Doench
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| |
Collapse
|
31
|
Systematic approach for dissecting the molecular mechanisms of transcriptional regulation in bacteria. Proc Natl Acad Sci U S A 2018; 115:E4796-E4805. [PMID: 29728462 PMCID: PMC6003448 DOI: 10.1073/pnas.1722055115] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Organisms must constantly make regulatory decisions in response to a change in cellular state or environment. However, while the catalog of genomes expands rapidly, we remain ignorant about how the genes in these genomes are regulated. Here, we show how a massively parallel reporter assay, Sort-Seq, and information-theoretic modeling can be used to identify regulatory sequences. We then use chromatography and mass spectrometry to identify the regulatory proteins that bind these sequences. The approach results in quantitative base pair-resolution models of promoter mechanism and was shown in both well-characterized and unannotated promoters in Escherichia coli. Given the generality of the approach, it opens up the possibility of quantitatively dissecting the mechanisms of promoter function in a wide range of bacteria. Gene regulation is one of the most ubiquitous processes in biology. However, while the catalog of bacterial genomes continues to expand rapidly, we remain ignorant about how almost all of the genes in these genomes are regulated. At present, characterizing the molecular mechanisms by which individual regulatory sequences operate requires focused efforts using low-throughput methods. Here, we take a first step toward multipromoter dissection and show how a combination of massively parallel reporter assays, mass spectrometry, and information-theoretic modeling can be used to dissect multiple bacterial promoters in a systematic way. We show this approach on both well-studied and previously uncharacterized promoters in the enteric bacterium Escherichia coli. In all cases, we recover nucleotide-resolution models of promoter mechanism. For some promoters, including previously unannotated ones, the approach allowed us to further extract quantitative biophysical models describing input–output relationships. Given the generality of the approach presented here, it opens up the possibility of quantitatively dissecting the mechanisms of promoter function in E. coli and a wide range of other bacteria.
Collapse
|
32
|
Muerdter F, Boryń ŁM, Woodfin AR, Neumayr C, Rath M, Zabidi MA, Pagani M, Haberle V, Kazmar T, Catarino RR, Schernhuber K, Arnold CD, Stark A. Resolving systematic errors in widely used enhancer activity assays in human cells. Nat Methods 2018; 15:141-149. [PMID: 29256496 PMCID: PMC5793997 DOI: 10.1038/nmeth.4534] [Citation(s) in RCA: 104] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 11/08/2017] [Indexed: 12/19/2022]
Abstract
The identification of transcriptional enhancers in the human genome is a prime goal in biology. Enhancers are typically predicted via chromatin marks, yet their function is primarily assessed with plasmid-based reporter assays. Here, we show that such assays are rendered unreliable by two previously reported phenomena relating to plasmid transfection into human cells: (i) the bacterial plasmid origin of replication (ORI) functions as a conflicting core promoter and (ii) a type I interferon (IFN-I) response is activated. These cause confounding false positives and negatives in luciferase assays and STARR-seq screens. We overcome both problems by employing the ORI as core promoter and by inhibiting two IFN-I-inducing kinases, enabling genome-wide STARR-seq screens in human cells. In HeLa-S3 cells, we uncover strong enhancers, IFN-I-induced enhancers, and enhancers endogenously silenced at the chromatin level. Our findings apply to all episomal enhancer activity assays in mammalian cells and are key to the characterization of human enhancers.
Collapse
Affiliation(s)
- Felix Muerdter
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Łukasz M Boryń
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Ashley R Woodfin
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Christoph Neumayr
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Martina Rath
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Muhammad A Zabidi
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Michaela Pagani
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Vanja Haberle
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Tomáš Kazmar
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Rui R Catarino
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Katharina Schernhuber
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Cosmas D Arnold
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
- Medical University of Vienna, Vienna Biocenter (VBC), Vienna, Austria
| |
Collapse
|
33
|
Chaudhari HG, Cohen BA. Local sequence features that influence AP-1 cis-regulatory activity. Genome Res 2018; 28:171-181. [PMID: 29305491 PMCID: PMC5793781 DOI: 10.1101/gr.226530.117] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 12/22/2017] [Indexed: 01/05/2023]
Abstract
In the genome, most occurrences of transcription factor binding sites (TFBS) have no cis-regulatory activity, which suggests that flanking sequences contain information that distinguishes functional from nonfunctional TFBS. We interrogated the role of flanking sequences near Activator Protein 1 (AP-1) binding sites that reside in DNase I Hypersensitive Sites (DHS) and regions annotated as Enhancers. In these regions, we found that sequence features directly adjacent to the core motif distinguish high from low activity AP-1 sites. Some nearby features are motifs for other TFs that genetically interact with the AP-1 site. Other features are extensions of the AP-1 core motif, which cause the extended sites to match motifs of multiple AP-1 binding proteins. Computational models trained on these data distinguish between sequences with high and low activity AP-1 sites and also predict changes in cis-regulatory activity due to mutations in AP-1 core sites and their flanking sequences. Our results suggest that extended AP-1 binding sites, together with adjacent binding sites for additional TFs, encode part of the information that governs TFBS activity in the genome.
Collapse
Affiliation(s)
- Hemangi G Chaudhari
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, Saint Louis, Missouri 63110, USA
| | - Barak A Cohen
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, Saint Louis, Missouri 63110, USA
| |
Collapse
|
34
|
Tycko J, Van MV, Elowitz MB, Bintu L. Advancing towards a global mammalian gene regulation model through single-cell analysis and synthetic biology. CURRENT OPINION IN BIOMEDICAL ENGINEERING 2017. [DOI: 10.1016/j.cobme.2017.10.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
35
|
Barr KA, Martinez C, Moran JR, Kim AR, Ramos AF, Reinitz J. Synthetic enhancer design by in silico compensatory evolution reveals flexibility and constraint in cis-regulation. BMC SYSTEMS BIOLOGY 2017; 11:116. [PMID: 29187214 PMCID: PMC5708098 DOI: 10.1186/s12918-017-0485-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 11/09/2017] [Indexed: 11/12/2022]
Abstract
BACKGROUND Models that incorporate specific chemical mechanisms have been successful in describing the activity of Drosophila developmental enhancers as a function of underlying transcription factor binding motifs. Despite this, the minimum set of mechanisms required to reconstruct an enhancer from its constituent parts is not known. Synthetic biology offers the potential to test the sufficiency of known mechanisms to describe the activity of enhancers, as well as to uncover constraints on the number, order, and spacing of motifs. RESULTS Using a functional model and in silico compensatory evolution, we generated putative synthetic even-skipped stripe 2 enhancers with varying degrees of similarity to the natural enhancer. These elements represent the evolutionary trajectories of the natural stripe 2 enhancer towards two synthetic enhancers designed ab initio. In the first trajectory, spatially regulated expression was maintained, even after more than a third of binding sites were lost. In the second, sequences with high similarity to the natural element did not drive expression, but a highly diverged sequence about half the length of the minimal stripe 2 enhancer drove ten times greater expression. Additionally, homotypic clusters of Zelda or Stat92E motifs, but not Bicoid, drove expression in developing embryos. CONCLUSIONS Here, we present a functional model of gene regulation to test the degree to which the known transcription factors and their interactions explain the activity of the Drosophila even-skipped stripe 2 enhancer. Initial success in the first trajectory showed that the gene regulation model explains much of the function of the stripe 2 enhancer. Cases where expression deviated from prediction indicates that undescribed factors likely act to modulate expression. We also showed that activation driven Bicoid and Hunchback is highly sensitive to spatial organization of binding motifs. In contrast, Zelda and Stat92E drive expression from simple homotypic clusters, suggesting that activation driven by these factors is less constrained. Collectively, the 40 sequences generated in this work provides a powerful training set for building future models of gene regulation.
Collapse
Affiliation(s)
- Kenneth A Barr
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Zoology 111, 1101 E 57th St, Chicago, 60637, Illinois, USA.
- Department of Ecology and Evolution, The University of Chicago, Chicago, 60637, Illinois, USA.
| | - Carlos Martinez
- Department Biochemistry and Molecular Genetics, Northwestern University, Chicago, 60611, Illinois, USA
| | - Jennifer R Moran
- Department Human Genetics, The University of Chicago, Chicago, 60637, Illinois, USA
- Institute for Genomics & Systems Biology, The University of Chicago, Chicago, 60637, Illinois, USA
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, 37554, Gyeongbuk, South Korea
| | - Alexandre F Ramos
- Departamento de Radiologia - Faculdade de Medicina, Universidade de São Paulo & Instituto do Câncer do Estado de São Paulo, São Paulo, SP CEP, 05403-911, Brazil
- Escola de Artes, Ciências e Humanidades & Núcleo de Estudos Interdisciplinares em Sistemas Complexos, Universidade de São Paulo, Av. Arlindo Béttio, São Paulo, 1000 CEP 03828-000, SP, Brazil
| | - John Reinitz
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Zoology 111, 1101 E 57th St, Chicago, 60637, Illinois, USA
- Department of Ecology and Evolution, The University of Chicago, Chicago, 60637, Illinois, USA
- Institute for Genomics & Systems Biology, The University of Chicago, Chicago, 60637, Illinois, USA
- Department Statistics, The University of Chicago, 5747 S. Ellis Avenue Jones 312, Chicago, 60637, IL, USA
| |
Collapse
|
36
|
Dougherty JD, Yang C, Lake AM. Systems biology in the central nervous system: a brief perspective on essential recent advancements. CURRENT OPINION IN SYSTEMS BIOLOGY 2017; 3:67-76. [PMID: 29057378 PMCID: PMC5648337 DOI: 10.1016/j.coisb.2017.04.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
As recent advances in human genetics have begun to more rapidly identify the individual genes contributing to risk of psychiatric disease, the spotlight now turns to understanding how disruption of these genes alters the brain, and thus behavior. Compared to other tissues, cellular complexity in the brain provides both a substantial challenge and a significant opportunity for systems biology approaches. Current methods are maturing that will allow for finally defining the 'parts list' for the functioning mouse and human brains, enabling new approaches to defining how the system goes awry in disorders of the CNS. However, the availability of tissue is certainly a challenge for systems biology of neuroscience, compared to systems biology of other tissues, where biopsy is feasible. This challenge is particularly notable for disorders caused by extremely rare genetic variants. Thus computational and systems biology approaches, as well as precise experimental models by way of genome editing, will play key roles in defining mechanisms for disorders, and their individual symptoms, across varied genetic etiologies. Here, we highlight recent progress in neurogenetics, postmortem genomics, cell-type specific profiling, and precision modeling toward defining mechanisms in disease.
Collapse
Affiliation(s)
- Joseph D. Dougherty
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Chengran Yang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Allison M. Lake
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
37
|
Wouters J, Kalender Atak Z, Aerts S. Decoding transcriptional states in cancer. Curr Opin Genet Dev 2017; 43:82-92. [PMID: 28129557 DOI: 10.1016/j.gde.2017.01.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 01/05/2017] [Accepted: 01/09/2017] [Indexed: 12/27/2022]
Abstract
Gene regulatory networks determine cellular identity. In cancer, aberrations of gene networks are caused by driver mutations that often affect transcription factors and chromatin modifiers. Nevertheless, gene transcription in cancer follows the same cis-regulatory rules as normal cells, and cancer cells have served as convenient model systems to study transcriptional regulation. Tumours often show regulatory heterogeneity, with subpopulations of cells in different transcriptional states, which has important therapeutic implications. Here, we review recent experimental and computational techniques to reverse engineer cancer gene networks using transcriptome and epigenome data. New algorithms, data integration strategies, and increasing amounts of single cell genomics data provide exciting opportunities to model dynamic regulatory states at unprecedented resolution.
Collapse
Affiliation(s)
- Jasper Wouters
- Laboratory of Computational Biology, VIB Center for Brain & Disease Research, Leuven, Belgium; Department of Human Genetics, KU Leuven (University of Leuven), Leuven, Belgium
| | - Zeynep Kalender Atak
- Laboratory of Computational Biology, VIB Center for Brain & Disease Research, Leuven, Belgium; Department of Human Genetics, KU Leuven (University of Leuven), Leuven, Belgium
| | - Stein Aerts
- Laboratory of Computational Biology, VIB Center for Brain & Disease Research, Leuven, Belgium; Department of Human Genetics, KU Leuven (University of Leuven), Leuven, Belgium.
| |
Collapse
|