1
|
Benner P, Vingron M. Quantifying the tissue-specific regulatory information within enhancer DNA sequences. NAR Genom Bioinform 2021; 3:lqab095. [PMID: 34729474 PMCID: PMC8557370 DOI: 10.1093/nargab/lqab095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 09/23/2021] [Accepted: 09/28/2021] [Indexed: 12/04/2022] Open
Abstract
Recent efforts to measure epigenetic marks across a wide variety of different cell types and tissues provide insights into the cell type-specific regulatory landscape. We use these data to study whether there exists a correlate of epigenetic signals in the DNA sequence of enhancers and explore with computational methods to what degree such sequence patterns can be used to predict cell type-specific regulatory activity. By constructing classifiers that predict in which tissues enhancers are active, we are able to identify sequence features that might be recognized by the cell in order to regulate gene expression. While classification performances vary greatly between tissues, we show examples where our classifiers correctly predict tissue-specific regulation from sequence alone. We also show that many of the informative patterns indeed harbor transcription factor footprints.
Collapse
Affiliation(s)
- Philipp Benner
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 73, 14195 Berlin, Germany
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 73, 14195 Berlin, Germany
| |
Collapse
|
2
|
Benner P. Computing Leapfrog Regularization Paths with Applications to Large-Scale K-mer Logistic Regression. J Comput Biol 2021; 28:560-569. [PMID: 33739865 PMCID: PMC8219187 DOI: 10.1089/cmb.2020.0284] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
High-dimensional statistics deals with statistical inference when the number of parameters or featurespexceeds the number of observationsn(i.e.,p≫n). In this case, the parameter space must be constrained either by regularization or by selecting a small subset ofm≤nfeatures. Feature selection throughl1-regularization combines the benefits of both approaches and has proven to yield good results in practice. However, the functional relation between the regularization strengthλand the number of selected featuresmis difficult to determine. Hence, parameters are typically estimated for all possible regularization strengthsλ. These so-called regularization paths can be expensive to compute and most solutions may not even be of interest to the problem at hand. As an alternative, an algorithm is proposed that determines thel1-regularization strengthλiteratively for a fixedm. The algorithm can be used to compute leapfrog regularization paths by subsequently increasingm.
Collapse
Affiliation(s)
- Philipp Benner
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
3
|
Hammelman J, Krismer K, Banerjee B, Gifford DK, Sherwood RI. Identification of determinants of differential chromatin accessibility through a massively parallel genome-integrated reporter assay. Genome Res 2020; 30:1468-1480. [PMID: 32973041 PMCID: PMC7605270 DOI: 10.1101/gr.263228.120] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 08/26/2020] [Indexed: 12/20/2022]
Abstract
A key mechanism in cellular regulation is the ability of the transcriptional machinery to physically access DNA. Transcription factors interact with DNA to alter the accessibility of chromatin, which enables changes to gene expression during development or disease or as a response to environmental stimuli. However, the regulation of DNA accessibility via the recruitment of transcription factors is difficult to study in the context of the native genome because every genomic site is distinct in multiple ways. Here we introduce the multiplexed integrated accessibility assay (MIAA), an assay that measures chromatin accessibility of synthetic oligonucleotide sequence libraries integrated into a controlled genomic context with low native accessibility. We apply MIAA to measure the effects of sequence motifs on cell type-specific accessibility between mouse embryonic stem cells and embryonic stem cell-derived definitive endoderm cells, screening 7905 distinct DNA sequences. MIAA recapitulates differential accessibility patterns of 100-nt sequences derived from natively differential genomic regions, identifying E-box motifs common to epithelial-mesenchymal transition driver transcription factors in stem cell-specific accessible regions that become repressed in endoderm. We show that a single binding motif for a key regulatory transcription factor is sufficient to open chromatin, and classify sets of stem cell-specific, endoderm-specific, and shared accessibility-modifying transcription factor motifs. We also show that overexpression of two definitive endoderm transcription factors, T and Foxa2, results in changes to accessibility in DNA sequences containing their respective DNA-binding motifs and identify preferential motif arrangements that influence accessibility.
Collapse
Affiliation(s)
- Jennifer Hammelman
- Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Konstantin Krismer
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Budhaditya Banerjee
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | - David K Gifford
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Richard I Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
- Hubrecht Institute, 3584 CT Utrecht, Netherlands
| |
Collapse
|
4
|
Szczesnik T, Chu L, Ho JWK, Sherwood RI. A High-Throughput Genome-Integrated Assay Reveals Spatial Dependencies Governing Tcf7l2 Binding. Cell Syst 2020; 11:315-327.e5. [PMID: 32910904 PMCID: PMC7530048 DOI: 10.1016/j.cels.2020.08.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Revised: 06/03/2020] [Accepted: 08/04/2020] [Indexed: 12/17/2022]
Abstract
Predicting where transcription factors bind in the genome from their in vitro DNA-binding affinity is confounded by the large number of possible interactions with nearby transcription factors. To characterize the in vivo binding logic for the Wnt effector Tcf7l2, we developed a high-throughput screening platform in which thousands of synthesized DNA phrases are inserted into a specific genomic locus, followed by measurement of Tcf7l2 binding by DamID. Using this platform at two genomic loci in mouse embryonic stem cells, we show that while the binding of Tcf7l2 closely follows the in vitro motif-binding strength and is influenced by local chromatin accessibility, it is also strongly affected by the surrounding 99 bp of sequence. Through controlled sequence perturbation, we show that Oct4 and Klf4 motifs promote Tcf7l2 binding, particularly in the adjacent ∼50 bp and oscillating with a 10.8-bp phasing relative to these cofactor motifs, which matches the turn of a DNA helix.
Collapse
Affiliation(s)
- Tomasz Szczesnik
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia; St Vincent's Clinical School, University of New South Wales, Darlinghurst, NSW 2010, Australia; Department of Biosystems Science and Engineering, ETH Zürich, Mattenstrasse 26, Basel 4058, Switzerland
| | - Lendy Chu
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Joshua W K Ho
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia; St Vincent's Clinical School, University of New South Wales, Darlinghurst, NSW 2010, Australia; School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Richard I Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Hubrecht Institute, 3584 CT Utrecht, the Netherlands.
| |
Collapse
|
5
|
Hujoel MLA, Gazal S, Hormozdiari F, van de Geijn B, Price AL. Disease Heritability Enrichment of Regulatory Elements Is Concentrated in Elements with Ancient Sequence Age and Conserved Function across Species. Am J Hum Genet 2019; 104:611-624. [PMID: 30905396 PMCID: PMC6451699 DOI: 10.1016/j.ajhg.2019.02.008] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 02/05/2019] [Indexed: 02/06/2023] Open
Abstract
Regulatory elements, e.g., enhancers and promoters, have been widely reported to be enriched for disease and complex trait heritability. We investigated how this enrichment varies with the age of the underlying genome sequence, the conservation of regulatory function across species, and the target gene of the regulatory element. We estimated heritability enrichment by applying stratified LD score regression to summary statistics from 41 independent diseases and complex traits (average N = 320K) and meta-analyzing results across traits. Enrichment of human putative enhancers and promoters was larger in elements with older sequence age, assessed via alignment with other species irrespective of conserved functionality: putative enhancer elements with ancient sequence age (older than the split between marsupial and placental mammals) were 8.8× enriched (versus 2.5× for all putative enhancers; p = 3e-14), and promoter elements with ancient sequence age were 13.5× enriched (versus 5.1× for all promoters; p = 5e-16). Enrichment of human putative enhancers and promoters was also larger in elements whose regulatory function was conserved across species, e.g., human putative enhancers that were enhancers in ≥5 of 9 other mammals were 4.6× enriched (p = 5e-12 versus all putative enhancers). Enrichment of human promoters was larger in promoters of loss-of-function intolerant genes: 12.0× enrichment (p = 8e-15 versus all promoters). The mean value of several measures of negative selection within these genomic annotations mirrored all of these findings. Notably, the annotations with these excess heritability enrichments were jointly significant conditional on each other and on our baseline-LD model, which includes a broad set of coding, conserved, regulatory, and LD-related annotations.
Collapse
Affiliation(s)
- Margaux L A Hujoel
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Division of Biostatistics, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Bryce van de Geijn
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alkes L Price
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
6
|
van Bömmel A, Love MI, Chung HR, Vingron M. coTRaCTE predicts co-occurring transcription factors within cell-type specific enhancers. PLoS Comput Biol 2018; 14:e1006372. [PMID: 30142147 PMCID: PMC6126874 DOI: 10.1371/journal.pcbi.1006372] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Revised: 09/06/2018] [Accepted: 07/17/2018] [Indexed: 02/06/2023] Open
Abstract
Cell-type specific gene expression is regulated by the combinatorial action of transcription factors (TFs). In this study, we predict transcription factor (TF) combinations that cooperatively bind in a cell-type specific manner. We first divide DNase hypersensitive sites into cell-type specifically open vs. ubiquitously open sites in 64 cell types to describe possible cell-type specific enhancers. Based on the pattern contrast between these two groups of sequences we develop "co-occurring TF predictor on Cell-Type specific Enhancers" (coTRaCTE) - a novel statistical method to determine regulatory TF co-occurrences. Contrasting the co-binding of TF pairs between cell-type specific and ubiquitously open chromatin guarantees the high cell-type specificity of the predictions. coTRaCTE predicts more than 2000 co-occurring TF pairs in 64 cell types. The large majority (70%) of these TF pairs is highly cell-type specific and overlaps in TF pair co-occurrence are highly consistent among related cell types. Furthermore, independently validated co-occurring and directly interacting TFs are significantly enriched in our predictions. Focusing on the regulatory network derived from the predicted co-occurring TF pairs in embryonic stem cells (ESCs) we find that it consists of three subnetworks with distinct functions: maintenance of pluripotency governed by OCT4, SOX2 and NANOG, regulation of early development governed by KLF4, STAT3, ZIC3 and ZNF148 and general functions governed by MYC, TCF3 and YY1. In summary, coTRaCTE predicts highly cell-type specific co-occurring TFs which reveal new insights into transcriptional regulatory mechanisms.
Collapse
Affiliation(s)
- Alena van Bömmel
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Michael I. Love
- Department of Biostatistics, Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Ho-Ryun Chung
- Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Philipps-Universität Marburg, Fachbereich Medizin, Institut für Medizinische Bioinformatik und Biostatistik, Marburg, Germany
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
- * E-mail:
| |
Collapse
|
7
|
Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res 2018; 28:739-750. [PMID: 29588361 PMCID: PMC5932613 DOI: 10.1101/gr.227819.117] [Citation(s) in RCA: 231] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 03/23/2018] [Indexed: 01/10/2023]
Abstract
Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system to predict cell-type-specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. By use of convolutional neural networks, this system identifies promoters and distal regulatory elements and synthesizes their content to make effective gene expression predictions. We show that model predictions for the influence of genomic variants on gene expression align well to causal variants underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses to enable fine mapping of disease loci.
Collapse
Affiliation(s)
| | - Yakir A Reshef
- Department of Computer Science, Harvard University, Cambridge, Massachusetts 02138, USA
| | | | | | | | - Jasper Snoek
- Google Brain, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
8
|
Banovich NE, Li YI, Raj A, Ward MC, Greenside P, Calderon D, Tung PY, Burnett JE, Myrthil M, Thomas SM, Burrows CK, Romero IG, Pavlovic BJ, Kundaje A, Pritchard JK, Gilad Y. Impact of regulatory variation across human iPSCs and differentiated cells. Genome Res 2017; 28:122-131. [PMID: 29208628 PMCID: PMC5749177 DOI: 10.1101/gr.224436.117] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 11/20/2017] [Indexed: 12/17/2022]
Abstract
Induced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation on gene regulation across different cell types and as models for studies of complex disease. To do so, we established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring gene expression levels, chromatin accessibility, and DNA methylation. Our analysis focused on a comparison of inter-individual regulatory variation across cell types. While most cell-type-specific regulatory quantitative trait loci (QTLs) lie in chromatin that is open only in the affected cell types, we found that 20% of cell-type-specific regulatory QTLs are in shared open chromatin. This observation motivated us to develop a deep neural network to predict open chromatin regions from DNA sequence alone. Using this approach, we were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on cell-type-specific chromatin accessibility.
Collapse
Affiliation(s)
- Nicholas E Banovich
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Yang I Li
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Anil Raj
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Michelle C Ward
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.,Department of Medicine, University of Chicago, Chicago, Illinois 60637, USA
| | - Peyton Greenside
- Department of Biomedical Informatics, Stanford University, Stanford, California 94305, USA
| | - Diego Calderon
- Department of Biomedical Informatics, Stanford University, Stanford, California 94305, USA
| | - Po Yuan Tung
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.,Department of Medicine, University of Chicago, Chicago, Illinois 60637, USA
| | - Jonathan E Burnett
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Marsha Myrthil
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Samantha M Thomas
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Courtney K Burrows
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Irene Gallego Romero
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Bryan J Pavlovic
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, California 94305, USA.,Department of Biology, Stanford University, Stanford, California 94305, USA.,Howard Hughes Medical Institute, Stanford University, Stanford, California 94305, USA
| | - Yoav Gilad
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.,Department of Medicine, University of Chicago, Chicago, Illinois 60637, USA
| |
Collapse
|
9
|
Canver MC, Bauer DE, Orkin SH. Functional interrogation of non-coding DNA through CRISPR genome editing. Methods 2017; 121-122:118-129. [PMID: 28288828 PMCID: PMC5483188 DOI: 10.1016/j.ymeth.2017.03.008] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2017] [Revised: 02/18/2017] [Accepted: 03/03/2017] [Indexed: 12/26/2022] Open
Abstract
Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA.
Collapse
Affiliation(s)
| | - Daniel E Bauer
- Harvard Medical School, Boston, MA 02115, United States; Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, United States; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02115, United States.
| | - Stuart H Orkin
- Harvard Medical School, Boston, MA 02115, United States; Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, United States; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02115, United States; Howard Hughes Medical Institute, Boston, MA 02115, United States.
| |
Collapse
|
10
|
Chasman D, Roy S. Inference of cell type specific regulatory networks on mammalian lineages. ACTA ACUST UNITED AC 2017; 2:130-139. [PMID: 29082337 DOI: 10.1016/j.coisb.2017.04.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Transcriptional regulatory networks are at the core of establishing cell type specific gene expression programs. In mammalian systems, such regulatory networks are determined by multiple levels of regulation, including by transcription factors, chromatin environment, and three-dimensional organization of the genome. Recent efforts to measure diverse regulatory genomic datasets across multiple cell types and tissues offer unprecedented opportunities to examine the context-specificity and dynamics of regulatory networks at a greater resolution and scale than before. In parallel, numerous computational approaches to analyze these data have emerged that serve as important tools for understanding mammalian cell type specific regulation. In this article, we review recent computational approaches to predict the expression and sequence-based regulators of a gene's expression level and examine long-range gene regulation. We highlight promising approaches, insights gained, and open challenges that need to be overcome to build a comprehensive picture of cell type specific transcriptional regulatory networks.
Collapse
Affiliation(s)
- Deborah Chasman
- Wisconsin Institute for Discovery University of Wisconsin-Madison, Madison, WI 53715
| | - Sushmita Roy
- Wisconsin Institute for Discovery University of Wisconsin-Madison, Madison, WI 53715.,Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, WI 53792
| |
Collapse
|
11
|
Abstract
The vast majority of somatic variants in cancer genomes occur in non-coding regions. However, progress in cancer genomics in the past decade has been mostly focused on coding regions, largely due to the prohibitive cost of whole genome sequencing (WGS). Recent technological advances have decreased sequencing costs leading to the current acquisition of thousands of tumor whole genome sequences which has led to a hunt for non-coding drivers. The most well characterized regulatory drivers are in the TERT promoter and have been identified in many cancer types. Despite the larger fraction of somatic variants occurring in non-coding regions, the number of non-coding drivers identified so far is much less than the number of coding region drivers. Here we discuss reasons that may hinder the detection of non-coding drivers. We also examine the relationship between non-coding genetic variation and epigenetic state in tumor cells and assert the need for additional epigenetic data sets as a prerequisite for understanding the rewiring of regulatory networks in cancer.
Collapse
|
12
|
Abstract
Due to plummeting costs, whole genome sequencing of patients and cancers will soon become routine medical practice; however, we cannot currently predict how non-coding genotype affects cellular gene expression. Gene regulation research has recently been dominated by observational approaches that correlate chromatin state with regulatory function. These approaches are limited to the available genotypes and cannot scratch the surface of possible sequence combinations, and thus there is a need for perturbation-based approaches to better understand how DNA encodes gene regulatory functions. CRISPR/Cas9 genome editing has revolutionized our ability to alter genome sequence, and CRISPR/Cas9-based assays have already begun to contribute to new paradigms of gene regulation. We discuss the variety of arenas in which current and future CRISPR-based technologies will aid in developing predictive understanding of how genome sequence leads to gene regulatory function.
Collapse
Affiliation(s)
- Budhaditya Banerjee
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
| | - Richard I Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
- Hubrecht Institute and UMC Utrecht, Uppsalalaan 8, 3584 CT Utrecht, The Netherlands
| |
Collapse
|