1
|
Ordoñez R, Zhang W, Ellis G, Zhu Y, Ashe HJ, Ribeiro-Dos-Santos AM, Brosh R, Huang E, Hogan MS, Boeke JD, Maurano MT. Genomic context sensitizes regulatory elements to genetic disruption. Mol Cell 2024; 84:1842-1854.e7. [PMID: 38759624 PMCID: PMC11104518 DOI: 10.1016/j.molcel.2024.04.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 03/11/2024] [Accepted: 04/18/2024] [Indexed: 05/19/2024]
Abstract
Genomic context critically modulates regulatory function but is difficult to manipulate systematically. The murine insulin-like growth factor 2 (Igf2)/H19 locus is a paradigmatic model of enhancer selectivity, whereby CTCF occupancy at an imprinting control region directs downstream enhancers to activate either H19 or Igf2. We used synthetic regulatory genomics to repeatedly replace the native locus with 157-kb payloads, and we systematically dissected its architecture. Enhancer deletion and ectopic delivery revealed previously uncharacterized long-range regulatory dependencies at the native locus. Exchanging the H19 enhancer cluster with the Sox2 locus control region (LCR) showed that the H19 enhancers relied on their native surroundings while the Sox2 LCR functioned autonomously. Analysis of regulatory DNA actuation across cell types revealed that these enhancer clusters typify broader classes of context sensitivity genome wide. These results show that unexpected dependencies influence even well-studied loci, and our approach permits large-scale manipulation of complete loci to investigate the relationship between regulatory architecture and function.
Collapse
Affiliation(s)
- Raquel Ordoñez
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Weimin Zhang
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Gwen Ellis
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Yinan Zhu
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Hannah J Ashe
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | | | - Ran Brosh
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Emily Huang
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Megan S Hogan
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Jef D Boeke
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA; Department of Biochemistry Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA; Department of Biomedical Engineering, NYU Tandon School of Engineering, Brooklyn, NY 11201, USA
| | - Matthew T Maurano
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA; Department of Pathology, NYU School of Medicine, New York, NY 10016, USA.
| |
Collapse
|
2
|
Ordoñez R, Zhang W, Ellis G, Zhu Y, Ashe HJ, Ribeiro-dos-Santos AM, Brosh R, Huang E, Hogan MS, Boeke JD, Maurano MT. Genomic context sensitizes regulatory elements to genetic disruption. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.02.547201. [PMID: 37781588 PMCID: PMC10541140 DOI: 10.1101/2023.07.02.547201] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
Enhancer function is frequently investigated piecemeal using truncated reporter assays or single deletion analysis. Thus it remains unclear to what extent enhancer function at native loci relies on surrounding genomic context. Using the Big-IN technology for targeted integration of large DNAs, we analyzed the regulatory architecture of the murine Igf2/H19 locus, a paradigmatic model of enhancer selectivity. We assembled payloads containing a 157-kb functional Igf2/H19 locus and engineered mutations to genetically direct CTCF occupancy at the imprinting control region (ICR) that switches the target gene of the H19 enhancer cluster. Contrasting activity of payloads delivered at the endogenous Igf2/H19 locus or ectopically at Hprt revealed that the Igf2/H19 locus includes additional, previously unknown long-range regulatory elements. Exchanging components of the Igf2/H19 locus with the well-studied Sox2 locus showed that the H19 enhancer cluster functioned poorly out of context, and required its native surroundings to activate Sox2 expression. Conversely, the Sox2 locus control region (LCR) could activate both Igf2 and H19 outside its native context, but its activity was only partially modulated by CTCF occupancy at the ICR. Analysis of regulatory DNA actuation across different cell types revealed that, while the H19 enhancers are tightly coordinated within their native locus, the Sox2 LCR acts more independently. We show that these enhancer clusters typify broader classes of loci genome-wide. Our results show that unexpected dependencies may influence even the most studied functional elements, and our synthetic regulatory genomics approach permits large-scale manipulation of complete loci to investigate the relationship between locus architecture and function.
Collapse
Affiliation(s)
- Raquel Ordoñez
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
- These authors contributed equally
| | - Weimin Zhang
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
- These authors contributed equally
| | - Gwen Ellis
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
- Present address: Department of Biology, University of Vermont, Burlington, VT 05405, USA
| | - Yinan Zhu
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Hannah J. Ashe
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
- Present address: School of Medicine, University of Maryland, Baltimore, MD 21201, USA
| | | | - Ran Brosh
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Emily Huang
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
- Present address: Highmark Health, Pittsburgh, PA 15222, USA
| | - Megan S. Hogan
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
- Present address: Neochromosome Inc., Long Island City, NY 11101, USA
| | - Jef D. Boeke
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
- Department of Biochemistry Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA
- Department of Biomedical Engineering, NYU Tandon School of Engineering, Brooklyn, NY 11201, USA
| | - Matthew T. Maurano
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
- Department of Pathology, NYU School of Medicine, New York, NY 10016, USA
- Lead contact
| |
Collapse
|
3
|
Pazos Obregón F, Silvera D, Soto P, Yankilevich P, Guerberoff G, Cantera R. Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning. Sci Rep 2022; 12:11655. [PMID: 35803984 PMCID: PMC9270439 DOI: 10.1038/s41598-022-15329-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 06/22/2022] [Indexed: 12/13/2022] Open
Abstract
The function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that a gene's function is not independent of its location, the few available examples of gene function prediction based on gene location rely on sequence identity between genes of different organisms and are thus subjected to the limitations of the relationship between sequence and function. Here we predict thousands of gene functions in five model eukaryotes (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens) using machine learning models exclusively trained with features derived from the location of genes in the genomes to which they belong. Our aim was not to obtain the best performing method to automated function prediction but to explore the extent to which a gene's location can predict its function in eukaryotes. We found that our models outperform BLAST when predicting terms from Biological Process and Cellular Component Ontologies, showing that, at least in some cases, gene location alone can be more useful than sequence to infer gene function.
Collapse
Affiliation(s)
- Flavio Pazos Obregón
- Departamento de Biología del Neurodesarrollo, Instituto de Investigaciones Biológicas Clemente Estable, Av. Italia 3318, 11600, Montevideo, Uruguay. .,Unidad de Bioquímica y Proteómica Analíticas, Instituto Pasteur de Montevideo, Montevideo, Uruguay.
| | - Diego Silvera
- Departamento de Biología del Neurodesarrollo, Instituto de Investigaciones Biológicas Clemente Estable, Av. Italia 3318, 11600, Montevideo, Uruguay
| | - Pablo Soto
- Departamento de Biología del Neurodesarrollo, Instituto de Investigaciones Biológicas Clemente Estable, Av. Italia 3318, 11600, Montevideo, Uruguay
| | - Patricio Yankilevich
- Instituto de Investigación en Biomedicina de Buenos Aires (IBioBA), CONICET-Partner Institute of the Max Planck Society, Buenos Aires, Argentina
| | - Gustavo Guerberoff
- Instituto de Matemática y Estadística "Prof. Ing. Rafael Laguardia", Facultad de Ingeniería, UDELAR, Montevideo, Uruguay
| | - Rafael Cantera
- Departamento de Biología del Neurodesarrollo, Instituto de Investigaciones Biológicas Clemente Estable, Av. Italia 3318, 11600, Montevideo, Uruguay
| |
Collapse
|
4
|
Song M, Zhong H. Efficient weighted univariate clustering maps outstanding dysregulated genomic zones in human cancers. Bioinformatics 2021; 36:5027-5036. [PMID: 32619008 PMCID: PMC7755420 DOI: 10.1093/bioinformatics/btaa613] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2019] [Revised: 05/24/2020] [Accepted: 06/26/2020] [Indexed: 12/14/2022] Open
Abstract
Motivation Chromosomal patterning of gene expression in cancer can arise from aneuploidy, genome disorganization or abnormal DNA methylation. To map such patterns, we introduce a weighted univariate clustering algorithm to guarantee linear runtime, optimality and reproducibility. Results We present the chromosome clustering method, establish its optimality and runtime and evaluate its performance. It uses dynamic programming enhanced with an algorithm to reduce search-space in-place to decrease runtime overhead. Using the method, we delineated outstanding genomic zones in 17 human cancer types. We identified strong continuity in dysregulation polarity—dominance by either up- or downregulated genes in a zone—along chromosomes in all cancer types. Significantly polarized dysregulation zones specific to cancer types are found, offering potential diagnostic biomarkers. Unreported previously, a total of 109 loci with conserved dysregulation polarity across cancer types give insights into pan-cancer mechanisms. Efficient chromosomal clustering opens a window to characterize molecular patterns in cancer genome and beyond. Availability and implementation Weighted univariate clustering algorithms are implemented within the R package ‘Ckmeans.1d.dp’ (4.0.0 or above), freely available at https://cran.r-project.org/package=Ckmeans.1d.dp. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mingzhou Song
- Department of Computer Science.,Molecular Biology Graduate Program, New Mexico State University, Las Cruces, NM 88003, USA
| | | |
Collapse
|
5
|
Quintero-Cadena P, Sternberg PW. Enhancer Sharing Promotes Neighborhoods of Transcriptional Regulation Across Eukaryotes. G3 (BETHESDA, MD.) 2016; 6:4167-4174. [PMID: 27799341 PMCID: PMC5144984 DOI: 10.1534/g3.116.036228] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Accepted: 10/15/2016] [Indexed: 01/08/2023]
Abstract
Enhancers physically interact with transcriptional promoters, looping over distances that can span multiple regulatory elements. Given that enhancer-promoter (EP) interactions generally occur via common protein complexes, it is unclear whether EP pairing is predominantly deterministic or proximity guided. Here, we present cross-organismic evidence suggesting that most EP pairs are compatible, largely determined by physical proximity rather than specific interactions. By reanalyzing transcriptome datasets, we find that the transcription of gene neighbors is correlated over distances that scale with genome size. We experimentally show that nonspecific EP interactions can explain such correlation, and that EP distance acts as a scaling factor for the transcriptional influence of an enhancer. We propose that enhancer sharing is commonplace among eukaryotes, and that EP distance is an important layer of information in gene regulation.
Collapse
Affiliation(s)
- Porfirio Quintero-Cadena
- Division of Biology and Biological Engineering, California Institute of Technology, Howard Hughes Medical Institute, Pasadena, California 91125
| | - Paul W Sternberg
- Division of Biology and Biological Engineering, California Institute of Technology, Howard Hughes Medical Institute, Pasadena, California 91125
| |
Collapse
|
6
|
Mostovoy Y, Thiemicke A, Hsu TY, Brem RB. The Role of Transcription Factors at Antisense-Expressing Gene Pairs in Yeast. Genome Biol Evol 2016; 8:1748-61. [PMID: 27190003 PMCID: PMC4943177 DOI: 10.1093/gbe/evw104] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Genes encoded close to one another on the chromosome are often coexpressed, by a mechanism and regulatory logic that remain poorly understood. We surveyed the yeast genome for tandem gene pairs oriented tail-to-head at which expression antisense to the upstream gene was conserved across species. The intergenic region at most such tandem pairs is a bidirectional promoter, shared by the downstream gene mRNA and the upstream antisense transcript. Genomic analyses of these intergenic loci revealed distinctive patterns of transcription factor regulation. Mutation of a given transcription factor verified its role as a regulator in trans of tandem gene pair loci, including the proximally initiating upstream antisense transcript and downstream mRNA and the distally initiating upstream mRNA. To investigate cis-regulatory activity at such a locus, we focused on the stress-induced NAD(P)H dehydratase YKL151C and its downstream neighbor, the metabolic enzyme GPM1. Previous work has implicated the region between these genes in regulation of GPM1 expression; our mutation experiments established its function in rich medium as a repressor in cis of the distally initiating YKL151C sense RNA, and an activator of the proximally initiating YKL151C antisense RNA. Wild-type expression of all three transcripts required the transcription factor Gcr2. Thus, at this locus, the intergenic region serves as a focal point of regulatory input, driving antisense expression and mediating the coordinated regulation of YKL151C and GPM1. Together, our findings implicate transcription factors in the joint control of neighboring genes specialized to opposing conditions and the antisense transcripts expressed between them.
Collapse
Affiliation(s)
- Yulia Mostovoy
- Department of Molecular and Cell Biology, University of California, Berkeley, California Present address: Cardiovascular Research Institute, University of California, San Francisco, CA
| | - Alexander Thiemicke
- Department of Molecular and Cell Biology, University of California, Berkeley, California Program in Molecular Medicine, Friedrich-Schiller-Universität, Jena, Germany Present address: Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN
| | - Tiffany Y Hsu
- Department of Molecular and Cell Biology, University of California, Berkeley, California Present address: Graduate Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA
| | - Rachel B Brem
- Department of Molecular and Cell Biology, University of California, Berkeley, California Present address: Buck Institute for Research on Aging, Novato, CA
| |
Collapse
|