1
|
Wang SK, Li J, Nair S, Korasaju R, Chen Y, Zhang Y, Kundaje A, Liu Y, Wang N, Chang HY. Single-cell multiome and enhancer connectome of human retinal pigment epithelium and choroid nominate pathogenic variants in age-related macular degeneration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.21.644670. [PMID: 40196652 PMCID: PMC11974679 DOI: 10.1101/2025.03.21.644670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2025]
Abstract
Age-related macular degeneration (AMD) is a leading cause of vision loss worldwide. Genome-wide association studies (GWAS) of AMD have identified dozens of risk loci that may house disease targets. However, variants at these loci are largely noncoding, making it difficult to assess their function and whether they are causal. Here, we present a single-cell gene expression and chromatin accessibility atlas of human retinal pigment epithelium (RPE) and choroid to systematically analyze both coding and noncoding variants implicated in AMD. We employ HiChIP and Activity-by-Contact modeling to map enhancers in these tissues and predict cell and gene targets of risk variants. We further perform allele-specific self-transcribing active regulatory region sequencing (STARR-seq) to functionally test variant activity in RPE cells, including in the context of complement activation. Our work nominates new pathogenic variants and mechanisms in AMD and offers a rich and accessible resource for studying diseases of the RPE and choroid.
Collapse
|
2
|
Lee D, Gunamalai L, Kannan J, Vickery K, Yaacov O, Onuchic-Whitford AC, Chakravarti A, Kapoor A. Massively parallel reporter assays identify functional enhancer variants at QT interval GWAS loci. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.11.642686. [PMID: 40161821 PMCID: PMC11952420 DOI: 10.1101/2025.03.11.642686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Genome-wide association studies (GWAS) have identified >30 loci with multiple common noncoding variants explaining interindividual electrocardiographic QT interval (QTi) variation. Of the many types of noncoding functional elements, here we sought to identify transcriptional enhancers with sequence variation and their cognate transcription factors (TFs) that alter the expression of proximal cardiac genes to affect QTi variation. We used massively parallel reporter assays (MPRA) in mouse cardiomyocyte HL-1 cells to screen for functional enhancer variants among 1,018 QTi-associated GWAS variants that overlap candidate cardiac enhancers across 31 loci. We identified 445 GWAS variant-containing enhancers of which 79 showed significant allelic difference in enhancer activity across 21 GWAS loci, with multiple enhancer variants per locus. Of these, we predicted differential binding by cardiac TFs, including AP-1, ATF-1, GATA2, MEF2, NKX2.5, SRF and TBX5 which are known to play key roles in development and homeostasis, at 49 enhancer variants. Finally, we used expression quantitative trait locus mapping and predicted promoter-enhancer contacts to identify 14 candidate target genes through analyses of 36 enhancer variants at 16 loci. This study provides strong evidence for 14 cardiac genes, 10 of them novel, impacting on QTi variation, beyond explaining observed genetic associations.
Collapse
Affiliation(s)
- Dongwon Lee
- Department of Pediatrics, Division of Nephrology, Boston Children’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
| | - Lavanya Gunamalai
- Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jeerthi Kannan
- Department of Pediatrics, Division of Nephrology, Boston Children’s Hospital, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
| | - Kyla Vickery
- Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Or Yaacov
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, NY, USA
| | - Ana C. Onuchic-Whitford
- Department of Pediatrics, Division of Nephrology, Boston Children’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
- Renal division, Brigham and Women’s Hospital, Boston, MA, USA
| | - Aravinda Chakravarti
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, NY, USA
| | - Ashish Kapoor
- Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
3
|
Ahituv N. 2024 ASHG Scientific Achievement Award. Am J Hum Genet 2025; 112:473-477. [PMID: 40054438 PMCID: PMC11993864 DOI: 10.1016/j.ajhg.2025.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2025] [Accepted: 01/03/2025] [Indexed: 04/16/2025] Open
Abstract
This article is based on the address given by the author at the 2024 meeting of The American Society of Human Genetics (ASHG) in Denver, CO. The video of the original address can be found at the ASHG website.
Collapse
Affiliation(s)
- Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
4
|
Agarwal V, Inoue F, Schubach M, Penzar D, Martin BK, Dash PM, Keukeleire P, Zhang Z, Sohota A, Zhao J, Georgakopoulos-Soares I, Noble WS, Yardımcı GG, Kulakovskiy IV, Kircher M, Shendure J, Ahituv N. Massively parallel characterization of transcriptional regulatory elements. Nature 2025; 639:411-420. [PMID: 39814889 PMCID: PMC11903340 DOI: 10.1038/s41586-024-08430-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 11/20/2024] [Indexed: 01/18/2025]
Abstract
The human genome contains millions of candidate cis-regulatory elements (cCREs) with cell-type-specific activities that shape both health and many disease states1. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these cCREs. Here we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of more than 680,000 sequences, representing an extensive set of annotated cCREs among three cell types (HepG2, K562 and WTC11), and found that 41.7% of these sequences were active. By testing sequences in both orientations, we find promoters to have strand-orientation biases and their 200-nucleotide cores to function as non-cell-type-specific 'on switches' that provide similar expression levels to their associated gene. By contrast, enhancers have weaker orientation biases, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict cCRE function and variant effects with high accuracy, delineate regulatory motifs and model their combinatorial effects. Testing a lentiMPRA library encompassing 60,000 cCREs in all three cell types further identified factors that determine cell-type specificity. Collectively, our work provides an extensive catalogue of functional CREs in three widely used cell lines and showcases how large-scale functional measurements can be used to dissect regulatory grammar.
Collapse
Affiliation(s)
- Vikram Agarwal
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- mRNA Center of Excellence, Sanofi, Waltham, MA, USA.
| | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Max Schubach
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Dmitry Penzar
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Pyaree Mohan Dash
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Pia Keukeleire
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Zicong Zhang
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Ajuni Sohota
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Jingjing Zhao
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Galip Gürkan Yardımcı
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA
- Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland, OR, USA
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
- Life Improvement by Future Technologies (LIFT) Center, Moscow, Russia
| | - Martin Kircher
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA.
- Seattle Hub for Synthetic Biology, Seattle, Washington, USA.
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
5
|
Kumari P, Friedman RZ, Pi L, Curtis SW, Paraiso K, Visel A, Rhea L, Dunnwald M, Patni AP, Mar D, Bomsztyk K, Mathieu J, Ruohola-Baker H, Leslie EJ, White MA, Cohen BA, Cornell RA. Identification of functional non-coding variants associated with orofacial cleft. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.06.01.596914. [PMID: 40027800 PMCID: PMC11870446 DOI: 10.1101/2024.06.01.596914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Oral facial cleft (OFC) is a multifactorial disorder that can present as a cleft lip with or without cleft palate (CL/P) or a cleft palate only. Genome wide association studies (GWAS) of isolated OFC have identified common single nucleotide polymorphisms (SNPs) at the 1q32/ IRF6 locus and many other loci where, like IRF6 , the presumed OFC-relevant gene is expressed in embryonic oral epithelium. To identify the functional subset of SNPs at eight such loci we conducted a massively parallel reporter assay in a cell line derived from fetal oral epithelium, revealing SNPs with allele-specific effects on enhancer activity. We filtered these against chromatin-mark evidence of enhancers in relevant cell types or tissues, and then tested a subset in traditional reporter assays, yielding six candidates for functional SNPs in five loci (1q32/ IRF6 , 3q28/ TP63 , 6p24.3/ TFAP2A , 20q12/ MAFB , and 9q22.33/ FOXE1 ). We further tested two SNPs near IRF6 and one near FOXE1 by engineering the genome of induced pluripotent stem cells, differentiating the cells into embryonic oral epithelium, and measuring expression of IRF6 or FOXE1 and binding of transcription factors; the results strongly supported their candidacy. Conditional analyses of a meta-analysis of GWAS suggest that the two functional SNPs near IRF6 account for the majority of risk for CL/P associated with variation at this locus. This study connects genetic variation associated with orofacial cleft to mechanisms of pathogenesis.
Collapse
|
6
|
Degner KN, Bell JL, Jones SD, Won H. Just a SNP away: The future of in vivo massively parallel reporter assay. CELL INSIGHT 2025; 4:100214. [PMID: 39618480 PMCID: PMC11607654 DOI: 10.1016/j.cellin.2024.100214] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 10/03/2024] [Accepted: 10/06/2024] [Indexed: 04/03/2025]
Abstract
The human genome is largely noncoding, yet the field is still grasping to understand how noncoding variants impact transcription and contribute to disease etiology. The massively parallel reporter assay (MPRA) has been employed to characterize the function of noncoding variants at unprecedented scales, but its application has been largely limited by the in vitro context. The field will benefit from establishing a systemic platform to study noncoding variant function across multiple tissue types under physiologically relevant conditions. However, to date, MPRA has been applied to only a handful of in vivo conditions. Given the complexity of the central nervous system and its widespread interactions with all other organ systems, our understanding of neuropsychiatric disorder-associated noncoding variants would be greatly advanced by studying their functional impact in the intact brain. In this review, we discuss the importance, technical considerations, and future applications of implementing MPRA in the in vivo space with the focus on neuropsychiatric disorders.
Collapse
Affiliation(s)
- Katherine N. Degner
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jessica L. Bell
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Sean D. Jones
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
7
|
Chignon A, Lettre G. Using omics data and genome editing methods to decipher GWAS loci associated with coronary artery disease. Atherosclerosis 2025; 401:118621. [PMID: 39909615 DOI: 10.1016/j.atherosclerosis.2024.118621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 09/18/2024] [Accepted: 10/03/2024] [Indexed: 02/07/2025]
Abstract
Coronary artery disease (CAD) is due to atherosclerosis, a pathophysiological process that involves several cell-types and results in the accumulation of lipid-rich plaque that disrupt the normal blood flow through the coronary arteries to the heart. Genome-wide association studies have identified 1000s of genetic variants robustly associated with CAD or its traditional risk factors (e.g. blood pressure, blood lipids, type 2 diabetes, smoking). However, gaining biological insights from these genetic discoveries remain challenging because of linkage disequilibrium and the difficulty to interpret the functions of non-coding regulatory elements in the human genome. In this review, we present different statistical methods (e.g. Mendelian randomization) and molecular datasets (e.g. expression or protein quantitative trait loci) that have helped connect CAD-associated variants with genes, biological pathways, and cell-types or tissues. We emphasize that these various strategies make predictions, which need to be validated in orthologous systems. We discuss specific examples where the integration of omics data with GWAS results has prioritized causal CAD variants and genes. Finally, we review how targeted and genome-wide genome editing experiments using the CRISPR/Cas9 toolbox have been used to characterize new CAD genes in human cells. Researchers now have the statistical and bioinformatic methods, the molecular datasets, and the experimental tools to dissect comprehensively the loci that contribute to CAD risk in humans.
Collapse
Affiliation(s)
- Arnaud Chignon
- Montreal Heart Institute, Montreal, Quebec, Canada; Faculté de Médecine, Université de Montréal, Montreal, Quebec, Canada
| | - Guillaume Lettre
- Montreal Heart Institute, Montreal, Quebec, Canada; Faculté de Médecine, Université de Montréal, Montreal, Quebec, Canada.
| |
Collapse
|
8
|
Nishino K, Kitzman JO, Parker SCJ, Tovar A. Functional dissection of metabolic trait-associated gene regulation in steady state and stimulated human skeletal muscle cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.11.28.625886. [PMID: 39677760 PMCID: PMC11642805 DOI: 10.1101/2024.11.28.625886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Type 2 diabetes (T2D) is a common metabolic disorder characterized by dysregulation of glucose metabolism. Genome-wide association studies have defined hundreds of signals associated with T2D and related metabolic traits, predominantly in noncoding regions. While pancreatic islets have been a focal point given their central role in insulin production and glucose homeostasis, other metabolic tissues, including liver, adipose, and skeletal muscle, also contribute to T2D pathogenesis and risk. Here, we examined context-specific genetic regulation under basal and stimulated states. Using LHCN-M2 human skeletal muscle cells, we generated transcriptomic profiles and characterized regulatory activity of 327 metabolic trait-associated variants via a massively parallel reporter assay (MPRA). To identify condition-specific effects, we compared four different conditions: (1) undifferentiated, or (2) differentiated with basal media, (3) media supplemented with the AMP analog AICAR (to simulate exercise) or (4) media containing sodium palmitate (to induce insulin resistance). RNA-seq revealed these treatments extensively perturbed transcriptional regulation, with 498-3,686 genes showing significant differential expression between pairs of conditions. Among differentially expressed genes, we observed enrichment of relevant biological pathways including muscle differentiation (undifferentiated vs. differentiated), oxidoreductase activity (differentiated vs. AICAR), and glycogen binding (differentiated vs. palmitate). The results of our MPRA found broadly different levels of activity between all conditions. Our MPRA screen revealed a shared set of 7 variants with significant allelic activity across all conditions, along with a proportional number of variants showing condition-specific allelic bias and the total number of active oligos per condition. We found that a lead variant for serum triglyceride levels, rs490972, overlaps SP transcription factor motifs and has differential regulatory activity between conditions. Comparison of MPRA activity with paired gene expression data allowed us to predict that regulatory activity at this locus is mediated by SP1 transcription factor binding. While several of the MPRA variants have been previously characterized in other metabolic tissues, none have been studied in these stimulated states. Together, this work uncovers context-dependent transcriptomic and regulatory dynamics of T2D- and metabolic trait-associated variants in skeletal muscle cells, offering new insights into their functional roles in metabolic processes.
Collapse
Affiliation(s)
- Kirsten Nishino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
| | - Jacob O Kitzman
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - Stephen C J Parker
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
| | - Adelaide Tovar
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
9
|
Miller D, Dziulko A, Levy S. Pooled PPIseq: Screening the SARS-CoV-2 and human interface with a scalable multiplexed protein-protein interaction assay platform. PLoS One 2025; 20:e0299440. [PMID: 39823405 PMCID: PMC11741623 DOI: 10.1371/journal.pone.0299440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Accepted: 08/25/2024] [Indexed: 01/19/2025] Open
Abstract
Protein-Protein Interactions (PPIs) are a key interface between virus and host, and these interactions are important to both viral reprogramming of the host and to host restriction of viral infection. In particular, viral-host PPI networks can be used to further our understanding of the molecular mechanisms of tissue specificity, host range, and virulence. At higher scales, viral-host PPI screening could also be used to screen for small-molecule antivirals that interfere with essential viral-host interactions, or to explore how the PPI networks between interacting viral and host genomes co-evolve. Current high-throughput PPI assays have screened entire viral-host PPI networks. However, these studies are time consuming, often require specialized equipment, and are difficult to further scale. Here, we develop methods that make larger-scale viral-host PPI screening more accessible. This approach combines the mDHFR split-tag reporter with the iSeq2 interaction-barcoding system to permit massively-multiplexed PPI quantification by simple pooled engineering of barcoded constructs, integration of these constructs into budding yeast, and fitness measurements by pooled cell competitions and barcode-sequencing. We applied this method to screen for PPIs between SARS-CoV-2 proteins and human proteins, screening in triplicate >180,000 ORF-ORF combinations represented by >1,000,000 barcoded lineages. Our results complement previous screens by identifying 74 putative PPIs, including interactions between ORF7A with the taste receptors TAS2R41 and TAS2R7, and between NSP4 with the transmembrane KDELR2 and KDELR3. We show that this PPI screening method is highly scalable, enabling larger studies aimed at generating a broad understanding of how viral effector proteins converge on cellular targets to effect replication.
Collapse
Affiliation(s)
- Darach Miller
- SLAC National Accelerator Laboratory, Stanford University, Stanford, California, United States of America
| | - Adam Dziulko
- SLAC National Accelerator Laboratory, Stanford University, Stanford, California, United States of America
- Department of Molecular, Cellular, and Developmental Biology, University of Colorado Boulder, Boulder, Colorado, United States of America
- BioFrontiers Institute, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Sasha Levy
- SLAC National Accelerator Laboratory, Stanford University, Stanford, California, United States of America
| |
Collapse
|
10
|
Chang TY, Waxman DJ. HDI-STARR-seq: Condition-specific enhancer discovery in mouse liver in vivo. BMC Genomics 2024; 25:1240. [PMID: 39716078 DOI: 10.1186/s12864-024-11162-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Accepted: 12/16/2024] [Indexed: 12/25/2024] Open
Abstract
BACKGROUND STARR-seq and other massively-parallel reporter assays are widely used to discover functional enhancers in transfected cell models, which can be confounded by plasmid vector-induced type-I interferon immune responses and lack the multicellular environment and endogenous chromatin state of complex mammalian tissues. RESULTS We describe HDI-STARR-seq, which combines STARR-seq plasmid library delivery to the liver, by hydrodynamic tail vein injection (HDI), with reporter RNA transcriptional initiation driven by a minimal Albumin promoter, which we show is essential for mouse liver STARR-seq enhancer activity assayed 7 days after HDI. Importantly, little or no vector-induced innate type-I interferon responses were observed. Comparisons of HDI-STARR-seq activity between male and female mouse livers and in livers from males treated with an activating ligand of the transcription factor (TF) CAR (Nr1i3) identified many condition-dependent enhancers linked to condition-specific gene expression. Further, thousands of active liver enhancers were identified using a high complexity STARR-seq library comprised of ~ 50,000 genomic regions released by DNase-I digestion of mouse liver nuclei. When compared to stringently inactive library sequences, the active enhancer sequences identified were highly enriched for liver open chromatin regions with activating histone marks (H3K27ac, H3K4me1, H3K4me3), were significantly closer to gene transcriptional start sites, and were significantly depleted of repressive (H3K27me3, H3K9me3) and transcribed region histone marks (H3K36me3). CONCLUSION HDI-STARR-seq offers substantial improvements over current methodologies for large scale, functional profiling of enhancers, including condition-dependent enhancers, in liver tissue in vivo, and can be adapted to characterize enhancer activities in a variety of species and tissues by selecting suitable tissue- and species-specific promoter sequences.
Collapse
Affiliation(s)
- Ting-Ya Chang
- Departments of Biology and Biomedical Engineering, and Bioinformatics Program, Boston University, 5 Cummington Mall, Boston, MA, 02215, USA
| | - David J Waxman
- Departments of Biology and Biomedical Engineering, and Bioinformatics Program, Boston University, 5 Cummington Mall, Boston, MA, 02215, USA.
| |
Collapse
|
11
|
Woo HJ, Kim J, Kim SM, Kim D, Moon JY, Park D, Lee JS. Context-dependent genomic locus effects on antibody production in recombinant Chinese hamster ovary cells generated through random integration. Comput Struct Biotechnol J 2024; 23:1654-1665. [PMID: 38680870 PMCID: PMC11046053 DOI: 10.1016/j.csbj.2024.04.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 03/30/2024] [Accepted: 04/07/2024] [Indexed: 05/01/2024] Open
Abstract
High-yield production of therapeutic protein using Chinese hamster ovary (CHO) cells requires stable cell line development (CLD). CLD typically uses random integration of transgenes; however, this results in clonal variation and subsequent laborious clone screening. Therefore, site-specific integration of a protein expression cassette into a desired chromosomal locus showing high transcriptional activity and stability, referred to as a hot spot, is emerging. Although positional effects are important for therapeutic protein expression, the sequence-specific mechanisms by which hotspots work are not well understood. In this study, we performed whole-genome sequencing (WGS) to locate randomly inserted vectors in the genome of recombinant CHO cells expressing high levels of monoclonal antibodies (mAbs) and experimentally validated these locations and vector compositions. The integration site was characterized by active histone marks and potential enhancer activities, and clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9 (Cas9) mediated indel mutations in the region upstream of the integration site led to a significant reduction in specific antibody productivity by up to 30%. Notably, the integration site and its core region did not function equivalently outside the native genomic context, showing a minimal effect on the increase in exogenous protein expression in the host cell line. We also observed a superior production capacity of the mAb expressing cell line compared to that of the host cell line. Collectively, this study demonstrates that developing recombinant CHO cell lines to produce therapeutic proteins at high levels requires a balance of factors including transgene configuration, genomic locus landscape, and host cell properties.
Collapse
Affiliation(s)
- Hyun Jee Woo
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea
| | - Jaehoon Kim
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea
- Molecular Science and Technology Research Center, Ajou University, Suwon 16499, Republic of Korea
| | - Seul Mi Kim
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea
| | - Dongwoo Kim
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea
| | - Jae Yun Moon
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea
| | - Daechan Park
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea
- Department of Biological Sciences, Ajou University, Suwon 16499, Republic of Korea
| | - Jae Seong Lee
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea
- Department of Applied Chemistry and Biological Engineering, Ajou University, Suwon 16499, Republic of Korea
| |
Collapse
|
12
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
13
|
Ren X, Zheng L, Maliskova L, Tam TW, Sun Y, Liu H, Lee J, Takagi MA, Li B, Ren B, Wang W, Shen Y. CRISPR tiling deletion screens reveal functional enhancers of neuropsychiatric risk genes and allelic compensation effects (ACE) on transcription. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.08.616922. [PMID: 39416108 PMCID: PMC11483005 DOI: 10.1101/2024.10.08.616922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Precise transcriptional regulation is critical for cellular function and development, yet the mechanism of this process remains poorly understood for many genes. To gain a deeper understanding of the regulation of neuropsychiatric disease risk genes, we identified a total of 39 functional enhancers for four dosage-sensitive genes, APP, FMR1, MECP2, and SIN3A, using CRISPR tiling deletion screening in human induced pluripotent stem cell (iPSC)-induced excitatory neurons. We found that enhancer annotation provides potential pathological insights into disease-associated copy number variants. More importantly, we discovered that allelic enhancer deletions at SIN3A could be compensated by increased transcriptional activities from the other intact allele. Such allelic compensation effects (ACE) on transcription is stably maintained during differentiation and, once established, cannot be reversed by ectopic SIN3A expression. Further, ACE at SIN3A occurs through dosage sensing by the promoter. Together, our findings unravel a regulatory compensation mechanism that ensures stable and precise transcriptional output for SIN3A, and potentially other dosage-sensitive genes.
Collapse
Affiliation(s)
- Xingjie Ren
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Lina Zheng
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
| | - Lenka Maliskova
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Tsz Wai Tam
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Yifan Sun
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Hongjiang Liu
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Jerry Lee
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Maya Asami Takagi
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Bin Li
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Bing Ren
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
- Center for Epigenomics, University of California, San Diego, La Jolla, CA, USA
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Wei Wang
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA
| | - Yin Shen
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| |
Collapse
|
14
|
Rong S, Root E, Reilly SK. Massively parallel approaches for characterizing noncoding functional variation in human evolution. Curr Opin Genet Dev 2024; 88:102256. [PMID: 39217658 PMCID: PMC11648527 DOI: 10.1016/j.gde.2024.102256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 08/02/2024] [Accepted: 08/16/2024] [Indexed: 09/04/2024]
Abstract
The genetic differences underlying unique phenotypes in humans compared to our closest primate relatives have long remained a mystery. Similarly, the genetic basis of adaptations between human groups during our expansion across the globe is poorly characterized. Uncovering the downstream phenotypic consequences of these genetic variants has been difficult, as a substantial portion lies in noncoding regions, such as cis-regulatory elements (CREs). Here, we review recent high-throughput approaches to measure the functions of CREs and the impact of variation within them. CRISPR screens can directly perturb CREs in the genome to understand downstream impacts on gene expression and phenotypes, while massively parallel reporter assays can decipher the regulatory impact of sequence variants. Machine learning has begun to be able to predict regulatory function from sequence alone, further scaling our ability to characterize genome function. Applying these tools across diverse phenotypes, model systems, and ancestries is beginning to revolutionize our understanding of noncoding variation underlying human evolution.
Collapse
Affiliation(s)
- Stephen Rong
- Department of Genetics, Yale University, New Haven, CT, USA.
| | - Elise Root
- Department of Genetics, Yale University, New Haven, CT, USA
| | - Steven K Reilly
- Department of Genetics, Yale University, New Haven, CT, USA; Wu Tsai Institute, Yale University, New Haven, CT, USA.
| |
Collapse
|
15
|
Bond ML, Quiroga-Barber IY, D’Costa S, Wu Y, Bell JL, McAfee JC, Kramer NE, Lee S, Patrucco M, Phanstiel DH, Won H. Deciphering the functional impact of Alzheimer's Disease-associated variants in resting and proinflammatory immune cells. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.09.13.24313654. [PMID: 39371155 PMCID: PMC11451667 DOI: 10.1101/2024.09.13.24313654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Genome-wide association studies have identified loci associated with Alzheimer's Disease (AD), but identifying the exact causal variants and genes at each locus is challenging due to linkage disequilibrium and their largely non-coding nature. To address this, we performed a massively parallel reporter assay of 3,576 AD-associated variants in THP-1 macrophages in both resting and proinflammatory states and identified 47 expression-modulating variants (emVars). To understand the endogenous chromatin context of emVars, we built an activity-by-contact model using epigenomic maps of macrophage inflammation and inferred condition-specific enhancer-promoter pairs. Intersection of emVars with enhancer-promoter pairs and microglia expression quantitative trait loci allowed us to connect 39 emVars to 76 putative AD risk genes enriched for AD-associated molecular signatures. Overall, systematic characterization of AD-associated variants enhances our understanding of the regulatory mechanisms underlying AD pathogenesis.
Collapse
Affiliation(s)
- Marielle L. Bond
- Curriculum in Genetics & Molecular Biology, University of North Carolina at Chapel Hill
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | | | - Susan D’Costa
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
| | - Yijia Wu
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Jessica L. Bell
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Jessica C. McAfee
- Curriculum in Genetics & Molecular Biology, University of North Carolina at Chapel Hill
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Nicole E. Kramer
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill
| | - Sool Lee
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill
| | - Mary Patrucco
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Douglas H. Phanstiel
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Department of Cell Biology & Physiology, University of North Carolina at Chapel Hill
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| |
Collapse
|
16
|
Deciphering the impact of genomic variation on function. Nature 2024; 633:47-57. [PMID: 39232149 PMCID: PMC11973978 DOI: 10.1038/s41586-024-07510-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 05/02/2024] [Indexed: 09/06/2024]
Abstract
Our genomes influence nearly every aspect of human biology-from molecular and cellular functions to phenotypes in health and disease. Studying the differences in DNA sequence between individuals (genomic variation) could reveal previously unknown mechanisms of human biology, uncover the basis of genetic predispositions to diseases, and guide the development of new diagnostic tools and therapeutic agents. Yet, understanding how genomic variation alters genome function to influence phenotype has proved challenging. To unlock these insights, we need a systematic and comprehensive catalogue of genome function and the molecular and cellular effects of genomic variants. Towards this goal, the Impact of Genomic Variation on Function (IGVF) Consortium will combine approaches in single-cell mapping, genomic perturbations and predictive modelling to investigate the relationships among genomic variation, genome function and phenotypes. IGVF will create maps across hundreds of cell types and states describing how coding variants alter protein activity, how noncoding variants change the regulation of gene expression, and how such effects connect through gene-regulatory and protein-interaction networks. These experimental data, computational predictions and accompanying standards and pipelines will be integrated into an open resource that will catalyse community efforts to explore how our genomes influence biology and disease across populations.
Collapse
|
17
|
Jiang K, Liu T, Kales S, Tewhey R, Kim D, Park Y, Jarvis JN. A systematic strategy for identifying causal single nucleotide polymorphisms and their target genes on Juvenile arthritis risk haplotypes. BMC Med Genomics 2024; 17:185. [PMID: 38997781 PMCID: PMC11241977 DOI: 10.1186/s12920-024-01954-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 06/27/2024] [Indexed: 07/14/2024] Open
Abstract
BACKGROUND Although genome-wide association studies (GWAS) have identified multiple regions conferring genetic risk for juvenile idiopathic arthritis (JIA), we are still faced with the task of identifying the single nucleotide polymorphisms (SNPs) on the disease haplotypes that exert the biological effects that confer risk. Until we identify the risk-driving variants, identifying the genes influenced by these variants, and therefore translating genetic information to improved clinical care, will remain an insurmountable task. We used a function-based approach for identifying causal variant candidates and the target genes on JIA risk haplotypes. METHODS We used a massively parallel reporter assay (MPRA) in myeloid K562 cells to query the effects of 5,226 SNPs in non-coding regions on JIA risk haplotypes for their ability to alter gene expression when compared to the common allele. The assay relies on 180 bp oligonucleotide reporters ("oligos") in which the allele of interest is flanked by its cognate genomic sequence. Barcodes were added randomly by PCR to each oligo to achieve > 20 barcodes per oligo to provide a quantitative read-out of gene expression for each allele. Assays were performed in both unstimulated K562 cells and cells stimulated overnight with interferon gamma (IFNg). As proof of concept, we then used CRISPRi to demonstrate the feasibility of identifying the genes regulated by enhancers harboring expression-altering SNPs. RESULTS We identified 553 expression-altering SNPs in unstimulated K562 cells and an additional 490 in cells stimulated with IFNg. We further filtered the SNPs to identify those plausibly situated within functional chromatin, using open chromatin and H3K27ac ChIPseq peaks in unstimulated cells and open chromatin plus H3K4me1 in stimulated cells. These procedures yielded 42 unique SNPs (total = 84) for each set. Using CRISPRi, we demonstrated that enhancers harboring MPRA-screened variants in the TRAF1 and LNPEP/ERAP2 loci regulated multiple genes, suggesting complex influences of disease-driving variants. CONCLUSION Using MPRA and CRISPRi, JIA risk haplotypes can be queried to identify plausible candidates for disease-driving variants. Once these candidate variants are identified, target genes can be identified using CRISPRi informed by the 3D chromatin structures that encompass the risk haplotypes.
Collapse
Affiliation(s)
- Kaiyu Jiang
- Department of Pediatrics, Clinical and Translational Research Center, University at Buffalo Jacobs School of Medicine School Medicine & Biomedical Sciences, 701 Ellicott St, Buffalo, NY, 14203, USA
| | - Tao Liu
- Roswell Park Cancer Institute, 665 Elm St, Buffalo, NY, 14203, USA
| | - Susan Kales
- Jackson Laboratories, 600 Main St, Bar Harbor, ME, 04609, USA
| | - Ryan Tewhey
- Jackson Laboratories, 600 Main St, Bar Harbor, ME, 04609, USA
| | - Dongkyeong Kim
- Department of Biochemistry, University at Buffalo Jacobs School of Medicine School Medicine & Biomedical Sciences, 955 Main St, Buffalo, NY, 14203, USA
| | - Yungki Park
- Department of Biochemistry, University at Buffalo Jacobs School of Medicine School Medicine & Biomedical Sciences, 955 Main St, Buffalo, NY, 14203, USA
- Genetics, Genomics, & Bioinformatics Program, University at Buffalo Jacobs School of Medicine School Medicine & Biomedical Sciences, 955 Main St, Buffalo, NY, 14203, USA
| | - James N Jarvis
- Department of Pediatrics, Clinical and Translational Research Center, University at Buffalo Jacobs School of Medicine School Medicine & Biomedical Sciences, 701 Ellicott St, Buffalo, NY, 14203, USA.
- Genetics, Genomics, & Bioinformatics Program, University at Buffalo Jacobs School of Medicine School Medicine & Biomedical Sciences, 955 Main St, Buffalo, NY, 14203, USA.
- University of Washington Rheumatology Research, 750 Republican St., E520, Seattle, WA, 98109, USA.
| |
Collapse
|
18
|
Retallick-Townsley KG, Lee S, Cartwright S, Cohen S, Sen A, Jia M, Young H, Dobbyn L, Deans M, Fernandez-Garcia M, Huckins LM, Brennand KJ. Dynamic stress- and inflammatory-based regulation of psychiatric risk loci in human neurons. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.09.602755. [PMID: 39026810 PMCID: PMC11257632 DOI: 10.1101/2024.07.09.602755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
The prenatal environment can alter neurodevelopmental and clinical trajectories, markedly increasing risk for psychiatric disorders in childhood and adolescence. To understand if and how fetal exposures to stress and inflammation exacerbate manifestation of genetic risk for complex brain disorders, we report a large-scale context-dependent massively parallel reporter assay (MPRA) in human neurons designed to catalogue genotype x environment (GxE) interactions. Across 240 genome-wide association study (GWAS) loci linked to ten brain traits/disorders, the impact of hydrocortisone, interleukin 6, and interferon alpha on transcriptional activity is empirically evaluated in human induced pluripotent stem cell (hiPSC)-derived glutamatergic neurons. Of ~3,500 candidate regulatory risk elements (CREs), 11% of variants are active at baseline, whereas cue-specific CRE regulatory activity range from a high of 23% (hydrocortisone) to a low of 6% (IL-6). Cue-specific regulatory activity is driven, at least in part, by differences in transcription factor binding activity, the gene targets of which show unique enrichments for brain disorders as well as co-morbid metabolic and immune syndromes. The dynamic nature of genetic regulation informs the influence of environmental factors, reveals a mechanism underlying pleiotropy and variable penetrance, and identifies specific risk variants that confer greater disorder susceptibility after exposure to stress or inflammation. Understanding neurodevelopmental GxE interactions will inform mental health trajectories and uncover novel targets for therapeutic intervention.
Collapse
Affiliation(s)
- Kayla G. Retallick-Townsley
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Seoyeon Lee
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
- Department of Genetics, Wu Tsai Institute, Yale University School of Medicine, New Haven, CT 06511
| | - Sam Cartwright
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Sophie Cohen
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Annabel Sen
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
- Department of Genetics, Wu Tsai Institute, Yale University School of Medicine, New Haven, CT 06511
| | - Meng Jia
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
- Department of Genetics, Wu Tsai Institute, Yale University School of Medicine, New Haven, CT 06511
| | - Hannah Young
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lee Dobbyn
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Michael Deans
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
- Department of Genetics, Wu Tsai Institute, Yale University School of Medicine, New Haven, CT 06511
| | - Meilin Fernandez-Garcia
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
- Department of Genetics, Wu Tsai Institute, Yale University School of Medicine, New Haven, CT 06511
| | - Laura M. Huckins
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
| | - Kristen J. Brennand
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
- Department of Genetics, Wu Tsai Institute, Yale University School of Medicine, New Haven, CT 06511
| |
Collapse
|
19
|
Moeckel C, Mouratidis I, Chantzi N, Uzun Y, Georgakopoulos-Soares I. Advances in computational and experimental approaches for deciphering transcriptional regulatory networks: Understanding the roles of cis-regulatory elements is essential, and recent research utilizing MPRAs, STARR-seq, CRISPR-Cas9, and machine learning has yielded valuable insights. Bioessays 2024; 46:e2300210. [PMID: 38715516 PMCID: PMC11444527 DOI: 10.1002/bies.202300210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/22/2024] [Accepted: 04/23/2024] [Indexed: 05/16/2024]
Abstract
Understanding the influence of cis-regulatory elements on gene regulation poses numerous challenges given complexities stemming from variations in transcription factor (TF) binding, chromatin accessibility, structural constraints, and cell-type differences. This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and covers construction methods ranging from expression-based approaches to supervised machine learning. Additionally, key experimental methods, including MPRAs and CRISPR-Cas9-based screening, which have significantly contributed to understanding TF binding preferences and cis-regulatory element functions, are explored. Lastly, the potential of machine learning and artificial intelligence to unravel cis-regulatory logic is analyzed. These computational advances have far-reaching implications for precision medicine, therapeutic target discovery, and the study of genetic variations in health and disease.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
20
|
Chang TY, Waxman DJ. HDI-STARR-seq: Condition-specific enhancer discovery in mouse liver in vivo. RESEARCH SQUARE 2024:rs.3.rs-4559581. [PMID: 38978599 PMCID: PMC11230509 DOI: 10.21203/rs.3.rs-4559581/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background STARR-seq and other massively-parallel reporter assays are widely used to discover functional enhancers in transfected cell models, which can be confounded by plasmid vector-induced type-I interferon immune responses and lack the multicellular environment and endogenous chromatin state of complex mammalian tissues. Results Here, we describe HDI-STARR-seq, which combines STARR-seq plasmid library delivery to the liver, by hydrodynamic tail vein injection (HDI), with reporter RNA transcriptional initiation driven by a minimal Albumin promoter, which we show is essential for mouse liver STARR-seq enhancer activity assayed 7 days after HDI. Importantly, little or no vector-induced innate type-I interferon responses were observed. Comparisons of HDI-STARR-seq activity between male and female mouse livers and in livers from males treated with an activating ligand of the transcription factor CAR (Nr1i3) identified many condition-dependent enhancers linked to condition-specific gene expression. Further, thousands of active liver enhancers were identified using a high complexity STARR-seq library comprised of ~ 50,000 genomic regions released by DNase-I digestion of mouse liver nuclei. When compared to stringently inactive library sequences, the active enhancer sequences identified were highly enriched for liver open chromatin regions with activating histone marks (H3K27ac, H3K4me1, H3K4me3), were significantly closer to gene transcriptional start sites, and were significantly depleted of repressive (H3K27me3, H3K9me3) and transcribed region histone marks (H3K36me3). Conclusions HDI-STARR-seq offers substantial improvements over current methodologies for large scale, functional profiling of enhancers, including condition-dependent enhancers, in liver tissue in vivo, and can be adapted to characterize enhancer activities in a variety of species and tissues by selecting suitable tissue- and species-specific promoter sequences.
Collapse
|
21
|
Chang TY, Waxman DJ. HDI-STARR-seq: Condition-specific enhancer discovery in mouse liver in vivo. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.10.598329. [PMID: 38915578 PMCID: PMC11195054 DOI: 10.1101/2024.06.10.598329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
STARR-seq and other massively-parallel reporter assays are widely used to discover functional enhancers in transfected cell models, which can be confounded by plasmid vector-induced type-I interferon immune responses and lack the multicellular environment and endogenous chromatin state of complex mammalian tissues. Here, we describe HDI-STARR-seq, which combines STARR-seq plasmid library delivery to the liver, by hydrodynamic tail vein injection (HDI), with reporter RNA transcriptional initiation driven by a minimal Albumin promoter, which we show is essential for mouse liver STARR-seq enhancer activity assayed 7 days after HDI. Importantly, little or no vector-induced innate type-I interferon responses were observed. Comparisons of HDI-STARR-seq activity between male and female mouse livers and in livers from males treated with an activating ligand of the transcription factor CAR (Nr1i3) identified many condition-dependent enhancers linked to condition-specific gene expression. Further, thousands of active liver enhancers were identified using a high complexity STARR-seq library comprised of ~50,000 genomic regions released by DNase-I digestion of mouse liver nuclei. When compared to stringently inactive library sequences, the active enhancer sequences identified were highly enriched for liver open chromatin regions with activating histone marks (H3K27ac, H3K4me1, H3K4me3), were significantly closer to gene transcriptional start sites, and were significantly depleted of repressive (H3K27me3, H3K9me3) and transcribed region histone marks (H3K36me3). HDI-STARR-seq offers substantial improvements over current methodologies for large scale, functional profiling of enhancers, including condition-dependent enhancers, in liver tissue in vivo, and can be adapted to characterize enhancer activities in a variety of species and tissues by selecting suitable tissue- and species-specific promoter sequences.
Collapse
Affiliation(s)
- Ting-Ya Chang
- Departments of Biology and Biomedical Engineering, and Bioinformatics program, Boston University, Boston, MA 02215
| | - David J Waxman
- Departments of Biology and Biomedical Engineering, and Bioinformatics program, Boston University, Boston, MA 02215
| |
Collapse
|
22
|
Lalanne JB, Regalado SG, Domcke S, Calderon D, Martin BK, Li X, Li T, Suiter CC, Lee C, Trapnell C, Shendure J. Multiplex profiling of developmental cis-regulatory elements with quantitative single-cell expression reporters. Nat Methods 2024; 21:983-993. [PMID: 38724692 PMCID: PMC11166576 DOI: 10.1038/s41592-024-02260-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 03/22/2024] [Indexed: 06/13/2024]
Abstract
The inability to scalably and precisely measure the activity of developmental cis-regulatory elements (CREs) in multicellular systems is a bottleneck in genomics. Here we develop a dual RNA cassette that decouples the detection and quantification tasks inherent to multiplex single-cell reporter assays. The resulting measurement of reporter expression is accurate over multiple orders of magnitude, with a precision approaching the limit set by Poisson counting noise. Together with RNA barcode stabilization via circularization, these scalable single-cell quantitative expression reporters provide high-contrast readouts, analogous to classic in situ assays but entirely from sequencing. Screening >200 regions of accessible chromatin in a multicellular in vitro model of early mammalian development, we identify 13 (8 previously uncharacterized) autonomous and cell-type-specific developmental CREs. We further demonstrate that chimeric CRE pairs generate cognate two-cell-type activity profiles and assess gain- and loss-of-function multicellular expression phenotypes from CRE variants with perturbed transcription factor binding sites. Single-cell quantitative expression reporters can be applied in developmental and multicellular systems to quantitatively characterize native, perturbed and synthetic CREs at scale, with high sensitivity and at single-cell resolution.
Collapse
Affiliation(s)
| | - Samuel G Regalado
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Silvia Domcke
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Diego Calderon
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Xiaoyi Li
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Tony Li
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Chase C Suiter
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
| |
Collapse
|
23
|
Chin IM, Gardell ZA, Corces MR. Decoding polygenic diseases: advances in noncoding variant prioritization and validation. Trends Cell Biol 2024; 34:465-483. [PMID: 38719704 DOI: 10.1016/j.tcb.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 03/12/2024] [Accepted: 03/21/2024] [Indexed: 06/09/2024]
Abstract
Genome-wide association studies (GWASs) provide a key foundation for elucidating the genetic underpinnings of common polygenic diseases. However, these studies have limitations in their ability to assign causality to particular genetic variants, especially those residing in the noncoding genome. Over the past decade, technological and methodological advances in both analytical and empirical prioritization of noncoding variants have enabled the identification of causative variants by leveraging orthogonal functional evidence at increasing scale. In this review, we present an overview of these approaches and describe how this workflow provides the groundwork necessary to move beyond associations toward genetically informed studies on the molecular and cellular mechanisms of polygenic disease.
Collapse
Affiliation(s)
- Iris M Chin
- Gladstone Institute of Neurological Disease, Gladstone Institutes, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA; Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Zachary A Gardell
- Gladstone Institute of Neurological Disease, Gladstone Institutes, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA; Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - M Ryan Corces
- Gladstone Institute of Neurological Disease, Gladstone Institutes, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA; Department of Neurology, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
24
|
Cooper S, Obolenski S, Waters AJ, Bassett AR, Coelho MA. Analyzing the functional effects of DNA variants with gene editing. CELL REPORTS METHODS 2024; 4:100776. [PMID: 38744287 PMCID: PMC11133854 DOI: 10.1016/j.crmeth.2024.100776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/01/2024] [Accepted: 04/22/2024] [Indexed: 05/16/2024]
Abstract
Continual advancements in genomics have led to an ever-widening disparity between the rate of discovery of genetic variants and our current understanding of their functions and potential roles in disease. Systematic methods for phenotyping DNA variants are required to effectively translate genomics data into improved outcomes for patients with genetic diseases. To make the biggest impact, these approaches must be scalable and accurate, faithfully reflect disease biology, and define complex disease mechanisms. We compare current methods to analyze the function of variants in their endogenous DNA context using genome editing strategies, such as saturation genome editing, base editing and prime editing. We discuss how these technologies can be linked to high-content readouts to gain deep mechanistic insights into variant effects. Finally, we highlight key challenges that need to be addressed to bridge the genotype to phenotype gap, and ultimately improve the diagnosis and treatment of genetic diseases.
Collapse
Affiliation(s)
- Sarah Cooper
- Cellular and Gene Editing Research, Wellcome Sanger Institute, Hinxton, UK
| | - Sofia Obolenski
- Experimental Cancer Genetics, Wellcome Sanger Institute, Hinxton, UK; Department of Dermatology, Leiden University Medical Center, Leiden, the Netherlands
| | - Andrew J Waters
- Experimental Cancer Genetics, Wellcome Sanger Institute, Hinxton, UK
| | - Andrew R Bassett
- Cellular and Gene Editing Research, Wellcome Sanger Institute, Hinxton, UK.
| | | |
Collapse
|
25
|
Kosicki M, Cintrón DL, Page NF, Georgakopoulos-Soares I, Akiyama JA, Plajzer-Frick I, Novak CS, Kato M, Hunter RD, von Maydell K, Barton S, Godfrey P, Beckman E, Sanders SJ, Pennacchio LA, Ahituv N. Massively parallel reporter assays and mouse transgenic assays provide complementary information about neuronal enhancer activity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.22.590634. [PMID: 38712228 PMCID: PMC11071441 DOI: 10.1101/2024.04.22.590634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Genetic studies find hundreds of thousands of noncoding variants associated with psychiatric disorders. Massively parallel reporter assays (MPRAs) and in vivo transgenic mouse assays can be used to assay the impact of these variants. However, the relevance of MPRAs to in vivo function is unknown and transgenic assays suffer from low throughput. Here, we studied the utility of combining the two assays to study the impact of non-coding variants. We carried out an MPRA on over 50,000 sequences derived from enhancers validated in transgenic mouse assays and from multiple fetal neuronal ATAC-seq datasets. We also tested over 20,000 variants, including synthetic mutations in highly active neuronal enhancers and 177 common variants associated with psychiatric disorders. Variants with a high impact on MPRA activity were further tested in mice. We found a strong and specific correlation between MPRA and mouse neuronal enhancer activity including changes in neuronal enhancer activity in mouse embryos for variants with strong MPRA effects. Mouse assays also revealed pleiotropic variant effects that could not be observed in MPRA. Our work provides a large catalog of functional neuronal enhancers and variant effects and highlights the effectiveness of combining MPRAs and mouse transgenic assays.
Collapse
Affiliation(s)
- Michael Kosicki
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Dianne Laboy Cintrón
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
| | - Nicholas F. Page
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Psychiatry and Behavioral Sciences, Kavli Institute for Fundamental Neuroscience, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Jennifer A. Akiyama
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Ingrid Plajzer-Frick
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Catherine S. Novak
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Momoe Kato
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Riana D. Hunter
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Kianna von Maydell
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Sarah Barton
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Patrick Godfrey
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Erik Beckman
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Stephan J. Sanders
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Psychiatry and Behavioral Sciences, Kavli Institute for Fundamental Neuroscience, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 16 7TY, UK
| | - Len A. Pennacchio
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
26
|
Vaknin I, Willinger O, Mandl J, Heuberger H, Ben-Ami D, Zeng Y, Goldberg S, Orenstein Y, Amit R. A universal system for boosting gene expression in eukaryotic cell-lines. Nat Commun 2024; 15:2394. [PMID: 38493141 PMCID: PMC10944472 DOI: 10.1038/s41467-024-46573-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 03/04/2024] [Indexed: 03/18/2024] Open
Abstract
We demonstrate a transcriptional regulatory design algorithm that can boost expression in yeast and mammalian cell lines. The system consists of a simplified transcriptional architecture composed of a minimal core promoter and a synthetic upstream regulatory region (sURS) composed of up to three motifs selected from a list of 41 motifs conserved in the eukaryotic lineage. The sURS system was first characterized using an oligo-library containing 189,990 variants. We validate the resultant expression model using a set of 43 unseen sURS designs. The validation sURS experiments indicate that a generic set of grammar rules for boosting and attenuation may exist in yeast cells. Finally, we demonstrate that this generic set of grammar rules functions similarly in mammalian CHO-K1 and HeLa cells. Consequently, our work provides a design algorithm for boosting the expression of promoters used for expressing industrially relevant proteins in yeast and mammalian cell lines.
Collapse
Affiliation(s)
- Inbal Vaknin
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel
| | - Or Willinger
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel
| | - Jonathan Mandl
- Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
| | - Hadar Heuberger
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Dan Ben-Ami
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Yi Zeng
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel
| | - Sarah Goldberg
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel
| | - Yaron Orenstein
- Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel
| | - Roee Amit
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel.
- The Russell Berrie Nanotechnology Institute, Technion, Haifa, Israel.
| |
Collapse
|
27
|
Xiao F, Zhang X, Morton SU, Kim SW, Fan Y, Gorham JM, Zhang H, Berkson PJ, Mazumdar N, Cao Y, Chen J, Hagen J, Liu X, Zhou P, Richter F, Shen Y, Ward T, Gelb BD, Seidman JG, Seidman CE, Pu WT. Functional dissection of human cardiac enhancers and noncoding de novo variants in congenital heart disease. Nat Genet 2024; 56:420-430. [PMID: 38378865 PMCID: PMC11218660 DOI: 10.1038/s41588-024-01669-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 01/23/2024] [Indexed: 02/22/2024]
Abstract
Rare coding mutations cause ∼45% of congenital heart disease (CHD). Noncoding mutations that perturb cis-regulatory elements (CREs) likely contribute to the remaining cases, but their identification has been problematic. Using a lentiviral massively parallel reporter assay (lentiMPRA) in human induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs), we functionally evaluated 6,590 noncoding de novo variants (ncDNVs) prioritized from the whole-genome sequencing of 750 CHD trios. A total of 403 ncDNVs substantially affected cardiac CRE activity. A majority increased enhancer activity, often at regions with undetectable reference sequence activity. Of ten DNVs tested by introduction into their native genomic context, four altered the expression of neighboring genes and iPSC-CM transcriptional state. To prioritize future DNVs for functional testing, we used the MPRA data to develop a regression model, EpiCard. Analysis of an independent CHD cohort by EpiCard found enrichment of DNVs. Together, we developed a scalable system to measure the effect of ncDNVs on CRE activity and deployed it to systematically assess the contribution of ncDNVs to CHD.
Collapse
Affiliation(s)
- Feng Xiao
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Xiaoran Zhang
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Sarah U Morton
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Division of Newborn Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Seong Won Kim
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Youfei Fan
- Department of Pediatrics, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Joshua M Gorham
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Huan Zhang
- Department of Radiation Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Paul J Berkson
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Neil Mazumdar
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Yangpo Cao
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
- Department of Pharmacology, School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Jian Chen
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Jacob Hagen
- Mindich Child Health and Development Institute and Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York City, NY, USA
| | - Xujie Liu
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Pingzhu Zhou
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Felix Richter
- Mindich Child Health and Development Institute and Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York City, NY, USA
| | - Yufeng Shen
- Departments of Systems Biology and Biomedical Informatics, Columbia University Medical Center, New York City, NY, USA
| | - Tarsha Ward
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Bruce D Gelb
- Mindich Child Health and Development Institute and Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York City, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY, USA
| | | | - Christine E Seidman
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Division of Cardiology, Brigham and Women's Hospital, Boston, MA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| | - William T Pu
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA.
- Harvard Stem Cell Institute, Cambridge, MA, USA.
| |
Collapse
|
28
|
DaSilva LF, Senan S, Patel ZM, Janardhan Reddy A, Gabbita S, Nussbaum Z, Valdez Córdova CM, Wenteler A, Weber N, Tunjic TM, Ahmad Khan T, Li Z, Smith C, Bejan M, Karmel Louis L, Cornejo P, Connell W, Wong ES, Meuleman W, Pinello L. DNA-Diffusion: Leveraging Generative Models for Controlling Chromatin Accessibility and Gene Expression via Synthetic Regulatory Elements. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.578352. [PMID: 38352499 PMCID: PMC10862870 DOI: 10.1101/2024.02.01.578352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/25/2024]
Abstract
The challenge of systematically modifying and optimizing regulatory elements for precise gene expression control is central to modern genomics and synthetic biology. Advancements in generative AI have paved the way for designing synthetic sequences with the aim of safely and accurately modulating gene expression. We leverage diffusion models to design context-specific DNA regulatory sequences, which hold significant potential toward enabling novel therapeutic applications requiring precise modulation of gene expression. Our framework uses a cell type-specific diffusion model to generate synthetic 200 bp regulatory elements based on chromatin accessibility across different cell types. We evaluate the generated sequences based on key metrics to ensure they retain properties of endogenous sequences: transcription factor binding site composition, potential for cell type-specific chromatin accessibility, and capacity for sequences generated by DNA diffusion to activate gene expression in different cell contexts using state-of-the-art prediction models. Our results demonstrate the ability to robustly generate DNA sequences with cell type-specific regulatory potential. DNA-Diffusion paves the way for revolutionizing a regulatory modulation approach to mammalian synthetic biology and precision gene therapy.
Collapse
Affiliation(s)
- Lucas Ferreira DaSilva
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
| | - Simon Senan
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Zain Munir Patel
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Aniketh Janardhan Reddy
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Sameer Gabbita
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | | | | | | | | | | | - Zelun Li
- Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
| | - Cameron Smith
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | | | - Lithin Karmel Louis
- Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
| | - Paola Cornejo
- Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
| | | | - Emily S. Wong
- Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
| | - Wouter Meuleman
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Luca Pinello
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| |
Collapse
|
29
|
Sun J, Noss S, Banerjee D, Das M, Girirajan S. Strategies for dissecting the complexity of neurodevelopmental disorders. Trends Genet 2024; 40:187-202. [PMID: 37949722 PMCID: PMC10872993 DOI: 10.1016/j.tig.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 09/20/2023] [Accepted: 10/16/2023] [Indexed: 11/12/2023]
Abstract
Neurodevelopmental disorders (NDDs) are associated with a wide range of clinical features, affecting multiple pathways involved in brain development and function. Recent advances in high-throughput sequencing have unveiled numerous genetic variants associated with NDDs, which further contribute to disease complexity and make it challenging to infer disease causation and underlying mechanisms. Herein, we review current strategies for dissecting the complexity of NDDs using model organisms, induced pluripotent stem cells, single-cell sequencing technologies, and massively parallel reporter assays. We further highlight single-cell CRISPR-based screening techniques that allow genomic investigation of cellular transcriptomes with high efficiency, accuracy, and throughput. Overall, we provide an integrated review of experimental approaches that can be applicable for investigating a broad range of complex disorders.
Collapse
Affiliation(s)
- Jiawan Sun
- Molecular, Cellular, and Integrative Biosciences Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA
| | - Serena Noss
- Molecular, Cellular, and Integrative Biosciences Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA
| | - Deepro Banerjee
- Bioinformatics and Genomics Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA
| | - Maitreya Das
- Molecular, Cellular, and Integrative Biosciences Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA
| | - Santhosh Girirajan
- Molecular, Cellular, and Integrative Biosciences Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA; Bioinformatics and Genomics Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA; Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA; Department of Anthropology, Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
30
|
Kang CK, Kim AR. Deep molecular learning of transcriptional control of a synthetic CRE enhancer and its variants. iScience 2024; 27:108747. [PMID: 38222110 PMCID: PMC10784702 DOI: 10.1016/j.isci.2023.108747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/29/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024] Open
Abstract
Massively parallel reporter assay measures transcriptional activities of various cis-regulatory modules (CRMs) in a single experiment. We developed a thermodynamic computational model framework that calculates quantitative levels of gene expression directly from regulatory DNA sequences. Using the framework, we investigated the molecular mechanisms of cis-regulatory mutations of a synthetic enhancer that cause abnormal gene expression. We found that, in a human cell line, competitive binding between family transcription factors (TFs) with slightly different binding preferences significantly increases the accuracy of recapitulating the transcriptional effects of thousands of single- or multi-mutations. We also discovered that even if various harmful mutations occurred in an activator binding site, CRM could stably maintain or even increase gene expression through a certain form of competitive binding between family TFs. These findings enhance understanding the effect of SNPs and indels on CRMs and would help building robust custom-designed CRMs for biologics production and gene therapy.
Collapse
Affiliation(s)
- Chan-Koo Kang
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- School of Applied Artificial Intelligence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| |
Collapse
|
31
|
Schubach M, Maass T, Nazaretyan L, Röner S, Kircher M. CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions. Nucleic Acids Res 2024; 52:D1143-D1154. [PMID: 38183205 PMCID: PMC10767851 DOI: 10.1093/nar/gkad989] [Citation(s) in RCA: 42] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/14/2023] [Accepted: 10/17/2023] [Indexed: 01/07/2024] Open
Abstract
Machine Learning-based scoring and classification of genetic variants aids the assessment of clinical findings and is employed to prioritize variants in diverse genetic studies and analyses. Combined Annotation-Dependent Depletion (CADD) is one of the first methods for the genome-wide prioritization of variants across different molecular functions and has been continuously developed and improved since its original publication. Here, we present our most recent release, CADD v1.7. We explored and integrated new annotation features, among them state-of-the-art protein language model scores (Meta ESM-1v), regulatory variant effect predictions (from sequence-based convolutional neural networks) and sequence conservation scores (Zoonomia). We evaluated the new version on data sets derived from ClinVar, ExAC/gnomAD and 1000 Genomes variants. For coding effects, we tested CADD on 31 Deep Mutational Scanning (DMS) data sets from ProteinGym and, for regulatory effect prediction, we used saturation mutagenesis reporter assay data of promoter and enhancer sequences. The inclusion of new features further improved the overall performance of CADD. As with previous releases, all data sets, genome-wide CADD v1.7 scores, scripts for on-site scoring and an easy-to-use webserver are readily provided via https://cadd.bihealth.org/ or https://cadd.gs.washington.edu/ to the community.
Collapse
Affiliation(s)
- Max Schubach
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Thorben Maass
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Lusiné Nazaretyan
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Sebastian Röner
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Martin Kircher
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| |
Collapse
|
32
|
Chen Y, Paramo MI, Zhang Y, Yao L, Shah SR, Jin Y, Zhang J, Pan X, Yu H. Finding Needles in the Haystack: Strategies for Uncovering Noncoding Regulatory Variants. Annu Rev Genet 2023; 57:201-222. [PMID: 37562413 DOI: 10.1146/annurev-genet-030723-120717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Despite accumulating evidence implicating noncoding variants in human diseases, unraveling their functionality remains a significant challenge. Systematic annotations of the regulatory landscape and the growth of sequence variant data sets have fueled the development of tools and methods to identify causal noncoding variants and evaluate their regulatory effects. Here, we review the latest advances in the field and discuss potential future research avenues to gain a more in-depth understanding of noncoding regulatory variants.
Collapse
Affiliation(s)
- You Chen
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Mauricio I Paramo
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Yingying Zhang
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Li Yao
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| | - Sagar R Shah
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Yiyang Jin
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Junke Zhang
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| | - Xiuqi Pan
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Haiyuan Yu
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
33
|
Zhao J, Baltoumas FA, Konnaris MA, Mouratidis I, Liu Z, Sims J, Agarwal V, Pavlopoulos GA, Georgakopoulos--Soares I, Ahituv N. MPRAbase: A Massively Parallel Reporter Assay Database. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.19.567742. [PMID: 38045264 PMCID: PMC10690217 DOI: 10.1101/2023.11.19.567742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Massively parallel reporter assays (MPRAs) represent a set of high-throughput technologies that measure the functional effects of thousands of sequences/variants on gene regulatory activity. There are several different variations of MPRA technology and they are used for numerous applications, including regulatory element discovery, variant effect measurement, saturation mutagenesis, synthetic regulatory element generation or characterization of evolutionary gene regulatory differences. Despite their many designs and uses, there is no comprehensive database that incorporates the results of these experiments. To address this, we developed MPRAbase, a manually curated database that currently harbors 129 experiments, encompassing 17,718,677 elements tested across 35 cell types and 4 organisms. The MPRAbase web interface (http://www.mprabase.com) serves as a centralized user-friendly repository to download existing MPRA data for independent analysis and is designed with the ability to allow researchers to share their published data for rapid dissemination to the community.
Collapse
Affiliation(s)
- Jingjing Zhao
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Statistics, Penn State University, State College, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Statistics, Penn State University, State College, PA, USA
| | - Zhe Liu
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Jasmine Sims
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
| | - Vikram Agarwal
- mRNA Center of Excellence, Sanofi Pasteur Inc., Waltham, MA, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - Ilias Georgakopoulos--Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Statistics, Penn State University, State College, PA, USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
34
|
Foo JL, Kitano S, Susanto AV, Jin Z, Lin Y, Luo Z, Huang L, Liang Z, Mitchell LA, Yang K, Wong A, Cai Y, Cai J, Stracquadanio G, Bader JS, Boeke JD, Dai J, Chang MW. Establishing chromosomal design-build-test-learn through a synthetic chromosome and its combinatorial reconfiguration. CELL GENOMICS 2023; 3:100435. [PMID: 38020970 PMCID: PMC10667554 DOI: 10.1016/j.xgen.2023.100435] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 08/19/2023] [Accepted: 10/06/2023] [Indexed: 12/01/2023]
Abstract
Chromosome-level design-build-test-learn cycles (chrDBTLs) allow systematic combinatorial reconfiguration of chromosomes with ease. Here, we established chrDBTL with a redesigned synthetic Saccharomyces cerevisiae chromosome XV, synXV. We designed and built synXV to harbor strategically inserted features, modified elements, and synonymously recoded genes throughout the chromosome. Based on the recoded chromosome, we developed a method to enable chrDBTL: CRISPR-Cas9-mediated mitotic recombination with endoreduplication (CRIMiRE). CRIMiRE allowed the creation of customized wild-type/synthetic combinations, accelerating genotype-phenotype mapping and synthetic chromosome redesign. We also leveraged synXV as a "build-to-learn" model organism for translation studies by ribosome profiling. We conducted a locus-to-locus comparison of ribosome occupancy between synXV and the wild-type chromosome, providing insight into the effects of codon changes and redesigned features on translation dynamics in vivo. Overall, we established synXV as a versatile reconfigurable system that advances chrDBTL for understanding biological mechanisms and engineering strains.
Collapse
Affiliation(s)
- Jee Loon Foo
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore 117456, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117456, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore
- Wilmar-NUS Corporate Laboratory (WIL@NUS), National University of Singapore, Singapore 117599, Singapore
| | - Shohei Kitano
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore 117456, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117456, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore
- Wilmar-NUS Corporate Laboratory (WIL@NUS), National University of Singapore, Singapore 117599, Singapore
| | - Adelia Vicanatalita Susanto
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore 117456, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117456, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore
- Wilmar-NUS Corporate Laboratory (WIL@NUS), National University of Singapore, Singapore 117599, Singapore
| | - Zhu Jin
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore 117456, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117456, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore
- Wilmar-NUS Corporate Laboratory (WIL@NUS), National University of Singapore, Singapore 117599, Singapore
| | - Yicong Lin
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Zhouqing Luo
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Linsen Huang
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Zhenzhen Liang
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Leslie A. Mitchell
- Institute for Systems Genetics, NYU Langone Health, New York, NY 10016, USA
| | - Kun Yang
- Department of Biomedical Engineering, NYU Tandon School of Engineering, Brooklyn, NY 11201, USA
| | - Adison Wong
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore 117456, Singapore
- Singapore Institute of Technology, 10 Dover Drive, Singapore 138683, Singapore
| | - Yizhi Cai
- Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester M1 7DN, UK
| | - Jitong Cai
- High-Throughput Biological Center and Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Giovanni Stracquadanio
- High-Throughput Biological Center and Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
- School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Joel S. Bader
- High-Throughput Biological Center and Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Jef D. Boeke
- Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016, USA
- Institute for Systems Genetics, NYU Langone Health, New York, NY 10016, USA
- Department of Biomedical Engineering, NYU Tandon School of Engineering, Brooklyn, NY 11201, USA
| | - Junbiao Dai
- CAS Key Laboratory of Quantitative Engineering Biology, Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Matthew Wook Chang
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore 117456, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117456, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore
- Wilmar-NUS Corporate Laboratory (WIL@NUS), National University of Singapore, Singapore 117599, Singapore
| |
Collapse
|
35
|
Guo MG, Reynolds DL, Ang CE, Liu Y, Zhao Y, Donohue LKH, Siprashvili Z, Yang X, Yoo Y, Mondal S, Hong A, Kain J, Meservey L, Fabo T, Elfaki I, Kellman LN, Abell NS, Pershad Y, Bayat V, Etminani P, Holodniy M, Geschwind DH, Montgomery SB, Duncan LE, Urban AE, Altman RB, Wernig M, Khavari PA. Integrative analyses highlight functional regulatory variants associated with neuropsychiatric diseases. Nat Genet 2023; 55:1876-1891. [PMID: 37857935 PMCID: PMC10859123 DOI: 10.1038/s41588-023-01533-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 09/15/2023] [Indexed: 10/21/2023]
Abstract
Noncoding variants of presumed regulatory function contribute to the heritability of neuropsychiatric disease. A total of 2,221 noncoding variants connected to risk for ten neuropsychiatric disorders, including autism spectrum disorder, attention deficit hyperactivity disorder, bipolar disorder, borderline personality disorder, major depression, generalized anxiety disorder, panic disorder, post-traumatic stress disorder, obsessive-compulsive disorder and schizophrenia, were studied in developing human neural cells. Integrating epigenomic and transcriptomic data with massively parallel reporter assays identified differentially-active single-nucleotide variants (daSNVs) in specific neural cell types. Expression-gene mapping, network analyses and chromatin looping nominated candidate disease-relevant target genes modulated by these daSNVs. Follow-up integration of daSNV gene editing with clinical cohort analyses suggested that magnesium transport dysfunction may increase neuropsychiatric disease risk and indicated that common genetic pathomechanisms may mediate specific symptoms that are shared across multiple neuropsychiatric diseases.
Collapse
Affiliation(s)
- Margaret G Guo
- Stanford Program in Biomedical Informatics, Stanford University, Stanford, CA, USA
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
| | - David L Reynolds
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
| | - Cheen E Ang
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Institute for Stem Cell Biology & Regenerative Medicine, Stanford University, Stanford, CA, USA
| | - Yingfei Liu
- Institute for Stem Cell Biology & Regenerative Medicine, Stanford University, Stanford, CA, USA
- Institute of Neurobiology, Xi'an Jiaotong University Health Science Center, Xi'an, China
| | - Yang Zhao
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
| | - Laura K H Donohue
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Zurab Siprashvili
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
| | - Xue Yang
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
- Stanford Program in Cancer Biology, Stanford University, Stanford, CA, USA
| | - Yongjin Yoo
- Institute for Stem Cell Biology & Regenerative Medicine, Stanford University, Stanford, CA, USA
| | - Smarajit Mondal
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
| | - Audrey Hong
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
| | - Jessica Kain
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Tania Fabo
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Ibtihal Elfaki
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Laura N Kellman
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
- Stanford Program in Cancer Biology, Stanford University, Stanford, CA, USA
| | - Nathan S Abell
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Yash Pershad
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | | | | | - Mark Holodniy
- Public Health Surveillance and Research, Department of Veterans Affairs, Washington, DC, USA
- Division of Infectious Disease & Geographic Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Daniel H Geschwind
- Program in Neurobehavioral Genetics, Semel Institute, UCLA, Los Angeles, CA, USA
| | - Stephen B Montgomery
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Laramie E Duncan
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
| | - Alexander E Urban
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
| | - Russ B Altman
- Stanford Program in Biomedical Informatics, Stanford University, Stanford, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Marius Wernig
- Department of Pathology, Stanford University, Stanford, CA, USA
- Institute for Stem Cell Biology & Regenerative Medicine, Stanford University, Stanford, CA, USA
| | - Paul A Khavari
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA.
- Stanford Program in Cancer Biology, Stanford University, Stanford, CA, USA.
- Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA.
| |
Collapse
|
36
|
Tovar A, Kyono Y, Nishino K, Bose M, Varshney A, Parker SCJ, Kitzman JO. Using a modular massively parallel reporter assay to discover context-specific regulatory grammars in type 2 diabetes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.08.561391. [PMID: 37873175 PMCID: PMC10592691 DOI: 10.1101/2023.10.08.561391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Recent genome-wide association studies have established that most complex disease-associated loci are found in noncoding regions where defining their function is nontrivial. In this study, we leverage a modular massively parallel reporter assay (MPRA) to uncover sequence features linked to context-specific regulatory activity. We screened enhancer activity across a panel of 198-bp fragments spanning over 10k type 2 diabetes- and metabolic trait-associated variants in the 832/13 rat insulinoma cell line, a relevant model of pancreatic beta cells. We explored these fragments' context sensitivity by comparing their activities when placed up-or downstream of a reporter gene, and in combination with either a synthetic housekeeping promoter (SCP1) or a more biologically relevant promoter corresponding to the human insulin gene ( INS ). We identified clear effects of MPRA construct design on measured fragment enhancer activity. Specifically, a subset of fragments (n = 702/11,656) displayed positional bias, evenly distributed across up- and downstream preference. A separate set of fragments exhibited promoter bias (n = 698/11,656), mostly towards the cell-specific INS promoter (73.4%). To identify sequence features associated with promoter preference, we used Lasso regression with 562 genomic annotations and discovered that fragments with INS promoter-biased activity are enriched for HNF1 motifs. HNF1 family transcription factors are key regulators of glucose metabolism disrupted in maturity onset diabetes of the young (MODY), suggesting genetic convergence between rare coding variants that cause MODY and common T2D-associated regulatory variants. We designed a follow-up MPRA containing HNF1 motif-enriched fragments and observed several instances where deletion or mutation of HNF1 motifs disrupted the INS promoter-biased enhancer activity, specifically in the beta cell model but not in a skeletal muscle cell line, another diabetes-relevant cell type. Together, our study suggests that cell-specific regulatory activity is partially influenced by enhancer-promoter compatibility and indicates that careful attention should be paid when designing MPRA libraries to capture context-specific regulatory processes at disease-associated genetic signals.
Collapse
|
37
|
Thomas HF, Buecker C. What is an enhancer? Bioessays 2023; 45:e2300044. [PMID: 37256273 PMCID: PMC11475577 DOI: 10.1002/bies.202300044] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 05/11/2023] [Accepted: 05/15/2023] [Indexed: 06/01/2023]
Abstract
Tight control of the transcription process is essential for the correct spatial and temporal gene expression pattern during development and in homeostasis. Enhancers are at the core of correct transcriptional activation. The original definition of an enhancer is straightforward: a DNA sequence that activates transcription independent of orientation and direction. Dissection of numerous enhancer loci has shown that many enhancer-like elements might not conform to the original definition, suggesting that enhancers and enhancer-like elements might use multiple different mechanisms to contribute to transcriptional activation. Here, we review methodologies to identify enhancers and enhancer-like elements and discuss pitfalls and consequences for our understanding of transcriptional regulation.
Collapse
|
38
|
Aktar A, Heit B. Role of the pioneer transcription factor GATA2 in health and disease. J Mol Med (Berl) 2023; 101:1191-1208. [PMID: 37624387 DOI: 10.1007/s00109-023-02359-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 08/04/2023] [Accepted: 08/14/2023] [Indexed: 08/26/2023]
Abstract
The transcription factor GATA2 is involved in human diseases ranging from hematopoietic disorders, to cancer, to infectious diseases. GATA2 is one of six GATA-family transcription factors that act as pioneering transcription factors which facilitate the opening of heterochromatin and the subsequent binding of other transcription factors to induce gene expression from previously inaccessible regions of the genome. Although GATA2 is essential for hematopoiesis and lymphangiogenesis, it is also expressed in other tissues such as the lung, prostate gland, gastrointestinal tract, central nervous system, placenta, fetal liver, and fetal heart. Gene or transcriptional abnormalities of GATA2 causes or predisposes patients to several diseases including the hematological cancers acute myeloid leukemia and acute lymphoblastic leukemia, the primary immunodeficiency MonoMAC syndrome, and to cancers of the lung, prostate, uterus, kidney, breast, gastric tract, and ovaries. Recent data has also linked GATA2 expression and mutations to responses to infectious diseases including SARS-CoV-2 and Pneumocystis carinii pneumonia, and to inflammatory disorders such as atherosclerosis. In this article we review the role of GATA2 in the etiology and progression of these various diseases.
Collapse
Affiliation(s)
- Amena Aktar
- Department of Microbiology and Immunology; the Western Infection, Immunity and Inflammation Centre, The University of Western Ontario, London, ON, N6A 5C1, Canada
| | - Bryan Heit
- Department of Microbiology and Immunology; the Western Infection, Immunity and Inflammation Centre, The University of Western Ontario, London, ON, N6A 5C1, Canada.
- Robarts Research Institute, London, ON, N6A 3K7, Canada.
| |
Collapse
|
39
|
Ni P, Wu S, Su Z. Underlying causes for prevalent false positives and false negatives in STARR-seq data. NAR Genom Bioinform 2023; 5:lqad085. [PMID: 37745976 PMCID: PMC10516709 DOI: 10.1093/nargab/lqad085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 08/23/2023] [Accepted: 09/12/2023] [Indexed: 09/26/2023] Open
Abstract
Self-transcribing active regulatory region sequencing (STARR-seq) and its variants have been widely used to characterize enhancers. However, it has been reported that up to 87% of STARR-seq peaks are located in repressive chromatin and are not functional in the tested cells. While some of the STARR-seq peaks in repressive chromatin might be active in other cell/tissue types, some others might be false positives. Meanwhile, many active enhancers may not be identified by the current STARR-seq methods. Although methods have been proposed to mitigate systematic errors caused by the use of plasmid vectors, the artifacts due to the intrinsic limitations of current STARR-seq methods are still prevalent and the underlying causes are not fully understood. Based on predicted cis-regulatory modules (CRMs) and non-CRMs in the human genome as well as predicted active CRMs and non-active CRMs in a few human cell lines/tissues with STARR-seq data available, we reveal prevalent false positives and false negatives in STARR-seq peaks generated by major variants of STARR-seq methods and possible underlying causes. Our results will help design strategies to improve STARR-seq methods and interpret the results.
Collapse
Affiliation(s)
- Pengyu Ni
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Siwen Wu
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
40
|
Kleinschmidt H, Xu C, Bai L. Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 2023; 132:167-189. [PMID: 37184694 PMCID: PMC10542970 DOI: 10.1007/s00412-023-00796-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/16/2023]
Abstract
Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.
Collapse
Affiliation(s)
- Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Physics, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
41
|
Guzman C, Duttke S, Zhu Y, De Arruda Saldanha C, Downes N, Benner C, Heinz S. Combining TSS-MPRA and sensitive TSS profile dissimilarity scoring to study the sequence determinants of transcription initiation. Nucleic Acids Res 2023; 51:e80. [PMID: 37403796 PMCID: PMC10450201 DOI: 10.1093/nar/gkad562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/13/2023] [Accepted: 06/20/2023] [Indexed: 07/06/2023] Open
Abstract
Cis-regulatory elements (CREs) can be classified by the shapes of their transcription start site (TSS) profiles, which are indicative of distinct regulatory mechanisms. Massively parallel reporter assays (MPRAs) are increasingly being used to study CRE regulatory mechanisms, yet the degree to which MPRAs replicate individual endogenous TSS profiles has not been determined. Here, we present a new low-input MPRA protocol (TSS-MPRA) that enables measuring TSS profiles of episomal reporters as well as after lentiviral reporter chromatinization. To sensitively compare MPRA and endogenous TSS profiles, we developed a novel dissimilarity scoring algorithm (WIP score) that outperforms the frequently used earth mover's distance on experimental data. Using TSS-MPRA and WIP scoring on 500 unique reporter inserts, we found that short (153 bp) MPRA promoter inserts replicate the endogenous TSS patterns of ∼60% of promoters. Lentiviral reporter chromatinization did not improve fidelity of TSS-MPRA initiation patterns, and increasing insert size frequently led to activation of extraneous TSS in the MPRA that are not active in vivo. We discuss the implications of our findings, which highlight important caveats when using MPRAs to study transcription mechanisms. Finally, we illustrate how TSS-MPRA and WIP scoring can provide novel insights into the impact of transcription factor motif mutations and genetic variants on TSS patterns and transcription levels.
Collapse
Affiliation(s)
- Carlos Guzman
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
- Department of Bioengineering, Graduate Program in Bioinformatics & Systems Biology, U.C. San Diego, La Jolla, CA 92093, USA
| | - Sascha Duttke
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Yixin Zhu
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Camila De Arruda Saldanha
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Nicholas L Downes
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Christopher Benner
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Sven Heinz
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| |
Collapse
|
42
|
The Impact of Genomic Variation on Function (IGVF) Consortium. ARXIV 2023:arXiv:2307.13708v1. [PMID: 37547663 PMCID: PMC10402186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Our genomes influence nearly every aspect of human biology from molecular and cellular functions to phenotypes in health and disease. Human genetics studies have now associated hundreds of thousands of differences in our DNA sequence ("genomic variation") with disease risk and other phenotypes, many of which could reveal novel mechanisms of human biology and uncover the basis of genetic predispositions to diseases, thereby guiding the development of new diagnostics and therapeutics. Yet, understanding how genomic variation alters genome function to influence phenotype has proven challenging. To unlock these insights, we need a systematic and comprehensive catalog of genome function and the molecular and cellular effects of genomic variants. Toward this goal, the Impact of Genomic Variation on Function (IGVF) Consortium will combine approaches in single-cell mapping, genomic perturbations, and predictive modeling to investigate the relationships among genomic variation, genome function, and phenotypes. Through systematic comparisons and benchmarking of experimental and computational methods, we aim to create maps across hundreds of cell types and states describing how coding variants alter protein activity, how noncoding variants change the regulation of gene expression, and how both coding and noncoding variants may connect through gene regulatory and protein interaction networks. These experimental data, computational predictions, and accompanying standards and pipelines will be integrated into an open resource that will catalyze community efforts to explore genome function and the impact of genetic variation on human biology and disease across populations.
Collapse
|
43
|
Oliveros W, Delfosse K, Lato DF, Kiriakopulos K, Mokhtaridoost M, Said A, McMurray BJ, Browning JW, Mattioli K, Meng G, Ellis J, Mital S, Melé M, Maass PG. Systematic characterization of regulatory variants of blood pressure genes. CELL GENOMICS 2023; 3:100330. [PMID: 37492106 PMCID: PMC10363820 DOI: 10.1016/j.xgen.2023.100330] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 03/29/2023] [Accepted: 04/28/2023] [Indexed: 07/27/2023]
Abstract
High blood pressure (BP) is the major risk factor for cardiovascular disease. Genome-wide association studies have identified genetic variants for BP, but functional insights into causality and related molecular mechanisms lag behind. We functionally characterize 4,608 genetic variants in linkage with 135 BP loci in vascular smooth muscle cells and cardiomyocytes by massively parallel reporter assays. High densities of regulatory variants at BP loci (i.e., ULK4, MAP4, CFDP1, PDE5A) indicate that multiple variants drive genetic association. Regulatory variants are enriched in repeats, alter cardiovascular-related transcription factor motifs, and spatially converge with genes controlling specific cardiovascular pathways. Using heuristic scoring, we define likely causal variants, and CRISPR prime editing finally determines causal variants for KCNK9, SFXN2, and PCGF6, which are candidates for developing high BP. Our systems-level approach provides a catalog of functionally relevant variants and their genomic architecture in two trait-relevant cell lines for a better understanding of BP gene regulation.
Collapse
Affiliation(s)
- Winona Oliveros
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Catalonia, Spain
| | - Kate Delfosse
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Daniella F. Lato
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Katerina Kiriakopulos
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Milad Mokhtaridoost
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Abdelrahman Said
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Brandon J. McMurray
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Jared W.L. Browning
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Kaia Mattioli
- Division of Genetics, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Guoliang Meng
- Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - James Ellis
- Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Seema Mital
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Ted Rogers Centre for Heart Research, Toronto, ON M5G 1X8, Canada
- Department of Pediatrics, The Hospital for Sick Children, University of Toronto, Toronto, ON M5G 0A4, Canada
| | - Marta Melé
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Catalonia, Spain
| | - Philipp G. Maass
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
44
|
Rummel CK, Gagliardi M, Herholt A, Ahmad R, Murek V, Weigert L, Hausruckinger A, Maidl S, Jimenez-Barron L, Trastulla L, Eder M, Rossner M, Ziller MJ. Cell type and condition specific functional annotation of schizophrenia associated non-coding genetic variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.27.545266. [PMID: 37425902 PMCID: PMC10326990 DOI: 10.1101/2023.06.27.545266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Schizophrenia (SCZ) is a highly polygenic disease and genome wide association studies have identified thousands of genetic variants that are statistically associated with this psychiatric disorder. However, our ability to translate these associations into insights on the disease mechanisms has been challenging since the causal genetic variants, their molecular function and their target genes remain largely unknown. In order to address these questions, we established a functional genomics pipeline in combination with induced pluripotent stem cell technology to functionally characterize ~35,000 non-coding genetic variants associated with schizophrenia along with their target genes. This analysis identified a set of 620 (1.7%) single nucleotide polymorphisms as functional on a molecular level in a highly cell type and condition specific fashion. These results provide a high-resolution map of functional variant-gene combinations and offer comprehensive biological insights into the developmental context and stimulation dependent molecular processes modulated by SCZ associated genetic variation.
Collapse
Affiliation(s)
- Christine K. Rummel
- Max Planck Institute of Psychiatry, Munich, Germany
- International Max Planck Research School for Translational Psychiatry (IMPRS-TP), Munich, Germany
| | - Miriam Gagliardi
- Department of Psychiatry, University of Münster, Münster, Germany
| | - Alexander Herholt
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, Munich, Germany
| | - Ruhel Ahmad
- Max Planck Institute of Psychiatry, Munich, Germany
| | | | | | | | | | - Laura Jimenez-Barron
- Max Planck Institute of Psychiatry, Munich, Germany
- International Max Planck Research School for Translational Psychiatry (IMPRS-TP), Munich, Germany
| | - Lucia Trastulla
- Department of Psychiatry, University of Münster, Münster, Germany
| | - Mathias Eder
- Max Planck Institute of Psychiatry, Munich, Germany
| | - Moritz Rossner
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, Munich, Germany
| | - Michael J. Ziller
- Max Planck Institute of Psychiatry, Munich, Germany
- Department of Psychiatry, University of Münster, Münster, Germany
- Center for Soft Nanoscience, University of Münster, Münster, Germany
| |
Collapse
|
45
|
Georgakopoulos-Soares I, Deng C, Agarwal V, Chan CSY, Zhao J, Inoue F, Ahituv N. Transcription factor binding site orientation and order are major drivers of gene regulatory activity. Nat Commun 2023; 14:2333. [PMID: 37087538 PMCID: PMC10122648 DOI: 10.1038/s41467-023-37960-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 04/06/2023] [Indexed: 04/24/2023] Open
Abstract
The gene regulatory code and grammar remain largely unknown, precluding our ability to link phenotype to genotype in regulatory sequences. Here, using a massively parallel reporter assay (MPRA) of 209,440 sequences, we examine all possible pair and triplet combinations, permutations and orientations of eighteen liver-associated transcription factor binding sites (TFBS). We find that TFBS orientation and order have a major effect on gene regulatory activity. Corroborating these results with genomic analyses, we find clear human promoter TFBS orientation biases and similar TFBS orientation and order transcriptional effects in an MPRA that tested 164,307 liver candidate regulatory elements. Additionally, by adding TFBS orientation to a model that predicts expression from sequence we improve performance by 7.7%. Collectively, our results show that TFBS orientation and order have a significant effect on gene regulatory activity and need to be considered when analyzing the functional effect of variants on the activity of these sequences.
Collapse
Affiliation(s)
- Ilias Georgakopoulos-Soares
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA.
| | - Chengyu Deng
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Vikram Agarwal
- mRNA Center of Excellence, Sanofi Pasteur Inc., Waltham, MA, USA
| | - Candace S Y Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Jingjing Zhao
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Fumitaka Inoue
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
46
|
Das M, Hossain A, Banerjee D, Praul CA, Girirajan S. Challenges and considerations for reproducibility of STARR-seq assays. Genome Res 2023; 33:479-495. [PMID: 37130797 PMCID: PMC10234304 DOI: 10.1101/gr.277204.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 03/15/2023] [Indexed: 05/04/2023]
Abstract
High-throughput methods such as RNA-seq, ChIP-seq, and ATAC-seq have well-established guidelines, commercial kits, and analysis pipelines that enable consistency and wider adoption for understanding genome function and regulation. STARR-seq, a popular assay for directly quantifying the activities of thousands of enhancer sequences simultaneously, has seen limited standardization across studies. The assay is long, with more than 250 steps, and frequent customization of the protocol and variations in bioinformatics methods raise concerns for reproducibility of STARR-seq studies. Here, we assess each step of the protocol and analysis pipelines from published sources and in-house assays, and identify critical steps and quality control (QC) checkpoints necessary for reproducibility of the assay. We also provide guidelines for experimental design, protocol scaling, customization, and analysis pipelines for better adoption of the assay. These resources will allow better optimization of STARR-seq for specific research needs, enable comparisons and integration across studies, and improve the reproducibility of results.
Collapse
Affiliation(s)
- Maitreya Das
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- Molecular and Cellular Integrative Biosciences Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Ayaan Hossain
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Deepro Banerjee
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Craig Alan Praul
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Santhosh Girirajan
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- Molecular and Cellular Integrative Biosciences Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Anthropology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
47
|
Ren N, Dai S, Ma S, Yang F. Strategies for activity analysis of single nucleotide polymorphisms associated with human diseases. Clin Genet 2023; 103:392-400. [PMID: 36527336 DOI: 10.1111/cge.14282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/10/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
Genome-wide association studies (GWAS) have identified a large number of single nucleotide polymorphism (SNP) sites associated with human diseases. In the annotation of human diseases, especially cancers, SNPs, as an important component of genetic factors, have gained increasing attention. Given that most of the SNPs are located in non-coding regions, the functional verification of these SNPs is a great challenge. The key to functional annotation for risk SNPs is to screen SNPs with regulatory activity from thousands of disease associated-SNPs. In this review, we systematically recapitulate the characteristics and functional roles of SNP sites, discuss three parallel reporter screening strategies in detail based on barcode tag classification, and recommend the common in silico strategies to help supplement the annotation of SNP sites with epigenetic activity analysis, prediction of target genes and trans-acting factors. We hope that this review will contribute to this exuberant research field by providing robust activity analysis strategies that can facilitate the translation of GWAS results into personalized diagnosis and prevention measures for human diseases.
Collapse
Affiliation(s)
- Naixia Ren
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shangkun Dai
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shumin Ma
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
| | - Fengtang Yang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| |
Collapse
|
48
|
Zhang Z, Wang X, Park S, Song H, Ming GL. Development and Application of Brain Region-Specific Organoids for Investigating Psychiatric Disorders. Biol Psychiatry 2023; 93:594-605. [PMID: 36759261 PMCID: PMC9998354 DOI: 10.1016/j.biopsych.2022.12.015] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/14/2022] [Accepted: 12/12/2022] [Indexed: 12/25/2022]
Abstract
Human society has been burdened by psychiatric disorders throughout the course of its history. The emergence and rapid advances of human brain organoid technology provide unprecedented opportunities for investigation of potential disease mechanisms and development of targeted or even personalized treatments for various psychiatric disorders. In this review, we summarize recent advances for generating organoids from human pluripotent stem cells to model distinct brain regions and diverse cell types. We also highlight recent progress, discuss limitations, and propose potential improvements in using patient-derived or genetically engineered brain region-specific organoids for investigating various psychiatric disorders.
Collapse
Affiliation(s)
- Zhijian Zhang
- Department of Neuroscience and Mahoney Institute for Neurosciences, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Xin Wang
- Department of Neuroscience and Mahoney Institute for Neurosciences, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Sean Park
- Department of Neuroscience and Mahoney Institute for Neurosciences, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Hongjun Song
- Department of Neuroscience and Mahoney Institute for Neurosciences, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania; Department of Cell and Developmental Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania; Institute for Regenerative Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania; Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Guo-Li Ming
- Department of Neuroscience and Mahoney Institute for Neurosciences, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania; Department of Cell and Developmental Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania; Institute for Regenerative Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania; Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania.
| |
Collapse
|
49
|
Zheng Y, VanDusen NJ. Massively Parallel Reporter Assays for High-Throughput In Vivo Analysis of Cis-Regulatory Elements. J Cardiovasc Dev Dis 2023; 10:jcdd10040144. [PMID: 37103023 PMCID: PMC10146671 DOI: 10.3390/jcdd10040144] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/24/2023] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
The rapid improvement of descriptive genomic technologies has fueled a dramatic increase in hypothesized connections between cardiovascular gene expression and phenotypes. However, in vivo testing of these hypotheses has predominantly been relegated to slow, expensive, and linear generation of genetically modified mice. In the study of genomic cis-regulatory elements, generation of mice featuring transgenic reporters or cis-regulatory element knockout remains the standard approach. While the data obtained is of high quality, the approach is insufficient to keep pace with candidate identification and therefore results in biases introduced during the selection of candidates for validation. However, recent advances across a range of disciplines are converging to enable functional genomic assays that can be conducted in a high-throughput manner. Here, we review one such method, massively parallel reporter assays (MPRAs), in which the activities of thousands of candidate genomic regulatory elements are simultaneously assessed via the next-generation sequencing of a barcoded reporter transcript. We discuss best practices for MPRA design and use, with a focus on practical considerations, and review how this emerging technology has been successfully deployed in vivo. Finally, we discuss how MPRAs are likely to evolve and be used in future cardiovascular research.
Collapse
|
50
|
Kliesmete Z, Wange LE, Vieth B, Esgleas M, Radmer J, Hülsmann M, Geuder J, Richter D, Ohnuki M, Götz M, Hellmann I, Enard W. Regulatory and coding sequences of TRNP1 co-evolve with brain size and cortical folding in mammals. eLife 2023; 12:e83593. [PMID: 36947129 PMCID: PMC10032658 DOI: 10.7554/elife.83593] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 03/01/2023] [Indexed: 03/23/2023] Open
Abstract
Brain size and cortical folding have increased and decreased recurrently during mammalian evolution. Identifying genetic elements whose sequence or functional properties co-evolve with these traits can provide unique information on evolutionary and developmental mechanisms. A good candidate for such a comparative approach is TRNP1, as it controls proliferation of neural progenitors in mice and ferrets. Here, we investigate the contribution of both regulatory and coding sequences of TRNP1 to brain size and cortical folding in over 30 mammals. We find that the rate of TRNP1 protein evolution (ω) significantly correlates with brain size, slightly less with cortical folding and much less with body size. This brain correlation is stronger than for >95% of random control proteins. This co-evolution is likely affecting TRNP1 activity, as we find that TRNP1 from species with larger brains and more cortical folding induce higher proliferation rates in neural stem cells. Furthermore, we compare the activity of putative cis-regulatory elements (CREs) of TRNP1 in a massively parallel reporter assay and identify one CRE that likely co-evolves with cortical folding in Old World monkeys and apes. Our analyses indicate that coding and regulatory changes that increased TRNP1 activity were positively selected either as a cause or a consequence of increases in brain size and cortical folding. They also provide an example how phylogenetic approaches can inform biological mechanisms, especially when combined with molecular phenotypes across several species.
Collapse
Affiliation(s)
- Zane Kliesmete
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-UniversitätMunichGermany
| | - Lucas Esteban Wange
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-UniversitätMunichGermany
| | - Beate Vieth
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-UniversitätMunichGermany
| | - Miriam Esgleas
- Physiological Genomics, BioMedical Center - BMC, Ludwig-Maximilians-UniversitätMunichGermany
- Institute for Stem Cell Research, Helmholtz Zentrum München, Germany Research Center for Environmental HealthMunichGermany
| | - Jessica Radmer
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-UniversitätMunichGermany
| | - Matthias Hülsmann
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-UniversitätMunichGermany
- Department of Environmental Microbiology, EawagDübendorfSwitzerland
- Department of Environmental Systems Science, ETH ZurichZurichSwitzerland
| | - Johanna Geuder
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-UniversitätMunichGermany
| | - Daniel Richter
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-UniversitätMunichGermany
| | - Mari Ohnuki
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-UniversitätMunichGermany
| | - Magdelena Götz
- Physiological Genomics, BioMedical Center - BMC, Ludwig-Maximilians-UniversitätMunichGermany
- Institute for Stem Cell Research, Helmholtz Zentrum München, Germany Research Center for Environmental HealthMunichGermany
- SYNERGY, Excellence Cluster of Systems Neurology, BioMedical Center (BMC), Ludwig-Maximilians-Universität MünchenMunichGermany
| | - Ines Hellmann
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-UniversitätMunichGermany
| | - Wolfgang Enard
- Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-UniversitätMunichGermany
| |
Collapse
|