1
|
Rong S, Root E, Reilly SK. Massively parallel approaches for characterizing noncoding functional variation in human evolution. Curr Opin Genet Dev 2024; 88:102256. [PMID: 39217658 DOI: 10.1016/j.gde.2024.102256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 08/02/2024] [Accepted: 08/16/2024] [Indexed: 09/04/2024]
Abstract
The genetic differences underlying unique phenotypes in humans compared to our closest primate relatives have long remained a mystery. Similarly, the genetic basis of adaptations between human groups during our expansion across the globe is poorly characterized. Uncovering the downstream phenotypic consequences of these genetic variants has been difficult, as a substantial portion lies in noncoding regions, such as cis-regulatory elements (CREs). Here, we review recent high-throughput approaches to measure the functions of CREs and the impact of variation within them. CRISPR screens can directly perturb CREs in the genome to understand downstream impacts on gene expression and phenotypes, while massively parallel reporter assays can decipher the regulatory impact of sequence variants. Machine learning has begun to be able to predict regulatory function from sequence alone, further scaling our ability to characterize genome function. Applying these tools across diverse phenotypes, model systems, and ancestries is beginning to revolutionize our understanding of noncoding variation underlying human evolution.
Collapse
Affiliation(s)
- Stephen Rong
- Department of Genetics, Yale University, New Haven, CT, USA.
| | - Elise Root
- Department of Genetics, Yale University, New Haven, CT, USA
| | - Steven K Reilly
- Department of Genetics, Yale University, New Haven, CT, USA; Wu Tsai Institute, Yale University, New Haven, CT, USA.
| |
Collapse
|
2
|
Fan S, Ma L, Song C, Han X, Zhong B, Lin Y. Promoter DNA methylation and transcription factor condensation are linked to transcriptional memory in mammalian cells. Cell Syst 2024; 15:808-823.e6. [PMID: 39243757 DOI: 10.1016/j.cels.2024.08.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 06/08/2024] [Accepted: 08/15/2024] [Indexed: 09/09/2024]
Abstract
The regulation of genes can be mathematically described by input-output functions that are typically assumed to be time invariant. This fundamental assumption underpins the design of synthetic gene circuits and the quantitative understanding of natural gene regulatory networks. Here, we found that this assumption is challenged in mammalian cells. We observed that a synthetic reporter gene can exhibit unexpected transcriptional memory, leading to a shift in the dose-response curve upon a second induction. Mechanistically, we investigated the cis-dependency of transcriptional memory, revealing the necessity of promoter DNA methylation in establishing memory. Furthermore, we showed that the synthetic transcription factor's effective DNA binding affinity underlies trans-dependency, which is associated with its capacity to undergo biomolecular condensation. These principles enabled modulating memory by perturbing either cis- or trans-regulation of genes. Together, our findings suggest the potential pervasiveness of transcriptional memory and implicate the need to model mammalian gene regulation with time-varying input-output functions. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Shenqi Fan
- Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China
| | - Liang Ma
- Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China.
| | - Chengzhi Song
- Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China
| | - Xu Han
- Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China
| | - Bijunyao Zhong
- Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China; School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Yihan Lin
- Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China; Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu 610213, Sichuan, China.
| |
Collapse
|
3
|
Bond ML, Quiroga-Barber IY, D’Costa S, Wu Y, Bell JL, McAfee JC, Kramer NE, Lee S, Patrucco M, Phanstiel DH, Won H. Deciphering the functional impact of Alzheimer's Disease-associated variants in resting and proinflammatory immune cells. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.09.13.24313654. [PMID: 39371155 PMCID: PMC11451667 DOI: 10.1101/2024.09.13.24313654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Genome-wide association studies have identified loci associated with Alzheimer's Disease (AD), but identifying the exact causal variants and genes at each locus is challenging due to linkage disequilibrium and their largely non-coding nature. To address this, we performed a massively parallel reporter assay of 3,576 AD-associated variants in THP-1 macrophages in both resting and proinflammatory states and identified 47 expression-modulating variants (emVars). To understand the endogenous chromatin context of emVars, we built an activity-by-contact model using epigenomic maps of macrophage inflammation and inferred condition-specific enhancer-promoter pairs. Intersection of emVars with enhancer-promoter pairs and microglia expression quantitative trait loci allowed us to connect 39 emVars to 76 putative AD risk genes enriched for AD-associated molecular signatures. Overall, systematic characterization of AD-associated variants enhances our understanding of the regulatory mechanisms underlying AD pathogenesis.
Collapse
Affiliation(s)
- Marielle L. Bond
- Curriculum in Genetics & Molecular Biology, University of North Carolina at Chapel Hill
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | | | - Susan D’Costa
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
| | - Yijia Wu
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Jessica L. Bell
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Jessica C. McAfee
- Curriculum in Genetics & Molecular Biology, University of North Carolina at Chapel Hill
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Nicole E. Kramer
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill
| | - Sool Lee
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill
| | - Mary Patrucco
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Douglas H. Phanstiel
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Department of Cell Biology & Physiology, University of North Carolina at Chapel Hill
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| |
Collapse
|
4
|
Petersen RM, Vockley CM, Lea AJ. Uncovering methylation-dependent genetic effects on regulatory element function in diverse genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.23.609412. [PMID: 39229133 PMCID: PMC11370585 DOI: 10.1101/2024.08.23.609412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
A major goal in evolutionary biology and biomedicine is to understand the complex interactions between genetic variants, the epigenome, and gene expression. However, the causal relationships between these factors remain poorly understood. mSTARR-seq, a methylation-sensitive massively parallel reporter assay, is capable of identifying methylation-dependent regulatory activity at many thousands of genomic regions simultaneously, and allows for the testing of causal relationships between DNA methylation and gene expression on a region-by-region basis. Here, we developed a multiplexed mSTARR-seq protocol to assay naturally occurring human genetic variation from 25 individuals sampled from 10 localities in Europe and Africa. We identified 6,957 regulatory elements in either the unmethylated or methylated state, and this set was enriched for enhancer and promoter annotations, as expected. The expression of 58% of these regulatory elements was modulated by methylation, which was generally associated with decreased RNA expression. Within our set of regulatory elements, we used allele-specific expression analyses to identify 8,020 sites with genetic effects on gene regulation; further, we found that 42.3% of these genetic effects varied between methylated and unmethylated states. Sites exhibiting methylation-dependent genetic effects were enriched for GWAS and EWAS annotations, implicating them in human disease. Compared to datasets that assay DNA from a single European individual, our multiplexed assay uncovers dramatically more genetic effects and methylation-dependent genetic effects, highlighting the importance of including diverse individuals in assays which aim to understand gene regulatory processes.
Collapse
|
5
|
Xu L, Liu Y. Identification, Design, and Application of Noncoding Cis-Regulatory Elements. Biomolecules 2024; 14:945. [PMID: 39199333 PMCID: PMC11352686 DOI: 10.3390/biom14080945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 07/25/2024] [Accepted: 07/30/2024] [Indexed: 09/01/2024] Open
Abstract
Cis-regulatory elements (CREs) play a pivotal role in orchestrating interactions with trans-regulatory factors such as transcription factors, RNA-binding proteins, and noncoding RNAs. These interactions are fundamental to the molecular architecture underpinning complex and diverse biological functions in living organisms, facilitating a myriad of sophisticated and dynamic processes. The rapid advancement in the identification and characterization of these regulatory elements has been marked by initiatives such as the Encyclopedia of DNA Elements (ENCODE) project, which represents a significant milestone in the field. Concurrently, the development of CRE detection technologies, exemplified by massively parallel reporter assays, has progressed at an impressive pace, providing powerful tools for CRE discovery. The exponential growth of multimodal functional genomic data has necessitated the application of advanced analytical methods. Deep learning algorithms, particularly large language models, have emerged as invaluable tools for deconstructing the intricate nucleotide sequences governing CRE function. These advancements facilitate precise predictions of CRE activity and enable the de novo design of CREs. A deeper understanding of CRE operational dynamics is crucial for harnessing their versatile regulatory properties. Such insights are instrumental in refining gene therapy techniques, enhancing the efficacy of selective breeding programs, pushing the boundaries of genetic innovation, and opening new possibilities in microbial synthetic biology.
Collapse
Affiliation(s)
- Lingna Xu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China;
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China;
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Foshan 528226, China
| |
Collapse
|
6
|
Trégouët DA, Morange PE. Next-generation sequencing strategies in venous thromboembolism: in whom and for what purpose? J Thromb Haemost 2024; 22:1826-1834. [PMID: 38641321 DOI: 10.1016/j.jtha.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/04/2024] [Accepted: 04/05/2024] [Indexed: 04/21/2024]
Abstract
This invited review follows the oral presentation "To Sequence or Not to Sequence, That Is Not the Question; But 'When, Who, Which and What For?' Is" given during the State of the Art session "Translational Genomics in Thrombosis: From OMICs to Clinics" of the International Society on Thrombosis and Haemostasis 2023 Congress. Emphasizing the power of next-generation sequencing technologies and the diverse strategies associated with DNA variant analysis, this review highlights the unresolved questions and challenges in their implementation both for the clinical diagnosis of venous thromboembolism and in translational research.
Collapse
Affiliation(s)
- David-Alexandre Trégouët
- University of Bordeaux, Institut National de la Santé et de la Recherche Médicale, Bordeaux Population Health Research Center, Unité Mixte de Recherche 1219, Bordeaux, France.
| | - Pierre-Emmanuel Morange
- Cardiovascular and Nutrition Research Center (Centre de Recherche en CardioVasculaire et Nutrition), Institut National de la Santé et de la Recherche Médicale, Institut National de Recherche pour l'agriculture, l' Alimentation et l'Environnement, Aix-Marseille University, Marseille, France
| |
Collapse
|
7
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.593772. [PMID: 38798671 PMCID: PMC11118547 DOI: 10.1101/2024.05.12.593772] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Justin B. Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
8
|
Iñiguez-Muñoz S, Llinàs-Arias P, Ensenyat-Mendez M, Bedoya-López AF, Orozco JIJ, Cortés J, Roy A, Forsberg-Nilsson K, DiNome ML, Marzese DM. Hidden secrets of the cancer genome: unlocking the impact of non-coding mutations in gene regulatory elements. Cell Mol Life Sci 2024; 81:274. [PMID: 38902506 PMCID: PMC11335195 DOI: 10.1007/s00018-024-05314-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 12/07/2023] [Accepted: 06/06/2024] [Indexed: 06/22/2024]
Abstract
Discoveries in the field of genomics have revealed that non-coding genomic regions are not merely "junk DNA", but rather comprise critical elements involved in gene expression. These gene regulatory elements (GREs) include enhancers, insulators, silencers, and gene promoters. Notably, new evidence shows how mutations within these regions substantially influence gene expression programs, especially in the context of cancer. Advances in high-throughput sequencing technologies have accelerated the identification of somatic and germline single nucleotide mutations in non-coding genomic regions. This review provides an overview of somatic and germline non-coding single nucleotide alterations affecting transcription factor binding sites in GREs, specifically involved in cancer biology. It also summarizes the technologies available for exploring GREs and the challenges associated with studying and characterizing non-coding single nucleotide mutations. Understanding the role of GRE alterations in cancer is essential for improving diagnostic and prognostic capabilities in the precision medicine era, leading to enhanced patient-centered clinical outcomes.
Collapse
Affiliation(s)
- Sandra Iñiguez-Muñoz
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Pere Llinàs-Arias
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Miquel Ensenyat-Mendez
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Andrés F Bedoya-López
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Javier I J Orozco
- Saint John's Cancer Institute, Providence Saint John's Health Center, Santa Monica, CA, USA
| | - Javier Cortés
- International Breast Cancer Center (IBCC), Pangaea Oncology, Quiron Group, 08017, Barcelona, Spain
- Medica Scientia Innovation Research SL (MEDSIR), 08018, Barcelona, Spain
- Faculty of Biomedical and Health Sciences, Department of Medicine, Universidad Europea de Madrid, 28670, Madrid, Spain
| | - Ananya Roy
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Karin Forsberg-Nilsson
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- University of Nottingham Biodiscovery Institute, Nottingham, UK
| | - Maggie L DiNome
- Department of Surgery, Duke University School of Medicine, Durham, NC, USA
| | - Diego M Marzese
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain.
- Department of Surgery, Duke University School of Medicine, Durham, NC, USA.
| |
Collapse
|
9
|
Lalanne JB, Regalado SG, Domcke S, Calderon D, Martin BK, Li X, Li T, Suiter CC, Lee C, Trapnell C, Shendure J. Multiplex profiling of developmental cis-regulatory elements with quantitative single-cell expression reporters. Nat Methods 2024; 21:983-993. [PMID: 38724692 PMCID: PMC11166576 DOI: 10.1038/s41592-024-02260-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 03/22/2024] [Indexed: 06/13/2024]
Abstract
The inability to scalably and precisely measure the activity of developmental cis-regulatory elements (CREs) in multicellular systems is a bottleneck in genomics. Here we develop a dual RNA cassette that decouples the detection and quantification tasks inherent to multiplex single-cell reporter assays. The resulting measurement of reporter expression is accurate over multiple orders of magnitude, with a precision approaching the limit set by Poisson counting noise. Together with RNA barcode stabilization via circularization, these scalable single-cell quantitative expression reporters provide high-contrast readouts, analogous to classic in situ assays but entirely from sequencing. Screening >200 regions of accessible chromatin in a multicellular in vitro model of early mammalian development, we identify 13 (8 previously uncharacterized) autonomous and cell-type-specific developmental CREs. We further demonstrate that chimeric CRE pairs generate cognate two-cell-type activity profiles and assess gain- and loss-of-function multicellular expression phenotypes from CRE variants with perturbed transcription factor binding sites. Single-cell quantitative expression reporters can be applied in developmental and multicellular systems to quantitatively characterize native, perturbed and synthetic CREs at scale, with high sensitivity and at single-cell resolution.
Collapse
Affiliation(s)
| | - Samuel G Regalado
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Silvia Domcke
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Diego Calderon
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Xiaoyi Li
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Tony Li
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Chase C Suiter
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
| |
Collapse
|
10
|
Tong Y, Sun J, Chen Y, Yi C, Wang H, Li C, Dai N, Yang G. POSoligo software for in vitro gene synthesis. Sci Rep 2024; 14:11117. [PMID: 38750104 PMCID: PMC11096389 DOI: 10.1038/s41598-024-59497-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 04/11/2024] [Indexed: 05/18/2024] Open
Abstract
Oligonucleotide synthesis is vital for molecular experiments. Bioinformatics has been employed to create various algorithmic tools for the in vitro synthesis of nucleotides. The main approach to synthesizing long-chain DNA molecules involves linking short-chain oligonucleotides through ligase chain reaction (LCR) and polymerase chain reaction (PCR). Short-chain DNA molecules have low mutation rates, while LCR requires complementary interfaces at both ends of the two nucleic acid molecules or may alter the conformation of the nucleotide chain, leading to termination of amplification. Therefore, molecular melting temperature, length, and specificity must be considered during experimental design. POSoligo is a specialized offline tool for nucleotide fragment synthesis. It optimizes the oligonucleotide length and specificity based on input single-stranded DNA, producing multiple contiguous long strands (COS) and short patch strands (POS) with complementary ends. This process ensures free 5'- and 3'-ends during oligonucleotide synthesis, preventing secondary structure formation and ensuring specific binding between COS and POS without relying on stabilizing the complementary strands based on Tm values. POSoligo was used to synthesize the linear RBD sequence of SARS-CoV-2 using only one DNA strand, several POSs for LCR ligation, and two pairs of primers for PCR amplification in a time- and cost-effective manner.
Collapse
Affiliation(s)
- Yingying Tong
- International Research Center for Biological Sciences, Ministry of Science and Technology, Shanghai Ocean University, Shanghai, 201306, China
- National Aquatic Animal Pathogen Collection Center, Shanghai Ocean University, Shanghai, 201306, China
- Aquatic Animal Genetics and Breeding Center, Shanghai Ocean University, Shanghai, 201306, China
| | - Jie Sun
- International Research Center for Biological Sciences, Ministry of Science and Technology, Shanghai Ocean University, Shanghai, 201306, China
- National Aquatic Animal Pathogen Collection Center, Shanghai Ocean University, Shanghai, 201306, China
- Aquatic Animal Genetics and Breeding Center, Shanghai Ocean University, Shanghai, 201306, China
| | - Yang Chen
- International Research Center for Biological Sciences, Ministry of Science and Technology, Shanghai Ocean University, Shanghai, 201306, China
- National Aquatic Animal Pathogen Collection Center, Shanghai Ocean University, Shanghai, 201306, China
- Aquatic Animal Genetics and Breeding Center, Shanghai Ocean University, Shanghai, 201306, China
| | - Changhua Yi
- International Research Center for Biological Sciences, Ministry of Science and Technology, Shanghai Ocean University, Shanghai, 201306, China
- National Aquatic Animal Pathogen Collection Center, Shanghai Ocean University, Shanghai, 201306, China
- Aquatic Animal Genetics and Breeding Center, Shanghai Ocean University, Shanghai, 201306, China
| | - Hua Wang
- Shanghai Telebio Biomedical Technology Co., LTD, Shanghai, China
| | - Caixin Li
- Shanghai Telebio Biomedical Technology Co., LTD, Shanghai, China
| | - Nana Dai
- Anhui Health College, Chizhou, 247100, Anhui, China
| | - Guanghua Yang
- International Research Center for Biological Sciences, Ministry of Science and Technology, Shanghai Ocean University, Shanghai, 201306, China.
- National Aquatic Animal Pathogen Collection Center, Shanghai Ocean University, Shanghai, 201306, China.
- Aquatic Animal Genetics and Breeding Center, Shanghai Ocean University, Shanghai, 201306, China.
| |
Collapse
|
11
|
Siraj L, Castro RI, Dewey H, Kales S, Nguyen TTL, Kanai M, Berenzy D, Mouri K, Wang QS, McCaw ZR, Gosai SJ, Aguet F, Cui R, Vockley CM, Lareau CA, Okada Y, Gusev A, Jones TR, Lander ES, Sabeti PC, Finucane HK, Reilly SK, Ulirsch JC, Tewhey R. Functional dissection of complex and molecular trait variants at single nucleotide resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.05.592437. [PMID: 38766054 PMCID: PMC11100724 DOI: 10.1101/2024.05.05.592437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Identifying the causal variants and mechanisms that drive complex traits and diseases remains a core problem in human genetics. The majority of these variants have individually weak effects and lie in non-coding gene-regulatory elements where we lack a complete understanding of how single nucleotide alterations modulate transcriptional processes to affect human phenotypes. To address this, we measured the activity of 221,412 trait-associated variants that had been statistically fine-mapped using a Massively Parallel Reporter Assay (MPRA) in 5 diverse cell-types. We show that MPRA is able to discriminate between likely causal variants and controls, identifying 12,025 regulatory variants with high precision. Although the effects of these variants largely agree with orthogonal measures of function, only 69% can plausibly be explained by the disruption of a known transcription factor (TF) binding motif. We dissect the mechanisms of 136 variants using saturation mutagenesis and assign impacted TFs for 91% of variants without a clear canonical mechanism. Finally, we provide evidence that epistasis is prevalent for variants in close proximity and identify multiple functional variants on the same haplotype at a small, but important, subset of trait-associated loci. Overall, our study provides a systematic functional characterization of likely causal common variants underlying complex and molecular human traits, enabling new insights into the regulatory grammar underlying disease risk.
Collapse
Affiliation(s)
- Layla Siraj
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Program in Biophysics, Harvard Graduate School of Arts and Sciences, Boston, MA, USA
- Harvard-Massachusetts Institute of Technology MD/PhD Program, Harvard Medical School, Boston, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | | | | | | | | | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA
| | | | | | - Qingbo S. Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
| | | | - Sager J. Gosai
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - François Aguet
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Ran Cui
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Caleb A. Lareau
- Program in Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| | - Thouis R. Jones
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eric S. Lander
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Pardis C. Sabeti
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Hilary K. Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - Steven K. Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Wu Tsai Institute, Yale University, New Haven, CT, USA
| | - Jacob C. Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
- Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA, USA
| |
Collapse
|
12
|
Kosicki M, Cintrón DL, Page NF, Georgakopoulos-Soares I, Akiyama JA, Plajzer-Frick I, Novak CS, Kato M, Hunter RD, von Maydell K, Barton S, Godfrey P, Beckman E, Sanders SJ, Pennacchio LA, Ahituv N. Massively parallel reporter assays and mouse transgenic assays provide complementary information about neuronal enhancer activity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.22.590634. [PMID: 38712228 PMCID: PMC11071441 DOI: 10.1101/2024.04.22.590634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Genetic studies find hundreds of thousands of noncoding variants associated with psychiatric disorders. Massively parallel reporter assays (MPRAs) and in vivo transgenic mouse assays can be used to assay the impact of these variants. However, the relevance of MPRAs to in vivo function is unknown and transgenic assays suffer from low throughput. Here, we studied the utility of combining the two assays to study the impact of non-coding variants. We carried out an MPRA on over 50,000 sequences derived from enhancers validated in transgenic mouse assays and from multiple fetal neuronal ATAC-seq datasets. We also tested over 20,000 variants, including synthetic mutations in highly active neuronal enhancers and 177 common variants associated with psychiatric disorders. Variants with a high impact on MPRA activity were further tested in mice. We found a strong and specific correlation between MPRA and mouse neuronal enhancer activity including changes in neuronal enhancer activity in mouse embryos for variants with strong MPRA effects. Mouse assays also revealed pleiotropic variant effects that could not be observed in MPRA. Our work provides a large catalog of functional neuronal enhancers and variant effects and highlights the effectiveness of combining MPRAs and mouse transgenic assays.
Collapse
Affiliation(s)
- Michael Kosicki
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Dianne Laboy Cintrón
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
| | - Nicholas F. Page
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Psychiatry and Behavioral Sciences, Kavli Institute for Fundamental Neuroscience, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Jennifer A. Akiyama
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Ingrid Plajzer-Frick
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Catherine S. Novak
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Momoe Kato
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Riana D. Hunter
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Kianna von Maydell
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Sarah Barton
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Patrick Godfrey
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Erik Beckman
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Stephan J. Sanders
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Psychiatry and Behavioral Sciences, Kavli Institute for Fundamental Neuroscience, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 16 7TY, UK
| | - Len A. Pennacchio
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
13
|
Fu Y, Kelly JA, Gopalakrishnan J, Pelikan RC, Tessneer KL, Pasula S, Grundahl K, Murphy DA, Gaffney PM. Massively parallel reporter assay confirms regulatory potential of hQTLs and reveals important variants in lupus and other autoimmune diseases. HGG ADVANCES 2024; 5:100279. [PMID: 38389303 PMCID: PMC10943488 DOI: 10.1016/j.xhgg.2024.100279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 02/15/2024] [Accepted: 02/18/2024] [Indexed: 02/24/2024] Open
Abstract
We designed a massively parallel reporter assay (MPRA) in an Epstein-Barr virus transformed B cell line to directly characterize the potential for histone post-translational modifications, i.e., histone quantitative trait loci (hQTLs), expression QTLs (eQTLs), and variants on systemic lupus erythematosus (SLE) and autoimmune (AI) disease risk haplotypes to modulate regulatory activity in an allele-dependent manner. Our study demonstrates that hQTLs, as a group, are more likely to modulate regulatory activity in an MPRA compared with other variant classes tested, including a set of eQTLs previously shown to interact with hQTLs and tested AI risk variants. In addition, we nominate 17 variants (including 11 previously unreported) as putative causal variants for SLE and another 14 for various other AI diseases, prioritizing these variants for future functional studies in primary and immortalized B cells. Thus, we uncover important insights into the mechanistic relationships among genotype, epigenetics, and gene expression in SLE and AI disease phenotypes.
Collapse
Affiliation(s)
- Yao Fu
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA
| | - Jennifer A Kelly
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA
| | - Jaanam Gopalakrishnan
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA; Neuro-Immune Regulome Unit, National Eye Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Richard C Pelikan
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA
| | - Kandice L Tessneer
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA
| | - Satish Pasula
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA
| | - Kiely Grundahl
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA
| | - David A Murphy
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA
| | - Patrick M Gaffney
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA.
| |
Collapse
|
14
|
Liu J, Ashuach T, Inoue F, Ahituv N, Yosef N, Kreimer A. Optimizing sequence design strategies for perturbation MPRAs: a computational evaluation framework. Nucleic Acids Res 2024; 52:1613-1627. [PMID: 38296821 PMCID: PMC10939410 DOI: 10.1093/nar/gkae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 12/26/2023] [Accepted: 01/12/2024] [Indexed: 02/02/2024] Open
Abstract
The advent of perturbation-based massively parallel reporter assays (MPRAs) technique has facilitated the delineation of the roles of non-coding regulatory elements in orchestrating gene expression. However, computational efforts remain scant to evaluate and establish guidelines for sequence design strategies for perturbation MPRAs. In this study, we propose a framework for evaluating and comparing various perturbation strategies for MPRA experiments. Within this framework, we benchmark three different perturbation approaches from the perspectives of alteration in motif-based profiles, consistency of MPRA outputs, and robustness of models that predict the activities of putative regulatory motifs. While our analyses show very similar results across multiple benchmarking metrics, the predictive modeling for the approach involving random nucleotide shuffling shows significant robustness compared with the other two approaches. Thus, we recommend designing sequences by randomly shuffling the nucleotides of the perturbed site in perturbation-MPRA, followed by a coherence check to prevent the introduction of other variations of the target motifs. In summary, our evaluation framework and the benchmarking findings create a resource of computational pipelines and highlight the potential of perturbation-MPRA in predicting non-coding regulatory activities.
Collapse
Affiliation(s)
- Jiayi Liu
- Graduate Program in Cell & Developmental Biology, Rutgers, The State University of New Jersey, 604 Allison Rd, Piscataway, NJ 08854, USA
- Department of Biochemistry and Molecular Biology, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, NJ 08854, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, NJ 08854, USA
| | - Tal Ashuach
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, 387 Soda Hall, Berkeley, CA 94720, USA
| | - Fumitaka Inoue
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Faculty of Medicine Building B, Yoshidatachibanacho, Sakyo Ward, Kyoto 606-8303, Japan
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, 1700 4th Street, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California, 513 Parnassus Ave, San Francisco, CA 94143, USA
| | - Nir Yosef
- Department of Systems Immunology, Weizmann Institute of Science, 234 Herzl Street, Rehovot 7610001, Israel
- Chan-Zuckerberg Biohub, 499 Illinois St, San Francisco, CA 94158, USA
- Department of Systems Immunology, Ragon Institute of MGH, MIT, and Harvard Institute of Science, 400 Technology Square, Cambridge, MA 02139, USA
| | - Anat Kreimer
- Department of Biochemistry and Molecular Biology, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, NJ 08854, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, NJ 08854, USA
| |
Collapse
|
15
|
Kwak IY, Kim BC, Lee J, Kang T, Garry DJ, Zhang J, Gong W. Proformer: a hybrid macaron transformer model predicts expression values from promoter sequences. BMC Bioinformatics 2024; 25:81. [PMID: 38378442 PMCID: PMC10877777 DOI: 10.1186/s12859-024-05645-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 01/08/2024] [Indexed: 02/22/2024] Open
Abstract
The breakthrough high-throughput measurement of the cis-regulatory activity of millions of randomly generated promoters provides an unprecedented opportunity to systematically decode the cis-regulatory logic that determines the expression values. We developed an end-to-end transformer encoder architecture named Proformer to predict the expression values from DNA sequences. Proformer used a Macaron-like Transformer encoder architecture, where two half-step feed forward (FFN) layers were placed at the beginning and the end of each encoder block, and a separable 1D convolution layer was inserted after the first FFN layer and in front of the multi-head attention layer. The sliding k-mers from one-hot encoded sequences were mapped onto a continuous embedding, combined with the learned positional embedding and strand embedding (forward strand vs. reverse complemented strand) as the sequence input. Moreover, Proformer introduced multiple expression heads with mask filling to prevent the transformer models from collapsing when training on relatively small amount of data. We empirically determined that this design had significantly better performance than the conventional design such as using the global pooling layer as the output layer for the regression task. These analyses support the notion that Proformer provides a novel method of learning and enhances our understanding of how cis-regulatory sequences determine the expression values.
Collapse
Affiliation(s)
- Il-Youp Kwak
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Byeong-Chan Kim
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Juhyun Lee
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Taein Kang
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Daniel J Garry
- Cardiovascular Division, Department of Medicine, Lillehei Heart Institute, University of Minnesota, 2231 6th St SE, Minneapolis, MN, 55455, USA.
- Stem Cell Institute, University of Minnesota, Minneapolis, MN, 55455, USA.
- Paul and Sheila Wellstone Muscular Dystrophy Center, University of Minnesota, Minneapolis, MN, 55455, USA.
| | - Jianyi Zhang
- Department of Biomedical Engineering, The University of Alabama at Birmingham, Birmingham, AL, 35233, USA
| | - Wuming Gong
- Cardiovascular Division, Department of Medicine, Lillehei Heart Institute, University of Minnesota, 2231 6th St SE, Minneapolis, MN, 55455, USA.
| |
Collapse
|
16
|
Boye C, Kalita CA, Findley AS, Alazizi A, Wei J, Wen X, Pique-Regi R, Luca F. Characterization of caffeine response regulatory variants in vascular endothelial cells. eLife 2024; 13:e85235. [PMID: 38334359 PMCID: PMC10901511 DOI: 10.7554/elife.85235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 02/08/2024] [Indexed: 02/10/2024] Open
Abstract
Genetic variants in gene regulatory sequences can modify gene expression and mediate the molecular response to environmental stimuli. In addition, genotype-environment interactions (GxE) contribute to complex traits such as cardiovascular disease. Caffeine is the most widely consumed stimulant and is known to produce a vascular response. To investigate GxE for caffeine, we treated vascular endothelial cells with caffeine and used a massively parallel reporter assay to measure allelic effects on gene regulation for over 43,000 genetic variants. We identified 665 variants with allelic effects on gene regulation and 6 variants that regulate the gene expression response to caffeine (GxE, false discovery rate [FDR] < 5%). When overlapping our GxE results with expression quantitative trait loci colocalized with coronary artery disease and hypertension, we dissected their regulatory mechanisms and showed a modulatory role for caffeine. Our results demonstrate that massively parallel reporter assay is a powerful approach to identify and molecularly characterize GxE in the specific context of caffeine consumption.
Collapse
Affiliation(s)
- Carly Boye
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
| | - Cynthia A Kalita
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
| | - Anthony S Findley
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
| | - Adnan Alazizi
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
| | - Julong Wei
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
| | - Xiaoquan Wen
- Department of Biostatistics, University of MichiganAnn ArborUnited States
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
- Department of Obstetrics and Gynecology, Wayne State UniversityDetroitUnited States
| | - Francesca Luca
- Center for Molecular Medicine and Genetics, Wayne State UniversityDetroitUnited States
- Department of Obstetrics and Gynecology, Wayne State UniversityDetroitUnited States
- Department of Biology, University of Rome Tor VergataRomeItaly
| |
Collapse
|
17
|
Taskiran II, Spanier KI, Dickmänken H, Kempynck N, Pančíková A, Ekşi EC, Hulselmans G, Ismail JN, Theunis K, Vandepoel R, Christiaens V, Mauduit D, Aerts S. Cell-type-directed design of synthetic enhancers. Nature 2024; 626:212-220. [PMID: 38086419 PMCID: PMC10830415 DOI: 10.1038/s41586-023-06936-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 12/05/2023] [Indexed: 01/19/2024]
Abstract
Transcriptional enhancers act as docking stations for combinations of transcription factors and thereby regulate spatiotemporal activation of their target genes1. It has been a long-standing goal in the field to decode the regulatory logic of an enhancer and to understand the details of how spatiotemporal gene expression is encoded in an enhancer sequence. Here we show that deep learning models2-6, can be used to efficiently design synthetic, cell-type-specific enhancers, starting from random sequences, and that this optimization process allows detailed tracing of enhancer features at single-nucleotide resolution. We evaluate the function of fully synthetic enhancers to specifically target Kenyon cells or glial cells in the fruit fly brain using transgenic animals. We further exploit enhancer design to create 'dual-code' enhancers that target two cell types and minimal enhancers smaller than 50 base pairs that are fully functional. By examining the state space searches towards local optima, we characterize enhancer codes through the strength, combination and arrangement of transcription factor activator and transcription factor repressor motifs. Finally, we apply the same strategies to successfully design human enhancers, which adhere to enhancer rules similar to those of Drosophila enhancers. Enhancer design guided by deep learning leads to better understanding of how enhancers work and shows that their code can be exploited to manipulate cell states.
Collapse
Affiliation(s)
- Ibrahim I Taskiran
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Katina I Spanier
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Hannah Dickmänken
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Niklas Kempynck
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Alexandra Pančíková
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- VIB-KULeuven Center for Cancer Biology, Leuven, Belgium
| | - Eren Can Ekşi
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Gert Hulselmans
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Joy N Ismail
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- UK Dementia Research Institute at Imperial College London, London, UK
| | - Koen Theunis
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Roel Vandepoel
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Valerie Christiaens
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - David Mauduit
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Stein Aerts
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium.
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium.
- Department of Human Genetics, KU Leuven, Leuven, Belgium.
| |
Collapse
|
18
|
Zhu X, Huang Q, Huang L, Luo J, Li Q, Kong D, Deng B, Gu Y, Wang X, Li C, Kong S, Zhang Y. MAE-seq refines regulatory elements across the genome. Nucleic Acids Res 2024; 52:e9. [PMID: 38038259 PMCID: PMC10810209 DOI: 10.1093/nar/gkad1129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 10/23/2023] [Accepted: 11/10/2023] [Indexed: 12/02/2023] Open
Abstract
Proper cell fate determination relies on precise spatial and temporal genome-wide cooperation between regulatory elements (REs) and their targeted genes. However, the lengths of REs defined using different methods vary, which indicates that there is sequence redundancy and that the context of the genome may be unintelligible. We developed a method called MAE-seq (Massive Active Enhancers by Sequencing) to experimentally identify functional REs at a 25-bp scale. In this study, MAE-seq was used to identify 626879, 541617 and 554826 25-bp enhancers in mouse embryonic stem cells (mESCs), C2C12 and HEK 293T, respectively. Using ∼1.6 trillion 25 bp DNA fragments and screening 12 billion cells, we identified 626879 as active enhancers in mESCs as an example. Comparative analysis revealed that most of the histone modification datasets were annotated by MAE-Seq loci. Furthermore, 33.85% (212195) of the identified enhancers were identified as de novo ones with no epigenetic modification. Intriguingly, distinct chromatin states dictate the requirement for dissimilar cofactors in governing novel and known enhancers. Validation results show that these 25-bp sequences could act as a functional unit, which shows identical or similar expression patterns as the previously defined larger elements, Enhanced resolution facilitated the identification of numerous cell-specific enhancers and their accurate annotation as super enhancers. Moreover, we characterized novel elements capable of augmenting gene activity. By integrating with high-resolution Hi-C data, over 55.64% of novel elements may have a distal association with different targeted genes. For example, we found that the Cdh1 gene interacts with one novel and two known REs in mESCs. The biological effects of these interactions were investigated using CRISPR-Cas9, revealing their role in coordinating Cdh1 gene expression and mESC proliferation. Our study presents an experimental approach to refine the REs at 25-bp resolution, advancing the precision of genome annotation and unveiling the underlying genome context. This novel approach not only advances our understanding of gene regulation but also opens avenues for comprehensive exploration of the genomic landscape.
Collapse
Affiliation(s)
- Xiusheng Zhu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qitong Huang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- Department of animal sciences, Wageningen University & Research, Wageningen, 6708PB, Netherlands
| | - Lei Huang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Jing Luo
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qing Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Dashuai Kong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Biao Deng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yi Gu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xueyan Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Chenying Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Siyuan Kong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yubo Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- Kunpeng Institute of Modern Agriculture at Foshan, Foshan, 528225, China
| |
Collapse
|
19
|
Brown EA, Kales S, Boyle MJ, Vitti J, Kotliar D, Schaffner S, Tewhey R, Sabeti PC. Three linked variants have opposing regulatory effects on isovaleryl-CoA dehydrogenase gene expression. Hum Mol Genet 2024; 33:270-283. [PMID: 37930192 PMCID: PMC10800014 DOI: 10.1093/hmg/ddad177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 10/03/2023] [Accepted: 10/09/2023] [Indexed: 11/07/2023] Open
Abstract
While genome-wide association studies (GWAS) and positive selection scans identify genomic loci driving human phenotypic diversity, functional validation is required to discover the variant(s) responsible. We dissected the IVD gene locus-which encodes the isovaleryl-CoA dehydrogenase enzyme-implicated by selection statistics, multiple GWAS, and clinical genetics as important to function and fitness. We combined luciferase assays, CRISPR/Cas9 genome-editing, massively parallel reporter assays (MPRA), and a deletion tiling MPRA strategy across regulatory loci. We identified three regulatory variants, including an indel, that may underpin GWAS signals for pulmonary fibrosis and testosterone, and that are linked on a positively selected haplotype in the Japanese population. These regulatory variants exhibit synergistic and opposing effects on IVD expression experimentally. Alleles at these variants lie on a haplotype tagged by the variant most strongly associated with IVD expression and metabolites, but with no functional evidence itself. This work demonstrates how comprehensive functional investigation and multiple technologies are needed to discover the true genetic drivers of phenotypic diversity.
Collapse
Affiliation(s)
- Elizabeth A Brown
- The Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, United States
- Broad Institute of MIT and Harvard, 75 Ames Street, Cambridge, MA 02142, United States
| | - Susan Kales
- The Jackson Laboratory, 600 Main St, Bar Harbor, ME 04609, United States
| | - Michael James Boyle
- The Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, United States
| | - Joseph Vitti
- The Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, United States
- Broad Institute of MIT and Harvard, 75 Ames Street, Cambridge, MA 02142, United States
| | - Dylan Kotliar
- The Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, United States
- Broad Institute of MIT and Harvard, 75 Ames Street, Cambridge, MA 02142, United States
| | - Steve Schaffner
- Broad Institute of MIT and Harvard, 75 Ames Street, Cambridge, MA 02142, United States
| | - Ryan Tewhey
- The Jackson Laboratory, 600 Main St, Bar Harbor, ME 04609, United States
| | - Pardis C Sabeti
- The Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, United States
- Broad Institute of MIT and Harvard, 75 Ames Street, Cambridge, MA 02142, United States
- Howard Hughes Medical Institute, Harvard University, 26 Oxford Street, Cambridge, MA 02138, United States
| |
Collapse
|
20
|
Kang CK, Kim AR. Deep molecular learning of transcriptional control of a synthetic CRE enhancer and its variants. iScience 2024; 27:108747. [PMID: 38222110 PMCID: PMC10784702 DOI: 10.1016/j.isci.2023.108747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/29/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024] Open
Abstract
Massively parallel reporter assay measures transcriptional activities of various cis-regulatory modules (CRMs) in a single experiment. We developed a thermodynamic computational model framework that calculates quantitative levels of gene expression directly from regulatory DNA sequences. Using the framework, we investigated the molecular mechanisms of cis-regulatory mutations of a synthetic enhancer that cause abnormal gene expression. We found that, in a human cell line, competitive binding between family transcription factors (TFs) with slightly different binding preferences significantly increases the accuracy of recapitulating the transcriptional effects of thousands of single- or multi-mutations. We also discovered that even if various harmful mutations occurred in an activator binding site, CRM could stably maintain or even increase gene expression through a certain form of competitive binding between family TFs. These findings enhance understanding the effect of SNPs and indels on CRMs and would help building robust custom-designed CRMs for biologics production and gene therapy.
Collapse
Affiliation(s)
- Chan-Koo Kang
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- School of Applied Artificial Intelligence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| |
Collapse
|
21
|
Bravo González-Blas C, Matetovici I, Hillen H, Taskiran II, Vandepoel R, Christiaens V, Sansores-García L, Verboven E, Hulselmans G, Poovathingal S, Demeulemeester J, Psatha N, Mauduit D, Halder G, Aerts S. Single-cell spatial multi-omics and deep learning dissect enhancer-driven gene regulatory networks in liver zonation. Nat Cell Biol 2024; 26:153-167. [PMID: 38182825 PMCID: PMC10791584 DOI: 10.1038/s41556-023-01316-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 11/15/2023] [Indexed: 01/07/2024]
Abstract
In the mammalian liver, hepatocytes exhibit diverse metabolic and functional profiles based on their location within the liver lobule. However, it is unclear whether this spatial variation, called zonation, is governed by a well-defined gene regulatory code. Here, using a combination of single-cell multiomics, spatial omics, massively parallel reporter assays and deep learning, we mapped enhancer-gene regulatory networks across mouse liver cell types. We found that zonation affects gene expression and chromatin accessibility in hepatocytes, among other cell types. These states are driven by the repressors TCF7L1 and TBX3, alongside other core hepatocyte transcription factors, such as HNF4A, CEBPA, FOXA1 and ONECUT1. To examine the architecture of the enhancers driving these cell states, we trained a hierarchical deep learning model called DeepLiver. Our study provides a multimodal understanding of the regulatory code underlying hepatocyte identity and their zonation state that can be used to engineer enhancers with specific activity levels and zonation patterns.
Collapse
Affiliation(s)
- Carmen Bravo González-Blas
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Irina Matetovici
- VIB Center for Brain & Disease Research, Leuven, Belgium
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium
- VIB Tech Watch, VIB Headquarters, Ghent, Belgium
| | - Hanne Hillen
- VIB Center for Cancer Biology, Leuven, Belgium
- Department of Oncology, KU Leuven, Leuven, Belgium
| | - Ibrahim Ihsan Taskiran
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium
| | - Roel Vandepoel
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium
| | - Valerie Christiaens
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium
| | - Leticia Sansores-García
- VIB Center for Cancer Biology, Leuven, Belgium
- Department of Oncology, KU Leuven, Leuven, Belgium
| | - Elisabeth Verboven
- VIB Center for Cancer Biology, Leuven, Belgium
- Department of Oncology, KU Leuven, Leuven, Belgium
| | - Gert Hulselmans
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium
| | | | - Jonas Demeulemeester
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Nikoleta Psatha
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - David Mauduit
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium
| | - Georg Halder
- VIB Center for Cancer Biology, Leuven, Belgium
- Department of Oncology, KU Leuven, Leuven, Belgium
| | - Stein Aerts
- VIB Center for Brain & Disease Research, Leuven, Belgium.
- Department of Human Genetics, KU Leuven, Leuven, Belgium.
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium.
| |
Collapse
|
22
|
Hollingsworth EW, Liu TA, Jacinto SH, Chen CX, Alcantara JA, Kvon EZ. Rapid and Quantitative Functional Interrogation of Human Enhancer Variant Activity in Live Mice. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.10.570890. [PMID: 38105996 PMCID: PMC10723448 DOI: 10.1101/2023.12.10.570890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Functional analysis of non-coding variants associated with human congenital disorders remains challenging due to the lack of efficient in vivo models. Here we introduce dual-enSERT, a robust Cas9-based two-color fluorescent reporter system which enables rapid, quantitative comparison of enhancer allele activities in live mice of any genetic background. We use this new technology to examine and measure the gain- and loss-of-function effects of enhancer variants linked to limb polydactyly, autism, and craniofacial malformation. By combining dual-enSERT with single-cell transcriptomics, we characterize variant enhancer alleles at cellular resolution, thereby implicating candidate molecular pathways in pathogenic enhancer misregulation. We further show that independent, polydactyly-linked enhancer variants lead to ectopic expression in the same cell populations, indicating shared genetic mechanisms underlying non-coding variant pathogenesis. Finally, we streamline dual-enSERT for analysis in F0 animals by placing both reporters on the same transgene separated by a synthetic insulator. Dual-enSERT allows researchers to go from identifying candidate enhancer variants to analysis of comparative enhancer activity in live embryos in under two weeks.
Collapse
Affiliation(s)
- Ethan W. Hollingsworth
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA
- Medical Scientist Training Program, University of California, Irvine School of Medicine, Irvine, CA 92697, USA
| | - Taryn A. Liu
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA
| | - Sandra H. Jacinto
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA
| | - Cindy X. Chen
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA
| | - Joshua A. Alcantara
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA
| | - Evgeny Z. Kvon
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA
| |
Collapse
|
23
|
Abstract
Enhancers are cis-regulatory elements that can stimulate gene expression from distance, and drive precise spatiotemporal gene expression profiles during development. Functional enhancers display specific features including an open chromatin conformation, Histone H3 lysine 27 acetylation, Histone H3 lysine 4 mono-methylation enrichment, and enhancer RNAs production. These features are modified upon developmental cues which impacts their activity. In this review, we describe the current state of knowledge about enhancer functions and the diverse chromatin signatures found on enhancers. We also discuss the dynamic changes of enhancer chromatin signatures, and their impact on lineage specific gene expression profiles, during development or cellular differentiation.
Collapse
Affiliation(s)
- Amandine Barral
- Institute for Regenerative Medicine, Epigenetics Institute, Department of Cell and Developmental Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA,CONTACT Amandine Barral Institute for Regenerative Medicine, Epigenetics Institute, Department of Cell and Developmental Biology, Perelman School of Medicine, University of Pennsylvania. 3400 Civic Blvd, Philadelphia, Pennsylvania19104, USA
| | - Jérôme Déjardin
- Biology of repetitive sequences, Institute of Human Genetics CNRS-Université de Montpellier UMR 9002, Montpellier, France,Jérôme Déjardin Biology of repetitive sequences, Institute of Human Genetics CNRS-Université de Montpellier UMR 9002, 141 rue de la Cardonille, Montpellier34000, France
| |
Collapse
|
24
|
Lee D, Han SK, Yaacov O, Berk-Rauch H, Mathiyalagan P, Ganesh SK, Chakravarti A. Tissue-specific and tissue-agnostic effects of genome sequence variation modulating blood pressure. Cell Rep 2023; 42:113351. [PMID: 37910504 PMCID: PMC10726310 DOI: 10.1016/j.celrep.2023.113351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 09/21/2023] [Accepted: 10/11/2023] [Indexed: 11/03/2023] Open
Abstract
Genome-wide association studies (GWASs) have identified numerous variants associated with polygenic traits and diseases. However, with few exceptions, a mechanistic understanding of which variants affect which genes in which tissues to modulate trait variation is lacking. Here, we present genomic analyses to explain trait heritability of blood pressure (BP) through the genetics of transcriptional regulation using GWASs, multiomics data from different tissues, and machine learning approaches. Approximately 500,000 predicted regulatory variants across four tissues explain 33.4% of variant heritability: 2.5%, 5.3%, 7.7%, and 11.8% for kidney-, adrenal-, heart-, and artery-specific variants, respectively. Variation in the enhancers involved shows greater tissue specificity than in the genes they regulate, suggesting that gene regulatory networks perturbed by enhancer variants in a tissue relevant to a phenotype are the major source of interindividual variation in BP. Thus, our study provides an approach to scan human tissue and cell types for their physiological contribution to any trait.
Collapse
Affiliation(s)
- Dongwon Lee
- Department of Pediatrics, Division of Nephrology, Boston Children's Hospital, Boston & Harvard Medical School, Boston, MA, USA.
| | - Seong Kyu Han
- Department of Pediatrics, Division of Nephrology, Boston Children's Hospital, Boston & Harvard Medical School, Boston, MA, USA
| | - Or Yaacov
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, NY, USA
| | - Hanna Berk-Rauch
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, NY, USA
| | - Prabhu Mathiyalagan
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, NY, USA
| | - Santhi K Ganesh
- Department of Internal Medicine & Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Aravinda Chakravarti
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, NY, USA.
| |
Collapse
|
25
|
Zhao J, Baltoumas FA, Konnaris MA, Mouratidis I, Liu Z, Sims J, Agarwal V, Pavlopoulos GA, Georgakopoulos-Soares I, Ahituv N. MPRAbase: A Massively Parallel Reporter Assay Database. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.19.567742. [PMID: 38045264 PMCID: PMC10690217 DOI: 10.1101/2023.11.19.567742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Massively parallel reporter assays (MPRAs) represent a set of high-throughput technologies that measure the functional effects of thousands of sequences/variants on gene regulatory activity. There are several different variations of MPRA technology and they are used for numerous applications, including regulatory element discovery, variant effect measurement, saturation mutagenesis, synthetic regulatory element generation or characterization of evolutionary gene regulatory differences. Despite their many designs and uses, there is no comprehensive database that incorporates the results of these experiments. To address this, we developed MPRAbase, a manually curated database that currently harbors 129 experiments, encompassing 17,718,677 elements tested across 35 cell types and 4 organisms. The MPRAbase web interface ( http://www.mprabase.com ) serves as a centralized user-friendly repository to download existing MPRA data for independent analysis and is designed with the ability to allow researchers to share their published data for rapid dissemination to the community.
Collapse
|
26
|
Tovar A, Kyono Y, Nishino K, Bose M, Varshney A, Parker SCJ, Kitzman JO. Using a modular massively parallel reporter assay to discover context-specific regulatory grammars in type 2 diabetes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.08.561391. [PMID: 37873175 PMCID: PMC10592691 DOI: 10.1101/2023.10.08.561391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Recent genome-wide association studies have established that most complex disease-associated loci are found in noncoding regions where defining their function is nontrivial. In this study, we leverage a modular massively parallel reporter assay (MPRA) to uncover sequence features linked to context-specific regulatory activity. We screened enhancer activity across a panel of 198-bp fragments spanning over 10k type 2 diabetes- and metabolic trait-associated variants in the 832/13 rat insulinoma cell line, a relevant model of pancreatic beta cells. We explored these fragments' context sensitivity by comparing their activities when placed up-or downstream of a reporter gene, and in combination with either a synthetic housekeeping promoter (SCP1) or a more biologically relevant promoter corresponding to the human insulin gene ( INS ). We identified clear effects of MPRA construct design on measured fragment enhancer activity. Specifically, a subset of fragments (n = 702/11,656) displayed positional bias, evenly distributed across up- and downstream preference. A separate set of fragments exhibited promoter bias (n = 698/11,656), mostly towards the cell-specific INS promoter (73.4%). To identify sequence features associated with promoter preference, we used Lasso regression with 562 genomic annotations and discovered that fragments with INS promoter-biased activity are enriched for HNF1 motifs. HNF1 family transcription factors are key regulators of glucose metabolism disrupted in maturity onset diabetes of the young (MODY), suggesting genetic convergence between rare coding variants that cause MODY and common T2D-associated regulatory variants. We designed a follow-up MPRA containing HNF1 motif-enriched fragments and observed several instances where deletion or mutation of HNF1 motifs disrupted the INS promoter-biased enhancer activity, specifically in the beta cell model but not in a skeletal muscle cell line, another diabetes-relevant cell type. Together, our study suggests that cell-specific regulatory activity is partially influenced by enhancer-promoter compatibility and indicates that careful attention should be paid when designing MPRA libraries to capture context-specific regulatory processes at disease-associated genetic signals.
Collapse
|
27
|
Tan Y, Yan X, Sun J, Wan J, Li X, Huang Y, Li L, Niu L, Hou C. Genome-wide enhancer identification by massively parallel reporter assay in Arabidopsis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 116:234-250. [PMID: 37387536 DOI: 10.1111/tpj.16373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 05/29/2023] [Accepted: 06/27/2023] [Indexed: 07/01/2023]
Abstract
Enhancers are critical cis-regulatory elements controlling gene expression during cell development and differentiation. However, genome-wide enhancer characterization has been challenging due to the lack of a well-defined relationship between enhancers and genes. Function-based methods are the gold standard for determining the biological function of cis-regulatory elements; however, these methods have not been widely applied to plants. Here, we applied a massively parallel reporter assay on Arabidopsis to measure enhancer activities across the genome. We identified 4327 enhancers with various combinations of epigenetic modifications distinctively different from animal enhancers. Furthermore, we showed that enhancers differ from promoters in their preference for transcription factors. Although some enhancers are not conserved and overlap with transposable elements forming clusters, enhancers are generally conserved across thousand Arabidopsis accessions, suggesting they are selected under evolution pressure and could play critical roles in the regulation of important genes. Moreover, comparison analysis reveals that enhancers identified by different strategies do not overlap, suggesting these methods are complementary in nature. In sum, we systematically investigated the features of enhancers identified by functional assay in A. thaliana, which lays the foundation for further investigation into enhancers' functional mechanisms in plants.
Collapse
Affiliation(s)
- Yongjun Tan
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650201, China
- Department of Biology, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Xiaohao Yan
- Department of Biology, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Jialei Sun
- Department of Biology, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Jing Wan
- Department of Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xinxin Li
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Cardiovascular Health and Precision Medicine, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Yingzhang Huang
- Department of Biology, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Li Li
- Department of Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Longjian Niu
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Cardiovascular Health and Precision Medicine, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Chunhui Hou
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650201, China
| |
Collapse
|
28
|
Ye W, Yu Y, Zhu X, Wan W, Liu Y, Zou H, Zhu Z. A Common Functional Variant at the Enhancer of the Rheumatoid Arthritis Risk Gene ORMDL3 Regulates its Expression Through Allele-Specific JunD Binding. PHENOMICS (CHAM, SWITZERLAND) 2023; 3:485-495. [PMID: 37881318 PMCID: PMC10593690 DOI: 10.1007/s43657-023-00107-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 03/27/2023] [Accepted: 03/29/2023] [Indexed: 10/27/2023]
Abstract
Genome-wide association studies (GWASs) have identified over 100 loci associated with rheumatoid arthritis (RA); however, the functionally affected genes and the underlying molecular mechanisms contributing to these associations are often unknown. In this study, we conducted an integrative genomic analysis incorporating multiple "omics" data and identified a functional regulatory DNA variant, rs56199421, and a plausible mechanism by which it regulates the expression of a putative RA risk gene, ORMDL Sphingolipid Biosynthesis Regulator 3 (ORMDL3). The T allele of rs56199421, located in the enhancer region of ORMDL3, exhibited stronger direct binding ability than the other C allele of rs56199421 did in vitro with the transcription factor JunD and demonstrated higher transcriptional activity. Moreover, the T allele of rs56199421 is associated with elevated RA risk, and ORMDL3 expression is increased in RA patients. Thus, these findings suggest that the T allele of rs56199421 enhances JunD transcription factor binding, increases enhancer activity, and elevates the expression of the RA risk gene ORMDL3. Supplementary Information The online version contains supplementary material available at 10.1007/s43657-023-00107-z.
Collapse
Affiliation(s)
- Wenjing Ye
- Division of Rheumatology and Immunology, Huashan Hospital, Fudan University, Shanghai, 200040 China
- Institute of Rheumatology, Immunology and Allergy, Fudan University, Shanghai, 200040 China
| | - Yiyun Yu
- Division of Rheumatology and Immunology, Huashan Hospital, Fudan University, Shanghai, 200040 China
- Institute of Rheumatology, Immunology and Allergy, Fudan University, Shanghai, 200040 China
| | - Xiaoxia Zhu
- Division of Rheumatology and Immunology, Huashan Hospital, Fudan University, Shanghai, 200040 China
- Institute of Rheumatology, Immunology and Allergy, Fudan University, Shanghai, 200040 China
| | - Weiguo Wan
- Division of Rheumatology and Immunology, Huashan Hospital, Fudan University, Shanghai, 200040 China
- Institute of Rheumatology, Immunology and Allergy, Fudan University, Shanghai, 200040 China
| | - Yun Liu
- The Ministry of Education Key Laboratory of Metabolism and Molecular Medicine, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fudan University, Shanghai, 200032 China
| | - Hejian Zou
- Division of Rheumatology and Immunology, Huashan Hospital, Fudan University, Shanghai, 200040 China
- Institute of Rheumatology, Immunology and Allergy, Fudan University, Shanghai, 200040 China
| | - Zaihua Zhu
- Division of Rheumatology and Immunology, Huashan Hospital, Fudan University, Shanghai, 200040 China
- Institute of Rheumatology, Immunology and Allergy, Fudan University, Shanghai, 200040 China
| |
Collapse
|
29
|
Kleinschmidt H, Xu C, Bai L. Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 2023; 132:167-189. [PMID: 37184694 PMCID: PMC10542970 DOI: 10.1007/s00412-023-00796-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/16/2023]
Abstract
Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.
Collapse
Affiliation(s)
- Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Physics, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
30
|
Fu Y, Kelly JA, Gopalakrishnan J, Pelikan RC, Tessneer KL, Pasula S, Grundahl K, Murphy DA, Gaffney PM. Massively Parallel Reporter Assay Confirms Regulatory Potential of hQTLs and Reveals Important Variants in Lupus and Other Autoimmune Diseases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.17.553722. [PMID: 37645944 PMCID: PMC10462090 DOI: 10.1101/2023.08.17.553722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Objective To systematically characterize the potential for histone post-translational modifications, i.e., histone quantitative trait loci (hQTLs), expression QTLs (eQTLs), and variants on systemic lupus erythematosus (SLE) and autoimmune (AI) disease risk haplotypes to modulate gene expression in an allele dependent manner. Methods We designed a massively parallel reporter assay (MPRA) containing ~32K variants and transfected it into an Epstein-Barr virus transformed B cell line generated from an SLE case. Results Our study expands our understanding of hQTLs, illustrating that epigenetic QTLs are more likely to contribute to functional mechanisms than eQTLs and other variant types, and a large proportion of hQTLs overlap transcription start sites (TSS) of noncoding RNAs. In addition, we nominate 17 variants (including 11 novel) as putative causal variants for SLE and another 14 for various other AI diseases, prioritizing these variants for future functional studies primary and immortalized B cells. Conclusion We uncover important insights into the mechanistic relationships between genotype, epigenetics, gene expression, and SLE and AI disease phenotypes.
Collapse
Affiliation(s)
- Yao Fu
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104, USA
| | - Jennifer A Kelly
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104, USA
| | - Jaanam Gopalakrishnan
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104, USA
- Neuro-Immune Regulome Unit, National Eye Institute, National Institute of Health, Bethesda, MD, 20892, USA
| | - Richard C Pelikan
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104, USA
| | - Kandice L Tessneer
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104, USA
| | - Satish Pasula
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104, USA
| | - Kiely Grundahl
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104, USA
| | - David A Murphy
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104, USA
| | - Patrick M Gaffney
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104, USA
| |
Collapse
|
31
|
Monti R, Ohler U. Toward Identification of Functional Sequences and Variants in Noncoding DNA. Annu Rev Biomed Data Sci 2023; 6:191-210. [PMID: 37262323 DOI: 10.1146/annurev-biodatasci-122120-110102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Understanding the noncoding part of the genome, which encodes gene regulation, is necessary to identify genetic mechanisms of disease and translate findings from genome-wide association studies into actionable results for treatments and personalized care. Here we provide an overview of the computational analysis of noncoding regions, starting from gene-regulatory mechanisms and their representation in data. Deep learning methods, when applied to these data, highlight important regulatory sequence elements and predict the functional effects of genetic variants. These and other algorithms are used to predict damaging sequence variants. Finally, we introduce rare-variant association tests that incorporate functional annotations and predictions in order to increase interpretability and statistical power.
Collapse
Affiliation(s)
- Remo Monti
- Max Delbrück Center for Molecular Medicine (MDC), Helmholtz Association of German Research Centers, Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany;
- Digital Health-Machine Learning, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Uwe Ohler
- Max Delbrück Center for Molecular Medicine (MDC), Helmholtz Association of German Research Centers, Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany;
| |
Collapse
|
32
|
Fan K, Pfister E, Weng Z. Toward a comprehensive catalog of regulatory elements. Hum Genet 2023; 142:1091-1111. [PMID: 36935423 DOI: 10.1007/s00439-023-02519-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 01/03/2023] [Indexed: 03/21/2023]
Abstract
Regulatory elements are the genomic regions that interact with transcription factors to control cell-type-specific gene expression in different cellular environments. A precise and complete catalog of functional elements encoded by the human genome is key to understanding mammalian gene regulation. Here, we review the current state of regulatory element annotation. We first provide an overview of assays for characterizing functional elements, including genome, epigenome, transcriptome, three-dimensional chromatin interaction, and functional validation assays. We then discuss computational methods for defining regulatory elements, including peak-calling and other statistical modeling methods. Finally, we introduce several high-quality lists of regulatory element annotations and suggest potential future directions.
Collapse
Affiliation(s)
- Kaili Fan
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Edith Pfister
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA.
| |
Collapse
|
33
|
Ruperao P, Rangan P, Shah T, Thakur V, Kalia S, Mayes S, Rathore A. The Progression in Developing Genomic Resources for Crop Improvement. Life (Basel) 2023; 13:1668. [PMID: 37629524 PMCID: PMC10455509 DOI: 10.3390/life13081668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 07/25/2023] [Indexed: 08/27/2023] Open
Abstract
Sequencing technologies have rapidly evolved over the past two decades, and new technologies are being continually developed and commercialized. The emerging sequencing technologies target generating more data with fewer inputs and at lower costs. This has also translated to an increase in the number and type of corresponding applications in genomics besides enhanced computational capacities (both hardware and software). Alongside the evolving DNA sequencing landscape, bioinformatics research teams have also evolved to accommodate the increasingly demanding techniques used to combine and interpret data, leading to many researchers moving from the lab to the computer. The rich history of DNA sequencing has paved the way for new insights and the development of new analysis methods. Understanding and learning from past technologies can help with the progress of future applications. This review focuses on the evolution of sequencing technologies, their significant enabling role in generating plant genome assemblies and downstream applications, and the parallel development of bioinformatics tools and skills, filling the gap in data analysis techniques.
Collapse
Affiliation(s)
- Pradeep Ruperao
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Parimalan Rangan
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India;
| | - Trushar Shah
- International Institute of Tropical Agriculture (IITA), Nairobi 30709-00100, Kenya;
| | - Vivek Thakur
- Department of Systems & Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad 500046, India;
| | - Sanjay Kalia
- Department of Biotechnology, Ministry of Science and Technology, Government of India, New Delhi 110003, India;
| | - Sean Mayes
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Abhishek Rathore
- Excellence in Breeding, International Maize and Wheat Improvement Center (CIMMYT), Hyderabad 502324, India
| |
Collapse
|
34
|
Armendariz DA, Sundarrajan A, Hon GC. Breaking enhancers to gain insights into developmental defects. eLife 2023; 12:e88187. [PMID: 37497775 PMCID: PMC10374278 DOI: 10.7554/elife.88187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 07/19/2023] [Indexed: 07/28/2023] Open
Abstract
Despite ground-breaking genetic studies that have identified thousands of risk variants for developmental diseases, how these variants lead to molecular and cellular phenotypes remains a gap in knowledge. Many of these variants are non-coding and occur at enhancers, which orchestrate key regulatory programs during development. The prevailing paradigm is that non-coding variants alter the activity of enhancers, impacting gene expression programs, and ultimately contributing to disease risk. A key obstacle to progress is the systematic functional characterization of non-coding variants at scale, especially since enhancer activity is highly specific to cell type and developmental stage. Here, we review the foundational studies of enhancers in developmental disease and current genomic approaches to functionally characterize developmental enhancers and their variants at scale. In the coming decade, we anticipate systematic enhancer perturbation studies to link non-coding variants to molecular mechanisms, changes in cell state, and disease phenotypes.
Collapse
Affiliation(s)
- Daniel A Armendariz
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
| | - Anjana Sundarrajan
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
| | - Gary C Hon
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
- Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, United States
- Lyda Hill Department of Bioinformatics, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, United States
| |
Collapse
|
35
|
Oliveros W, Delfosse K, Lato DF, Kiriakopulos K, Mokhtaridoost M, Said A, McMurray BJ, Browning JW, Mattioli K, Meng G, Ellis J, Mital S, Melé M, Maass PG. Systematic characterization of regulatory variants of blood pressure genes. CELL GENOMICS 2023; 3:100330. [PMID: 37492106 PMCID: PMC10363820 DOI: 10.1016/j.xgen.2023.100330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 03/29/2023] [Accepted: 04/28/2023] [Indexed: 07/27/2023]
Abstract
High blood pressure (BP) is the major risk factor for cardiovascular disease. Genome-wide association studies have identified genetic variants for BP, but functional insights into causality and related molecular mechanisms lag behind. We functionally characterize 4,608 genetic variants in linkage with 135 BP loci in vascular smooth muscle cells and cardiomyocytes by massively parallel reporter assays. High densities of regulatory variants at BP loci (i.e., ULK4, MAP4, CFDP1, PDE5A) indicate that multiple variants drive genetic association. Regulatory variants are enriched in repeats, alter cardiovascular-related transcription factor motifs, and spatially converge with genes controlling specific cardiovascular pathways. Using heuristic scoring, we define likely causal variants, and CRISPR prime editing finally determines causal variants for KCNK9, SFXN2, and PCGF6, which are candidates for developing high BP. Our systems-level approach provides a catalog of functionally relevant variants and their genomic architecture in two trait-relevant cell lines for a better understanding of BP gene regulation.
Collapse
Affiliation(s)
- Winona Oliveros
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Catalonia, Spain
| | - Kate Delfosse
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Daniella F. Lato
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Katerina Kiriakopulos
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Milad Mokhtaridoost
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Abdelrahman Said
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Brandon J. McMurray
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Jared W.L. Browning
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Kaia Mattioli
- Division of Genetics, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Guoliang Meng
- Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - James Ellis
- Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Seema Mital
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Ted Rogers Centre for Heart Research, Toronto, ON M5G 1X8, Canada
- Department of Pediatrics, The Hospital for Sick Children, University of Toronto, Toronto, ON M5G 0A4, Canada
| | - Marta Melé
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Catalonia, Spain
| | - Philipp G. Maass
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
36
|
FitzPatrick VD, Leemans C, van Arensbergen J, van Steensel B, Bussemaker H. Defining the fine structure of promoter activity on a genome-wide scale with CISSECTOR. Nucleic Acids Res 2023; 51:5499-5511. [PMID: 37013986 PMCID: PMC10287907 DOI: 10.1093/nar/gkad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 03/08/2023] [Accepted: 03/22/2023] [Indexed: 04/05/2023] Open
Abstract
Classic promoter mutagenesis strategies can be used to study how proximal promoter regions regulate the expression of particular genes of interest. This is a laborious process, in which the smallest sub-region of the promoter still capable of recapitulating expression in an ectopic setting is first identified, followed by targeted mutation of putative transcription factor binding sites. Massively parallel reporter assays such as survey of regulatory elements (SuRE) provide an alternative way to study millions of promoter fragments in parallel. Here we show how a generalized linear model (GLM) can be used to transform genome-scale SuRE data into a high-resolution genomic track that quantifies the contribution of local sequence to promoter activity. This coefficient track helps identify regulatory elements and can be used to predict promoter activity of any sub-region in the genome. It thus allows in silico dissection of any promoter in the human genome to be performed. We developed a web application, available at cissector.nki.nl, that lets researchers easily perform this analysis as a starting point for their research into any promoter of interest.
Collapse
Affiliation(s)
- Vincent D FitzPatrick
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Christ Leemans
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Joris van Arensbergen
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Bas van Steensel
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
- Department of Cell Biology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
37
|
Rong S, Neil CR, Welch A, Duan C, Maguire S, Meremikwu IC, Meyerson M, Evans BJ, Fairbrother WG. Large-scale functional screen identifies genetic variants with splicing effects in modern and archaic humans. Proc Natl Acad Sci U S A 2023; 120:e2218308120. [PMID: 37192163 PMCID: PMC10214146 DOI: 10.1073/pnas.2218308120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 04/12/2023] [Indexed: 05/18/2023] Open
Abstract
Humans coexisted and interbred with other hominins which later became extinct. These archaic hominins are known to us only through fossil records and for two cases, genome sequences. Here, we engineer Neanderthal and Denisovan sequences into thousands of artificial genes to reconstruct the pre-mRNA processing patterns of these extinct populations. Of the 5,169 alleles tested in this massively parallel splicing reporter assay (MaPSy), we report 962 exonic splicing mutations that correspond to differences in exon recognition between extant and extinct hominins. Using MaPSy splicing variants, predicted splicing variants, and splicing quantitative trait loci, we show that splice-disrupting variants experienced greater purifying selection in anatomically modern humans than that in Neanderthals. Adaptively introgressed variants were enriched for moderate-effect splicing variants, consistent with positive selection for alternative spliced alleles following introgression. As particularly compelling examples, we characterized a unique tissue-specific alternative splicing variant at the adaptively introgressed innate immunity gene TLR1, as well as a unique Neanderthal introgressed alternative splicing variant in the gene HSPG2 that encodes perlecan. We further identified potentially pathogenic splicing variants found only in Neanderthals and Denisovans in genes related to sperm maturation and immunity. Finally, we found splicing variants that may contribute to variation among modern humans in total bilirubin, balding, hemoglobin levels, and lung capacity. Our findings provide unique insights into natural selection acting on splicing in human evolution and demonstrate how functional assays can be used to identify candidate causal variants underlying differences in gene regulation and phenotype.
Collapse
Affiliation(s)
- Stephen Rong
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Christopher R. Neil
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Anastasia Welch
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Chaorui Duan
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Samantha Maguire
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ijeoma C. Meremikwu
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Malcolm Meyerson
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ben J. Evans
- Department of Biology, McMaster University, Hamilton, ONL8S 4K1, Canada
| | - William G. Fairbrother
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
- Hassenfeld Child Health Innovation Institute of Brown University, Providence, RI02912
| |
Collapse
|
38
|
Li XC, Fuqua T, van Breugel ME, Crocker J. Mutational scans reveal differential evolvability of Drosophila promoters and enhancers. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220054. [PMID: 37004721 PMCID: PMC10067265 DOI: 10.1098/rstb.2022.0054] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023] Open
Abstract
Rapid enhancer and slow promoter evolution have been demonstrated through comparative genomics. However, it is not clear how this information is encoded genetically and if this can be used to place evolution in a predictive context. Part of the challenge is that our understanding of the potential for regulatory evolution is biased primarily toward natural variation or limited experimental perturbations. Here, to explore the evolutionary capacity of promoter variation, we surveyed an unbiased mutation library for three promoters in Drosophila melanogaster. We found that mutations in promoters had limited to no effect on spatial patterns of gene expression. Compared to developmental enhancers, promoters are more robust to mutations and have more access to mutations that can increase gene expression, suggesting that their low activity might be a result of selection. Consistent with these observations, increasing the promoter activity at the endogenous locus of shavenbaby led to increased transcription yet limited phenotypic changes. Taken together, developmental promoters may encode robust transcriptional outputs allowing evolvability through the integration of diverse developmental enhancers. This article is part of the theme issue ‘Interdisciplinary approaches to predicting evolutionary biology’.
Collapse
Affiliation(s)
- Xueying C. Li
- European Molecular Biology Laboratory, Heidelberg, Baden-Württemberg 69117, Germany
| | - Timothy Fuqua
- European Molecular Biology Laboratory, Heidelberg, Baden-Württemberg 69117, Germany
| | | | - Justin Crocker
- European Molecular Biology Laboratory, Heidelberg, Baden-Württemberg 69117, Germany
| |
Collapse
|
39
|
Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023; 24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Enhancers are genomic DNA elements controlling spatiotemporal gene expression. Their flexible organization and functional redundancies make deciphering their sequence-function relationships challenging. This article provides an overview of the current understanding of enhancer organization and evolution, with an emphasis on factors that influence these relationships. Technological advancements, particularly in machine learning and synthetic biology, are discussed in light of how they provide new ways to understand this complexity. Exciting opportunities lie ahead as we continue to unravel the intricacies of enhancer function.
Collapse
Affiliation(s)
- Gabrielle D Smith
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Wan Hern Ching
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
| | - Paola Cornejo-Páramo
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Emily S Wong
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia.
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia.
| |
Collapse
|
40
|
Guo Q, Wu S, Geschwind DH. Characterization of Gene Regulatory Elements in Human Fetal Cortical Development: Enhancing Our Understanding of Neurodevelopmental Disorders and Evolution. Dev Neurosci 2023; 46:69-83. [PMID: 37231806 DOI: 10.1159/000530929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 04/24/2023] [Indexed: 05/27/2023] Open
Abstract
The neocortex is the region that most distinguishes human brain from other mammals and primates [Annu Rev Genet. 2021 Nov;55(1):555-81]. Studying the development of human cortex is important in understanding the evolutionary changes occurring in humans relative to other primates, as well as in elucidating mechanisms underlying neurodevelopmental disorders. Cortical development is a highly regulated process, spatially and temporally coordinated by expression of essential transcriptional factors in response to signaling pathways [Neuron. 2019 Sep;103(6):980-1004]. Enhancers are the most well-understood cis-acting, non-protein-coding regulatory elements that regulate gene expression [Nat Rev Genet. 2014 Apr;15(4):272-86]. Importantly, given the conservation of both DNA sequence and molecular function of the majority of proteins across mammals [Genome Res. 2003 Dec;13(12):2507-18], enhancers [Science. 2015 Mar;347(6226):1155-9], which are far more divergent at the sequence level, likely account for the phenotypes that distinguish the human brain by changing the regulation of gene expression. In this review, we will revisit the conceptual framework of gene regulation during human brain development, as well as the evolution of technologies to study transcriptional regulation, with recent advances in genome biology that open a window allowing us to systematically characterize cis-regulatory elements in developing human brain [Hum Mol Genet. 2022 Oct;31(R1):R84-96]. We provide an update on work to characterize the suite of all enhancers in the developing human brain and the implications for understanding neuropsychiatric disorders. Finally, we discuss emerging therapeutic ideas that utilize our emerging knowledge of enhancer function.
Collapse
Affiliation(s)
- Qiuyu Guo
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, USA
- Center for Autism Research and Treatment, Semel Institute, University of California Los Angeles, Los Angeles, California, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, USA
| | - Sarah Wu
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, USA
| | - Daniel H Geschwind
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, USA
- Center for Autism Research and Treatment, Semel Institute, University of California Los Angeles, Los Angeles, California, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, USA
- Institute of Precision Health, University of California Los Angeles, Los Angeles, California, USA
| |
Collapse
|
41
|
Das M, Hossain A, Banerjee D, Praul CA, Girirajan S. Challenges and considerations for reproducibility of STARR-seq assays. Genome Res 2023; 33:479-495. [PMID: 37130797 PMCID: PMC10234304 DOI: 10.1101/gr.277204.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 03/15/2023] [Indexed: 05/04/2023]
Abstract
High-throughput methods such as RNA-seq, ChIP-seq, and ATAC-seq have well-established guidelines, commercial kits, and analysis pipelines that enable consistency and wider adoption for understanding genome function and regulation. STARR-seq, a popular assay for directly quantifying the activities of thousands of enhancer sequences simultaneously, has seen limited standardization across studies. The assay is long, with more than 250 steps, and frequent customization of the protocol and variations in bioinformatics methods raise concerns for reproducibility of STARR-seq studies. Here, we assess each step of the protocol and analysis pipelines from published sources and in-house assays, and identify critical steps and quality control (QC) checkpoints necessary for reproducibility of the assay. We also provide guidelines for experimental design, protocol scaling, customization, and analysis pipelines for better adoption of the assay. These resources will allow better optimization of STARR-seq for specific research needs, enable comparisons and integration across studies, and improve the reproducibility of results.
Collapse
Affiliation(s)
- Maitreya Das
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- Molecular and Cellular Integrative Biosciences Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Ayaan Hossain
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Deepro Banerjee
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Craig Alan Praul
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Santhosh Girirajan
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- Molecular and Cellular Integrative Biosciences Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Anthropology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
42
|
Ren N, Dai S, Ma S, Yang F. Strategies for activity analysis of single nucleotide polymorphisms associated with human diseases. Clin Genet 2023; 103:392-400. [PMID: 36527336 DOI: 10.1111/cge.14282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/10/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
Genome-wide association studies (GWAS) have identified a large number of single nucleotide polymorphism (SNP) sites associated with human diseases. In the annotation of human diseases, especially cancers, SNPs, as an important component of genetic factors, have gained increasing attention. Given that most of the SNPs are located in non-coding regions, the functional verification of these SNPs is a great challenge. The key to functional annotation for risk SNPs is to screen SNPs with regulatory activity from thousands of disease associated-SNPs. In this review, we systematically recapitulate the characteristics and functional roles of SNP sites, discuss three parallel reporter screening strategies in detail based on barcode tag classification, and recommend the common in silico strategies to help supplement the annotation of SNP sites with epigenetic activity analysis, prediction of target genes and trans-acting factors. We hope that this review will contribute to this exuberant research field by providing robust activity analysis strategies that can facilitate the translation of GWAS results into personalized diagnosis and prevention measures for human diseases.
Collapse
Affiliation(s)
- Naixia Ren
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shangkun Dai
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shumin Ma
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
| | - Fengtang Yang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| |
Collapse
|
43
|
Fair T, Pollen AA. Genetic architecture of human brain evolution. Curr Opin Neurobiol 2023; 80:102710. [PMID: 37003107 DOI: 10.1016/j.conb.2023.102710] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 02/20/2023] [Accepted: 02/26/2023] [Indexed: 04/03/2023]
Abstract
Comparative studies of hominids have long sought to identify mutational events that shaped the evolution of the human nervous system. However, functional genetic differences are outnumbered by millions of nearly neutral mutations, and the developmental mechanisms underlying human nervous system specializations are difficult to model and incompletely understood. Candidate-gene studies have attempted to map select human-specific genetic differences to neurodevelopmental functions, but it remains unclear how to contextualize the relative effects of genes that are investigated independently. Considering these limitations, we discuss scalable approaches for probing the functional contributions of human-specific genetic differences. We propose that a systems-level view will enable a more quantitative and integrative understanding of the genetic, molecular and cellular underpinnings of human nervous system evolution.
Collapse
Affiliation(s)
- Tyler Fair
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA. https://twitter.com/@TylerFair_
| | - Alex A Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
44
|
Zheng Y, VanDusen NJ. Massively Parallel Reporter Assays for High-Throughput In Vivo Analysis of Cis-Regulatory Elements. J Cardiovasc Dev Dis 2023; 10:jcdd10040144. [PMID: 37103023 PMCID: PMC10146671 DOI: 10.3390/jcdd10040144] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/24/2023] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
The rapid improvement of descriptive genomic technologies has fueled a dramatic increase in hypothesized connections between cardiovascular gene expression and phenotypes. However, in vivo testing of these hypotheses has predominantly been relegated to slow, expensive, and linear generation of genetically modified mice. In the study of genomic cis-regulatory elements, generation of mice featuring transgenic reporters or cis-regulatory element knockout remains the standard approach. While the data obtained is of high quality, the approach is insufficient to keep pace with candidate identification and therefore results in biases introduced during the selection of candidates for validation. However, recent advances across a range of disciplines are converging to enable functional genomic assays that can be conducted in a high-throughput manner. Here, we review one such method, massively parallel reporter assays (MPRAs), in which the activities of thousands of candidate genomic regulatory elements are simultaneously assessed via the next-generation sequencing of a barcoded reporter transcript. We discuss best practices for MPRA design and use, with a focus on practical considerations, and review how this emerging technology has been successfully deployed in vivo. Finally, we discuss how MPRAs are likely to evolve and be used in future cardiovascular research.
Collapse
|
45
|
Moeckel C, Zaravinos A, Georgakopoulos-Soares I. Strand Asymmetries Across Genomic Processes. Comput Struct Biotechnol J 2023; 21:2036-2047. [PMID: 36968020 PMCID: PMC10030826 DOI: 10.1016/j.csbj.2023.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 03/08/2023] [Accepted: 03/08/2023] [Indexed: 03/12/2023] Open
Abstract
Across biological systems, a number of genomic processes, including transcription, replication, DNA repair, and transcription factor binding, display intrinsic directionalities. These directionalities are reflected in the asymmetric distribution of nucleotides, motifs, genes, transposon integration sites, and other functional elements across the two complementary strands. Strand asymmetries, including GC skews and mutational biases, have shaped the nucleotide composition of diverse organisms. The investigation of strand asymmetries often serves as a method to understand underlying biological mechanisms, including protein binding preferences, transcription factor interactions, retrotransposition, DNA damage and repair preferences, transcription-replication collisions, and mutagenesis mechanisms. Research into this subject also enables the identification of functional genomic sites, such as replication origins and transcription start sites. Improvements in our ability to detect and quantify DNA strand asymmetries will provide insights into diverse functionalities of the genome, the contribution of different mutational mechanisms in germline and somatic mutagenesis, and our knowledge of genome instability and evolution, which all have significant clinical implications in human disease, including cancer. In this review, we describe key developments that have been made across the field of genomic strand asymmetries, as well as the discovery of associated mechanisms.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Apostolos Zaravinos
- Department of Life Sciences, European University Cyprus, Diogenis Str., 6, Nicosia 2404, Cyprus
- Cancer Genetics, Genomics and Systems Biology laboratory, Basic and Translational Cancer Research Center (BTCRC), Nicosia 1516, Cyprus
- Corresponding author at: Department of Life Sciences, European University Cyprus, Diogenis Str., 6, Nicosia 2404, Cyprus.
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Corresponding author.
| |
Collapse
|
46
|
Stankey CT, Lee JC. Translating non-coding genetic associations into a better understanding of immune-mediated disease. Dis Model Mech 2023; 16:297044. [PMID: 36897113 PMCID: PMC10040244 DOI: 10.1242/dmm.049790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023] Open
Abstract
Genome-wide association studies have identified hundreds of genetic loci that are associated with immune-mediated diseases. Most disease-associated variants are non-coding, and a large proportion of these variants lie within enhancers. As a result, there is a pressing need to understand how common genetic variation might affect enhancer function and thereby contribute to immune-mediated (and other) diseases. In this Review, we first describe statistical and experimental methods to identify causal genetic variants that modulate gene expression, including statistical fine-mapping and massively parallel reporter assays. We then discuss approaches to characterise the mechanisms by which these variants modulate immune function, such as clustered regularly interspaced short palindromic repeats (CRISPR)-based screens. We highlight examples of studies that, by elucidating the effects of disease variants within enhancers, have provided important insights into immune function and uncovered key pathways of disease.
Collapse
Affiliation(s)
- Christina T Stankey
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London NW1 1AT, UK
- Department of Immunology and Inflammation, Imperial College London, London W12 0NN, UK
| | - James C Lee
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London NW1 1AT, UK
- Institute of Liver and Digestive Health, Royal Free Hospital, University College London, London NW3 2PF, UK
| |
Collapse
|
47
|
Gallego Romero I, Lea AJ. Leveraging massively parallel reporter assays for evolutionary questions. Genome Biol 2023; 24:26. [PMID: 36788564 PMCID: PMC9926830 DOI: 10.1186/s13059-023-02856-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 01/17/2023] [Indexed: 02/16/2023] Open
Abstract
A long-standing goal of evolutionary biology is to decode how gene regulation contributes to organismal diversity. Doing so is challenging because it is hard to predict function from non-coding sequence and to perform molecular research with non-model taxa. Massively parallel reporter assays (MPRAs) enable the testing of thousands to millions of sequences for regulatory activity simultaneously. Here, we discuss the execution, advantages, and limitations of MPRAs, with a focus on evolutionary questions. We propose solutions for extending MPRAs to rare taxa and those with limited genomic resources, and we underscore MPRA's broad potential for driving genome-scale, functional studies across organisms.
Collapse
Affiliation(s)
- Irene Gallego Romero
- Melbourne Integrative Genomics, University of Melbourne, Royal Parade, Parkville, Victoria, 3010, Australia. .,School of BioSciences, The University of Melbourne, Royal Parade, Parkville, 3010, Australia. .,The Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, 30 Royal Parade, Parkville, Victoria, 3010, Australia. .,Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Estonia.
| | - Amanda J. Lea
- grid.152326.10000 0001 2264 7217Department of Biological Sciences, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37240 USA ,Child and Brain Development Program, Canadian Institute for Advanced Study, Toronto, Canada
| |
Collapse
|
48
|
Kim S, Wysocka J. Deciphering the multi-scale, quantitative cis-regulatory code. Mol Cell 2023; 83:373-392. [PMID: 36693380 PMCID: PMC9898153 DOI: 10.1016/j.molcel.2022.12.032] [Citation(s) in RCA: 69] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 12/29/2022] [Accepted: 12/30/2022] [Indexed: 01/24/2023]
Abstract
Uncovering the cis-regulatory code that governs when and how much each gene is transcribed in a given genome and cellular state remains a central goal of biology. Here, we discuss major layers of regulation that influence how transcriptional outputs are encoded by DNA sequence and cellular context. We first discuss how transcription factors bind specific DNA sequences in a dosage-dependent and cooperative manner and then proceed to the cofactors that facilitate transcription factor function and mediate the activity of modular cis-regulatory elements such as enhancers, silencers, and promoters. We then consider the complex and poorly understood interplay of these diverse elements within regulatory landscapes and its relationships with chromatin states and nuclear organization. We propose that a mechanistically informed, quantitative model of transcriptional regulation that integrates these multiple regulatory layers will be the key to ultimately cracking the cis-regulatory code.
Collapse
Affiliation(s)
- Seungsoo Kim
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Joanna Wysocka
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
49
|
Zhao S, Hong CKY, Myers CA, Granas DM, White MA, Corbo JC, Cohen BA. A single-cell massively parallel reporter assay detects cell-type-specific gene regulation. Nat Genet 2023; 55:346-354. [PMID: 36635387 PMCID: PMC9931678 DOI: 10.1038/s41588-022-01278-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 12/05/2022] [Indexed: 01/14/2023]
Abstract
Massively parallel reporter gene assays are key tools in regulatory genomics but cannot be used to identify cell-type-specific regulatory elements without performing assays serially across different cell types. To address this problem, we developed a single-cell massively parallel reporter assay (scMPRA) to measure the activity of libraries of cis-regulatory sequences (CRSs) across multiple cell types simultaneously. We assayed a library of core promoters in a mixture of HEK293 and K562 cells and showed that scMPRA is a reproducible, highly parallel, single-cell reporter gene assay that detects cell-type-specific cis-regulatory activity. We then measured a library of promoter variants across multiple cell types in live mouse retinas and showed that subtle genetic variants can produce cell-type-specific effects on cis-regulatory activity. We anticipate that scMPRA will be widely applicable for studying the role of CRSs across diverse cell types.
Collapse
Affiliation(s)
- Siqi Zhao
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- Ginkgo Bioworks, Boston, MA, USA
| | - Clarice K Y Hong
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Connie A Myers
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - David M Granas
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Michael A White
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Joseph C Corbo
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - Barak A Cohen
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
50
|
Mouri K, Dewey HB, Castro R, Berenzy D, Kales S, Tewhey R. Whole-genome functional characterization of RE1 silencers using a modified massively parallel reporter assay. CELL GENOMICS 2023; 3:100234. [PMID: 36777181 PMCID: PMC9903721 DOI: 10.1016/j.xgen.2022.100234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 09/12/2022] [Accepted: 11/23/2022] [Indexed: 12/23/2022]
Abstract
Both upregulation and downregulation by cis-regulatory elements help modulate precise gene expression. However, our understanding of repressive elements is far more limited than activating elements. To address this gap, we characterized RE1, a group of transcriptional silencers bound by REST, at genome-wide scale using a modified massively parallel reporter assay (MPRAduo). MPRAduo empirically defined a minimal binding strength of REST (REST motif-intrinsic value [m-value]), above which cofactors colocalize and silence transcription. We identified 1,500 human variants that alter RE1 silencing and found that their effect sizes are predictable when they overlap with REST-binding sites above the m-value. Additionally, we demonstrate that non-canonical REST-binding motifs exhibit silencer function only if they precisely align half sites with specific spacer lengths. Our results show mechanistic insights into RE1, which allow us to predict its activity and effect of variants on RE1, providing a paradigm for performing genome-wide functional characterization of transcription-factor-binding sites.
Collapse
Affiliation(s)
| | | | | | | | - Susan Kales
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
- Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA, USA
| |
Collapse
|