1
|
Maritato R, Medugno A, D'Andretta E, De Riso G, Lupo M, Botta S, Marrocco E, Renda M, Sofia M, Mussolino C, Bacci ML, Surace EM. A DNA base-specific sequence interposed between CRX and NRL contributes to RHODOPSIN expression. Sci Rep 2024; 14:26313. [PMID: 39487168 PMCID: PMC11530525 DOI: 10.1038/s41598-024-76664-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 10/15/2024] [Indexed: 11/04/2024] Open
Abstract
Gene expression emerges from DNA sequences through the interaction of transcription factors (TFs) with DNA cis-regulatory sequences. In eukaryotes, TFs bind to transcription factor binding sites (TFBSs) with differential affinities, enabling cell-specific gene expression. In this view, DNA enables TF binding along a continuum ranging from low to high affinity depending on its sequence composition; however, it is not known whether evolution has entailed a further level of entanglement between DNA-protein interaction. Here we found that the composition and length (22 bp) of the DNA sequence interposed between the CRX and NRL retinal TFs in the proximal promoter of RHODOPSIN (RHO) largely controls the expression levels of RHO. Mutagenesis of CRX-NRL DNA linking sequences (here termed "DNA-linker") results in uncorrelated gene expression variation. In contrast, mutual exchange of naturally occurring divergent human and mouse Rho cis-regulatory elements conferred similar yet species-specific Rho expression levels. Two orthogonal DNA-binding proteins targeted to the DNA-linker either activate or repress the expression of Rho depending on the DNA-linker orientation relative to the CRX and NRL binding sites. These results argue that, in this instance, DNA itself contributes to CRX and NRL activities through a code based on specific base sequences of a defined length, ultimately determining optimal RHO expression levels.
Collapse
Affiliation(s)
- Rosa Maritato
- Department of Translational Medicine, University of Naples Federico II, Naples, Italy
| | - Alessia Medugno
- Department of Translational Medicine, University of Naples Federico II, Naples, Italy
| | - Emanuela D'Andretta
- Department of Translational Medicine, University of Naples Federico II, Naples, Italy
| | - Giulia De Riso
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Naples, Italy
- AOU Federico II, Naples, Italy
| | - Mariangela Lupo
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli, Italy
| | - Salvatore Botta
- Department of Translational Medical Science, University of Campania Luigi Vanvitelli, Naples, Italy
| | - Elena Marrocco
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli, Italy
| | - Mario Renda
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli, Italy
| | - Martina Sofia
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli, Italy
| | | | - Maria Laura Bacci
- Department of Veterinary Medical Sciences, University of Bologna, Bologna, Italy
| | - Enrico Maria Surace
- Department of Translational Medicine, University of Naples Federico II, Naples, Italy.
| |
Collapse
|
2
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
3
|
Bruner WS, Grant SFA. Translation of genome-wide association study: from genomic signals to biological insights. Front Genet 2024; 15:1375481. [PMID: 39421299 PMCID: PMC11484060 DOI: 10.3389/fgene.2024.1375481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 09/24/2024] [Indexed: 10/19/2024] Open
Abstract
Since the turn of the 21st century, genome-wide association study (GWAS) have successfully identified genetic signals associated with a myriad of common complex traits and diseases. As we transition from establishing robust genetic associations with diverse phenotypes, the central challenge is now focused on characterizing the underlying functional mechanisms driving these signals. Previous GWAS efforts have revealed multiple variants, each conferring relatively subtle susceptibility, collectively contributing to the pathogenesis of various common diseases. Such variants can further exhibit associations with multiple other traits and differ across ancestries, plus disentangling causal variants from non-causal due to linkage disequilibrium complexities can lead to challenges in drawing direct biological conclusions. Combined with cellular context considerations, such challenges can reduce the capacity to definitively elucidate the biological significance of GWAS signals, limiting the potential to define mechanistic insights. This review will detail current and anticipated approaches for functional interpretation of GWAS signals, both in terms of characterizing the underlying causal variants and the corresponding effector genes.
Collapse
Affiliation(s)
- Winter S. Bruner
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
| | - Struan F. A. Grant
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Division of Endocrinology and Diabetes, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
| |
Collapse
|
4
|
Petersen RM, Vockley CM, Lea AJ. Uncovering methylation-dependent genetic effects on regulatory element function in diverse genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.23.609412. [PMID: 39229133 PMCID: PMC11370585 DOI: 10.1101/2024.08.23.609412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
A major goal in evolutionary biology and biomedicine is to understand the complex interactions between genetic variants, the epigenome, and gene expression. However, the causal relationships between these factors remain poorly understood. mSTARR-seq, a methylation-sensitive massively parallel reporter assay, is capable of identifying methylation-dependent regulatory activity at many thousands of genomic regions simultaneously, and allows for the testing of causal relationships between DNA methylation and gene expression on a region-by-region basis. Here, we developed a multiplexed mSTARR-seq protocol to assay naturally occurring human genetic variation from 25 individuals sampled from 10 localities in Europe and Africa. We identified 6,957 regulatory elements in either the unmethylated or methylated state, and this set was enriched for enhancer and promoter annotations, as expected. The expression of 58% of these regulatory elements was modulated by methylation, which was generally associated with decreased RNA expression. Within our set of regulatory elements, we used allele-specific expression analyses to identify 8,020 sites with genetic effects on gene regulation; further, we found that 42.3% of these genetic effects varied between methylated and unmethylated states. Sites exhibiting methylation-dependent genetic effects were enriched for GWAS and EWAS annotations, implicating them in human disease. Compared to datasets that assay DNA from a single European individual, our multiplexed assay uncovers dramatically more genetic effects and methylation-dependent genetic effects, highlighting the importance of including diverse individuals in assays which aim to understand gene regulatory processes.
Collapse
|
5
|
Xu L, Liu Y. Identification, Design, and Application of Noncoding Cis-Regulatory Elements. Biomolecules 2024; 14:945. [PMID: 39199333 PMCID: PMC11352686 DOI: 10.3390/biom14080945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 07/25/2024] [Accepted: 07/30/2024] [Indexed: 09/01/2024] Open
Abstract
Cis-regulatory elements (CREs) play a pivotal role in orchestrating interactions with trans-regulatory factors such as transcription factors, RNA-binding proteins, and noncoding RNAs. These interactions are fundamental to the molecular architecture underpinning complex and diverse biological functions in living organisms, facilitating a myriad of sophisticated and dynamic processes. The rapid advancement in the identification and characterization of these regulatory elements has been marked by initiatives such as the Encyclopedia of DNA Elements (ENCODE) project, which represents a significant milestone in the field. Concurrently, the development of CRE detection technologies, exemplified by massively parallel reporter assays, has progressed at an impressive pace, providing powerful tools for CRE discovery. The exponential growth of multimodal functional genomic data has necessitated the application of advanced analytical methods. Deep learning algorithms, particularly large language models, have emerged as invaluable tools for deconstructing the intricate nucleotide sequences governing CRE function. These advancements facilitate precise predictions of CRE activity and enable the de novo design of CREs. A deeper understanding of CRE operational dynamics is crucial for harnessing their versatile regulatory properties. Such insights are instrumental in refining gene therapy techniques, enhancing the efficacy of selective breeding programs, pushing the boundaries of genetic innovation, and opening new possibilities in microbial synthetic biology.
Collapse
Affiliation(s)
- Lingna Xu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China;
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China;
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Foshan 528226, China
| |
Collapse
|
6
|
Padigepati SR, Stafford DA, Tan CA, Silvis MR, Jamieson K, Keyser A, Nunez PAC, Nicoludis JM, Manders T, Fresard L, Kobayashi Y, Araya CL, Aradhya S, Johnson B, Nykamp K, Reuter JA. Scalable approaches for generating, validating and incorporating data from high-throughput functional assays to improve clinical variant classification. Hum Genet 2024; 143:995-1004. [PMID: 39085601 PMCID: PMC11303574 DOI: 10.1007/s00439-024-02691-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 07/12/2024] [Indexed: 08/02/2024]
Abstract
As the adoption and scope of genetic testing continue to expand, interpreting the clinical significance of DNA sequence variants at scale remains a formidable challenge, with a high proportion classified as variants of uncertain significance (VUSs). Genetic testing laboratories have historically relied, in part, on functional data from academic literature to support variant classification. High-throughput functional assays or multiplex assays of variant effect (MAVEs), designed to assess the effects of DNA variants on protein stability and function, represent an important and increasingly available source of evidence for variant classification, but their potential is just beginning to be realized in clinical lab settings. Here, we describe a framework for generating, validating and incorporating data from MAVEs into a semi-quantitative variant classification method applied to clinical genetic testing. Using single-cell gene expression measurements, cellular evidence models were built to assess the effects of DNA variation in 44 genes of clinical interest. This framework was also applied to models for an additional 22 genes with previously published MAVE datasets. In total, modeling data was incorporated from 24 genes into our variant classification method. These data contributed evidence for classifying 4043 observed variants in over 57,000 individuals. Genetic testing laboratories are uniquely positioned to generate, analyze, validate, and incorporate evidence from high-throughput functional data and ultimately enable the use of these data to provide definitive clinical variant classifications for more patients.
Collapse
Affiliation(s)
| | | | | | - Melanie R Silvis
- Invitae Corporation, San Francisco, CA, 94103, USA
- Epic Bio, South San Francisco, CA, 94080, USA
| | - Kirsty Jamieson
- Invitae Corporation, San Francisco, CA, 94103, USA
- Epic Bio, South San Francisco, CA, 94080, USA
| | - Andrew Keyser
- Invitae Corporation, San Francisco, CA, 94103, USA
- Calico Life Sciences, South San Francisco, CA, 94080, USA
| | | | - John M Nicoludis
- Invitae Corporation, San Francisco, CA, 94103, USA
- Department of Structural Biology, Genentech, South San Francisco, CA, 94080, USA
| | - Toby Manders
- Invitae Corporation, San Francisco, CA, 94103, USA
| | | | | | - Carlos L Araya
- Invitae Corporation, San Francisco, CA, 94103, USA
- Tapanti.org, Santa Barbara, CA, 93108, USA
| | | | - Britt Johnson
- Invitae Corporation, San Francisco, CA, 94103, USA
- GeneDx, Stamford, CT, 06902, USA
| | - Keith Nykamp
- Invitae Corporation, San Francisco, CA, 94103, USA.
| | | |
Collapse
|
7
|
Gordon MG, Kathail P, Choy B, Kim MC, Mazumder T, Gearing M, Ye CJ. Population Diversity at the Single-Cell Level. Annu Rev Genomics Hum Genet 2024; 25:27-49. [PMID: 38382493 DOI: 10.1146/annurev-genom-021623-083207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Population-scale single-cell genomics is a transformative approach for unraveling the intricate links between genetic and cellular variation. This approach is facilitated by cutting-edge experimental methodologies, including the development of high-throughput single-cell multiomics and advances in multiplexed environmental and genetic perturbations. Examining the effects of natural or synthetic genetic variants across cellular contexts provides insights into the mutual influence of genetics and the environment in shaping cellular heterogeneity. The development of computational methodologies further enables detailed quantitative analysis of molecular variation, offering an opportunity to examine the respective roles of stochastic, intercellular, and interindividual variation. Future opportunities lie in leveraging long-read sequencing, refining disease-relevant cellular models, and embracing predictive and generative machine learning models. These advancements hold the potential for a deeper understanding of the genetic architecture of human molecular traits, which in turn has important implications for understanding the genetic causes of human disease.
Collapse
Affiliation(s)
| | - Pooja Kathail
- Center for Computational Biology, University of California, Berkeley, California, USA
| | - Bryson Choy
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
| | - Min Cheol Kim
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
| | - Thomas Mazumder
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
| | - Melissa Gearing
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
| | - Chun Jimmie Ye
- Arc Institute, Palo Alto, California, USA
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
- Bakar Computational Health Sciences Institute, Gladstone-UCSF Institute of Genomic Immunology, Parker Institute for Cancer Immunotherapy, Department of Epidemiology and Biostatistics, Department of Microbiology and Immunology, and Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA;
| |
Collapse
|
8
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.593772. [PMID: 38798671 PMCID: PMC11118547 DOI: 10.1101/2024.05.12.593772] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Justin B. Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
9
|
Shepherdson JL, Friedman RZ, Zheng Y, Sun C, Oh IY, Granas DM, Cohen BA, Chen S, White MA. Pathogenic variants in CRX have distinct cis-regulatory effects on enhancers and silencers in photoreceptors. Genome Res 2024; 34:243-255. [PMID: 38355306 PMCID: PMC10984388 DOI: 10.1101/gr.278133.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 02/01/2024] [Indexed: 02/16/2024]
Abstract
Dozens of variants in the gene for the homeodomain transcription factor (TF) cone-rod homeobox (CRX) are linked with human blinding diseases that vary in their severity and age of onset. How different variants in this single TF alter its function in ways that lead to a range of phenotypes is unclear. We characterized the effects of human disease-causing variants on CRX cis-regulatory function by deploying massively parallel reporter assays (MPRAs) in mouse retina explants carrying knock-ins of two variants, one in the DNA-binding domain (p.R90W) and the other in the transcriptional effector domain (p.E168d2). The degree of reporter gene dysregulation in these mutant Crx retinas corresponds with their phenotypic severity. The two variants affect similar sets of enhancers, and p.E168d2 has distinct effects on silencers. Cis-regulatory elements (CREs) near cone photoreceptor genes are enriched for silencers that are derepressed in the presence of p.E168d2. Chromatin environments of CRX-bound loci are partially predictive of episomal MPRA activity, and distal elements whose accessibility increases later in retinal development are enriched for CREs with silencer activity. We identified a set of potentially pleiotropic regulatory elements that convert from silencers to enhancers in retinas that lack a functional CRX effector domain. Our findings show that phenotypically distinct variants in different domains of CRX have partially overlapping effects on its cis-regulatory function, leading to misregulation of similar sets of enhancers while having a qualitatively different impact on silencers.
Collapse
Affiliation(s)
- James L Shepherdson
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Ryan Z Friedman
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Yiqiao Zheng
- Department of Ophthalmology and Visual Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Chi Sun
- Department of Ophthalmology and Visual Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Inez Y Oh
- Department of Ophthalmology and Visual Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - David M Granas
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Barak A Cohen
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Shiming Chen
- Department of Ophthalmology and Visual Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA;
- Department of Developmental Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| | - Michael A White
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA;
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
| |
Collapse
|
10
|
Hurabielle C, LaFlam TN, Gearing M, Ye CJ. Functional genomics in inborn errors of immunity. Immunol Rev 2024; 322:53-70. [PMID: 38329267 PMCID: PMC10950534 DOI: 10.1111/imr.13309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Inborn errors of immunity (IEI) comprise a diverse spectrum of 485 disorders as recognized by the International Union of Immunological Societies Committee on Inborn Error of Immunity in 2022. While IEI are monogenic by definition, they illuminate various pathways involved in the pathogenesis of polygenic immune dysregulation as in autoimmune or autoinflammatory syndromes, or in more common infectious diseases that may not have a significant genetic basis. Rapid improvement in genomic technologies has been the main driver of the accelerated rate of discovery of IEI and has led to the development of innovative treatment strategies. In this review, we will explore various facets of IEI, delving into the distinctions between PIDD and PIRD. We will examine how Mendelian inheritance patterns contribute to these disorders and discuss advancements in functional genomics that aid in characterizing new IEI. Additionally, we will explore how emerging genomic tools help to characterize new IEI as well as how they are paving the way for innovative treatment approaches for managing and potentially curing these complex immune conditions.
Collapse
Affiliation(s)
- Charlotte Hurabielle
- Division of Rheumatology, Department of Medicine, UCSF, San Francisco, California, USA
| | - Taylor N LaFlam
- Division of Pediatric Rheumatology, Department of Pediatrics, UCSF, San Francisco, California, USA
| | - Melissa Gearing
- Division of Rheumatology, Department of Medicine, UCSF, San Francisco, California, USA
| | - Chun Jimmie Ye
- Institute for Human Genetics, UCSF, San Francisco, California, USA
- Institute of Computational Health Sciences, UCSF, San Francisco, California, USA
- Gladstone Genomic Immunology Institute, San Francisco, California, USA
- Parker Institute for Cancer Immunotherapy, UCSF, San Francisco, California, USA
- Department of Epidemiology and Biostatistics, UCSF, San Francisco, California, USA
- Department of Microbiology and Immunology, UCSF, San Francisco, California, USA
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, California, USA
- Arc Institute, Palo Alto, California, USA
| |
Collapse
|
11
|
Kang CK, Kim AR. Deep molecular learning of transcriptional control of a synthetic CRE enhancer and its variants. iScience 2024; 27:108747. [PMID: 38222110 PMCID: PMC10784702 DOI: 10.1016/j.isci.2023.108747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/29/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024] Open
Abstract
Massively parallel reporter assay measures transcriptional activities of various cis-regulatory modules (CRMs) in a single experiment. We developed a thermodynamic computational model framework that calculates quantitative levels of gene expression directly from regulatory DNA sequences. Using the framework, we investigated the molecular mechanisms of cis-regulatory mutations of a synthetic enhancer that cause abnormal gene expression. We found that, in a human cell line, competitive binding between family transcription factors (TFs) with slightly different binding preferences significantly increases the accuracy of recapitulating the transcriptional effects of thousands of single- or multi-mutations. We also discovered that even if various harmful mutations occurred in an activator binding site, CRM could stably maintain or even increase gene expression through a certain form of competitive binding between family TFs. These findings enhance understanding the effect of SNPs and indels on CRMs and would help building robust custom-designed CRMs for biologics production and gene therapy.
Collapse
Affiliation(s)
- Chan-Koo Kang
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- School of Applied Artificial Intelligence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| |
Collapse
|
12
|
Loell KJ, Friedman RZ, Myers CA, Corbo JC, Cohen BA, White MA. Transcription factor interactions explain the context-dependent activity of CRX binding sites. PLoS Comput Biol 2024; 20:e1011802. [PMID: 38227575 PMCID: PMC10817189 DOI: 10.1371/journal.pcbi.1011802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 01/26/2024] [Accepted: 01/06/2024] [Indexed: 01/18/2024] Open
Abstract
The effects of transcription factor binding sites (TFBSs) on the activity of a cis-regulatory element (CRE) depend on the local sequence context. In rod photoreceptors, binding sites for the transcription factor (TF) Cone-rod homeobox (CRX) occur in both enhancers and silencers, but the sequence context that determines whether CRX binding sites contribute to activation or repression of transcription is not understood. To investigate the context-dependent activity of CRX sites, we fit neural network-based models to the activities of synthetic CREs composed of photoreceptor TFBSs. The models revealed that CRX binding sites consistently make positive, independent contributions to CRE activity, while negative homotypic interactions between sites cause CREs composed of multiple CRX sites to function as silencers. The effects of negative homotypic interactions can be overcome by the presence of other TFBSs that either interact cooperatively with CRX sites or make independent positive contributions to activity. The context-dependent activity of CRX sites is thus determined by the balance between positive heterotypic interactions, independent contributions of TFBSs, and negative homotypic interactions. Our findings explain observed patterns of activity among genomic CRX-bound enhancers and silencers, and suggest that enhancers may require diverse TFBSs to overcome negative homotypic interactions between TFBSs.
Collapse
Affiliation(s)
- Kaiser J. Loell
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Ryan Z. Friedman
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Connie A. Myers
- Department of Pathology and Immunology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Joseph C. Corbo
- Department of Pathology and Immunology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Barak A. Cohen
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Michael A. White
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| |
Collapse
|
13
|
Martyn GE, Montgomery MT, Jones H, Guo K, Doughty BR, Linder J, Chen Z, Cochran K, Lawrence KA, Munson G, Pampari A, Fulco CP, Kelley DR, Lander ES, Kundaje A, Engreitz JM. Rewriting regulatory DNA to dissect and reprogram gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.20.572268. [PMID: 38187584 PMCID: PMC10769263 DOI: 10.1101/2023.12.20.572268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Regulatory DNA sequences within enhancers and promoters bind transcription factors to encode cell type-specific patterns of gene expression. However, the regulatory effects and programmability of such DNA sequences remain difficult to map or predict because we have lacked scalable methods to precisely edit regulatory DNA and quantify the effects in an endogenous genomic context. Here we present an approach to measure the quantitative effects of hundreds of designed DNA sequence variants on gene expression, by combining pooled CRISPR prime editing with RNA fluorescence in situ hybridization and cell sorting (Variant-FlowFISH). We apply this method to mutagenize and rewrite regulatory DNA sequences in an enhancer and the promoter of PPIF in two immune cell lines. Of 672 variant-cell type pairs, we identify 497 that affect PPIF expression. These variants appear to act through a variety of mechanisms including disruption or optimization of existing transcription factor binding sites, as well as creation of de novo sites. Disrupting a single endogenous transcription factor binding site often led to large changes in expression (up to -40% in the enhancer, and -50% in the promoter). The same variant often had different effects across cell types and states, demonstrating a highly tunable regulatory landscape. We use these data to benchmark performance of sequence-based predictive models of gene regulation, and find that certain types of variants are not accurately predicted by existing models. Finally, we computationally design 185 small sequence variants (≤10 bp) and optimize them for specific effects on expression in silico. 84% of these rationally designed edits showed the intended direction of effect, and some had dramatic effects on expression (-100% to +202%). Variant-FlowFISH thus provides a powerful tool to map the effects of variants and transcription factor binding sites on gene expression, test and improve computational models of gene regulation, and reprogram regulatory DNA.
Collapse
Affiliation(s)
- Gabriella E Martyn
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
| | - Michael T Montgomery
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
| | - Hank Jones
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
| | - Katherine Guo
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
| | - Benjamin R Doughty
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Ziwei Chen
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Kelly Cochran
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Kathryn A Lawrence
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Glen Munson
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Anusri Pampari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Charles P Fulco
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Present Address: Sanofi, Cambridge, MA, USA
| | | | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jesse M Engreitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanford Cardiovascular Institute, Stanford University, Stanford, CA, USA
| |
Collapse
|
14
|
Shepherdson JL, Friedman RZ, Zheng Y, Sun C, Oh IY, Granas DM, Cohen BA, Chen S, White MA. Pathogenic variants in Crx have distinct cis-regulatory effects on enhancers and silencers in photoreceptors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.27.542576. [PMID: 37292699 PMCID: PMC10245955 DOI: 10.1101/2023.05.27.542576] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Dozens of variants in the photoreceptor-specific transcription factor (TF) CRX are linked with human blinding diseases that vary in their severity and age of onset. It is unclear how different variants in this single TF alter its function in ways that lead to a range of phenotypes. We examined the effects of human disease-causing variants on CRX cis-regulatory function by deploying massively parallel reporter assays (MPRAs) in live mouse retinas carrying knock-ins of two variants, one in the DNA binding domain (p.R90W) and the other in the transcriptional effector domain (p.E168d2). The degree of reporter gene dysregulation caused by the variants corresponds with their phenotypic severity. The two variants affect similar sets of enhancers, while p.E168d2 has stronger effects on silencers. Cis-regulatory elements (CREs) near cone photoreceptor genes are enriched for silencers that are de-repressed in the presence of p.E168d2. Chromatin environments of CRX-bound loci were partially predictive of episomal MPRA activity, and silencers were notably enriched among distal elements whose accessibility increases later in retinal development. We identified a set of potentially pleiotropic regulatory elements that convert from silencers to enhancers in retinas that lack a functional CRX effector domain. Our findings show that phenotypically distinct variants in different domains of CRX have partially overlapping effects on its cis-regulatory function, leading to misregulation of similar sets of enhancers, while having a qualitatively different impact on silencers.
Collapse
Affiliation(s)
- James L. Shepherdson
- Department of Genetics
- Edison Family Center for Genome Sciences & Systems Biology
| | - Ryan Z. Friedman
- Department of Genetics
- Edison Family Center for Genome Sciences & Systems Biology
| | | | - Chi Sun
- Department of Ophthalmology and Visual Sciences
| | - Inez Y. Oh
- Department of Ophthalmology and Visual Sciences
| | - David M. Granas
- Department of Genetics
- Edison Family Center for Genome Sciences & Systems Biology
| | - Barak A. Cohen
- Department of Genetics
- Edison Family Center for Genome Sciences & Systems Biology
| | - Shiming Chen
- Department of Ophthalmology and Visual Sciences
- Department of Developmental Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63110
| | - Michael A. White
- Department of Genetics
- Edison Family Center for Genome Sciences & Systems Biology
| |
Collapse
|
15
|
Zheng Y, Sun C, Zhang X, Ruzycki PA, Chen S. Missense mutations in CRX homeodomain cause dominant retinopathies through two distinct mechanisms. eLife 2023; 12:RP87147. [PMID: 37963072 PMCID: PMC10645426 DOI: 10.7554/elife.87147] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023] Open
Abstract
Homeodomain transcription factors (HD TFs) are instrumental to vertebrate development. Mutations in HD TFs have been linked to human diseases, but their pathogenic mechanisms remain elusive. Here, we use Cone-Rod Homeobox (CRX) as a model to decipher the disease-causing mechanisms of two HD mutations, p.E80A and p.K88N, that produce severe dominant retinopathies. Through integrated analysis of molecular and functional evidence in vitro and in knock-in mouse models, we uncover two novel gain-of-function mechanisms: p.E80A increases CRX-mediated transactivation of canonical CRX target genes in developing photoreceptors; p.K88N alters CRX DNA-binding specificity resulting in binding at ectopic sites and severe perturbation of CRX target gene expression. Both mechanisms produce novel retinal morphological defects and hinder photoreceptor maturation distinct from loss-of-function models. This study reveals the distinct roles of E80 and K88 residues in CRX HD regulatory functions and emphasizes the importance of transcriptional precision in normal development.
Collapse
Affiliation(s)
- Yiqiao Zheng
- Molecular Genetic and Genomics Graduate Program, Division of Biological and Biomedical Sciences, Washington University in St LouisSaint LouisUnited States
- Department of Ophthalmology and Visual Sciences, Washington University in St LouisSaint LouisUnited States
| | - Chi Sun
- Molecular Genetic and Genomics Graduate Program, Division of Biological and Biomedical Sciences, Washington University in St LouisSaint LouisUnited States
- Department of Ophthalmology and Visual Sciences, Washington University in St LouisSaint LouisUnited States
| | - Xiaodong Zhang
- Department of Ophthalmology and Visual Sciences, Washington University in St LouisSaint LouisUnited States
| | - Philip A Ruzycki
- Department of Ophthalmology and Visual Sciences, Washington University in St LouisSaint LouisUnited States
- Department of Genetics, Washington University in St LouisSaint LouisUnited States
| | - Shiming Chen
- Molecular Genetic and Genomics Graduate Program, Division of Biological and Biomedical Sciences, Washington University in St LouisSaint LouisUnited States
- Department of Ophthalmology and Visual Sciences, Washington University in St LouisSaint LouisUnited States
- Department of Developmental Biology, Washington University in St LouisSaint LouisUnited States
| |
Collapse
|
16
|
Mulet-Lazaro R, Delwel R. From Genotype to Phenotype: How Enhancers Control Gene Expression and Cell Identity in Hematopoiesis. Hemasphere 2023; 7:e969. [PMID: 37953829 PMCID: PMC10635615 DOI: 10.1097/hs9.0000000000000969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/11/2023] [Indexed: 11/14/2023] Open
Abstract
Blood comprises a wide array of specialized cells, all of which share the same genetic information and ultimately derive from the same precursor, the hematopoietic stem cell (HSC). This diversity of phenotypes is underpinned by unique transcriptional programs gradually acquired in the process known as hematopoiesis. Spatiotemporal regulation of gene expression depends on many factors, but critical among them are enhancers-sequences of DNA that bind transcription factors and increase transcription of genes under their control. Thus, hematopoiesis involves the activation of specific enhancer repertoires in HSCs and their progeny, driving the expression of sets of genes that collectively determine morphology and function. Disruption of this tightly regulated process can have catastrophic consequences: in hematopoietic malignancies, dysregulation of transcriptional control by enhancers leads to misexpression of oncogenes that ultimately drive transformation. This review attempts to provide a basic understanding of enhancers and their role in transcriptional regulation, with a focus on normal and malignant hematopoiesis. We present examples of enhancers controlling master regulators of hematopoiesis and discuss the main mechanisms leading to enhancer dysregulation in leukemia and lymphoma.
Collapse
Affiliation(s)
- Roger Mulet-Lazaro
- Department of Hematology, Erasmus MC Cancer Institute, Rotterdam, the Netherlands
- Oncode Institute, Utrecht, the Netherlands
| | - Ruud Delwel
- Department of Hematology, Erasmus MC Cancer Institute, Rotterdam, the Netherlands
- Oncode Institute, Utrecht, the Netherlands
| |
Collapse
|
17
|
Aradhya S, Facio FM, Metz H, Manders T, Colavin A, Kobayashi Y, Nykamp K, Johnson B, Nussbaum RL. Applications of artificial intelligence in clinical laboratory genomics. AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2023; 193:e32057. [PMID: 37507620 DOI: 10.1002/ajmg.c.32057] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023]
Abstract
The transition from analog to digital technologies in clinical laboratory genomics is ushering in an era of "big data" in ways that will exceed human capacity to rapidly and reproducibly analyze those data using conventional approaches. Accurately evaluating complex molecular data to facilitate timely diagnosis and management of genomic disorders will require supportive artificial intelligence methods. These are already being introduced into clinical laboratory genomics to identify variants in DNA sequencing data, predict the effects of DNA variants on protein structure and function to inform clinical interpretation of pathogenicity, link phenotype ontologies to genetic variants identified through exome or genome sequencing to help clinicians reach diagnostic answers faster, correlate genomic data with tumor staging and treatment approaches, utilize natural language processing to identify critical published medical literature during analysis of genomic data, and use interactive chatbots to identify individuals who qualify for genetic testing or to provide pre-test and post-test education. With careful and ethical development and validation of artificial intelligence for clinical laboratory genomics, these advances are expected to significantly enhance the abilities of geneticists to translate complex data into clearly synthesized information for clinicians to use in managing the care of their patients at scale.
Collapse
Affiliation(s)
- Swaroop Aradhya
- Invitae Corporation, San Francisco, California, USA
- Adjunct Clinical Faculty, Department of Pathology, Stanford University School of Medicine, Stanford, California, USA
| | | | - Hillery Metz
- Invitae Corporation, San Francisco, California, USA
| | - Toby Manders
- Invitae Corporation, San Francisco, California, USA
| | | | | | - Keith Nykamp
- Invitae Corporation, San Francisco, California, USA
| | | | - Robert L Nussbaum
- Invitae Corporation, San Francisco, California, USA
- Volunteer Faculty, School of Medicine, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
18
|
Guzman C, Duttke S, Zhu Y, De Arruda Saldanha C, Downes N, Benner C, Heinz S. Combining TSS-MPRA and sensitive TSS profile dissimilarity scoring to study the sequence determinants of transcription initiation. Nucleic Acids Res 2023; 51:e80. [PMID: 37403796 PMCID: PMC10450201 DOI: 10.1093/nar/gkad562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/13/2023] [Accepted: 06/20/2023] [Indexed: 07/06/2023] Open
Abstract
Cis-regulatory elements (CREs) can be classified by the shapes of their transcription start site (TSS) profiles, which are indicative of distinct regulatory mechanisms. Massively parallel reporter assays (MPRAs) are increasingly being used to study CRE regulatory mechanisms, yet the degree to which MPRAs replicate individual endogenous TSS profiles has not been determined. Here, we present a new low-input MPRA protocol (TSS-MPRA) that enables measuring TSS profiles of episomal reporters as well as after lentiviral reporter chromatinization. To sensitively compare MPRA and endogenous TSS profiles, we developed a novel dissimilarity scoring algorithm (WIP score) that outperforms the frequently used earth mover's distance on experimental data. Using TSS-MPRA and WIP scoring on 500 unique reporter inserts, we found that short (153 bp) MPRA promoter inserts replicate the endogenous TSS patterns of ∼60% of promoters. Lentiviral reporter chromatinization did not improve fidelity of TSS-MPRA initiation patterns, and increasing insert size frequently led to activation of extraneous TSS in the MPRA that are not active in vivo. We discuss the implications of our findings, which highlight important caveats when using MPRAs to study transcription mechanisms. Finally, we illustrate how TSS-MPRA and WIP scoring can provide novel insights into the impact of transcription factor motif mutations and genetic variants on TSS patterns and transcription levels.
Collapse
Affiliation(s)
- Carlos Guzman
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
- Department of Bioengineering, Graduate Program in Bioinformatics & Systems Biology, U.C. San Diego, La Jolla, CA 92093, USA
| | - Sascha Duttke
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Yixin Zhu
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Camila De Arruda Saldanha
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Nicholas L Downes
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Christopher Benner
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Sven Heinz
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| |
Collapse
|
19
|
Friedman RZ, Ramu A, Lichtarge S, Myers CA, Granas DM, Gause M, Corbo JC, Cohen BA, White MA. Active learning of enhancer and silencer regulatory grammar in photoreceptors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.21.554146. [PMID: 37662358 PMCID: PMC10473580 DOI: 10.1101/2023.08.21.554146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Cis-regulatory elements (CREs) direct gene expression in health and disease, and models that can accurately predict their activities from DNA sequences are crucial for biomedicine. Deep learning represents one emerging strategy to model the regulatory grammar that relates CRE sequence to function. However, these models require training data on a scale that exceeds the number of CREs in the genome. We address this problem using active machine learning to iteratively train models on multiple rounds of synthetic DNA sequences assayed in live mammalian retinas. During each round of training the model actively selects sequence perturbations to assay, thereby efficiently generating informative training data. We iteratively trained a model that predicts the activities of sequences containing binding motifs for the photoreceptor transcription factor Cone-rod homeobox (CRX) using an order of magnitude less training data than current approaches. The model's internal confidence estimates of its predictions are reliable guides for designing sequences with high activity. The model correctly identified critical sequence differences between active and inactive sequences with nearly identical transcription factor binding sites, and revealed order and spacing preferences for combinations of motifs. Our results establish active learning as an effective method to train accurate deep learning models of cis-regulatory function after exhausting naturally occurring training examples in the genome.
Collapse
Affiliation(s)
- Ryan Z. Friedman
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Avinash Ramu
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Sara Lichtarge
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Connie A. Myers
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, 63110
| | - David M. Granas
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Maria Gause
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Joseph C. Corbo
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Barak A. Cohen
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Michael A. White
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| |
Collapse
|
20
|
Snyman M, Xu S. The effects of mutations on gene expression and alternative splicing. Proc Biol Sci 2023; 290:20230565. [PMID: 37403507 PMCID: PMC10320348 DOI: 10.1098/rspb.2023.0565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 06/12/2023] [Indexed: 07/06/2023] Open
Abstract
Understanding the relationship between mutations and their genomic and phenotypic consequences has been a longstanding goal of evolutionary biology. However, few studies have investigated the impact of mutations on gene expression and alternative splicing on the genome-wide scale. In this study, we aim to bridge this knowledge gap by utilizing whole-genome sequencing data and RNA sequencing data from 16 obligately parthenogenetic Daphnia mutant lines to investigate the effects of ethyl methanesulfonate-induced mutations on gene expression and alternative splicing. Using rigorous analyses of mutations, expression changes and alternative splicing, we show that trans-effects are the major contributor to the variance in gene expression and alternative splicing between the wild-type and mutant lines, whereas cis mutations only affected a limited number of genes and do not always alter gene expression. Moreover, we show that there is a significant association between differentially expressed genes and exonic mutations, indicating that exonic mutations are an important driver of altered gene expression.
Collapse
Affiliation(s)
- Marelize Snyman
- Department of Biology, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Sen Xu
- Department of Biology, University of Texas at Arlington, Arlington, TX 76019, USA
| |
Collapse
|
21
|
Zheng Y, Sun C, Zhang X, Ruzycki PA, Chen S. Missense mutations in CRX homeodomain cause dominant retinopathies through two distinct mechanisms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.01.526652. [PMID: 36778408 PMCID: PMC9915647 DOI: 10.1101/2023.02.01.526652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Homeodomain transcription factors (HD TFs) are instrumental to vertebrate development. Mutations in HD TFs have been linked to human diseases, but their pathogenic mechanisms remain elusive. Here we use Cone-Rod Homeobox (CRX) as a model to decipher the disease-causing mechanisms of two HD mutations, p.E80A and p.K88N, that produce severe dominant retinopathies. Through integrated analysis of molecular and functional evidence in vitro and in knock-in mouse models, we uncover two novel gain-of-function mechanisms: p.E80A increases CRX-mediated transactivation of canonical CRX target genes in developing photoreceptors; p.K88N alters CRX DNA-binding specificity resulting in binding at ectopic sites and severe perturbation of CRX target gene expression. Both mechanisms produce novel retinal morphological defects and hinder photoreceptor maturation distinct from loss-of-function models. This study reveals the distinct roles of E80 and K88 residues in CRX HD regulatory functions and emphasizes the importance of transcriptional precision in normal development.
Collapse
Affiliation(s)
- Yiqiao Zheng
- Molecular Genetic and Genomics Graduate Program, Division of Biological and Biomedical Sciences, Washington University in St Louis, Saint Louis, Missouri, USA
- Department of Ophthalmology and Visual Sciences, Washington University in St Louis, Saint Louis, Missouri, USA
| | - Chi Sun
- Department of Ophthalmology and Visual Sciences, Washington University in St Louis, Saint Louis, Missouri, USA
| | - Xiaodong Zhang
- Department of Ophthalmology and Visual Sciences, Washington University in St Louis, Saint Louis, Missouri, USA
| | - Philip A. Ruzycki
- Department of Ophthalmology and Visual Sciences, Washington University in St Louis, Saint Louis, Missouri, USA
- Department of Genetics, Washington University in St Louis, Saint Louis, Missouri, USA
| | - Shiming Chen
- Molecular Genetic and Genomics Graduate Program, Division of Biological and Biomedical Sciences, Washington University in St Louis, Saint Louis, Missouri, USA
- Department of Ophthalmology and Visual Sciences, Washington University in St Louis, Saint Louis, Missouri, USA
- Department of Developmental Biology, Washington University in St Louis, Saint Louis, Missouri, USA
| |
Collapse
|
22
|
Li XC, Fuqua T, van Breugel ME, Crocker J. Mutational scans reveal differential evolvability of Drosophila promoters and enhancers. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220054. [PMID: 37004721 PMCID: PMC10067265 DOI: 10.1098/rstb.2022.0054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023] Open
Abstract
Rapid enhancer and slow promoter evolution have been demonstrated through comparative genomics. However, it is not clear how this information is encoded genetically and if this can be used to place evolution in a predictive context. Part of the challenge is that our understanding of the potential for regulatory evolution is biased primarily toward natural variation or limited experimental perturbations. Here, to explore the evolutionary capacity of promoter variation, we surveyed an unbiased mutation library for three promoters in Drosophila melanogaster. We found that mutations in promoters had limited to no effect on spatial patterns of gene expression. Compared to developmental enhancers, promoters are more robust to mutations and have more access to mutations that can increase gene expression, suggesting that their low activity might be a result of selection. Consistent with these observations, increasing the promoter activity at the endogenous locus of shavenbaby led to increased transcription yet limited phenotypic changes. Taken together, developmental promoters may encode robust transcriptional outputs allowing evolvability through the integration of diverse developmental enhancers. This article is part of the theme issue ‘Interdisciplinary approaches to predicting evolutionary biology’.
Collapse
Affiliation(s)
- Xueying C. Li
- European Molecular Biology Laboratory, Heidelberg, Baden-Württemberg 69117, Germany
| | - Timothy Fuqua
- European Molecular Biology Laboratory, Heidelberg, Baden-Württemberg 69117, Germany
| | | | - Justin Crocker
- European Molecular Biology Laboratory, Heidelberg, Baden-Württemberg 69117, Germany
| |
Collapse
|
23
|
Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023; 24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Enhancers are genomic DNA elements controlling spatiotemporal gene expression. Their flexible organization and functional redundancies make deciphering their sequence-function relationships challenging. This article provides an overview of the current understanding of enhancer organization and evolution, with an emphasis on factors that influence these relationships. Technological advancements, particularly in machine learning and synthetic biology, are discussed in light of how they provide new ways to understand this complexity. Exciting opportunities lie ahead as we continue to unravel the intricacies of enhancer function.
Collapse
Affiliation(s)
- Gabrielle D Smith
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Wan Hern Ching
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
| | - Paola Cornejo-Páramo
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Emily S Wong
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia.
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia.
| |
Collapse
|
24
|
Ren N, Dai S, Ma S, Yang F. Strategies for activity analysis of single nucleotide polymorphisms associated with human diseases. Clin Genet 2023; 103:392-400. [PMID: 36527336 DOI: 10.1111/cge.14282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/10/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
Genome-wide association studies (GWAS) have identified a large number of single nucleotide polymorphism (SNP) sites associated with human diseases. In the annotation of human diseases, especially cancers, SNPs, as an important component of genetic factors, have gained increasing attention. Given that most of the SNPs are located in non-coding regions, the functional verification of these SNPs is a great challenge. The key to functional annotation for risk SNPs is to screen SNPs with regulatory activity from thousands of disease associated-SNPs. In this review, we systematically recapitulate the characteristics and functional roles of SNP sites, discuss three parallel reporter screening strategies in detail based on barcode tag classification, and recommend the common in silico strategies to help supplement the annotation of SNP sites with epigenetic activity analysis, prediction of target genes and trans-acting factors. We hope that this review will contribute to this exuberant research field by providing robust activity analysis strategies that can facilitate the translation of GWAS results into personalized diagnosis and prevention measures for human diseases.
Collapse
Affiliation(s)
- Naixia Ren
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shangkun Dai
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shumin Ma
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
| | - Fengtang Yang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| |
Collapse
|
25
|
Zheng Y, VanDusen NJ. Massively Parallel Reporter Assays for High-Throughput In Vivo Analysis of Cis-Regulatory Elements. J Cardiovasc Dev Dis 2023; 10:jcdd10040144. [PMID: 37103023 PMCID: PMC10146671 DOI: 10.3390/jcdd10040144] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/24/2023] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
The rapid improvement of descriptive genomic technologies has fueled a dramatic increase in hypothesized connections between cardiovascular gene expression and phenotypes. However, in vivo testing of these hypotheses has predominantly been relegated to slow, expensive, and linear generation of genetically modified mice. In the study of genomic cis-regulatory elements, generation of mice featuring transgenic reporters or cis-regulatory element knockout remains the standard approach. While the data obtained is of high quality, the approach is insufficient to keep pace with candidate identification and therefore results in biases introduced during the selection of candidates for validation. However, recent advances across a range of disciplines are converging to enable functional genomic assays that can be conducted in a high-throughput manner. Here, we review one such method, massively parallel reporter assays (MPRAs), in which the activities of thousands of candidate genomic regulatory elements are simultaneously assessed via the next-generation sequencing of a barcoded reporter transcript. We discuss best practices for MPRA design and use, with a focus on practical considerations, and review how this emerging technology has been successfully deployed in vivo. Finally, we discuss how MPRAs are likely to evolve and be used in future cardiovascular research.
Collapse
|
26
|
Gallego Romero I, Lea AJ. Leveraging massively parallel reporter assays for evolutionary questions. Genome Biol 2023; 24:26. [PMID: 36788564 PMCID: PMC9926830 DOI: 10.1186/s13059-023-02856-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 01/17/2023] [Indexed: 02/16/2023] Open
Abstract
A long-standing goal of evolutionary biology is to decode how gene regulation contributes to organismal diversity. Doing so is challenging because it is hard to predict function from non-coding sequence and to perform molecular research with non-model taxa. Massively parallel reporter assays (MPRAs) enable the testing of thousands to millions of sequences for regulatory activity simultaneously. Here, we discuss the execution, advantages, and limitations of MPRAs, with a focus on evolutionary questions. We propose solutions for extending MPRAs to rare taxa and those with limited genomic resources, and we underscore MPRA's broad potential for driving genome-scale, functional studies across organisms.
Collapse
Affiliation(s)
- Irene Gallego Romero
- Melbourne Integrative Genomics, University of Melbourne, Royal Parade, Parkville, Victoria, 3010, Australia. .,School of BioSciences, The University of Melbourne, Royal Parade, Parkville, 3010, Australia. .,The Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, 30 Royal Parade, Parkville, Victoria, 3010, Australia. .,Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Estonia.
| | - Amanda J. Lea
- grid.152326.10000 0001 2264 7217Department of Biological Sciences, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37240 USA ,Child and Brain Development Program, Canadian Institute for Advanced Study, Toronto, Canada
| |
Collapse
|
27
|
Zhao S, Hong CKY, Myers CA, Granas DM, White MA, Corbo JC, Cohen BA. A single-cell massively parallel reporter assay detects cell-type-specific gene regulation. Nat Genet 2023; 55:346-354. [PMID: 36635387 PMCID: PMC9931678 DOI: 10.1038/s41588-022-01278-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 12/05/2022] [Indexed: 01/14/2023]
Abstract
Massively parallel reporter gene assays are key tools in regulatory genomics but cannot be used to identify cell-type-specific regulatory elements without performing assays serially across different cell types. To address this problem, we developed a single-cell massively parallel reporter assay (scMPRA) to measure the activity of libraries of cis-regulatory sequences (CRSs) across multiple cell types simultaneously. We assayed a library of core promoters in a mixture of HEK293 and K562 cells and showed that scMPRA is a reproducible, highly parallel, single-cell reporter gene assay that detects cell-type-specific cis-regulatory activity. We then measured a library of promoter variants across multiple cell types in live mouse retinas and showed that subtle genetic variants can produce cell-type-specific effects on cis-regulatory activity. We anticipate that scMPRA will be widely applicable for studying the role of CRSs across diverse cell types.
Collapse
Affiliation(s)
- Siqi Zhao
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- Ginkgo Bioworks, Boston, MA, USA
| | - Clarice K Y Hong
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Connie A Myers
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - David M Granas
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Michael A White
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Joseph C Corbo
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - Barak A Cohen
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
28
|
Аpplication of massive parallel reporter analysis in biotechnology and medicine. КЛИНИЧЕСКАЯ ПРАКТИКА 2023. [DOI: 10.17816/clinpract115063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The development and functioning of an organism relies on tissue-specific gene programs. Genome regulatory elements play a key role in the regulation of such programs, and disruptions in their function can lead to the development of various pathologies, including cancers, malformations and autoimmune diseases. The emergence of high-throughput genomic studies has led to massively parallel reporter analysis (MPRA) methods, which allow the functional verification and identification of regulatory elements on a genome-wide scale. Initially MPRA was used as a tool to investigate fundamental aspects of epigenetics, but the approach also has great potential for clinical and practical biotechnology. Currently, MPRA is used for validation of clinically significant mutations, identification of tissue-specific regulatory elements, search for the most promising loci for transgene integration, and is an indispensable tool for creating highly efficient expression systems, the range of application of which extends from approaches for protein development and design of next-generation therapeutic antibody superproducers to gene therapy. In this review, the main principles and areas of practical application of high-throughput reporter assays will be discussed.
Collapse
|
29
|
Du AY, Zhuo X, Sundaram V, Jensen NO, Chaudhari HG, Saccone NL, Cohen BA, Wang T. Functional characterization of enhancer activity during a long terminal repeat's evolution. Genome Res 2022; 32:1840-1851. [PMID: 36192170 PMCID: PMC9712623 DOI: 10.1101/gr.276863.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 08/23/2022] [Indexed: 11/24/2022]
Abstract
Many transposable elements (TEs) contain transcription factor binding sites and are implicated as potential regulatory elements. However, TEs are rarely functionally tested for regulatory activity, which in turn limits our understanding of how TE regulatory activity has evolved. We systematically tested the human LTR18A subfamily for regulatory activity using massively parallel reporter assay (MPRA) and found AP-1- and CEBP-related binding motifs as drivers of enhancer activity. Functional analysis of evolutionarily reconstructed ancestral sequences revealed that LTR18A elements have generally lost regulatory activity over time through sequence changes, with the largest effects occurring owing to mutations in the AP-1 and CEBP motifs. We observed that the two motifs are conserved at higher rates than expected based on neutral evolution. Finally, we identified LTR18A elements as potential enhancers in the human genome, primarily in epithelial cells. Together, our results provide a model for the origin, evolution, and co-option of TE-derived regulatory elements.
Collapse
Affiliation(s)
- Alan Y Du
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Xiaoyu Zhuo
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Vasavi Sundaram
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Nicholas O Jensen
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Hemangi G Chaudhari
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Nancy L Saccone
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Barak A Cohen
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| |
Collapse
|
30
|
Boytsov A, Abramov S, Aiusheeva AZ, Kasianova A, Baulin E, Kuznetsov I, Aulchenko Y, Kolmykov S, Yevshin I, Kolpakov F, Vorontsov I, Makeev V, Kulakovskiy I. ANANASTRA: annotation and enrichment analysis of allele-specific transcription factor binding at SNPs. Nucleic Acids Res 2022; 50:W51-W56. [PMID: 35446421 PMCID: PMC9252736 DOI: 10.1093/nar/gkac262] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 03/15/2022] [Accepted: 04/04/2022] [Indexed: 11/12/2022] Open
Abstract
We present ANANASTRA, https://ananastra.autosome.org, a web server for the identification and annotation of regulatory single-nucleotide polymorphisms (SNPs) with allele-specific binding events. ANANASTRA accepts a list of dbSNP IDs or a VCF file and reports allele-specific binding (ASB) sites of particular transcription factors or in specific cell types, highlighting those with ASBs significantly enriched at SNPs in the query list. ANANASTRA is built on top of a systematic analysis of allelic imbalance in ChIP-Seq experiments and performs the ASB enrichment test against background sets of SNPs found in the same source experiments as ASB sites but not displaying significant allelic imbalance. We illustrate ANANASTRA usage with selected case studies and expect that ANANASTRA will help to conduct the follow-up of GWAS in terms of establishing functional hypotheses and designing experimental verification.
Collapse
Affiliation(s)
- Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, 141701, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, 420008, Russia
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, 141701, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, 420008, Russia
| | - Ariuna Z Aiusheeva
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, 142290, Russia
| | - Alexandra M Kasianova
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, 142290, Russia
- Southern Federal University, Rostov-on-Don, 344006, Russia
| | - Eugene Baulin
- Moscow Institute of Physics and Technology, Dolgoprudny, 141701, Russia
- Institute of Mathematical Problems of Biology RAS - the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, 142290, Russia
| | - Ivan A Kuznetsov
- Skolkovo Institute of Science and Technology, Moscow, 121205, Russia
| | - Yurii S Aulchenko
- Institute of Cytology and Genetics SB RAS, Novosibirsk, 630090, Russia
- PolyKnomics BV, ’s-Hertogenbosch, 5237 PA, Netherlands
| | - Semyon Kolmykov
- Sirius University of Science and Technology, Sochi, 354340, Russia
- Biosoft.Ru LLC, Novosibirsk, 630090, Russia
| | - Ivan Yevshin
- Sirius University of Science and Technology, Sochi, 354340, Russia
- Biosoft.Ru LLC, Novosibirsk, 630090, Russia
| | - Fedor Kolpakov
- Sirius University of Science and Technology, Sochi, 354340, Russia
- Federal Research Center for Information and Computational Technologies, Novosibirsk, 630090, Russia
| | - Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, 142290, Russia
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, 141701, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, 420008, Russia
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, 420008, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, 142290, Russia
| |
Collapse
|
31
|
Perkins ML, Gandara L, Crocker J. A synthetic synthesis to explore animal evolution and development. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200517. [PMID: 35634925 PMCID: PMC9149795 DOI: 10.1098/rstb.2020.0517] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Identifying the general principles by which genotypes are converted into phenotypes remains a challenge in the post-genomic era. We still lack a predictive understanding of how genes shape interactions among cells and tissues in response to signalling and environmental cues, and hence how regulatory networks generate the phenotypic variation required for adaptive evolution. Here, we discuss how techniques borrowed from synthetic biology may facilitate a systematic exploration of evolvability across biological scales. Synthetic approaches permit controlled manipulation of both endogenous and fully engineered systems, providing a flexible platform for investigating causal mechanisms in vivo. Combining synthetic approaches with multi-level phenotyping (phenomics) will supply a detailed, quantitative characterization of how internal and external stimuli shape the morphology and behaviour of living organisms. We advocate integrating high-throughput experimental data with mathematical and computational techniques from a variety of disciplines in order to pursue a comprehensive theory of evolution. This article is part of the theme issue ‘Genetic basis of adaptation and speciation: from loci to causative mutations’.
Collapse
Affiliation(s)
- Mindy Liu Perkins
- Developmental Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Lautaro Gandara
- Developmental Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Justin Crocker
- Developmental Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| |
Collapse
|
32
|
Spielmann M, Kircher M. Computational and experimental methods for classifying variants of unknown clinical significance. Cold Spring Harb Mol Case Stud 2022; 8:mcs.a006196. [PMID: 35483875 PMCID: PMC9059783 DOI: 10.1101/mcs.a006196] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
The increase in sequencing capacity, reduction in costs, and national and international coordinated efforts have led to the widespread introduction of next-generation sequencing (NGS) technologies in patient care. More generally, human genetics and genomic medicine are gaining importance for more and more patients. Some communities are already discussing the prospect of sequencing each individual's genome at time of birth. Together with digital health records, this shall enable individualized treatments and preventive measures, so-called precision medicine. A central step in this process is the identification of disease causal mutations or variant combinations that make us more susceptible for diseases. Although various technological advances have improved the identification of genetic alterations, the interpretation and ranking of the identified variants remains a major challenge. Based on our knowledge of molecular processes or previously identified disease variants, we can identify potentially functional genetic variants and, using different lines of evidence, we are sometimes able to demonstrate their pathogenicity directly. However, the vast majority of variants are classified as variants of uncertain clinical significance (VUSs) with not enough experimental evidence to determine their pathogenicity. In these cases, computational methods may be used to improve the prioritization and an increasing toolbox of experimental methods is emerging that can be used to assay the molecular effects of VUSs. Here, we discuss how computational and experimental methods can be used to create catalogs of variant effects for a variety of molecular and cellular phenotypes. We discuss the prospects of integrating large-scale functional data with machine learning and clinical knowledge for the development of accurate pathogenicity predictions for clinical applications.
Collapse
Affiliation(s)
- Malte Spielmann
- Institute of Human Genetics, University of Lübeck, 23562 Lübeck, Germany;,Institute of Human Genetics, Christian-Albrechts-Universität, 24105 Kiel, Germany;,Human Molecular Genomics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany;,DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Lübeck/Kiel, 23562 Lübeck, Germany
| | - Martin Kircher
- Institute of Human Genetics, University of Lübeck, 23562 Lübeck, Germany;,Berlin Institute of Health at Charité—Universitätsmedizin Berlin, 10117 Berlin, Germany;,DZHK (German Centre for Cardiovascular Research), partner site Berlin, 10115 Berlin, Germany
| |
Collapse
|
33
|
VandenBosch LS, Luu K, Timms AE, Challam S, Wu Y, Lee AY, Cherry TJ. Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements. Transl Vis Sci Technol 2022; 11:16. [PMID: 35435921 PMCID: PMC9034719 DOI: 10.1167/tvst.11.4.16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 03/25/2022] [Indexed: 11/24/2022] Open
Abstract
Purpose Prior studies have demonstrated the significance of specific cis-regulatory variants in retinal disease; however, determining the functional impact of regulatory variants remains a major challenge. In this study, we utilized a machine learning approach, trained on epigenomic data from the adult human retina, to systematically quantify the predicted impact of cis-regulatory variants. Methods We used human retinal DNA accessibility data (ATAC-seq) to determine a set of 18.9k high-confidence, putative cis-regulatory elements. Eighty percent of these elements were used to train a machine learning model utilizing a gapped k-mer support vector machine-based approach. In silico saturation mutagenesis and variant scoring was applied to predict the functional impact of all potential single nucleotide variants within cis-regulatory elements. Impact scores were tested in a 20% hold-out dataset and compared to allele population frequency, phylogenetic conservation, transcription factor (TF) binding motifs, and existing massively parallel reporter assay data. Results We generated a model that distinguishes between human retinal regulatory elements and negative test sequences with 95% accuracy. Among a hold-out test set of 3.7k human retinal CREs, all possible single nucleotide variants were scored. Variants with negative impact scores correlated with higher phylogenetic conservation of the reference allele, disruption of predicted TF binding motifs, and massively parallel reporter expression. Conclusions We demonstrated the utility of human retinal epigenomic data to train a machine learning model for the purpose of predicting the impact of non-coding regulatory sequence variants. Our model accurately scored sequences and predicted putative transcription factor binding motifs. This approach has the potential to expedite the characterization of pathogenic non-coding sequence variants in the context of unexplained retinal disease. Translational Relevance This workflow and resulting dataset serve as a promising genomic tool to facilitate the clinical prioritization of functionally disruptive non-coding mutations in the retina.
Collapse
Affiliation(s)
- Leah S. VandenBosch
- Center for Developmental Biology and Regenerative Medicine, Seattle Children's Research Institute, Seattle, WA, USA
| | - Kelsey Luu
- Center for Developmental Biology and Regenerative Medicine, Seattle Children's Research Institute, Seattle, WA, USA
| | - Andrew E. Timms
- Center for Developmental Biology and Regenerative Medicine, Seattle Children's Research Institute, Seattle, WA, USA
| | - Shriya Challam
- Center for Developmental Biology and Regenerative Medicine, Seattle Children's Research Institute, Seattle, WA, USA
| | - Yue Wu
- University of Washington Department of Ophthalmology, Seattle, WA, USA
| | - Aaron Y. Lee
- University of Washington Department of Ophthalmology, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Timothy J. Cherry
- Center for Developmental Biology and Regenerative Medicine, Seattle Children's Research Institute, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- University of Washington Department of Pediatrics, Seattle, WA, USA
| |
Collapse
|
34
|
Abell NS, DeGorter MK, Gloudemans MJ, Greenwald E, Smith KS, He Z, Montgomery SB. Multiple causal variants underlie genetic associations in humans. Science 2022; 375:1247-1254. [PMID: 35298243 PMCID: PMC9725108 DOI: 10.1126/science.abj5117] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Associations between genetic variation and traits are often in noncoding regions with strong linkage disequilibrium (LD), where a single causal variant is assumed to underlie the association. We applied a massively parallel reporter assay (MPRA) to functionally evaluate genetic variants in high, local LD for independent cis-expression quantitative trait loci (eQTL). We found that 17.7% of eQTLs exhibit more than one major allelic effect in tight LD. The detected regulatory variants were highly and specifically enriched for activating chromatin structures and allelic transcription factor binding. Integration of MPRA profiles with eQTL/complex trait colocalizations across 114 human traits and diseases identified causal variant sets demonstrating how genetic association signals can manifest through multiple, tightly linked causal variants.
Collapse
Affiliation(s)
- Nathan S. Abell
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Marianne K. DeGorter
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| | | | - Emily Greenwald
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Kevin S. Smith
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Stephen B. Montgomery
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
35
|
The evolution, evolvability and engineering of gene regulatory DNA. Nature 2022; 603:455-463. [PMID: 35264797 DOI: 10.1038/s41586-022-04506-6] [Citation(s) in RCA: 106] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 02/02/2022] [Indexed: 11/08/2022]
Abstract
Mutations in non-coding regulatory DNA sequences can alter gene expression, organismal phenotype and fitness1-3. Constructing complete fitness landscapes, in which DNA sequences are mapped to fitness, is a long-standing goal in biology, but has remained elusive because it is challenging to generalize reliably to vast sequence spaces4-6. Here we build sequence-to-expression models that capture fitness landscapes and use them to decipher principles of regulatory evolution. Using millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Saccharomyces cerevisiae, we learn deep neural network models that generalize with excellent prediction performance, and enable sequence design for expression engineering. Using our models, we study expression divergence under genetic drift and strong-selection weak-mutation regimes to find that regulatory evolution is rapid and subject to diminishing returns epistasis; that conflicting expression objectives in different environments constrain expression adaptation; and that stabilizing selection on gene expression leads to the moderation of regulatory complexity. We present an approach for using such models to detect signatures of selection on expression from natural variation in regulatory sequences and use it to discover an instance of convergent regulatory evolution. We assess mutational robustness, finding that regulatory mutation effect sizes follow a power law, characterize regulatory evolvability, visualize promoter fitness landscapes, discover evolvability archetypes and illustrate the mutational robustness of natural regulatory sequence populations. Our work provides a general framework for designing regulatory sequences and addressing fundamental questions in regulatory evolution.
Collapse
|
36
|
Waters CT, Gisselbrecht SS, Sytnikova YA, Cafarelli TM, Hill DE, Bulyk ML. Quantitative-enhancer-FACS-seq (QeFS) reveals epistatic interactions among motifs within transcriptional enhancers in developing Drosophila tissue. Genome Biol 2021; 22:348. [PMID: 34930411 PMCID: PMC8686523 DOI: 10.1186/s13059-021-02574-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 12/10/2021] [Indexed: 11/16/2022] Open
Abstract
Understanding the contributions of transcription factor DNA binding sites to transcriptional enhancers is a significant challenge. We developed Quantitative enhancer-FACS-Seq for highly parallel quantification of enhancer activities from a genomically integrated reporter in Drosophila melanogaster embryos. We investigate the contributions of the DNA binding motifs of four poorly characterized TFs to the activities of twelve embryonic mesodermal enhancers. We measure quantitative changes in enhancer activity and discover a range of epistatic interactions among the motifs, both synergistic and alleviating. We find that understanding the regulatory consequences of TF binding motifs requires that they be investigated in combination across enhancer contexts.
Collapse
Affiliation(s)
- Colin T Waters
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, 02138, USA
| | - Stephen S Gisselbrecht
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Yuliya A Sytnikova
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Tiziana M Cafarelli
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - David E Hill
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA.
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, 02138, USA.
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
37
|
Romanov SE, Kalashnikova DA, Laktionov PP. Methods of massive parallel reporter assays for investigation of enhancers. Vavilovskii Zhurnal Genet Selektsii 2021; 25:344-355. [PMID: 34901731 PMCID: PMC8627875 DOI: 10.18699/vj21.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 03/28/2021] [Accepted: 03/28/2021] [Indexed: 11/19/2022] Open
Abstract
The correct deployment of genetic programs for development and differentiation relies on finely coordinated regulation of specific gene sets. Genomic regulatory elements play an exceptional role in this process. There are few types of gene regulatory elements, including promoters, enhancers, insulators and silencers. Alterations of gene regulatory elements may cause various pathologies, including cancer, congenital disorders and autoimmune diseases. The development of high-throughput genomic assays has made it possible to significantly accelerate the accumulation of information about the characteristic epigenetic properties of regulatory elements. In combination with high-throughput studies focused on the genome-wide distribution of epigenetic marks, regulatory proteins and the spatial structure of chromatin, this significantly expands the understanding of the principles of epigenetic regulation of genes and allows potential regulatory elements to be searched for in silico. However, common experimental approaches used to study the local characteristics of chromatin have a number of technical limitations that may reduce the reliability of computational identification of genomic regulatory sequences. Taking into account the variability of the functions of epigenetic determinants and complex multicomponent regulation of genomic elements activity, their functional verification is often required. A plethora of methods have been developed to study the functional role of regulatory elements on the genome scale. Common experimental approaches for in silico identification of regulatory elements and their inherent technical limitations will be described. The present review is focused on original high-throughput methods of enhancer activity reporter analysis that are currently used to validate predicted regulatory elements and to perform de novo searches. The methods described allow assessing the functional role of the nucleotide sequence of a regulatory element, to determine its exact boundaries and to assess the influence of the local state of chromatin on the activity of enhancers and gene expression. These approaches have contributed substantially to the understanding of the fundamental principles of gene regulation.
Collapse
Affiliation(s)
- S E Romanov
- Novosibirsk State University, Epigenetics Laboratory, Department of Natural Sciences, Novosibirsk, Russia Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Genomics Laboratory, Novosibirsk, Russia
| | - D A Kalashnikova
- Novosibirsk State University, Epigenetics Laboratory, Department of Natural Sciences, Novosibirsk, Russia Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Genomics Laboratory, Novosibirsk, Russia
| | - P P Laktionov
- Novosibirsk State University, Epigenetics Laboratory, Department of Natural Sciences, Novosibirsk, Russia Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Genomics Laboratory, Novosibirsk, Russia
| |
Collapse
|
38
|
Schweizer G, Wagner A. Both Binding Strength and Evolutionary Accessibility Affect the Population Frequency of Transcription Factor Binding Sequences in Arabidopsis thaliana. Genome Biol Evol 2021; 13:6459646. [PMID: 34894231 PMCID: PMC8712246 DOI: 10.1093/gbe/evab273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2021] [Indexed: 11/22/2022] Open
Abstract
Mutations in DNA sequences that bind transcription factors and thus modulate gene expression are a source of adaptive variation in gene expression. To understand how transcription factor binding sequences evolve in natural populations of the thale cress Arabidopsis thaliana, we integrated genomic polymorphism data for loci bound by transcription factors with in vitro data on binding affinity for these transcription factors. Specifically, we studied 19 different transcription factors, and the allele frequencies of 8,333 genomic loci bound in vivo by these transcription factors in 1,135 A. thaliana accessions. We find that transcription factor binding sequences show very low genetic diversity, suggesting that they are subject to purifying selection. High frequency alleles of such binding sequences tend to bind transcription factors strongly. Conversely, alleles that are absent from the population tend to bind them weakly. In addition, alleles with high frequencies also tend to be the endpoints of many accessible evolutionary paths leading to these alleles. We show that both high affinity and high evolutionary accessibility contribute to high allele frequency for at least some transcription factors. Although binding sequences with stronger affinity are more frequent, we did not find them to be associated with higher gene expression levels. Epistatic interactions among individual mutations that alter binding affinity are pervasive and can help explain variation in accessibility among binding sequences. In summary, combining in vitro binding affinity data with in vivo binding sequence data can help understand the forces that affect the evolution of transcription factor binding sequences in natural populations.
Collapse
Affiliation(s)
- Gabriel Schweizer
- Department of Evolutionary Biology and Environmental Studies, University of Zürich, Switzerland.,Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zürich, Switzerland.,Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland.,Santa Fe Institute, Santa Fe, New Mexico, USA.,Stellenbosch Institute for Advanced Study (STIAS), Wallenberg Research Centre at Stellenbosch University, South Africa
| |
Collapse
|
39
|
Friedman RZ, Granas DM, Myers CA, Corbo JC, Cohen BA, White MA. Information content differentiates enhancers from silencers in mouse photoreceptors. eLife 2021; 10:67403. [PMID: 34486522 PMCID: PMC8492058 DOI: 10.7554/elife.67403] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 09/03/2021] [Indexed: 12/12/2022] Open
Abstract
Enhancers and silencers often depend on the same transcription factors (TFs) and are conflated in genomic assays of TF binding or chromatin state. To identify sequence features that distinguish enhancers and silencers, we assayed massively parallel reporter libraries of genomic sequences targeted by the photoreceptor TF cone-rod homeobox (CRX) in mouse retinas. Both enhancers and silencers contain more TF motifs than inactive sequences, but relative to silencers, enhancers contain motifs from a more diverse collection of TFs. We developed a measure of information content that describes the number and diversity of motifs in a sequence and found that, while both enhancers and silencers depend on CRX motifs, enhancers have higher information content. The ability of information content to distinguish enhancers and silencers targeted by the same TF illustrates how motif context determines the activity of cis-regulatory sequences. Different cell types are established by activating and repressing the activity of specific sets of genes, a process controlled by proteins called transcription factors. Transcription factors work by recognizing and binding short stretches of DNA in parts of the genome called cis-regulatory sequences. A cis-regulatory sequence that increases the activity of a gene when bound by transcription factors is called an enhancer, while a sequence that causes a decrease in gene activity is called a silencer. To establish a cell type, a particular transcription factor will act on both enhancers and silencers that control the activity of different genes. For example, the transcription factor cone-rod homeobox (CRX) is critical for specifying different types of cells in the retina, and it acts on both enhancers and silencers. In rod photoreceptors, CRX activates rod genes by binding their enhancers, while repressing cone photoreceptor genes by binding their silencers. However, CRX always recognizes and binds to the same DNA sequence, known as its binding site, making it unclear why some cis-regulatory sequences bound to CRX act as silencers, while others act as enhancers. Friedman et al. sought to understand how enhancers and silencers, both bound by CRX, can have different effects on the genes they control. Since both enhancers and silencers contain CRX binding sites, the difference between the two must lie in the sequence of the DNA surrounding these binding sites. Using retinas that have been explanted from mice and kept alive in the laboratory, Friedman et al. tested the activity of thousands of CRX-binding sequences from the mouse genome. This showed that both enhancers and silencers have more copies of CRX-binding sites than sequences of the genome that are inactive. Additionally, the results revealed that enhancers have a diverse collection of binding sites for other transcription factors, while silencers do not. Friedman et al. developed a new metric they called information content, which captures the diverse combinations of different transcription binding sites that cis-regulatory sequences can have. Using this metric, Friedman et al. showed that it is possible to distinguish enhancers from silencers based on their information content. It is critical to understand how the DNA sequences of cis-regulatory regions determine their activity, because mutations in these regions of the genome can cause disease. However, since every person has thousands of benign mutations in cis-regulatory sequences, it is a challenge to identify specific disease-causing mutations, which are relatively rare. One long-term goal of models of enhancers and silencers, such as Friedman et al.’s information content model, is to understand how mutations can affect cis-regulatory sequences, and, in some cases, lead to disease.
Collapse
Affiliation(s)
- Ryan Z Friedman
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, United States.,Department of Genetics, Washington University School of Medicine, St. Louis, United States
| | - David M Granas
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, United States.,Department of Genetics, Washington University School of Medicine, St. Louis, United States
| | - Connie A Myers
- Department of Pathology and Immunology, Washington University School of Medicine, St Louis, United States
| | - Joseph C Corbo
- Department of Pathology and Immunology, Washington University School of Medicine, St Louis, United States
| | - Barak A Cohen
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, United States.,Department of Genetics, Washington University School of Medicine, St. Louis, United States
| | - Michael A White
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, United States.,Department of Genetics, Washington University School of Medicine, St. Louis, United States
| |
Collapse
|
40
|
Findley AS, Zhang X, Boye C, Lin YL, Kalita CA, Barreiro L, Lohmueller KE, Pique-Regi R, Luca F. A signature of Neanderthal introgression on molecular mechanisms of environmental responses. PLoS Genet 2021; 17:e1009493. [PMID: 34570765 PMCID: PMC8509894 DOI: 10.1371/journal.pgen.1009493] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 10/12/2021] [Accepted: 08/18/2021] [Indexed: 12/17/2022] Open
Abstract
Ancient human migrations led to the settlement of population groups in varied environmental contexts worldwide. The extent to which adaptation to local environments has shaped human genetic diversity is a longstanding question in human evolution. Recent studies have suggested that introgression of archaic alleles in the genome of modern humans may have contributed to adaptation to environmental pressures such as pathogen exposure. Functional genomic studies have demonstrated that variation in gene expression across individuals and in response to environmental perturbations is a main mechanism underlying complex trait variation. We considered gene expression response to in vitro treatments as a molecular phenotype to identify genes and regulatory variants that may have played an important role in adaptations to local environments. We investigated if Neanderthal introgression in the human genome may contribute to the transcriptional response to environmental perturbations. To this end we used eQTLs for genes differentially expressed in a panel of 52 cellular environments, resulting from 5 cell types and 26 treatments, including hormones, vitamins, drugs, and environmental contaminants. We found that SNPs with introgressed Neanderthal alleles (N-SNPs) disrupt binding of transcription factors important for environmental responses, including ionizing radiation and hypoxia, and for glucose metabolism. We identified an enrichment for N-SNPs among eQTLs for genes differentially expressed in response to 8 treatments, including glucocorticoids, caffeine, and vitamin D. Using Massively Parallel Reporter Assays (MPRA) data, we validated the regulatory function of 21 introgressed Neanderthal variants in the human genome, corresponding to 8 eQTLs regulating 15 genes that respond to environmental perturbations. These findings expand the set of environments where archaic introgression may have contributed to adaptations to local environments in modern humans and provide experimental validation for the regulatory function of introgressed variants.
Collapse
Affiliation(s)
- Anthony S. Findley
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
| | - Xinjun Zhang
- Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, California, United States of America
| | - Carly Boye
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
| | - Yen Lung Lin
- Genetics Section, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Cynthia A. Kalita
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
| | - Luis Barreiro
- Genetics Section, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California, United States of America
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, United States of America
| | - Francesca Luca
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, United States of America
| |
Collapse
|
41
|
Duveau F, Vande Zande P, Metzger BP, Diaz CJ, Walker EA, Tryban S, Siddiq MA, Yang B, Wittkopp PJ. Mutational sources of trans-regulatory variation affecting gene expression in Saccharomyces cerevisiae. eLife 2021; 10:67806. [PMID: 34463616 PMCID: PMC8456550 DOI: 10.7554/elife.67806] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 08/03/2021] [Indexed: 12/15/2022] Open
Abstract
Heritable variation in a gene’s expression arises from mutations impacting cis- and trans-acting components of its regulatory network. Here, we investigate how trans-regulatory mutations are distributed within the genome and within a gene regulatory network by identifying and characterizing 69 mutations with trans-regulatory effects on expression of the same focal gene in Saccharomyces cerevisiae. Relative to 1766 mutations without effects on expression of this focal gene, we found that these trans-regulatory mutations were enriched in coding sequences of transcription factors previously predicted to regulate expression of the focal gene. However, over 90% of the trans-regulatory mutations identified mapped to other types of genes involved in diverse biological processes including chromatin state, metabolism, and signal transduction. These data show how genetic changes in diverse types of genes can impact a gene’s expression in trans, revealing properties of trans-regulatory mutations that provide the raw material for trans-regulatory variation segregating within natural populations.
Collapse
Affiliation(s)
- Fabien Duveau
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, United States.,Laboratory of Biology and Modeling of the Cell, Ecole Normale Supérieure de Lyon, CNRS, Université Claude Bernard Lyon, Université de Lyon, Lyon, France
| | - Petra Vande Zande
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, United States
| | - Brian Ph Metzger
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, United States
| | - Crisandra J Diaz
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, United States
| | - Elizabeth A Walker
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, United States
| | - Stephen Tryban
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, United States
| | - Mohammad A Siddiq
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, United States
| | - Bing Yang
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, United States
| | - Patricia J Wittkopp
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, United States.,Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, United States
| |
Collapse
|
42
|
Sinyakov AN, Ryabinin VA, Kostina EV. Application of Array-Based Oligonucleotides for Synthesis of Genetic Designs. Mol Biol 2021. [DOI: 10.1134/s0026893321030109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
43
|
Lange M, Begolli R, Giakountis A. Non-Coding Variants in Cancer: Mechanistic Insights and Clinical Potential for Personalized Medicine. Noncoding RNA 2021; 7:47. [PMID: 34449663 PMCID: PMC8395730 DOI: 10.3390/ncrna7030047] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 07/26/2021] [Accepted: 08/01/2021] [Indexed: 12/11/2022] Open
Abstract
The cancer genome is characterized by extensive variability, in the form of Single Nucleotide Polymorphisms (SNPs) or structural variations such as Copy Number Alterations (CNAs) across wider genomic areas. At the molecular level, most SNPs and/or CNAs reside in non-coding sequences, ultimately affecting the regulation of oncogenes and/or tumor-suppressors in a cancer-specific manner. Notably, inherited non-coding variants can predispose for cancer decades prior to disease onset. Furthermore, accumulation of additional non-coding driver mutations during progression of the disease, gives rise to genomic instability, acting as the driving force of neoplastic development and malignant evolution. Therefore, detection and characterization of such mutations can improve risk assessment for healthy carriers and expand the diagnostic and therapeutic toolbox for the patient. This review focuses on functional variants that reside in transcribed or not transcribed non-coding regions of the cancer genome and presents a collection of appropriate state-of-the-art methodologies to study them.
Collapse
Affiliation(s)
- Marios Lange
- Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, 41500 Larissa, Greece; (M.L.); (R.B.)
| | - Rodiola Begolli
- Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, 41500 Larissa, Greece; (M.L.); (R.B.)
| | - Antonis Giakountis
- Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, 41500 Larissa, Greece; (M.L.); (R.B.)
- Institute for Fundamental Biomedical Research, B.S.R.C “Alexander Fleming”, 34 Fleming Str., 16672 Vari, Greece
| |
Collapse
|
44
|
Mulvey B, Dougherty JD. Transcriptional-regulatory convergence across functional MDD risk variants identified by massively parallel reporter assays. Transl Psychiatry 2021; 11:403. [PMID: 34294677 PMCID: PMC8298436 DOI: 10.1038/s41398-021-01493-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 06/02/2021] [Accepted: 06/16/2021] [Indexed: 02/07/2023] Open
Abstract
Family and population studies indicate clear heritability of major depressive disorder (MDD), though its underlying biology remains unclear. The majority of single-nucleotide polymorphism (SNP) linkage blocks associated with MDD by genome-wide association studies (GWASes) are believed to alter transcriptional regulators (e.g., enhancers, promoters) based on enrichment of marks correlated with these functions. A key to understanding MDD pathophysiology will be elucidation of which SNPs are functional and how such functional variants biologically converge to elicit the disease. Furthermore, retinoids can elicit MDD in patients and promote depressive-like behaviors in rodent models, acting via a regulatory system of retinoid receptor transcription factors (TFs). We therefore sought to simultaneously identify functional genetic variants and assess retinoid pathway regulation of MDD risk loci. Using Massively Parallel Reporter Assays (MPRAs), we functionally screened over 1000 SNPs prioritized from 39 neuropsychiatric trait/disease GWAS loci, selecting SNPs based on overlap with predicted regulatory features-including expression quantitative trait loci (eQTL) and histone marks-from human brains and cell cultures. We identified >100 SNPs with allelic effects on expression in a retinoid-responsive model system. Functional SNPs were enriched for binding sequences of retinoic acid-receptive transcription factors (TFs), with additional allelic differences unmasked by treatment with all-trans retinoic acid (ATRA). Finally, motifs overrepresented across functional SNPs corresponded to TFs highly specific to serotonergic neurons, suggesting an in vivo site of action. Our application of MPRAs to screen MDD-associated SNPs suggests a shared transcriptional-regulatory program across loci, a component of which is unmasked by retinoids.
Collapse
Affiliation(s)
- Bernard Mulvey
- Departments of Genetics and Psychiatry, Washington University in St. Louis, St. Louis, MO, USA
| | - Joseph D Dougherty
- Departments of Genetics and Psychiatry, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
45
|
Lee D, Kapoor A, Lee C, Mudgett M, Beer MA, Chakravarti A. Sequence-based correction of barcode bias in massively parallel reporter assays. Genome Res 2021; 31:1638-1645. [PMID: 34285053 PMCID: PMC8415370 DOI: 10.1101/gr.268599.120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 07/07/2021] [Indexed: 11/24/2022]
Abstract
Massively parallel reporter assays (MPRAs) are a high-throughput method for evaluating in vitro activities of thousands of candidate cis-regulatory elements (CREs). In these assays, candidate sequences are cloned upstream or downstream from a reporter gene tagged by unique DNA sequences. However, tag sequences may themselves affect reporter gene expression and lead to major potential biases in the measured cis-regulatory activity. Here, we present a sequence-based method for correcting tag-sequence-specific effects and show that our method can significantly reduce this source of variation and improve the identification of functional regulatory variants by MPRAs. We also show that our model captures sequence features associated with post-transcriptional regulation of mRNA. Thus, this new method helps not only to improve detection of regulatory signals in MPRA experiments but also to design better MPRA protocols.
Collapse
Affiliation(s)
| | - Ashish Kapoor
- University of Texas Health Science Center at Houston
| | | | | | | | | |
Collapse
|
46
|
Asma H, Halfon MS. Annotating the Insect Regulatory Genome. INSECTS 2021; 12:591. [PMID: 34209769 PMCID: PMC8305585 DOI: 10.3390/insects12070591] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 06/23/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]
Abstract
An ever-growing number of insect genomes is being sequenced across the evolutionary spectrum. Comprehensive annotation of not only genes but also regulatory regions is critical for reaping the full benefits of this sequencing. Driven by developments in sequencing technologies and in both empirical and computational discovery strategies, the past few decades have witnessed dramatic progress in our ability to identify cis-regulatory modules (CRMs), sequences such as enhancers that play a major role in regulating transcription. Nevertheless, providing a timely and comprehensive regulatory annotation of newly sequenced insect genomes is an ongoing challenge. We review here the methods being used to identify CRMs in both model and non-model insect species, and focus on two tools that we have developed, REDfly and SCRMshaw. These resources can be paired together in a powerful combination to facilitate insect regulatory annotation over a broad range of species, with an accuracy equal to or better than that of other state-of-the-art methods.
Collapse
Affiliation(s)
- Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
| | - Marc S. Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- NY State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, NY 14203, USA
| |
Collapse
|
47
|
Letiagina AE, Omelina ES, Ivankin AV, Pindyurin AV. MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. Front Genet 2021; 12:618189. [PMID: 34046055 PMCID: PMC8148044 DOI: 10.3389/fgene.2021.618189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/25/2021] [Indexed: 11/13/2022] Open
Abstract
Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC-ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC-ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional "mapping" samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.
Collapse
Affiliation(s)
- Anna E Letiagina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Faculty of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Evgeniya S Omelina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anton V Ivankin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Alexey V Pindyurin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
48
|
Roberts BS, Partridge EC, Moyers BA, Agarwal V, Newberry KM, Martin BK, Shendure J, Myers RM, Cooper GM. Genome-wide strand asymmetry in massively parallel reporter activity favors genic strands. Genome Res 2021; 31:866-876. [PMID: 33879525 PMCID: PMC8092006 DOI: 10.1101/gr.270751.120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 02/18/2021] [Indexed: 11/24/2022]
Abstract
Massively parallel reporter assays (MPRAs) are useful tools to characterize regulatory elements in human genomes. An aspect of MPRAs that is not typically the focus of analysis is their intrinsic ability to differentiate activity levels for a given sequence element when placed in both of its possible orientations relative to the reporter construct. Here, we describe pervasive strand asymmetry of MPRA signals in data sets from multiple reporter configurations in both published and newly reported data. These effects are reproducible across different cell types and in different treatments within a cell type and are observed both within and outside of annotated regulatory elements. From elements in gene bodies, MPRA strand asymmetry favors the sense strand, suggesting that function related to endogenous transcription is driving the phenomenon. Similarly, we find that within Alu mobile element insertions, strand asymmetry favors the transcribed strand of the ancestral retrotransposon. The effect is consistent across the multiplicity of Alu elements in human genomes and is more pronounced in less diverged Alu elements. We find sequence features driving MPRA strand asymmetry and show its prediction from sequence alone. We see some evidence for RNA stabilization and transcriptional activation mechanisms and hypothesize that the effect is driven by natural selection favoring efficient transcription. Our results indicate that strand asymmetry is a pervasive and reproducible feature in MPRA data. More importantly, the fact that MPRA asymmetry favors naturally transcribed strands suggests that it stems from preserved biological functions that have a substantial, global impact on gene and genome evolution.
Collapse
Affiliation(s)
- Brian S Roberts
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA.,Department of Biological Sciences, The University of Alabama in Huntsville, Huntsville, Alabama 35899, USA
| | | | - Bryan A Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Vikram Agarwal
- Calico Life Sciences LLC, South San Francisco, California 94080, USA
| | | | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, Seattle, Washington 98195, USA.,Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, Washington 98195, USA
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Gregory M Cooper
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| |
Collapse
|
49
|
Mulvey B, Lagunas T, Dougherty JD. Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants Across Biological Contexts. Biol Psychiatry 2021; 89:76-89. [PMID: 32843144 PMCID: PMC7938388 DOI: 10.1016/j.biopsych.2020.06.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 06/09/2020] [Accepted: 06/10/2020] [Indexed: 12/18/2022]
Abstract
Neuropsychiatric phenotypes have long been known to be influenced by heritable risk factors, directly confirmed by the past decade of genetic studies that have revealed specific genetic variants enriched in disease cohorts. However, the initial hope that a small set of genes would be responsible for a given disorder proved false. The more complex reality is that a given disorder may be influenced by myriad small-effect noncoding variants and/or by rare but severe coding variants, many de novo. Noncoding genomic sequences-for which molecular functions cannot usually be inferred-harbor a large portion of these variants, creating a substantial barrier to understanding higher-order molecular and biological systems of disease. Fortunately, novel genetic technologies-scalable oligonucleotide synthesis, RNA sequencing, and CRISPR (clustered regularly interspaced short palindromic repeats)-have opened novel avenues to experimentally identify biologically significant variants en masse. Massively parallel reporter assays (MPRAs) are an especially versatile technique resulting from such innovations. MPRAs are powerful molecular genetics tools that can be used to screen thousands of untranscribed or untranslated sequences and their variants for functional effects in a single experiment. This approach, though underutilized in psychiatric genetics, has several useful features for the field. We review methods for assaying putatively functional genetic variants and regions, emphasizing MPRAs and the opportunities they hold for dissection of psychiatric polygenicity. We discuss literature applying functional assays in neurogenetics, highlighting strengths, caveats, and design considerations-especially regarding disease-relevant variables (cell type, neurodevelopment, and sex), and we ultimately propose applications of MPRA to both computational and experimental neurogenetics of polygenic disease risk.
Collapse
Affiliation(s)
- Bernard Mulvey
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Tomás Lagunas
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Joseph D Dougherty
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri.
| |
Collapse
|
50
|
Molecular and evolutionary processes generating variation in gene expression. Nat Rev Genet 2020; 22:203-215. [PMID: 33268840 DOI: 10.1038/s41576-020-00304-w] [Citation(s) in RCA: 128] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2020] [Indexed: 12/18/2022]
Abstract
Heritable variation in gene expression is common within and between species. This variation arises from mutations that alter the form or function of molecular gene regulatory networks that are then filtered by natural selection. High-throughput methods for introducing mutations and characterizing their cis- and trans-regulatory effects on gene expression (particularly, transcription) are revealing how different molecular mechanisms generate regulatory variation, and studies comparing these mutational effects with variation seen in the wild are teasing apart the role of neutral and non-neutral evolutionary processes. This integration of molecular and evolutionary biology allows us to understand how the variation in gene expression we see today came to be and to predict how it is most likely to evolve in the future.
Collapse
|