101
|
Abdalla M, Abdalla M. A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules. PLoS Comput Biol 2022; 18:e1010028. [PMID: 35421087 PMCID: PMC9041867 DOI: 10.1371/journal.pcbi.1010028] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 04/26/2022] [Accepted: 03/16/2022] [Indexed: 11/18/2022] Open
Abstract
Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to assist in this challenge. Our computational framework, peaBrain, models the transcriptional machinery of a tissue as a two-stage process: first, predicting the mean tissue specific abundance of all genes and second, incorporating the transcriptomic consequences of genotype variation to predict individual abundance on a subject-by-subject basis. We demonstrate that peaBrain accounts for the majority (>50%) of variance observed in mean transcript abundance across most tissues and outperforms regularized linear models in predicting the consequences of individual genotype variation. We highlight the validity of the peaBrain model by calculating non-coding impact scores that correlate with nucleotide evolutionary constraint that are also predictive of disease-associated variation and allele-specific transcription factor binding. We further show how these tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. We subsequently: (a) derive continuous dense embeddings of genes for downstream applications; (b) highlight the utility of the model in predicting transcriptomic impact of small molecules and shRNA (on par with in vitro experimental replication of external test sets); (c) explore how peaBrain can be used to model difficult-to-study processes (such as neural induction); and (d) identify putatively functional eQTLs that are missed by high-throughput experimental approaches.
Collapse
Affiliation(s)
- Moustafa Abdalla
- Wellcome Trust Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine, University of Oxford, Oxford, United Kingdom
- Computational Statistics and Machine Learning, Department of Statistics, University of Oxford, Oxford, United Kingdom
- Department of Surgery, Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail: (MA); (MA)
| | - Mohamed Abdalla
- Vector Institute for Artificial Intelligence, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
- * E-mail: (MA); (MA)
| |
Collapse
|
102
|
Chen L, Wang Y, Zhao F. Exploiting deep transfer learning for the prediction of functional non-coding variants using genomic sequence. Bioinformatics 2022; 38:3164-3172. [PMID: 35389435 PMCID: PMC9890318 DOI: 10.1093/bioinformatics/btac214] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 03/04/2022] [Accepted: 04/06/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Though genome-wide association studies have identified tens of thousands of variants associated with complex traits and most of them fall within the non-coding regions, they may not be the causal ones. The development of high-throughput functional assays leads to the discovery of experimental validated non-coding functional variants. However, these validated variants are rare due to technical difficulty and financial cost. The small sample size of validated variants makes it less reliable to develop a supervised machine learning model for achieving a whole genome-wide prediction of non-coding causal variants. RESULTS We will exploit a deep transfer learning model, which is based on convolutional neural network, to improve the prediction for functional non-coding variants (NCVs). To address the challenge of small sample size, the transfer learning model leverages both large-scale generic functional NCVs to improve the learning of low-level features and context-specific functional NCVs to learn high-level features toward the context-specific prediction task. By evaluating the deep transfer learning model on three MPRA datasets and 16 GWAS datasets, we demonstrate that the proposed model outperforms deep learning models without pretraining or retraining. In addition, the deep transfer learning model outperforms 18 existing computational methods in both MPRA and GWAS datasets. AVAILABILITY AND IMPLEMENTATION https://github.com/lichen-lab/TLVar. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Li Chen
- To whom correspondence should be addressed.
| | | | - Fengdi Zhao
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
103
|
Spielmann M, Kircher M. Computational and experimental methods for classifying variants of unknown clinical significance. Cold Spring Harb Mol Case Stud 2022; 8:mcs.a006196. [PMID: 35483875 PMCID: PMC9059783 DOI: 10.1101/mcs.a006196] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
The increase in sequencing capacity, reduction in costs, and national and international coordinated efforts have led to the widespread introduction of next-generation sequencing (NGS) technologies in patient care. More generally, human genetics and genomic medicine are gaining importance for more and more patients. Some communities are already discussing the prospect of sequencing each individual's genome at time of birth. Together with digital health records, this shall enable individualized treatments and preventive measures, so-called precision medicine. A central step in this process is the identification of disease causal mutations or variant combinations that make us more susceptible for diseases. Although various technological advances have improved the identification of genetic alterations, the interpretation and ranking of the identified variants remains a major challenge. Based on our knowledge of molecular processes or previously identified disease variants, we can identify potentially functional genetic variants and, using different lines of evidence, we are sometimes able to demonstrate their pathogenicity directly. However, the vast majority of variants are classified as variants of uncertain clinical significance (VUSs) with not enough experimental evidence to determine their pathogenicity. In these cases, computational methods may be used to improve the prioritization and an increasing toolbox of experimental methods is emerging that can be used to assay the molecular effects of VUSs. Here, we discuss how computational and experimental methods can be used to create catalogs of variant effects for a variety of molecular and cellular phenotypes. We discuss the prospects of integrating large-scale functional data with machine learning and clinical knowledge for the development of accurate pathogenicity predictions for clinical applications.
Collapse
Affiliation(s)
- Malte Spielmann
- Institute of Human Genetics, University of Lübeck, 23562 Lübeck, Germany;,Institute of Human Genetics, Christian-Albrechts-Universität, 24105 Kiel, Germany;,Human Molecular Genomics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany;,DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Lübeck/Kiel, 23562 Lübeck, Germany
| | - Martin Kircher
- Institute of Human Genetics, University of Lübeck, 23562 Lübeck, Germany;,Berlin Institute of Health at Charité—Universitätsmedizin Berlin, 10117 Berlin, Germany;,DZHK (German Centre for Cardiovascular Research), partner site Berlin, 10115 Berlin, Germany
| |
Collapse
|
104
|
Abell NS, DeGorter MK, Gloudemans MJ, Greenwald E, Smith KS, He Z, Montgomery SB. Multiple causal variants underlie genetic associations in humans. Science 2022; 375:1247-1254. [PMID: 35298243 PMCID: PMC9725108 DOI: 10.1126/science.abj5117] [Citation(s) in RCA: 65] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Associations between genetic variation and traits are often in noncoding regions with strong linkage disequilibrium (LD), where a single causal variant is assumed to underlie the association. We applied a massively parallel reporter assay (MPRA) to functionally evaluate genetic variants in high, local LD for independent cis-expression quantitative trait loci (eQTL). We found that 17.7% of eQTLs exhibit more than one major allelic effect in tight LD. The detected regulatory variants were highly and specifically enriched for activating chromatin structures and allelic transcription factor binding. Integration of MPRA profiles with eQTL/complex trait colocalizations across 114 human traits and diseases identified causal variant sets demonstrating how genetic association signals can manifest through multiple, tightly linked causal variants.
Collapse
Affiliation(s)
- Nathan S. Abell
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Marianne K. DeGorter
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| | | | - Emily Greenwald
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Kevin S. Smith
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Stephen B. Montgomery
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
105
|
Li X, Yung G, Zhou H, Sun R, Li Z, Hou K, Zhang MJ, Liu Y, Arapoglou T, Wang C, Ionita-Laza I, Lin X. A multi-dimensional integrative scoring framework for predicting functional variants in the human genome. Am J Hum Genet 2022; 109:446-456. [PMID: 35216679 PMCID: PMC8948160 DOI: 10.1016/j.ajhg.2022.01.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 01/26/2022] [Indexed: 12/26/2022] Open
Abstract
Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium.
Collapse
Affiliation(s)
- Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Godwin Yung
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Methods, Collaboration and Outreach Group, Genentech/Roche, South San Francisco, CA 94080, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Kangcheng Hou
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Martin Jinye Zhang
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Yaowu Liu
- School of Statistics, Southwestern University of Finance and Economics, Chengdu, Sichuan, China
| | - Theodore Arapoglou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Chen Wang
- Department of Biostatistics, Columbia University Mailman School of Public Health, New York, NY 10032, USA
| | - Iuliana Ionita-Laza
- Department of Biostatistics, Columbia University Mailman School of Public Health, New York, NY 10032, USA.
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Statistics, Harvard University, Cambridge, MA, 02138, USA.
| |
Collapse
|
106
|
Jimeno-Martín A, Sousa E, Brocal-Ruiz R, Daroqui N, Maicas M, Flames N. Joint actions of diverse transcription factor families establish neuron-type identities and promote enhancer selectivity. Genome Res 2022; 32:459-473. [PMID: 35074859 PMCID: PMC8896470 DOI: 10.1101/gr.275623.121] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 01/19/2022] [Indexed: 11/24/2022]
Abstract
To systematically investigate the complexity of neuron specification regulatory networks, we performed an RNA interference (RNAi) screen against all 875 transcription factors (TFs) encoded in Caenorhabditis elegans genome and searched for defects in nine different neuron types of the monoaminergic (MA) superclass and two cholinergic motoneurons. We identified 91 TF candidates to be required for correct generation of these neuron types, of which 28 were confirmed by mutant analysis. We found that correct reporter expression in each individual neuron type requires at least nine different TFs. Individual neuron types do not usually share TFs involved in their specification but share a common pattern of TFs belonging to the five most common TF families: homeodomain (HD), basic helix loop helix (bHLH), zinc finger (ZF), basic leucine zipper domain (bZIP), and nuclear hormone receptors (NHR). HD TF members are overrepresented, supporting a key role for this family in the establishment of neuronal identities. These five TF families are also prevalent when considering mutant alleles with previously reported neuronal phenotypes in C. elegans, Drosophila, and mouse. In addition, we studied terminal differentiation complexity focusing on the dopaminergic terminal regulatory program. We found two HD TFs (UNC-62 and VAB-3) that work together with known dopaminergic terminal selectors (AST-1, CEH-43, CEH-20). Combined TF binding sites for these five TFs constitute a cis-regulatory signature enriched in the regulatory regions of dopaminergic effector genes. Our results provide new insights on neuron-type regulatory programs in C. elegans that could help better understand neuron specification and evolution of neuron types.
Collapse
Affiliation(s)
- Angela Jimeno-Martín
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, 46010, Spain
| | - Erick Sousa
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, 46010, Spain
| | - Rebeca Brocal-Ruiz
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, 46010, Spain
| | - Noemi Daroqui
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, 46010, Spain
| | - Miren Maicas
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, 46010, Spain
| | - Nuria Flames
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, 46010, Spain
| |
Collapse
|
107
|
Snetkova V, Pennacchio LA, Visel A, Dickel DE. Perfect and imperfect views of ultraconserved sequences. Nat Rev Genet 2022; 23:182-194. [PMID: 34764456 PMCID: PMC8858888 DOI: 10.1038/s41576-021-00424-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/30/2021] [Indexed: 12/12/2022]
Abstract
Across the human genome, there are nearly 500 'ultraconserved' elements: regions of at least 200 contiguous nucleotides that are perfectly conserved in both the mouse and rat genomes. Remarkably, the majority of these sequences are non-coding, and many can function as enhancers that activate tissue-specific gene expression during embryonic development. From their first description more than 15 years ago, their extreme conservation has both fascinated and perplexed researchers in genomics and evolutionary biology. The intrigue around ultraconserved elements only grew with the observation that they are dispensable for viability. Here, we review recent progress towards understanding the general importance and the specific functions of ultraconserved sequences in mammalian development and human disease and discuss possible explanations for their extreme conservation.
Collapse
Affiliation(s)
- Valentina Snetkova
- Environmental Genomics & Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Molecular Biology, Genentech, South San Francisco, CA, USA
| | - Len A Pennacchio
- Environmental Genomics & Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- Comparative Biochemistry Program, University of California, Berkeley, CA, USA.
- US Department of Energy Joint Genome Institute, Berkeley, CA, USA.
| | - Axel Visel
- Environmental Genomics & Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- US Department of Energy Joint Genome Institute, Berkeley, CA, USA.
- School of Natural Sciences, University of California, Merced, Merced, CA, USA.
| | - Diane E Dickel
- Environmental Genomics & Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
108
|
Toropainen A, Stolze LK, Örd T, Whalen MB, Torrell PM, Link VM, Kaikkonen MU, Romanoski CE. Functional noncoding SNPs in human endothelial cells fine-map vascular trait associations. Genome Res 2022; 32:409-424. [PMID: 35193936 PMCID: PMC8896458 DOI: 10.1101/gr.276064.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 01/06/2022] [Indexed: 11/25/2022]
Abstract
Functional consequences of genetic variation in the noncoding human genome are difficult to ascertain despite demonstrated associations to common, complex disease traits. To elucidate properties of functional noncoding SNPs with effects in human endothelial cells (ECs), we utilized our previous molecular quantitative trait locus (molQTL) analysis for transcription factor binding, chromatin accessibility, and H3K27 acetylation to nominate a set of likely functional noncoding SNPs. Together with information from genome-wide association studies (GWASs) for vascular disease traits, we tested the ability of 34,344 variants to perturb enhancer function in ECs using the highly multiplexed STARR-seq assay. Of these, 5711 variants validated, whose enriched attributes included: (1) mutations to TF binding motifs for ETS or AP-1 that are regulators of the EC state; (2) location in accessible and H3K27ac-marked EC chromatin; and (3) molQTL associations whereby alleles associate with differences in chromatin accessibility and TF binding across genetically diverse ECs. Next, using pro-inflammatory IL1B as an activator of cell state, we observed robust evidence (>50%) of context-specific SNP effects, underscoring the prevalence of noncoding gene-by-environment (GxE) effects. Lastly, using these cumulative data, we fine-mapped vascular disease loci and highlighted evidence suggesting mechanisms by which noncoding SNPs at two loci affect risk for pulse pressure/large artery stroke and abdominal aortic aneurysm through respective effects on transcriptional regulation of POU4F1 and LDAH. Together, we highlight the attributes and context dependence of functional noncoding SNPs and provide new mechanisms underlying vascular disease risk.
Collapse
Affiliation(s)
- Anu Toropainen
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio 70211, Finland
| | - Lindsey K Stolze
- The Department of Cellular and Molecular Medicine, The University of Arizona, Tucson, Arizona 85721, USA.,The Genetics Interdisciplinary Graduate Program, The University of Arizona, Tucson, Arizona 85721, USA
| | - Tiit Örd
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio 70211, Finland
| | - Michael B Whalen
- The Department of Cellular and Molecular Medicine, The University of Arizona, Tucson, Arizona 85721, USA
| | - Paula Martí Torrell
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio 70211, Finland
| | - Verena M Link
- Metaorganism Immunity Section, Laboratory of Host Immunity and Microbiome, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Minna U Kaikkonen
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio 70211, Finland
| | - Casey E Romanoski
- The Department of Cellular and Molecular Medicine, The University of Arizona, Tucson, Arizona 85721, USA.,The Genetics Interdisciplinary Graduate Program, The University of Arizona, Tucson, Arizona 85721, USA
| |
Collapse
|
109
|
Wu T, Jiang D, Zou M, Sun W, Wu D, Cui J, Huntress I, Peng X, Li G. Coupling high-throughput mapping with proteomics analysis delineates cis-regulatory elements at high resolution. Nucleic Acids Res 2022; 50:e5. [PMID: 34634809 PMCID: PMC8754656 DOI: 10.1093/nar/gkab890] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 08/20/2021] [Accepted: 09/17/2021] [Indexed: 12/30/2022] Open
Abstract
Growing evidence suggests that functional cis-regulatory elements (cis-REs) not only exist in epigenetically marked but also in unmarked sites of the human genome. While it is already difficult to identify cis-REs in the epigenetically marked sites, interrogating cis-REs residing within the unmarked sites is even more challenging. Here, we report adapting Reel-seq, an in vitro high-throughput (HTP) technique, to fine-map cis-REs at high resolution over a large region of the human genome in a systematic and continuous manner. Using Reel-seq, as a proof-of-principle, we identified 408 candidate cis-REs by mapping a 58 kb core region on the aging-related CDKN2A/B locus that harbors p16INK4a. By coupling Reel-seq with FREP-MS, a proteomics analysis technique, we characterized two cis-REs, one in an epigenetically marked site and the other in an epigenetically unmarked site. These elements are shown to regulate the p16INK4a expression over an ∼100 kb distance by recruiting the poly(A) binding protein PABPC1 and the transcription factor FOXC2. Downregulation of either PABPC1 or FOXC2 in human endothelial cells (ECs) can induce the p16INK4a-dependent cellular senescence. Thus, we confirmed the utility of Reel-seq and FREP-MS analyses for the systematic identification of cis-REs at high resolution over a large region of the human genome.
Collapse
Affiliation(s)
- Ting Wu
- Aging Institute, University of Pittsburgh, Pittsburgh, PA 15219, USA
- Department of Medicine, Xiangya School of Medicine, Central South University, Changsha 410083, China
| | - Danli Jiang
- Aging Institute, University of Pittsburgh, Pittsburgh, PA 15219, USA
| | - Meijuan Zou
- Aging Institute, University of Pittsburgh, Pittsburgh, PA 15219, USA
| | - Wei Sun
- Center for Pulmonary Vascular Biology and Medicine, Pittsburgh Heart, Lung, Blood, and Vascular Medicine Institute, University of Pittsburgh School of Medicine and University of Pittsburgh Medical Center, Pittsburgh, PA 15261, USA
| | - Di Wu
- Division of Oral Craniofacial Health Science, Adams School of Dentistry, Department of Biostatistics, UNC Gillings School of Global Public Health, University of North Carolina, NC 27599, USA
| | - Jing Cui
- Department of Medicine, Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Ian Huntress
- Department of Molecular Biomedical Sciences, North Carolina State University College of Veterinary Medicine, Raleigh, NC 27607, USA
- Bioinformatics Graduate Program, North Carolina State University, Raleigh, NC 27695, USA
| | - Xinxia Peng
- Bioinformatics Graduate Program, North Carolina State University, Raleigh, NC 27695, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695, USA
| | - Gang Li
- Aging Institute, University of Pittsburgh, Pittsburgh, PA 15219, USA
- Department of Medicine, Division of Cardiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15223, USA
| |
Collapse
|
110
|
Ajore R, Niroula A, Pertesi M, Cafaro C, Thodberg M, Went M, Bao EL, Duran-Lozano L, Lopez de Lapuente Portilla A, Olafsdottir T, Ugidos-Damboriena N, Magnusson O, Samur M, Lareau CA, Halldorsson GH, Thorleifsson G, Norddahl GL, Gunnarsdottir K, Försti A, Goldschmidt H, Hemminki K, van Rhee F, Kimber S, Sperling AS, Kaiser M, Anderson K, Jonsdottir I, Munshi N, Rafnar T, Waage A, Weinhold N, Thorsteinsdottir U, Sankaran VG, Stefansson K, Houlston R, Nilsson B. Functional dissection of inherited non-coding variation influencing multiple myeloma risk. Nat Commun 2022; 13:151. [PMID: 35013207 PMCID: PMC8748989 DOI: 10.1038/s41467-021-27666-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 12/02/2021] [Indexed: 12/16/2022] Open
Abstract
Thousands of non-coding variants have been associated with increased risk of human diseases, yet the causal variants and their mechanisms-of-action remain obscure. In an integrative study combining massively parallel reporter assays (MPRA), expression analyses (eQTL, meQTL, PCHiC) and chromatin accessibility analyses in primary cells (caQTL), we investigate 1,039 variants associated with multiple myeloma (MM). We demonstrate that MM susceptibility is mediated by gene-regulatory changes in plasma cells and B-cells, and identify putative causal variants at six risk loci (SMARCD3, WAC, ELL2, CDCA7L, CEP120, and PREX1). Notably, three of these variants co-localize with significant plasma cell caQTLs, signaling the presence of causal activity at these precise genomic positions in an endogenous chromosomal context in vivo. Our results provide a systematic functional dissection of risk loci for a hematologic malignancy.
Collapse
Affiliation(s)
- Ram Ajore
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Abhishek Niroula
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA
| | - Maroulio Pertesi
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Caterina Cafaro
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Malte Thodberg
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Molly Went
- Division of Genetics and Epidemiology, The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, United Kingdom
| | - Erik L Bao
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Laura Duran-Lozano
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | | | | | - Nerea Ugidos-Damboriena
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Olafur Magnusson
- deCODE Genetics/Amgen Inc., Sturlugata 8, 101, Reykjavik, Iceland
| | - Mehmet Samur
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Caleb A Lareau
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | | | | | | | | | - Asta Försti
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120, Heidelberg, Germany
- Hopp Children's Cancer Center, Heidelberg, Germany
| | - Hartmut Goldschmidt
- Department of Internal Medicine V, University Hospital of Heidelberg, 69120, Heidelberg, Germany
| | - Kari Hemminki
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120, Heidelberg, Germany
- Faculty of Medicine and Biomedical Center in Pilsen, Charles University in Prague, Prague, 30605, Czech Republic
| | | | - Scott Kimber
- Division of Genetics and Epidemiology, The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, United Kingdom
| | - Adam S Sperling
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Martin Kaiser
- Division of Genetics and Epidemiology, The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, United Kingdom
| | - Kenneth Anderson
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | | | - Nikhil Munshi
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Thorunn Rafnar
- deCODE Genetics/Amgen Inc., Sturlugata 8, 101, Reykjavik, Iceland
| | - Anders Waage
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Box 8905, N-7491, Trondheim, Norway
| | - Niels Weinhold
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120, Heidelberg, Germany
- Department of Internal Medicine V, University Hospital of Heidelberg, 69120, Heidelberg, Germany
| | | | - Vijay G Sankaran
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Harvard Stem Cell Institute, Cambridge, MA, USA
| | - Kari Stefansson
- deCODE Genetics/Amgen Inc., Sturlugata 8, 101, Reykjavik, Iceland
| | - Richard Houlston
- Division of Genetics and Epidemiology, The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, United Kingdom
| | - Björn Nilsson
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden.
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA.
| |
Collapse
|
111
|
Jagoda E, Xue JR, Reilly SK, Dannemann M, Racimo F, Huerta-Sanchez E, Sankararaman S, Kelso J, Pagani L, Sabeti PC, Capellini TD. Detection of Neanderthal Adaptively Introgressed Genetic Variants That Modulate Reporter Gene Expression in Human Immune Cells. Mol Biol Evol 2022; 39:msab304. [PMID: 34662402 PMCID: PMC8760939 DOI: 10.1093/molbev/msab304] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Although some variation introgressed from Neanderthals has undergone selective sweeps, little is known about its functional significance. We used a Massively Parallel Reporter Assay (MPRA) to assay 5,353 high-frequency introgressed variants for their ability to modulate the gene expression within 170 bp of endogenous sequence. We identified 2,548 variants in active putative cis-regulatory elements (CREs) and 292 expression-modulating variants (emVars). These emVars are predicted to alter the binding motifs of important immune transcription factors, are enriched for associations with neutrophil and white blood cell count, and are associated with the expression of genes that function in innate immune pathways including inflammatory response and antiviral defense. We combined the MPRA data with other data sets to identify strong candidates to be driver variants of positive selection including an emVar that may contribute to protection against severe COVID-19 response. We endogenously deleted two CREs containing expression-modulation variants linked to immune function, rs11624425 and rs80317430, identifying their primary genic targets as ELMSAN1, and PAN2 and STAT2, respectively, three genes differentially expressed during influenza infection. Overall, we present the first database of experimentally identified expression-modulating Neanderthal-introgressed alleles contributing to potential immune response in modern humans.
Collapse
Affiliation(s)
- Evelyn Jagoda
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - James R Xue
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Steven K Reilly
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael Dannemann
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Fernando Racimo
- Lundbeck GeoGenetics Centre, The Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Emilia Huerta-Sanchez
- Department of Ecology and Evolutionary Biology, Brown University, Providence, RI, USA
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
| | - Sriram Sankararaman
- Department of Computer Science, UCLA, Los Angeles, CA, USA
- Department of Human Genetics, UCLA, Los Angeles, CA, USA
| | - Janet Kelso
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Luca Pagani
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu, Estonia
- Department of Biology, University of Padova, Padova, Italy
| | - Pardis C Sabeti
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Terence D Capellini
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
112
|
Gokuladhas S, Zaied RE, Schierding W, Farrow S, Fadason T, O'Sullivan JM. Integrating Multimorbidity into a Whole-Body Understanding of Disease Using Spatial Genomics. Results Probl Cell Differ 2022; 70:157-187. [PMID: 36348107 DOI: 10.1007/978-3-031-06573-6_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Multimorbidity is characterized by multidimensional complexity emerging from interactions between multiple diseases across levels of biological (including genetic) and environmental determinants and the complex array of interactions between and within cells, tissues and organ systems. Advances in spatial genomic research have led to an unprecedented expansion in our ability to link alterations in genome folding with changes that are associated with human disease. Studying disease-associated genetic variants in the context of the spatial genome has enabled the discovery of transcriptional regulatory programmes that potentially link dysregulated genes to disease development. However, the approaches that have been used have typically been applied to uncover pathological molecular mechanisms occurring in a specific disease-relevant tissue. These forms of reductionist, targeted investigations are not appropriate for the molecular dissection of multimorbidity that typically involves contributions from multiple tissues. In this perspective, we emphasize the importance of a whole-body understanding of multimorbidity and discuss how spatial genomics, when integrated with additional omic datasets, could provide novel insights into the molecular underpinnings of multimorbidity.
Collapse
Affiliation(s)
| | - Roan E Zaied
- Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - William Schierding
- Liggins Institute, The University of Auckland, Auckland, New Zealand
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
| | - Sophie Farrow
- Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - Tayaza Fadason
- Liggins Institute, The University of Auckland, Auckland, New Zealand
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
| | - Justin M O'Sullivan
- Liggins Institute, The University of Auckland, Auckland, New Zealand.
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand.
- Australian Parkinson's Mission, Garvan Institute of Medical Research, Sydney, NSW, Australia.
- MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton, UK.
| |
Collapse
|
113
|
Ding J, Frantzeskos A, Orozco G. Functional interrogation of autoimmune disease genetics using CRISPR/Cas9 technologies and massively parallel reporter assays. Semin Immunopathol 2022; 44:137-147. [PMID: 34508276 PMCID: PMC8837574 DOI: 10.1007/s00281-021-00887-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 08/13/2021] [Indexed: 02/07/2023]
Abstract
Genetic studies, including genome-wide association studies, have identified many common variants that are associated with autoimmune diseases. Strikingly, in addition to being frequently observed in healthy individuals, a number of these variants are shared across diseases with diverse clinical presentations. This highlights the potential for improved autoimmune disease understanding which could be achieved by characterising the mechanism by which variants lead to increased risk of disease. Of particular interest is the potential for identifying novel drug targets or of repositioning drugs currently used in other diseases. The majority of autoimmune disease variants do not alter coding regions and it is often difficult to generate a plausible hypothetical mechanism by which variants affect disease-relevant genes and pathways. Given the interest in this area, considerable effort has been invested in developing and applying appropriate methodologies. Two of the most important technologies in this space include both low- and high-throughput genomic perturbation using the CRISPR/Cas9 system and massively parallel reporter assays. In this review, we introduce the field of autoimmune disease functional genomics and use numerous examples to demonstrate the recent and potential future impact of these technologies.
Collapse
Affiliation(s)
- James Ding
- Centre for Genetics and Genomics Versus Arthritis, Division of Musculoskeletal and Dermatological Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, AV Hill Building, Oxford Road, Manchester, M13 9LJ, UK.
| | - Antonios Frantzeskos
- Centre for Genetics and Genomics Versus Arthritis, Division of Musculoskeletal and Dermatological Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, AV Hill Building, Oxford Road, Manchester, M13 9LJ, UK
| | - Gisela Orozco
- Centre for Genetics and Genomics Versus Arthritis, Division of Musculoskeletal and Dermatological Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, AV Hill Building, Oxford Road, Manchester, M13 9LJ, UK
- NIHR Manchester Biomedical Research Centre, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, M13 9WL, UK
| |
Collapse
|
114
|
Pratt BM, Won H. Advances in profiling chromatin architecture shed light on the regulatory dynamics underlying brain disorders. Semin Cell Dev Biol 2022; 121:153-160. [PMID: 34483043 PMCID: PMC8761161 DOI: 10.1016/j.semcdb.2021.08.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 08/18/2021] [Accepted: 08/23/2021] [Indexed: 01/03/2023]
Abstract
Understanding the exquisitely complex nature of the three-dimensional organization of the genome and how it affects gene regulation remains a central question in biology. Recent advances in sequencing- and imaging-based approaches in decoding the three-dimensional chromatin landscape have enabled a systematic characterization of gene regulatory architecture. In this review, we outline how chromatin architecture provides a reference atlas to predict the functional consequences of non-coding variants associated with human traits and disease. High-throughput perturbation assays such as massively parallel reporter assays (MPRA) and CRISPR-based genome engineering in combination with a reference atlas opened an avenue for going beyond observational studies to experimentally validating the regulatory principles of the genome. We conclude by providing a suggested path forward by calling attention to barriers that can be addressed for a more complete understanding of the regulatory landscape of the human brain.
Collapse
Affiliation(s)
- Brandon M Pratt
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Hyejung Won
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA; UNC Neuroscience Center, University of North Carolina, Chapel Hill, NC 27599, USA.
| |
Collapse
|
115
|
Wang QS, Huang H. Methods for statistical fine-mapping and their applications to auto-immune diseases. Semin Immunopathol 2022; 44:101-113. [PMID: 35041074 PMCID: PMC8837575 DOI: 10.1007/s00281-021-00902-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 10/22/2021] [Indexed: 01/07/2023]
Abstract
Although genome-wide association studies (GWAS) have identified thousands of loci in the human genome that are associated with different traits, understanding the biological mechanisms underlying the association signals identified in GWAS remains challenging. Statistical fine-mapping is a method aiming to refine GWAS signals by evaluating which variant(s) are truly causal to the phenotype. Here, we review the types of statistical fine-mapping methods that have been widely used to date, with a focus on recently developed functionally informed fine-mapping (FIFM) methods that utilize functional annotations. We then systematically review the applications of statistical fine-mapping in autoimmune disease studies to highlight the value of statistical fine-mapping in biological contexts.
Collapse
Affiliation(s)
- Qingbo S Wang
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
116
|
Miller EC, Wilczek A, Bello NA, Tom S, Wapner R, Suh Y. Pregnancy, preeclampsia and maternal aging: From epidemiology to functional genomics. Ageing Res Rev 2022; 73:101535. [PMID: 34871806 PMCID: PMC8827396 DOI: 10.1016/j.arr.2021.101535] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 11/15/2021] [Accepted: 12/01/2021] [Indexed: 01/03/2023]
Abstract
Women live longer than men but experience greater disability and a longer period of illness as they age. Despite clear sex differences in aging, the impact of pregnancy and its complications, such as preeclampsia, on aging is an underexplored area of geroscience. This review summarizes our current knowledge about the complex links between pregnancy and age-related diseases, including evidence from epidemiology, clinical research, and genetics. We discuss the relationship between normal and pathological pregnancy and maternal aging, using preeclampsia as a primary example. We review the results of human genetics studies of preeclampsia, including genome wide association studies (GWAS), and attempted to catalog genes involved in preeclampsia as a gateway to mechanisms underlying an increased risk of later life cardio- and neuro- vascular events. Lastly, we discuss challenges in interpreting the GWAS of preeclampsia and provide a functional genomics framework for future research needed to fully realize the promise of GWAS in identifying targets for geroprotective prevention and therapeutics against preeclampsia.
Collapse
Affiliation(s)
- Eliza C. Miller
- Department of Neurology, Division of Stroke and Cerebrovascular Disease, Columbia University Irving Medical Center, New York, NY, USA
| | - Ashley Wilczek
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA
| | - Natalie A. Bello
- Department of Medicine, Division of Cardiology, Columbia University Irving Medical Center, New York, NY, USA
| | - Sarah Tom
- Department of Neurology, Division of Neurology Clinical Outcomes Research and Population Science and the Department of Epidemiology, Columbia University Irving Medical Center, New York, NY, USA
| | - Ronald Wapner
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA.
| | - Yousin Suh
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA; Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY, USA.
| |
Collapse
|
117
|
Thibodeau A, Khetan S, Eroglu A, Tewhey R, Stitzel ML, Ucar D. CoRE-ATAC: A deep learning model for the functional classification of regulatory elements from single cell and bulk ATAC-seq data. PLoS Comput Biol 2021; 17:e1009670. [PMID: 34898596 PMCID: PMC8699717 DOI: 10.1371/journal.pcbi.1009670] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 12/23/2021] [Accepted: 11/19/2021] [Indexed: 02/06/2023] Open
Abstract
Cis-Regulatory elements (cis-REs) include promoters, enhancers, and insulators that regulate gene expression programs via binding of transcription factors. ATAC-seq technology effectively identifies active cis-REs in a given cell type (including from single cells) by mapping accessible chromatin at base-pair resolution. However, these maps are not immediately useful for inferring specific functions of cis-REs. For this purpose, we developed a deep learning framework (CoRE-ATAC) with novel data encoders that integrate DNA sequence (reference or personal genotypes) with ATAC-seq cut sites and read pileups. CoRE-ATAC was trained on 4 cell types (n = 6 samples/replicates) and accurately predicted known cis-RE functions from 7 cell types (n = 40 samples) that were not used in model training (mean average precision = 0.80, mean F1 score = 0.70). CoRE-ATAC enhancer predictions from 19 human islet samples coincided with genetically modulated gain/loss of enhancer activity, which was confirmed by massively parallel reporter assays (MPRAs). Finally, CoRE-ATAC effectively inferred cis-RE function from aggregate single nucleus ATAC-seq (snATAC) data from human blood-derived immune cells that overlapped with known functional annotations in sorted immune cells, which established the efficacy of these models to study cis-RE functions of rare cells without the need for cell sorting. ATAC-seq maps from primary human cells reveal individual- and cell-specific variation in cis-RE activity. CoRE-ATAC increases the functional resolution of these maps, a critical step for studying regulatory disruptions behind diseases.
Collapse
Affiliation(s)
- Asa Thibodeau
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America
| | - Shubham Khetan
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America
| | - Alper Eroglu
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Michael L. Stitzel
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America
- Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut, United States of America
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, Connecticut, United States of America
| | - Duygu Ucar
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America
- Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut, United States of America
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, Connecticut, United States of America
| |
Collapse
|
118
|
Ishigaki K. Beyond GWAS: from simple associations to functional insights. Semin Immunopathol 2021; 44:3-14. [PMID: 34605948 DOI: 10.1007/s00281-021-00894-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 09/08/2021] [Indexed: 12/31/2022]
Abstract
Each human, when born, has slightly different DNA sequences, which make each of us unique. The variations in DNA sequences are called genetic variants. The primary aim of genome-wide association study (GWAS) is to detect associations between genetic variants and human phenotypes. Since GWAS focuses on germ-line variants, there is no reverse causation. Therefore, GWAS is one of the few tools that can assess the causality of human diseases. In the past 10 years, many large-scale GWAS have been conducted. Although the primary outputs of GWAS are just a series of statistics, its downstream analyses provided many insights beyond simple associations: the causal mechanisms for autoimmune diseases and shared etiology between diseases. Moreover, GWAS downstream analyses generated scores potentially helpful in predicting clinical outcomes of each patient. This review focuses on GWAS for autoimmune diseases and introduces significant achievements of its downstream analyses. We also provide future directions that potentially overcome current limitations. We restrict our discussion to common autoimmune diseases (e.g., rheumatoid arthritis) since rare Mendelian diseases possess distinct genetic etiologies and are not tested by GWAS.
Collapse
Affiliation(s)
- Kazuyoshi Ishigaki
- Laboratory for Human Immunogenetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan.
| |
Collapse
|
119
|
Findlay GM. Linking genome variants to disease: scalable approaches to test the functional impact of human mutations. Hum Mol Genet 2021; 30:R187-R197. [PMID: 34338757 PMCID: PMC8490018 DOI: 10.1093/hmg/ddab219] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 07/19/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
The application of genomics to medicine has accelerated the discovery of mutations underlying disease and has enhanced our knowledge of the molecular underpinnings of diverse pathologies. As the amount of human genetic material queried via sequencing has grown exponentially in recent years, so too has the number of rare variants observed. Despite progress, our ability to distinguish which rare variants have clinical significance remains limited. Over the last decade, however, powerful experimental approaches have emerged to characterize variant effects orders of magnitude faster than before. Fueled by improved DNA synthesis and sequencing and, more recently, by CRISPR/Cas9 genome editing, multiplex functional assays provide a means of generating variant effect data in wide-ranging experimental systems. Here, I review recent applications of multiplex assays that link human variants to disease phenotypes and I describe emerging strategies that will enhance their clinical utility in coming years.
Collapse
Affiliation(s)
- Gregory M Findlay
- The Francis Crick Institute, The Genome Function Laboratory, London NW1 1AT, UK
| |
Collapse
|
120
|
Griesemer D, Xue JR, Reilly SK, Ulirsch JC, Kukreja K, Davis JR, Kanai M, Yang DK, Butts JC, Guney MH, Luban J, Montgomery SB, Finucane HK, Novina CD, Tewhey R, Sabeti PC. Genome-wide functional screen of 3'UTR variants uncovers causal variants for human disease and evolution. Cell 2021; 184:5247-5260.e19. [PMID: 34534445 PMCID: PMC8487971 DOI: 10.1016/j.cell.2021.08.025] [Citation(s) in RCA: 79] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 05/25/2021] [Accepted: 08/19/2021] [Indexed: 12/11/2022]
Abstract
3' untranslated region (3'UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the massively parallel reporter assay for 3'UTRs (MPRAu) to sensitively assay 12,173 3'UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3'UTR function, suggesting that simple sequences predominately explain 3'UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base pair resolution, including an adenylate-uridylate (AU)-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3'UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14 and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.
Collapse
Affiliation(s)
- Dustin Griesemer
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Department of Anesthesiology, Perioperative, and Pain Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - James R Xue
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA.
| | - Steven K Reilly
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA
| | - Jacob C Ulirsch
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Kalki Kukreja
- Department of Molecular and Cell Biology, Harvard University, Cambridge, MA 02138, USA
| | - Joe R Davis
- BigHat Biosciences, San Carlos, CA 94070, USA
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - David K Yang
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA
| | - John C Butts
- The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA
| | - Mehmet H Guney
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Jeremy Luban
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA; Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Stephen B Montgomery
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Carl D Novina
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA; Tufts University School of Medicine, Boston, MA 02111, USA
| | - Pardis C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| |
Collapse
|
121
|
Abstract
[Figure: see text].
Collapse
Affiliation(s)
- Tuuli Lappalainen
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.,New York Genome Center, New York, NY, USA
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, New South Wales, Australia.,Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
122
|
Khetan S, Kales S, Kursawe R, Jillette A, Ulirsch JC, Reilly SK, Ucar D, Tewhey R, Stitzel ML. Functional characterization of T2D-associated SNP effects on baseline and ER stress-responsive β cell transcriptional activation. Nat Commun 2021; 12:5242. [PMID: 34475398 PMCID: PMC8413311 DOI: 10.1038/s41467-021-25514-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 08/10/2021] [Indexed: 11/08/2022] Open
Abstract
Genome-wide association studies (GWAS) have linked single nucleotide polymorphisms (SNPs) at >250 loci in the human genome to type 2 diabetes (T2D) risk. For each locus, identifying the functional variant(s) among multiple SNPs in high linkage disequilibrium is critical to understand molecular mechanisms underlying T2D genetic risk. Using massively parallel reporter assays (MPRA), we test the cis-regulatory effects of SNPs associated with T2D and altered in vivo islet chromatin accessibility in MIN6 β cells under steady state and pathophysiologic endoplasmic reticulum (ER) stress conditions. We identify 1,982/6,621 (29.9%) SNP-containing elements that activate transcription in MIN6 and 879 SNP alleles that modulate MPRA activity. Multiple T2D-associated SNPs alter the activity of short interspersed nuclear element (SINE)-containing elements that are strongly induced by ER stress. We identify 220 functional variants at 104 T2D association signals, narrowing 54 signals to a single candidate SNP. Together, this study identifies elements driving β cell steady state and ER stress-responsive transcriptional activation, nominates causal T2D SNPs, and uncovers potential roles for repetitive elements in β cell transcriptional stress response and T2D genetics.
Collapse
Affiliation(s)
- Shubham Khetan
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, CT, USA
| | - Susan Kales
- The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, USA
| | - Romy Kursawe
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Jacob C Ulirsch
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Duygu Ucar
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, CT, USA
- Institute of Systems Genomics, University of Connecticut, Farmington, CT, USA
| | - Ryan Tewhey
- The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, USA.
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA.
- Tufts University School of Medicine, Boston, MA, USA.
| | - Michael L Stitzel
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, CT, USA.
- Institute of Systems Genomics, University of Connecticut, Farmington, CT, USA.
| |
Collapse
|
123
|
Findley AS, Zhang X, Boye C, Lin YL, Kalita CA, Barreiro L, Lohmueller KE, Pique-Regi R, Luca F. A signature of Neanderthal introgression on molecular mechanisms of environmental responses. PLoS Genet 2021; 17:e1009493. [PMID: 34570765 PMCID: PMC8509894 DOI: 10.1371/journal.pgen.1009493] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 10/12/2021] [Accepted: 08/18/2021] [Indexed: 12/17/2022] Open
Abstract
Ancient human migrations led to the settlement of population groups in varied environmental contexts worldwide. The extent to which adaptation to local environments has shaped human genetic diversity is a longstanding question in human evolution. Recent studies have suggested that introgression of archaic alleles in the genome of modern humans may have contributed to adaptation to environmental pressures such as pathogen exposure. Functional genomic studies have demonstrated that variation in gene expression across individuals and in response to environmental perturbations is a main mechanism underlying complex trait variation. We considered gene expression response to in vitro treatments as a molecular phenotype to identify genes and regulatory variants that may have played an important role in adaptations to local environments. We investigated if Neanderthal introgression in the human genome may contribute to the transcriptional response to environmental perturbations. To this end we used eQTLs for genes differentially expressed in a panel of 52 cellular environments, resulting from 5 cell types and 26 treatments, including hormones, vitamins, drugs, and environmental contaminants. We found that SNPs with introgressed Neanderthal alleles (N-SNPs) disrupt binding of transcription factors important for environmental responses, including ionizing radiation and hypoxia, and for glucose metabolism. We identified an enrichment for N-SNPs among eQTLs for genes differentially expressed in response to 8 treatments, including glucocorticoids, caffeine, and vitamin D. Using Massively Parallel Reporter Assays (MPRA) data, we validated the regulatory function of 21 introgressed Neanderthal variants in the human genome, corresponding to 8 eQTLs regulating 15 genes that respond to environmental perturbations. These findings expand the set of environments where archaic introgression may have contributed to adaptations to local environments in modern humans and provide experimental validation for the regulatory function of introgressed variants.
Collapse
Affiliation(s)
- Anthony S. Findley
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
| | - Xinjun Zhang
- Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, California, United States of America
| | - Carly Boye
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
| | - Yen Lung Lin
- Genetics Section, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Cynthia A. Kalita
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
| | - Luis Barreiro
- Genetics Section, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California, United States of America
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, United States of America
| | - Francesca Luca
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, United States of America
| |
Collapse
|
124
|
Shih CH, Fay J. Cis-regulatory variants affect gene expression dynamics in yeast. eLife 2021; 10:e68469. [PMID: 34369376 PMCID: PMC8367379 DOI: 10.7554/elife.68469] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 08/06/2021] [Indexed: 12/14/2022] Open
Abstract
Evolution of cis-regulatory sequences depends on how they affect gene expression and motivates both the identification and prediction of cis-regulatory variants responsible for expression differences within and between species. While much progress has been made in relating cis-regulatory variants to expression levels, the timing of gene activation and repression may also be important to the evolution of cis-regulatory sequences. We investigated allele-specific expression (ASE) dynamics within and between Saccharomyces species during the diauxic shift and found appreciable cis-acting variation in gene expression dynamics. Within-species ASE is associated with intergenic variants, and ASE dynamics are more strongly associated with insertions and deletions than ASE levels. To refine these associations, we used a high-throughput reporter assay to test promoter regions and individual variants. Within the subset of regions that recapitulated endogenous expression, we identified and characterized cis-regulatory variants that affect expression dynamics. Between species, chimeric promoter regions generate novel patterns and indicate constraints on the evolution of gene expression dynamics. We conclude that changes in cis-regulatory sequences can tune gene expression dynamics and that the interplay between expression dynamics and other aspects of expression is relevant to the evolution of cis-regulatory sequences.
Collapse
Affiliation(s)
- Ching-Hua Shih
- Department of Biology, University of RochesterRochesterUnited States
| | - Justin Fay
- Department of Biology, University of RochesterRochesterUnited States
| |
Collapse
|
125
|
Janowski M, Milewska M, Zare P, Pękowska A. Chromatin Alterations in Neurological Disorders and Strategies of (Epi)Genome Rescue. Pharmaceuticals (Basel) 2021; 14:765. [PMID: 34451862 PMCID: PMC8399958 DOI: 10.3390/ph14080765] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 07/23/2021] [Accepted: 07/24/2021] [Indexed: 12/26/2022] Open
Abstract
Neurological disorders (NDs) comprise a heterogeneous group of conditions that affect the function of the nervous system. Often incurable, NDs have profound and detrimental consequences on the affected individuals' lives. NDs have complex etiologies but commonly feature altered gene expression and dysfunctions of the essential chromatin-modifying factors. Hence, compounds that target DNA and histone modification pathways, the so-called epidrugs, constitute promising tools to treat NDs. Yet, targeting the entire epigenome might reveal insufficient to modify a chosen gene expression or even unnecessary and detrimental to the patients' health. New technologies hold a promise to expand the clinical toolkit in the fight against NDs. (Epi)genome engineering using designer nucleases, including CRISPR-Cas9 and TALENs, can potentially help restore the correct gene expression patterns by targeting a defined gene or pathway, both genetically and epigenetically, with minimal off-target activity. Here, we review the implication of epigenetic machinery in NDs. We outline syndromes caused by mutations in chromatin-modifying enzymes and discuss the functional consequences of mutations in regulatory DNA in NDs. We review the approaches that allow modifying the (epi)genome, including tools based on TALENs and CRISPR-Cas9 technologies, and we highlight how these new strategies could potentially change clinical practices in the treatment of NDs.
Collapse
Affiliation(s)
| | | | | | - Aleksandra Pękowska
- Dioscuri Centre for Chromatin Biology and Epigenomics, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Pasteur Street, 02-093 Warsaw, Poland; (M.J.); (M.M.); (P.Z.)
| |
Collapse
|
126
|
Yang Z, Wang C, Erjavec S, Petukhova L, Christiano A, Ionita-Laza I. A semi-supervised model to predict regulatory effects of genetic variants at single nucleotide resolution using massively parallel reporter assays. Bioinformatics 2021; 37:1953–1962. [PMID: 33515242 PMCID: PMC8337004 DOI: 10.1093/bioinformatics/btab040] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 01/04/2021] [Accepted: 01/07/2021] [Indexed: 12/28/2022] Open
Abstract
MOTIVATION Predicting regulatory effects of genetic variants is a challenging but important problem in functional genomics. Given the relatively low sensitivity of functional assays, and the pervasiveness of class imbalance in functional genomic data, popular statistical prediction models can sharply underestimate the probability of a regulatory effect. We describe here the presence-only model (PO-EN), a type of semi-supervised model, to predict regulatory effects of genetic variants at sequence-level resolution in a context of interest by integrating a large number of epigenetic features and massively parallel reporter assays (MPRAs). RESULTS Using experimental data from a variety of MPRAs we show that the presence-only model produces better calibrated predicted probabilities and has increased accuracy relative to state-of-the-art prediction models. Furthermore, we show that the predictions based on pre-trained PO-EN models are useful for prioritizing functional variants among candidate eQTLs and significant SNPs at GWAS loci. In particular, for the costimulatory locus, associated with multiple autoimmune diseases, we show evidence of a regulatory variant residing in an enhancer 24.4 kb downstream of CTLA4, with evidence from capture Hi-C of interaction with CTLA4. Furthermore, the risk allele of the regulatory variant is on the same risk increasing haplotype as a functional coding variant in exon 1 of CTLA4, suggesting that the regulatory variant acts jointly with the coding variant leading to increased risk to disease. AVAILABILITY The presence-only model is implemented in the R package 'PO.EN', freely available on CRAN. A vignette describing a detailed demonstration of using the proposed PO-EN model can be found on github at https://github.com/Iuliana-Ionita-Laza/PO.EN/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zikun Yang
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Chen Wang
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Stephanie Erjavec
- Department of Genetics and Development, Columbia University, New York, NY 10032, USA
| | - Lynn Petukhova
- Department of Epidemiology, Columbia University, New York, NY 10032, USA
- Department of Dermatology, Columbia University, New York, NY 10032, USA
| | - Angela Christiano
- Department of Genetics and Development, Columbia University, New York, NY 10032, USA
- Department of Dermatology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
127
|
Reilly SK, Gosai SJ, Gutierrez A, Mackay-Smith A, Ulirsch JC, Kanai M, Mouri K, Berenzy D, Kales S, Butler GM, Gladden-Young A, Bhuiyan RM, Stitzel ML, Finucane HK, Sabeti PC, Tewhey R. Direct characterization of cis-regulatory elements and functional dissection of complex genetic associations using HCR-FlowFISH. Nat Genet 2021; 53:1166-1176. [PMID: 34326544 PMCID: PMC8925018 DOI: 10.1038/s41588-021-00900-4] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 06/23/2021] [Indexed: 12/26/2022]
Abstract
Effective interpretation of genome function and genetic variation requires a shift from epigenetic mapping of cis-regulatory elements (CREs) to characterization of endogenous function. We developed hybridization chain reaction fluorescence in situ hybridization coupled with flow cytometry (HCR-FlowFISH), a broadly applicable approach to characterize CRISPR-perturbed CREs via accurate quantification of native transcripts, alongside CRISPR activity screen analysis (CASA), a hierarchical Bayesian model to quantify CRE activity. Across >325,000 perturbations, we provide evidence that CREs can regulate multiple genes, skip over the nearest gene and display activating and/or silencing effects. At the cholesterol-level-associated FADS locus, we combine endogenous screens with reporter assays to exhaustively characterize multiple genome-wide association signals, functionally nominate causal variants and, importantly, identify their target genes.
Collapse
Affiliation(s)
- Steven K Reilly
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Center for System Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| | - Sager J Gosai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for System Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Harvard Graduate Program in Biological and Biomedical Science, Boston, MA, USA
| | | | | | - Jacob C Ulirsch
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Graduate Program in Biological and Biomedical Science, Boston, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA
| | | | | | | | - Gina M Butler
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Redwan M Bhuiyan
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, CT, USA
| | - Michael L Stitzel
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, CT, USA
- Institute of Systems Genomics, University of Connecticut, Farmington, CT, USA
| | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Pardis C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Center for System Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
- Department of Immunology and Infectious Disease, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME, USA.
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA.
- Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA, USA.
| |
Collapse
|
128
|
Baglaenko Y, Macfarlane D, Marson A, Nigrovic PA, Raychaudhuri S. Genome editing to define the function of risk loci and variants in rheumatic disease. Nat Rev Rheumatol 2021; 17:462-474. [PMID: 34188205 PMCID: PMC10782829 DOI: 10.1038/s41584-021-00637-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/20/2021] [Indexed: 02/06/2023]
Abstract
Discoveries in human genetic studies have revolutionized our understanding of complex rheumatic and autoimmune diseases, including the identification of hundreds of genetic loci and single nucleotide polymorphisms that potentially predispose individuals to disease. However, in most cases, the exact disease-causing variants and their mechanisms of action remain unresolved. Functional follow-up of these findings is most challenging for genomic variants that are in non-coding genomic regions, where the large majority of common disease-associated variants are located, and/or that probably affect disease progression via cell type-specific gene regulation. To deliver on the therapeutic promise of human genetic studies, defining the mechanisms of action of these alleles is essential. Genome editing technology, such as CRISPR-Cas, has created a vast toolbox for targeted genetic and epigenetic modifications that presents unprecedented opportunities to decipher disease-causing loci, genes and variants in autoimmunity. In this Review, we discuss the past 5-10 years of progress in resolving the mechanisms underlying rheumatic disease-associated alleles, with an emphasis on how genomic editing techniques can enable targeted dissection and mechanistic studies of causal autoimmune risk variants.
Collapse
Affiliation(s)
- Yuriy Baglaenko
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
| | - Dana Macfarlane
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Alexander Marson
- Gladstone Institutes, San Francisco, CA, USA
- Department of Medicine, University of California, San Francisco, CA, USA
- Department of Microbiology and Immunology, University of California, San Francisco, CA, USA
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Rosalind Russell/Ephraim P. Engleman Rheumatology Research Center, University of California, San Francisco, San Francisco, CA, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Peter A Nigrovic
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Immunology, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Soumya Raychaudhuri
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Faculty of Medical and Human Sciences, University of Manchester, Manchester, UK.
| |
Collapse
|
129
|
Mulvey B, Dougherty JD. Transcriptional-regulatory convergence across functional MDD risk variants identified by massively parallel reporter assays. Transl Psychiatry 2021; 11:403. [PMID: 34294677 PMCID: PMC8298436 DOI: 10.1038/s41398-021-01493-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 06/02/2021] [Accepted: 06/16/2021] [Indexed: 02/07/2023] Open
Abstract
Family and population studies indicate clear heritability of major depressive disorder (MDD), though its underlying biology remains unclear. The majority of single-nucleotide polymorphism (SNP) linkage blocks associated with MDD by genome-wide association studies (GWASes) are believed to alter transcriptional regulators (e.g., enhancers, promoters) based on enrichment of marks correlated with these functions. A key to understanding MDD pathophysiology will be elucidation of which SNPs are functional and how such functional variants biologically converge to elicit the disease. Furthermore, retinoids can elicit MDD in patients and promote depressive-like behaviors in rodent models, acting via a regulatory system of retinoid receptor transcription factors (TFs). We therefore sought to simultaneously identify functional genetic variants and assess retinoid pathway regulation of MDD risk loci. Using Massively Parallel Reporter Assays (MPRAs), we functionally screened over 1000 SNPs prioritized from 39 neuropsychiatric trait/disease GWAS loci, selecting SNPs based on overlap with predicted regulatory features-including expression quantitative trait loci (eQTL) and histone marks-from human brains and cell cultures. We identified >100 SNPs with allelic effects on expression in a retinoid-responsive model system. Functional SNPs were enriched for binding sequences of retinoic acid-receptive transcription factors (TFs), with additional allelic differences unmasked by treatment with all-trans retinoic acid (ATRA). Finally, motifs overrepresented across functional SNPs corresponded to TFs highly specific to serotonergic neurons, suggesting an in vivo site of action. Our application of MPRAs to screen MDD-associated SNPs suggests a shared transcriptional-regulatory program across loci, a component of which is unmasked by retinoids.
Collapse
Affiliation(s)
- Bernard Mulvey
- Departments of Genetics and Psychiatry, Washington University in St. Louis, St. Louis, MO, USA
| | - Joseph D Dougherty
- Departments of Genetics and Psychiatry, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
130
|
Lee D, Kapoor A, Lee C, Mudgett M, Beer MA, Chakravarti A. Sequence-based correction of barcode bias in massively parallel reporter assays. Genome Res 2021; 31:1638-1645. [PMID: 34285053 PMCID: PMC8415370 DOI: 10.1101/gr.268599.120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 07/07/2021] [Indexed: 11/24/2022]
Abstract
Massively parallel reporter assays (MPRAs) are a high-throughput method for evaluating in vitro activities of thousands of candidate cis-regulatory elements (CREs). In these assays, candidate sequences are cloned upstream or downstream from a reporter gene tagged by unique DNA sequences. However, tag sequences may themselves affect reporter gene expression and lead to major potential biases in the measured cis-regulatory activity. Here, we present a sequence-based method for correcting tag-sequence-specific effects and show that our method can significantly reduce this source of variation and improve the identification of functional regulatory variants by MPRAs. We also show that our model captures sequence features associated with post-transcriptional regulation of mRNA. Thus, this new method helps not only to improve detection of regulatory signals in MPRA experiments but also to design better MPRA protocols.
Collapse
Affiliation(s)
| | - Ashish Kapoor
- University of Texas Health Science Center at Houston
| | | | | | | | | |
Collapse
|
131
|
Umer HM, Smolinska K, Komorowski J, Wadelius C. Functional annotation of noncoding mutations in cancer. Life Sci Alliance 2021; 4:4/9/e201900523. [PMID: 34282050 PMCID: PMC8321657 DOI: 10.26508/lsa.201900523] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 06/29/2021] [Accepted: 06/29/2021] [Indexed: 02/06/2023] Open
Abstract
Recurrent regulatory mutations affecting transcription factor binding sites in 2,500 cancer samples. In a cancer genome, the noncoding sequence contains the vast majority of somatic mutations. While very few are expected to be cancer drivers, those affecting regulatory elements have the potential to have downstream effects on gene regulation that may contribute to cancer progression. To prioritize regulatory mutations, we screened somatic mutations in the Pan-Cancer Analysis of Whole Genomes cohort of 2,515 cancer genomes on individual bases to assess their potential regulatory roles in their respective cancer types. We found a highly significant enrichment of regulatory mutations associated with the deamination signature overlapping a CpG site in the CCAAT/Enhancer Binding Protein β recognition sites in many cancer types. Overall, 5,749 mutated regulatory elements were identified in 1,844 tumor samples from 39 cohorts containing 11,962 candidate regulatory mutations. Our analysis indicated 20 or more regulatory mutations in 5.5% of the samples, and an overall average of six per tumor. Several recurrent elements were identified, and major cancer-related pathways were significantly enriched for genes nearby the mutated regulatory elements. Our results provide a detailed view of the role of regulatory elements in cancer genomes.
Collapse
Affiliation(s)
- Husen M Umer
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden.,Department of Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden
| | - Karolina Smolinska
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Jan Komorowski
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden.,Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland.,Swedish Collegium for Advanced Study, Uppsala, Sweden.,Washington National Primate Research Center, Seattle, WA, USA
| | - Claes Wadelius
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
132
|
Wang Y, Shi FY, Liang Y, Gao G. REVA as A Well-curated Database for Human Expression-modulating Variants. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:590-601. [PMID: 34224878 PMCID: PMC9040024 DOI: 10.1016/j.gpb.2021.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 06/22/2021] [Accepted: 06/25/2021] [Indexed: 10/25/2022]
Abstract
More than 90% of disease- and trait-associated human variants are noncoding. By systematically screening multiple large-scale studies, we compiled REVA, a manually curated database for over 11.8 million experimentally tested noncoding variants with expression-modulating potentials. We provided 2424 functional annotations that could be used to pinpoint the plausible regulatory mechanism of these variants. We further benchmarked multiple state-of-the-art computational tools and found their limited sensitivity remains a serious challenge for effective large-scale analysis. REVA provides high-quality experimentally tested expression-modulating variants with extensive functional annotations, which will be useful for users in the noncoding variants community. REVA is available at http://reva.gao-lab.org.
Collapse
Affiliation(s)
- Yu Wang
- Biomedical Pioneering Innovation Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI) and State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, China
| | - Fang-Yuan Shi
- Biomedical Pioneering Innovation Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI) and State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, China
| | - Yu Liang
- Human Aging Research Institute, School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Ge Gao
- Biomedical Pioneering Innovation Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI) and State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, China.
| |
Collapse
|
133
|
Degtyareva AO, Antontseva EV, Merkulova TI. Regulatory SNPs: Altered Transcription Factor Binding Sites Implicated in Complex Traits and Diseases. Int J Mol Sci 2021; 22:6454. [PMID: 34208629 PMCID: PMC8235176 DOI: 10.3390/ijms22126454] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 06/15/2021] [Accepted: 06/15/2021] [Indexed: 12/19/2022] Open
Abstract
The vast majority of the genetic variants (mainly SNPs) associated with various human traits and diseases map to a noncoding part of the genome and are enriched in its regulatory compartment, suggesting that many causal variants may affect gene expression. The leading mechanism of action of these SNPs consists in the alterations in the transcription factor binding via creation or disruption of transcription factor binding sites (TFBSs) or some change in the affinity of these regulatory proteins to their cognate sites. In this review, we first focus on the history of the discovery of regulatory SNPs (rSNPs) and systematized description of the existing methodical approaches to their study. Then, we brief the recent comprehensive examples of rSNPs studied from the discovery of the changes in the TFBS sequence as a result of a nucleotide substitution to identification of its effect on the target gene expression and, eventually, to phenotype. We also describe state-of-the-art genome-wide approaches to identification of regulatory variants, including both making molecular sense of genome-wide association studies (GWAS) and the alternative approaches the primary goal of which is to determine the functionality of genetic variants. Among these approaches, special attention is paid to expression quantitative trait loci (eQTLs) analysis and the search for allele-specific events in RNA-seq (ASE events) as well as in ChIP-seq, DNase-seq, and ATAC-seq (ASB events) data.
Collapse
Affiliation(s)
- Arina O. Degtyareva
- Department of Molecular Genetic, Institute of Cytology and Genetics, 630090 Novosibirsk, Russia; (A.O.D.); (E.V.A.)
| | - Elena V. Antontseva
- Department of Molecular Genetic, Institute of Cytology and Genetics, 630090 Novosibirsk, Russia; (A.O.D.); (E.V.A.)
| | - Tatiana I. Merkulova
- Department of Molecular Genetic, Institute of Cytology and Genetics, 630090 Novosibirsk, Russia; (A.O.D.); (E.V.A.)
- Department of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| |
Collapse
|
134
|
Wang QS, Kelley DR, Ulirsch J, Kanai M, Sadhuka S, Cui R, Albors C, Cheng N, Okada Y, Aguet F, Ardlie KG, MacArthur DG, Finucane HK. Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs. Nat Commun 2021; 12:3394. [PMID: 34099641 PMCID: PMC8184741 DOI: 10.1038/s41467-021-23134-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 04/15/2021] [Indexed: 02/05/2023] Open
Abstract
The large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants' effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.
Collapse
Affiliation(s)
- Qingbo S Wang
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- PhD program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA.
| | | | - Jacob Ulirsch
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- PhD program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- PhD program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan
| | - Shuvom Sadhuka
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard College, Cambridge, MA, USA
| | - Ran Cui
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Carlos Albors
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Nathan Cheng
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Osaka, Japan
- Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Osaka, Japan
| | | | | | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Parkville, VIC, Australia
| | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
135
|
Wang Y, Jiang Y, Yao B, Huang K, Liu Y, Wang Y, Qin X, Saykin AJ, Chen L. WEVar: a novel statistical learning framework for predicting noncoding regulatory variants. Brief Bioinform 2021; 22:6279833. [PMID: 34021560 DOI: 10.1093/bib/bbab189] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 04/05/2021] [Accepted: 04/23/2021] [Indexed: 11/15/2022] Open
Abstract
Understanding the functional consequence of noncoding variants is of great interest. Though genome-wide association studies or quantitative trait locus analyses have identified variants associated with traits or molecular phenotypes, most of them are located in the noncoding regions, making the identification of causal variants a particular challenge. Existing computational approaches developed for prioritizing noncoding variants produce inconsistent and even conflicting results. To address these challenges, we propose a novel statistical learning framework, which directly integrates the precomputed functional scores from representative scoring methods. It will maximize the usage of integrated methods by automatically learning the relative contribution of each method and produce an ensemble score as the final prediction. The framework consists of two modes. The first 'context-free' mode is trained using curated causal regulatory variants from a wide range of context and is applicable to predict regulatory variants of unknown and diverse context. The second 'context-dependent' mode further improves the prediction when the training and testing variants are from the same context. By evaluating the framework via both simulation and empirical studies, we demonstrate that it outperforms integrated scoring methods and the ensemble score successfully prioritizes experimentally validated regulatory variants in multiple risk loci.
Collapse
Affiliation(s)
- Ye Wang
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Yuchao Jiang
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Bing Yao
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, 36849, USA
| | - Kun Huang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Yunlong Liu
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yue Wang
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Xiao Qin
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Li Chen
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| |
Collapse
|
136
|
Letiagina AE, Omelina ES, Ivankin AV, Pindyurin AV. MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. Front Genet 2021; 12:618189. [PMID: 34046055 PMCID: PMC8148044 DOI: 10.3389/fgene.2021.618189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/25/2021] [Indexed: 11/13/2022] Open
Abstract
Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC-ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC-ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional "mapping" samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.
Collapse
Affiliation(s)
- Anna E Letiagina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Faculty of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Evgeniya S Omelina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anton V Ivankin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Alexey V Pindyurin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
137
|
Akinci E, Hamilton MC, Khowpinitchai B, Sherwood RI. Using CRISPR to understand and manipulate gene regulation. Development 2021; 148:dev182667. [PMID: 33913466 PMCID: PMC8126405 DOI: 10.1242/dev.182667] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Understanding how genes are expressed in the correct cell types and at the correct level is a key goal of developmental biology research. Gene regulation has traditionally been approached largely through observational methods, whereas perturbational approaches have lacked precision. CRISPR-Cas9 has begun to transform the study of gene regulation, allowing for precise manipulation of genomic sequences, epigenetic functionalization and gene expression. CRISPR-Cas9 technology has already led to the discovery of new paradigms in gene regulation and, as new CRISPR-based tools and methods continue to be developed, promises to transform our knowledge of the gene regulatory code and our ability to manipulate cell fate. Here, we discuss the current and future application of the emerging CRISPR toolbox toward predicting gene regulatory network behavior, improving stem cell disease modeling, dissecting the epigenetic code, reprogramming cell fate and treating diseases of gene dysregulation.
Collapse
Affiliation(s)
- Ersin Akinci
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Department of Agricultural Biotechnology, Faculty of Agriculture, Akdeniz University, Antalya, 07070, Turkey
| | - Marisa C. Hamilton
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Benyapa Khowpinitchai
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Richard I. Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Hubrecht Institute, 3584 CT, Utrecht, The Netherlands
| |
Collapse
|
138
|
Weiss CV, Harshman L, Inoue F, Fraser HB, Petrov DA, Ahituv N, Gokhman D. The cis-regulatory effects of modern human-specific variants. eLife 2021; 10:e63713. [PMID: 33885362 PMCID: PMC8062137 DOI: 10.7554/elife.63713] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Accepted: 03/30/2021] [Indexed: 12/24/2022] Open
Abstract
The Neanderthal and Denisovan genomes enabled the discovery of sequences that differ between modern and archaic humans, the majority of which are noncoding. However, our understanding of the regulatory consequences of these differences remains limited, in part due to the decay of regulatory marks in ancient samples. Here, we used a massively parallel reporter assay in embryonic stem cells, neural progenitor cells, and bone osteoblasts to investigate the regulatory effects of the 14,042 single-nucleotide modern human-specific variants. Overall, 1791 (13%) of sequences containing these variants showed active regulatory activity, and 407 (23%) of these drove differential expression between human groups. Differentially active sequences were associated with divergent transcription factor binding motifs, and with genes enriched for vocal tract and brain anatomy and function. This work provides insight into the regulatory function of variants that emerged along the modern human lineage and the recent evolution of human gene expression.
Collapse
Affiliation(s)
- Carly V Weiss
- Department of Biology, Stanford University, StanfordStanfordUnited States
| | - Lana Harshman
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San FranciscoSan FranciscoUnited States
- Institute for Human Genetics, University of California San Francisco, San FranciscoSan FranciscoUnited States
| | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San FranciscoSan FranciscoUnited States
- Institute for Human Genetics, University of California San Francisco, San FranciscoSan FranciscoUnited States
| | - Hunter B Fraser
- Department of Biology, Stanford University, StanfordStanfordUnited States
| | - Dmitri A Petrov
- Department of Biology, Stanford University, StanfordStanfordUnited States
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San FranciscoSan FranciscoUnited States
- Institute for Human Genetics, University of California San Francisco, San FranciscoSan FranciscoUnited States
| | - David Gokhman
- Department of Biology, Stanford University, StanfordStanfordUnited States
| |
Collapse
|
139
|
Mehra P, Wells AD. Variant to Gene Mapping to Discover New Targets for Immune Tolerance. Front Immunol 2021; 12:633219. [PMID: 33936046 PMCID: PMC8082446 DOI: 10.3389/fimmu.2021.633219] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/16/2021] [Indexed: 01/04/2023] Open
Abstract
The breakdown of immunological tolerance leads to autoimmune disease, and the mechanisms that maintain self-tolerance, especially in humans, are not fully understood. Genome-wide association studies (GWAS) have identified hundreds of human genetic loci statistically linked to autoimmune disease risk, and epigenetic modifications of DNA and chromatin at these loci have been associated with autoimmune disease risk. Because the vast majority of these signals are located far from genes, identifying causal variants, and their functional consequences on the correct effector genes, has been challenging. These limitations have hampered the translation of GWAS findings into novel drug targets and clinical interventions, but recent advances in understanding the spatial organization of the genome in the nucleus have offered mechanistic insights into gene regulation and answers to questions left open by GWAS. Here we discuss the potential for 'variant-to-gene mapping' approaches that integrate GWAS with 3D functional genomic data to identify human genes involved in the maintenance of tolerance.
Collapse
Affiliation(s)
- Parul Mehra
- Department of Pathology, The Children's Hospital of Philadelphia, Philadelphia, PA, United States
| | - Andrew D Wells
- Department of Pathology, The Children's Hospital of Philadelphia, Philadelphia, PA, United States.,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
140
|
Global discovery of lupus genetic risk variant allelic enhancer activity. Nat Commun 2021; 12:1611. [PMID: 33712590 PMCID: PMC7955039 DOI: 10.1038/s41467-021-21854-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2020] [Accepted: 02/16/2021] [Indexed: 12/17/2022] Open
Abstract
Genome-wide association studies of Systemic Lupus Erythematosus (SLE) nominate 3073 genetic variants at 91 risk loci. To systematically screen these variants for allelic transcriptional enhancer activity, we construct a massively parallel reporter assay (MPRA) library comprising 12,396 DNA oligonucleotides containing the genomic context around every allele of each SLE variant. Transfection into the Epstein-Barr virus-transformed B cell line GM12878 reveals 482 variants with enhancer activity, with 51 variants showing genotype-dependent (allelic) enhancer activity at 27 risk loci. Comparison of MPRA results in GM12878 and Jurkat T cell lines highlights shared and unique allelic transcriptional regulatory mechanisms at SLE risk loci. In-depth analysis of allelic transcription factor (TF) binding at and around allelic variants identifies one class of TFs whose DNA-binding motif tends to be directly altered by the risk variant and a second class of TFs that bind allelically without direct alteration of their motif by the variant. Collectively, our approach provides a blueprint for the discovery of allelic gene regulation at risk loci for any disease and offers insight into the transcriptional regulatory mechanisms underlying SLE. Thousands of genetic variants have been associated with lupus, but causal variants and mechanisms are unknown. Here, the authors combine a massively parallel reporter assay with genome-wide ChIP experiments to identify risk variants with allelic enhancer activity mediated through transcription factor binding.
Collapse
|
141
|
Rao S, Yao Y, Bauer DE. Editing GWAS: experimental approaches to dissect and exploit disease-associated genetic variation. Genome Med 2021; 13:41. [PMID: 33691767 PMCID: PMC7948363 DOI: 10.1186/s13073-021-00857-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 02/12/2021] [Indexed: 12/17/2022] Open
Abstract
Genome-wide association studies (GWAS) have uncovered thousands of genetic variants that influence risk for human diseases and traits. Yet understanding the mechanisms by which these genetic variants, mainly noncoding, have an impact on associated diseases and traits remains a significant hurdle. In this review, we discuss emerging experimental approaches that are being applied for functional studies of causal variants and translational advances from GWAS findings to disease prevention and treatment. We highlight the use of genome editing technologies in GWAS functional studies to modify genomic sequences, with proof-of-principle examples. We discuss the challenges in interrogating causal variants, points for consideration in experimental design and interpretation of GWAS locus mechanisms, and the potential for novel therapeutic opportunities. With the accumulation of knowledge of functional genetics, therapeutic genome editing based on GWAS discoveries will become increasingly feasible.
Collapse
Affiliation(s)
- Shuquan Rao
- Division of Hematology/Oncology, Boston Children's Hospital; Department of Pediatric Oncology, Dana-Farber Cancer Institute; Harvard Stem Cell Institute; Broad Institute; Department of Pediatrics, Harvard Medical School, Boston, MA, USA.
| | - Yao Yao
- Division of Hematology/Oncology, Boston Children's Hospital; Department of Pediatric Oncology, Dana-Farber Cancer Institute; Harvard Stem Cell Institute; Broad Institute; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- School of Basic Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Daniel E Bauer
- Division of Hematology/Oncology, Boston Children's Hospital; Department of Pediatric Oncology, Dana-Farber Cancer Institute; Harvard Stem Cell Institute; Broad Institute; Department of Pediatrics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
142
|
Integrative analysis of liver-specific non-coding regulatory SNPs associated with the risk of coronary artery disease. Am J Hum Genet 2021; 108:411-430. [PMID: 33626337 DOI: 10.1016/j.ajhg.2021.02.006] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 02/04/2021] [Indexed: 02/08/2023] Open
Abstract
Genetic factors underlying coronary artery disease (CAD) have been widely studied using genome-wide association studies (GWASs). However, the functional understanding of the CAD loci has been limited by the fact that a majority of GWAS variants are located within non-coding regions with no functional role. High cholesterol and dysregulation of the liver metabolism such as non-alcoholic fatty liver disease confer an increased risk of CAD. Here, we studied the function of non-coding single-nucleotide polymorphisms in CAD GWAS loci located within liver-specific enhancer elements by identifying their potential target genes using liver cis-eQTL analysis and promoter Capture Hi-C in HepG2 cells. Altogether, 734 target genes were identified of which 121 exhibited correlations to liver-related traits. To identify potentially causal regulatory SNPs, the allele-specific enhancer activity was analyzed by (1) sequence-based computational predictions, (2) quantification of allele-specific transcription factor binding, and (3) STARR-seq massively parallel reporter assay. Altogether, our analysis identified 1,277 unique SNPs that display allele-specific regulatory activity. Among these, susceptibility enhancers near important cholesterol homeostasis genes (APOB, APOC1, APOE, and LIPA) were identified, suggesting that altered gene regulatory activity could represent another way by which genetic variation regulates serum lipoprotein levels. Using CRISPR-based perturbation, we demonstrate how the deletion/activation of a single enhancer leads to changes in the expression of many target genes located in a shared chromatin interaction domain. Our integrative genomics approach represents a comprehensive effort in identifying putative causal regulatory regions and target genes that could predispose to clinical manifestation of CAD by affecting liver function.
Collapse
|
143
|
Zheng A, Lamkin M, Zhao H, Wu C, Su H, Gymrek M. Deep neural networks identify sequence context features predictive of transcription factor binding. NAT MACH INTELL 2021; 3:172-180. [PMID: 33796819 PMCID: PMC8009085 DOI: 10.1038/s42256-020-00282-y] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 12/10/2020] [Indexed: 12/11/2022]
Abstract
Transcription factors (TFs) bind DNA by recognizing specific sequence motifs, typically of length 6-12bp. A motif can occur many thousands of times in the human genome, but only a subset of those sites are actually bound. Here we present a machine learning framework leveraging existing convolutional neural network architectures and model interpretation techniques to identify and interpret sequence context features most important for predicting whether a particular motif instance will be bound. We apply our framework to predict binding at motifs for 38 TFs in a lymphoblastoid cell line, score the importance of context sequences at base-pair resolution, and characterize context features most predictive of binding. We find that the choice of training data heavily influences classification accuracy and the relative importance of features such as open chromatin. Overall, our framework enables novel insights into features predictive of TF binding and is likely to inform future deep learning applications to interpret non-coding genetic variants.
Collapse
Affiliation(s)
- An Zheng
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA USA
| | - Michael Lamkin
- Department of Bioengineering, University of California San Diego, La Jolla, CA USA
| | - Hanqing Zhao
- Department of Biology, University of California San Diego, La Jolla, CA, USA
| | - Cynthia Wu
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA USA
| | - Hao Su
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA USA
- Department of Medicine, University of California San Diego, La Jolla, CA USA
| |
Collapse
|
144
|
Yu TC, Liu WL, Brinck MS, Davis JE, Shek J, Bower G, Einav T, Insigne KD, Phillips R, Kosuri S, Urtecho G. Multiplexed characterization of rationally designed promoter architectures deconstructs combinatorial logic for IPTG-inducible systems. Nat Commun 2021; 12:325. [PMID: 33436562 PMCID: PMC7804116 DOI: 10.1038/s41467-020-20094-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 11/04/2020] [Indexed: 12/21/2022] Open
Abstract
A crucial step towards engineering biological systems is the ability to precisely tune the genetic response to environmental stimuli. In the case of Escherichia coli inducible promoters, our incomplete understanding of the relationship between sequence composition and gene expression hinders our ability to predictably control transcriptional responses. Here, we profile the expression dynamics of 8269 rationally designed, IPTG-inducible promoters that collectively explore the individual and combinatorial effects of RNA polymerase and LacI repressor binding site strengths. We then fit a statistical mechanics model to measured expression that accurately models gene expression and reveals properties of theoretically optimal inducible promoters. Furthermore, we characterize three alternative promoter architectures and show that repositioning binding sites within promoters influences the types of combinatorial effects observed between promoter elements. In total, this approach enables us to deconstruct relationships between inducible promoter elements and discover practical insights for engineering inducible promoters with desirable characteristics.
Collapse
Affiliation(s)
- Timothy C Yu
- Department of Bioengineering, University of California, Los Angeles, CA, 90095, USA
| | - Winnie L Liu
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA, 90095, USA
| | - Marcia S Brinck
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA, 90095, USA
| | - Jessica E Davis
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Jeremy Shek
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA
| | - Grace Bower
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA, 90095, USA
| | - Tal Einav
- Department of Physics, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Kimberly D Insigne
- Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, CA, 90095, USA
| | - Rob Phillips
- Department of Physics, California Institute of Technology, Pasadena, CA, 91125, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
- Department of Applied Physics, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Sriram Kosuri
- Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095, USA.
- UCLA-DOE Institute for Genomics and Proteomics, Los Angeles, CA, 90095, USA.
- Institute for Quantitative and Computational Biosciences (QCB), University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, 90095, USA.
- Molecular Biology Interdepartmental Doctoral Program, University of California, Los Angeles, CA, 90095, USA.
| | - Guillaume Urtecho
- Molecular Biology Interdepartmental Doctoral Program, University of California, Los Angeles, CA, 90095, USA.
| |
Collapse
|
145
|
Uebbing S, Gockley J, Reilly SK, Kocher AA, Geller E, Gandotra N, Scharfe C, Cotney J, Noonan JP. Massively parallel discovery of human-specific substitutions that alter enhancer activity. Proc Natl Acad Sci U S A 2021; 118:e2007049118. [PMID: 33372131 PMCID: PMC7812811 DOI: 10.1073/pnas.2007049118] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Genetic changes that altered the function of gene regulatory elements have been implicated in the evolution of human traits such as the expansion of the cerebral cortex. However, identifying the particular changes that modified regulatory activity during human evolution remain challenging. Here we used massively parallel enhancer assays in neural stem cells to quantify the functional impact of >32,000 human-specific substitutions in >4,300 human accelerated regions (HARs) and human gain enhancers (HGEs), which include enhancers with novel activities in humans. We found that >30% of active HARs and HGEs exhibited differential activity between human and chimpanzee. We isolated the effects of human-specific substitutions from background genetic variation to identify the effects of genetic changes most relevant to human evolution. We found that substitutions interacted in both additive and nonadditive ways to modify enhancer function. Substitutions within HARs, which are highly constrained compared to HGEs, showed smaller effects on enhancer activity, suggesting that the impact of human-specific substitutions is buffered in enhancers with constrained ancestral functions. Our findings yield insight into how human-specific genetic changes altered enhancer function and provide a rich set of candidates for studies of regulatory evolution in humans.
Collapse
Affiliation(s)
- Severin Uebbing
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510
| | - Jake Gockley
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510
| | - Steven K Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510
| | - Acadia A Kocher
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510
| | - Evan Geller
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510
| | - Neeru Gandotra
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510
| | - Curt Scharfe
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510
| | - Justin Cotney
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510
| | - James P Noonan
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510;
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510
- Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06510
| |
Collapse
|
146
|
Suzuki A, Guerrini MM, Yamamoto K. Functional genomics of autoimmune diseases. Ann Rheum Dis 2021; 80:689-697. [PMID: 33408079 DOI: 10.1136/annrheumdis-2019-216794] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 12/06/2020] [Indexed: 12/22/2022]
Abstract
For more than a decade, genome-wide association studies have been applied to autoimmune diseases and have expanded our understanding on the pathogeneses. Genetic risk factors associated with diseases and traits are essentially causative. However, elucidation of the biological mechanism of disease from genetic factors is challenging. In fact, it is difficult to identify the causal variant among multiple variants located on the same haplotype or linkage disequilibrium block and thus the responsible biological genes remain elusive. Recently, multiple studies have revealed that the majority of risk variants locate in the non-coding region of the genome and they are the most likely to regulate gene expression such as quantitative trait loci. Enhancer, promoter and long non-coding RNA appear to be the main target mechanisms of the risk variants. In this review, we discuss functional genetics to challenge these puzzles.
Collapse
Affiliation(s)
- Akari Suzuki
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Matteo Maurizio Guerrini
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Kazuhiko Yamamoto
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| |
Collapse
|
147
|
PsychENCODE and beyond: transcriptomics and epigenomics of brain development and organoids. Neuropsychopharmacology 2021; 46:70-85. [PMID: 32659782 PMCID: PMC7689467 DOI: 10.1038/s41386-020-0763-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 06/24/2020] [Accepted: 06/25/2020] [Indexed: 12/13/2022]
Abstract
Crucial decisions involving cell fate and connectivity that shape the distinctive development of the human brain occur in the embryonic and fetal stages-stages that are difficult to access and investigate in humans. The last decade has seen an impressive increase in resources-from atlases and databases to biological models-that is progressively lifting the curtain on this critical period. In this review, we describe the current state of genomic, transcriptomic, and epigenomic datasets charting the development of normal human brain with a particular focus on recent single-cell technologies. We discuss the emergence of brain organoids generated from pluripotent stem cells as a model to compensate for the limited availability of fetal tissue. Indeed, comparisons of neural lineages, transcriptional dynamics, and noncoding element activity between fetal brain and organoids have helped identify gene regulatory networks functioning at early stages of brain development. Altogether, we argue that large multi-omics investigations have pushed brain development into the "big data" era, and that current and future transversal approaches needed to leverage both fetal brain and organoid resources promise to answer major questions of brain biology and psychiatry.
Collapse
|
148
|
Promoter-interacting expression quantitative trait loci are enriched for functional genetic variants. Nat Genet 2021; 53:110-119. [PMID: 33349701 PMCID: PMC8053422 DOI: 10.1038/s41588-020-00745-3] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 11/02/2020] [Indexed: 01/28/2023]
Abstract
Expression quantitative trait loci (eQTLs) studies provide associations of genetic variants with gene expression but fall short of pinpointing functionally important eQTLs. Here, using H3K27ac HiChIP assays, we mapped eQTLs overlapping active cis-regulatory elements that interact with their target gene promoters (promoter-interacting eQTLs, pieQTLs) in five common immune cell types (Database of Immune Cell Expression, Expression quantitative trait loci and Epigenomics (DICE) cis-interactome project). This approach allowed us to identify functionally important eQTLs and show mechanisms that explain their cell-type restriction. We also devised an approach to eQTL discovery that relies on HiChIP-based promoter interaction maps as a structural framework for deciding which SNPs to test for association with gene expression, and observe ultra-long-distance pieQTLs (>1 megabase away), including several disease-risk variants. We validated the functional role of pieQTLs using reporter assays, CRISPRi, dCas9-tiling guides and Cas9-mediated base-pair editing. In this article we present a method for functional eQTL discovery and provide insights into relevance of noncoding variants for cell-specific gene regulation and for disease association beyond conventional eQTL mapping.
Collapse
|
149
|
Mulvey B, Lagunas T, Dougherty JD. Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants Across Biological Contexts. Biol Psychiatry 2021; 89:76-89. [PMID: 32843144 PMCID: PMC7938388 DOI: 10.1016/j.biopsych.2020.06.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 06/09/2020] [Accepted: 06/10/2020] [Indexed: 12/18/2022]
Abstract
Neuropsychiatric phenotypes have long been known to be influenced by heritable risk factors, directly confirmed by the past decade of genetic studies that have revealed specific genetic variants enriched in disease cohorts. However, the initial hope that a small set of genes would be responsible for a given disorder proved false. The more complex reality is that a given disorder may be influenced by myriad small-effect noncoding variants and/or by rare but severe coding variants, many de novo. Noncoding genomic sequences-for which molecular functions cannot usually be inferred-harbor a large portion of these variants, creating a substantial barrier to understanding higher-order molecular and biological systems of disease. Fortunately, novel genetic technologies-scalable oligonucleotide synthesis, RNA sequencing, and CRISPR (clustered regularly interspaced short palindromic repeats)-have opened novel avenues to experimentally identify biologically significant variants en masse. Massively parallel reporter assays (MPRAs) are an especially versatile technique resulting from such innovations. MPRAs are powerful molecular genetics tools that can be used to screen thousands of untranscribed or untranslated sequences and their variants for functional effects in a single experiment. This approach, though underutilized in psychiatric genetics, has several useful features for the field. We review methods for assaying putatively functional genetic variants and regions, emphasizing MPRAs and the opportunities they hold for dissection of psychiatric polygenicity. We discuss literature applying functional assays in neurogenetics, highlighting strengths, caveats, and design considerations-especially regarding disease-relevant variables (cell type, neurodevelopment, and sex), and we ultimately propose applications of MPRA to both computational and experimental neurogenetics of polygenic disease risk.
Collapse
Affiliation(s)
- Bernard Mulvey
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Tomás Lagunas
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Joseph D Dougherty
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri.
| |
Collapse
|
150
|
Ainsworth HC, Howard TD, Langefeld CD. Intrinsic DNA topology as a prioritization metric in genomic fine-mapping studies. Nucleic Acids Res 2020; 48:11304-11321. [PMID: 33084892 PMCID: PMC7672465 DOI: 10.1093/nar/gkaa877] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 08/23/2020] [Accepted: 09/25/2020] [Indexed: 12/15/2022] Open
Abstract
In genomic fine-mapping studies, some approaches leverage annotation data to prioritize likely functional polymorphisms. However, existing annotation resources can present challenges as many lack information for novel variants and/or may be uninformative for non-coding regions. We propose a novel annotation source, sequence-dependent DNA topology, as a prioritization metric for fine-mapping. DNA topology and function are well-intertwined, and as an intrinsic DNA property, it is readily applicable to any genomic region. Here, we constructed and applied Minor Groove Width (MGW) as a prioritization metric. Using an established MGW-prediction method, we generated a MGW census for 199 038 197 SNPs across the human genome. Summarizing a SNP's change in MGW (ΔMGW) as a Euclidean distance, ΔMGW exhibited a strongly right-skewed distribution, highlighting the infrequency of SNPs that generate dissimilar shape profiles. We hypothesized that phenotypically-associated SNPs can be prioritized by ΔMGW. We tested this hypothesis in 116 regions analyzed by a Massively Parallel Reporter Assay and observed enrichment of large ΔMGW for functional polymorphisms (P = 0.0007). To illustrate application in fine-mapping studies, we applied our MGW-prioritization approach to three non-coding regions associated with systemic lupus erythematosus. Together, this study presents the first usage of sequence-dependent DNA topology as a prioritization metric in genomic association studies.
Collapse
Affiliation(s)
- Hannah C Ainsworth
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA.,Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Timothy D Howard
- Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA.,Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Carl D Langefeld
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA.,Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA.,Comprehensive Cancer Center of Wake Forest Baptist Medical Center, Winston-Salem, NC 27157, USA
| |
Collapse
|