1
|
Zou X, Gomez ZW, Reddy TE, Allen AS, Majoros WH. Bayesian Estimation of Allele-Specific Expression in the Presence of Phasing Uncertainty. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.09.607371. [PMID: 39211106 PMCID: PMC11361064 DOI: 10.1101/2024.08.09.607371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Motivation Allele-specific expression (ASE) analyses aim to detect imbalanced expression of maternal versus paternal copies of an autosomal gene. Such allelic imbalance can result from a variety of cis-acting causes, including disruptive mutations within one copy of a gene that impact the stability of transcripts, as well as regulatory variants outside the gene that impact transcription initiation. Current methods for ASE estimation suffer from a number of shortcomings, such as relying on only one variant within a gene, assuming perfect phasing information across multiple variants within a gene, or failing to account for alignment biases and possible genotyping errors. Results We developed BEASTIE, a Bayesian hierarchical model designed for precise ASE quantification at the gene level, based on given genotypes and RNA-Seq data. BEASTIE addresses the complexities of allelic mapping bias, genotyping error, and phasing errors by incorporating empirical phasing error rates derived from Genome-in-a-Bottle individual NA12878. BEASTIE surpasses existing methods in accuracy, especially in scenarios with high phasing errors. This improvement is critical for identifying rare genetic variants often obscured by such errors. Through rigorous validation on simulated data and application to real data from the 1000 Genomes Project, we establish the robustness of BEASTIE. These findings underscore the value of BEASTIE in revealing patterns of ASE across gene sets and pathways. Availability and Implementation The software is freely available from https://github.com/x811zou/BEASTIE . BEASTIE is available as Python source code and as a Docker image. Supplementary information Additional information is available online.
Collapse
|
2
|
Abedini SS, Akhavantabasi S, Liang Y, Heng JIT, Alizadehsani R, Dehzangi I, Bauer DC, Alinejad-Rokny H. A critical review of the impact of candidate copy number variants on autism spectrum disorder. MUTATION RESEARCH. REVIEWS IN MUTATION RESEARCH 2024; 794:108509. [PMID: 38977176 DOI: 10.1016/j.mrrev.2024.108509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 04/14/2024] [Accepted: 07/02/2024] [Indexed: 07/10/2024]
Abstract
Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder (NDD) influenced by genetic, epigenetic, and environmental factors. Recent advancements in genomic analysis have shed light on numerous genes associated with ASD, highlighting the significant role of both common and rare genetic mutations, as well as copy number variations (CNVs), single nucleotide polymorphisms (SNPs) and unique de novo variants. These genetic variations disrupt neurodevelopmental pathways, contributing to the disorder's complexity. Notably, CNVs are present in 10 %-20 % of individuals with autism, with 3 %-7 % detectable through cytogenetic methods. While the role of submicroscopic CNVs in ASD has been recently studied, their association with genomic loci and genes has not been thoroughly explored. In this review, we focus on 47 CNV regions linked to ASD, encompassing 1632 genes, including protein-coding genes and long non-coding RNAs (lncRNAs), of which 659 show significant brain expression. Using a list of ASD-associated genes from SFARI, we detect 17 regions harboring at least one known ASD-related protein-coding gene. Of the remaining 30 regions, we identify 24 regions containing at least one protein-coding gene with brain-enriched expression and a nervous system phenotype in mouse mutants, and one lncRNA with both brain-enriched expression and upregulation in iPSC to neuron differentiation. This review not only expands our understanding of the genetic diversity associated with ASD but also underscores the potential of lncRNAs in contributing to its etiology. Additionally, the discovered CNVs will be a valuable resource for future diagnostic, therapeutic, and research endeavors aimed at prioritizing genetic variations in ASD.
Collapse
Affiliation(s)
- Seyedeh Sedigheh Abedini
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW 2052, Australia; School of Biotechnology & Biomolecular Sciences, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Shiva Akhavantabasi
- Department of Molecular Biology and Genetics, Yeni Yuzyil University, Istanbul, Turkey; Ghiaseddin Jamshid Kashani University, Andisheh University Town, Danesh Blvd, 3441356611, Abyek, Qazvin, Iran
| | - Yuheng Liang
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Julian Ik-Tsen Heng
- Curtin Health Innovation Research Institute, Curtin University, Bentley 6845, Australia
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Victoria, Australia
| | - Iman Dehzangi
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA; Department of Computer Science, Rutgers University, Camden, NJ 08102, USA
| | - Denis C Bauer
- Transformational Bioinformatics, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, Australia; Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park, Australia
| | - Hamid Alinejad-Rokny
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW 2052, Australia; Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, Sydney, NSW 2052, Australia.
| |
Collapse
|
3
|
Mintoff D, Pace NP, Borg I. Interpreting the spectrum of gamma-secretase complex missense variation in the context of hidradenitis suppurativa—An in-silico study. Front Genet 2022; 13:962449. [PMID: 36118898 PMCID: PMC9478468 DOI: 10.3389/fgene.2022.962449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 08/08/2022] [Indexed: 11/23/2022] Open
Abstract
Hidradenitis suppurativa (HS) is a disease of the pilosebaceous unit characterized by recurrent nodules, abscesses and draining tunnels with a predilection to intertriginous skin. The pathophysiology of HS is complex. However, it is known that inflammation and hyperkeratinization at the hair follicle play crucial roles in disease manifestation. Genetic and environmental factors are considered the main drivers of these two pathophysiological processes. Despite a considerable proportion of patients having a positive family history of disease, only a minority of patients suffering from HS have been found to harbor monogenic variants which segregate to affected kindreds. Most of these variants are in the ɣ secretase complex (GSC) protein-coding genes. In this manuscript, we set out to characterize the burden of missense pathogenic variants in healthy reference population using large scale genomic dataset thereby providing a standard for comparing genomic variation in GSC protein-coding genes in the HS patient cohort.
Collapse
Affiliation(s)
- Dillon Mintoff
- Department of Pathology, Faculty of Medicine and Surgery, University of Malta, Msida, Malta
- Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta
| | - Nikolai P. Pace
- Centre for Molecular Biology and Biobanking, University of Malta, Msida, Malta
- Department of Anatomy, Faculty of Medicine and Surgery, University of Malta, Msida, Malta
- *Correspondence: Nikolai P. Pace,
| | - Isabella Borg
- Department of Pathology, Faculty of Medicine and Surgery, University of Malta, Msida, Malta
- Centre for Molecular Biology and Biobanking, University of Malta, Msida, Malta
- Department of Pathology, Mater Dei Hospital, Msida, Malta
| |
Collapse
|
4
|
Kalita CA, Gusev A. DeCAF: a novel method to identify cell-type specific regulatory variants and their role in cancer risk. Genome Biol 2022; 23:152. [PMID: 35804456 PMCID: PMC9264694 DOI: 10.1186/s13059-022-02708-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 06/15/2022] [Indexed: 01/09/2023] Open
Abstract
Here, we propose DeCAF (DEconvoluted cell type Allele specific Function), a new method to identify cell-fraction (cf) QTLs in tumors by leveraging both allelic and total expression information. Applying DeCAF to RNA-seq data from TCGA, we identify 3664 genes with cfQTLs (at 10% FDR) in 14 cell types, a 5.63× increase in discovery over conventional interaction-eQTL mapping. cfQTLs replicated in external cell-type-specific eQTL data are more enriched for cancer risk than conventional eQTLs. Our new method, DeCAF, empowers the discovery of biologically meaningful cfQTLs from bulk RNA-seq data in moderately sized studies.
Collapse
Affiliation(s)
- Cynthia A. Kalita
- grid.38142.3c000000041936754XDivision of Population Sciences, Dana–Farber Cancer Institute & Harvard Medical School, Boston, USA
| | - Alexander Gusev
- grid.38142.3c000000041936754XDivision of Population Sciences, Dana–Farber Cancer Institute & Harvard Medical School, Boston, USA ,grid.66859.340000 0004 0546 1623The Broad Institute, Boston, USA ,grid.62560.370000 0004 0378 8294Division of Genetics, Brigham & Women’s Hospital, Boston, USA
| |
Collapse
|
5
|
Rowlands CF, Taylor A, Rice G, Whiffin N, Hall HN, Newman WG, Black GCM, O'Keefe RT, Hubbard S, Douglas AGL, Baralle D, Briggs TA, Ellingford JM. MRSD: A quantitative approach for assessing suitability of RNA-seq in the investigation of mis-splicing in Mendelian disease. Am J Hum Genet 2022; 109:210-222. [PMID: 35065709 PMCID: PMC8874219 DOI: 10.1016/j.ajhg.2021.12.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 12/12/2021] [Indexed: 12/16/2022] Open
Abstract
Variable levels of gene expression between tissues complicates the use of RNA sequencing of patient biosamples to delineate the impact of genomic variants. Here, we describe a gene- and tissue-specific metric to inform the feasibility of RNA sequencing. This overcomes limitations of using expression values alone as a metric to predict RNA-sequencing utility. We have derived a metric, minimum required sequencing depth (MRSD), that estimates the depth of sequencing required from RNA sequencing to achieve user-specified sequencing coverage of a gene, transcript, or group of genes. We applied MRSD across four human biosamples: whole blood, lymphoblastoid cell lines (LCLs), skeletal muscle, and cultured fibroblasts. MRSD has high precision (90.1%-98.2%) and overcomes transcript region-specific sequencing biases. Applying MRSD scoring to established disease gene panels shows that fibroblasts, of these four biosamples, are the optimum source of RNA for 63.1% of gene panels. Using this approach, up to 67.8% of the variants of uncertain significance in ClinVar that are predicted to impact splicing could be assayed by RNA sequencing in at least one of the biosamples. We demonstrate the utility and benefits of MRSD as a metric to inform functional assessment of splicing aberrations, in particular in the context of Mendelian genetic disorders to improve diagnostic yield.
Collapse
Affiliation(s)
- Charlie F Rowlands
- Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PT, UK; Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9WL, UK
| | - Algy Taylor
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9WL, UK
| | - Gillian Rice
- Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PT, UK
| | - Nicola Whiffin
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - Hildegard Nikki Hall
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK
| | - William G Newman
- Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PT, UK; Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9WL, UK
| | - Graeme C M Black
- Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PT, UK; Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9WL, UK
| | - Raymond T O'Keefe
- Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PT, UK
| | - Simon Hubbard
- Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PT, UK
| | - Andrew G L Douglas
- Wessex Clinical Genetics Service, Princess Anne Hospital, University Hospital Southampton NHS Foundation Trust, Coxford Rd, Southampton SO16 5YA, UK; Faculty of Medicine, University of Southampton, Duthie Building, Southampton General Hospital, Tremona Road, Southampton SO16 6YD, UK
| | - Diana Baralle
- Wessex Clinical Genetics Service, Princess Anne Hospital, University Hospital Southampton NHS Foundation Trust, Coxford Rd, Southampton SO16 5YA, UK; Faculty of Medicine, University of Southampton, Duthie Building, Southampton General Hospital, Tremona Road, Southampton SO16 6YD, UK
| | - Tracy A Briggs
- Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PT, UK; Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9WL, UK
| | - Jamie M Ellingford
- Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PT, UK; Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M13 9WL, UK.
| |
Collapse
|
6
|
RNA-seq for revealing the function of the transcriptome. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00002-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
7
|
Sherbina K, León-Novelo LG, Nuzhdin SV, McIntyre LM, Marroni F. Power calculator for detecting allelic imbalance using hierarchical Bayesian model. BMC Res Notes 2021; 14:436. [PMID: 34838135 PMCID: PMC8626927 DOI: 10.1186/s13104-021-05851-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 11/15/2021] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Allelic imbalance (AI) is the differential expression of the two alleles in a diploid. AI can vary between tissues, treatments, and environments. Methods for testing AI exist, but methods are needed to estimate type I error and power for detecting AI and difference of AI between conditions. As the costs of the technology plummet, what is more important: reads or replicates? RESULTS We find that a minimum of 2400, 480, and 240 allele specific reads divided equally among 12, 5, and 3 replicates is needed to detect a 10, 20, and 30%, respectively, deviation from allelic balance in a condition with power > 80%. A minimum of 960 and 240 allele specific reads divided equally among 8 replicates is needed to detect a 20 or 30% difference in AI between conditions with comparable power. Higher numbers of replicates increase power more than adding coverage without affecting type I error. We provide a Python package that enables simulation of AI scenarios and enables individuals to estimate type I error and power in detecting AI and differences in AI between conditions.
Collapse
Affiliation(s)
- Katrina Sherbina
- Quantitative and Computational Biology Section, University of Southern California, Los Angeles, CA, 90046, USA
| | - Luis G León-Novelo
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston-School of Public Health, Houston, TX, 77030, USA
| | - Sergey V Nuzhdin
- Molecular and Computational Biology Section, University of Southern California, Los Angeles, CA, 90046, USA
| | - Lauren M McIntyre
- Genetics Institute and Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32603, USA
| | - Fabio Marroni
- Dipartimento di Scienze Agroalimentari, Ambientali e Animali, Università di Udine, 33100, Udine, Italy.
| |
Collapse
|
8
|
Teran NA, Nachun DC, Eulalio T, Ferraro NM, Smail C, Rivas MA, Montgomery SB. Nonsense-mediated decay is highly stable across individuals and tissues. Am J Hum Genet 2021; 108:1401-1408. [PMID: 34216550 DOI: 10.1016/j.ajhg.2021.06.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 06/09/2021] [Indexed: 10/21/2022] Open
Abstract
Precise interpretation of the effects of rare protein-truncating variants (PTVs) is important for accurate determination of variant impact. Current methods for assessing the ability of PTVs to induce nonsense-mediated decay (NMD) focus primarily on the position of the variant in the transcript. We used RNA sequencing of the Genotype Tissue Expression v.8 cohort to compute the efficiency of NMD using allelic imbalance for 2,320 rare (genome aggregation database minor allele frequency ≤ 1%) PTVs across 809 individuals in 49 tissues. We created an interpretable predictive model using penalized logistic regression in order to evaluate the comprehensive influence of variant annotation, tissue, and inter-individual variation on NMD. We found that variant position, allele frequency, the inclusion of ultra-rare and singleton variants, and conservation were predictive of allelic imbalance. Furthermore, we found that NMD effects were highly concordant across tissues and individuals. Due to this high consistency, we demonstrate in silico that utilizing peripheral tissues or cell lines provides accurate prediction of NMD for PTVs.
Collapse
|
9
|
Lindquist P, Madsen JS, Bräuner-Osborne H, Rosenkilde MM, Hauser AS. Mutational Landscape of the Proglucagon-Derived Peptides. Front Endocrinol (Lausanne) 2021; 12:698511. [PMID: 34220721 PMCID: PMC8248487 DOI: 10.3389/fendo.2021.698511] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 05/24/2021] [Indexed: 12/18/2022] Open
Abstract
Strong efforts have been placed on understanding the physiological roles and therapeutic potential of the proglucagon peptide hormones including glucagon, GLP-1 and GLP-2. However, little is known about the extent and magnitude of variability in the amino acid composition of the proglucagon precursor and its mature peptides. Here, we identified 184 unique missense variants in the human proglucagon gene GCG obtained from exome and whole-genome sequencing of more than 450,000 individuals across diverse sub-populations. This provides an unprecedented source of population-wide genetic variation data on missense mutations and insights into the evolutionary constraint spectrum of proglucagon-derived peptides. We show that the stereotypical peptides glucagon, GLP-1 and GLP-2 display fewer evolutionary alterations and are more likely to be functionally affected by genetic variation compared to the rest of the gene products. Elucidating the spectrum of genetic variations and estimating the impact of how a peptide variant may influence human physiology and pathophysiology through changes in ligand binding and/or receptor signalling, are vital and serve as the first important step in understanding variability in glucose homeostasis, amino acid metabolism, intestinal epithelial growth, bone strength, appetite regulation, and other key physiological parameters controlled by these hormones.
Collapse
Affiliation(s)
- Peter Lindquist
- Department of Biomedical Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jakob S. Madsen
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Hans Bräuner-Osborne
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Mette M. Rosenkilde
- Department of Biomedical Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Alexander S. Hauser
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
10
|
Zhang Z, van Dijk F, de Klein N, van Gijn ME, Franke LH, Sinke RJ, Swertz MA, van der Velde KJ. Feasibility of predicting allele specific expression from DNA sequencing using machine learning. Sci Rep 2021; 11:10606. [PMID: 34012022 PMCID: PMC8134421 DOI: 10.1038/s41598-021-89904-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 05/04/2021] [Indexed: 11/09/2022] Open
Abstract
Allele specific expression (ASE) concerns divergent expression quantity of alternative alleles and is measured by RNA sequencing. Multiple studies show that ASE plays a role in hereditary diseases by modulating penetrance or phenotype severity. However, genome diagnostics is based on DNA sequencing and therefore neglects gene expression regulation such as ASE. To take advantage of ASE in absence of RNA sequencing, it must be predicted using only DNA variation. We have constructed ASE models from BIOS (n = 3432) and GTEx (n = 369) that predict ASE using DNA features. These models are highly reproducible and comprise many different feature types, highlighting the complex regulation that underlies ASE. We applied the BIOS-trained model to population variants in three genes in which ASE plays a clinically relevant role: BRCA2, RET and NF1. This resulted in predicted ASE effects for 27 variants, of which 10 were known pathogenic variants. We demonstrated that ASE can be predicted from DNA features using machine learning. Future efforts may improve sensitivity and translate these models into a new type of genome diagnostic tool that prioritizes candidate pathogenic variants or regulators thereof for follow-up validation by RNA sequencing. All used code and machine learning models are available at GitHub and Zenodo.
Collapse
Affiliation(s)
- Zhenhua Zhang
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Freerk van Dijk
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Prinses Maxima Center for Child Oncology, Heidelberglaan 25, 3584 CS, Utrecht, The Netherlands
| | - Niek de Klein
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Mariëlle E van Gijn
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Lude H Franke
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Richard J Sinke
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Morris A Swertz
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - K Joeri van der Velde
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands.
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands.
| |
Collapse
|
11
|
Atak ZK, Taskiran II, Demeulemeester J, Flerin C, Mauduit D, Minnoye L, Hulselmans G, Christiaens V, Ghanem GE, Wouters J, Aerts S. Interpretation of allele-specific chromatin accessibility using cell state-aware deep learning. Genome Res 2021; 31:1082-1096. [PMID: 33832990 PMCID: PMC8168584 DOI: 10.1101/gr.260851.120] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 04/05/2021] [Indexed: 12/26/2022]
Abstract
Genomic sequence variation within enhancers and promoters can have a significant impact on the cellular state and phenotype. However, sifting through the millions of candidate variants in a personal genome or a cancer genome, to identify those that impact cis-regulatory function, remains a major challenge. Interpretation of noncoding genome variation benefits from explainable artificial intelligence to predict and interpret the impact of a mutation on gene regulation. Here we generate phased whole genomes with matched chromatin accessibility, histone modifications, and gene expression for 10 melanoma cell lines. We find that training a specialized deep learning model, called DeepMEL2, on melanoma chromatin accessibility data can capture the various regulatory programs of the melanocytic and mesenchymal-like melanoma cell states. This model outperforms motif-based variant scoring, as well as more generic deep learning models. We detect hundreds to thousands of allele-specific chromatin accessibility variants (ASCAVs) in each melanoma genome, of which 15%-20% can be explained by gains or losses of transcription factor binding sites. A considerable fraction of ASCAVs are caused by changes in AP-1 binding, as confirmed by matched ChIP-seq data to identify allele-specific binding of JUN and FOSL1. Finally, by augmenting the DeepMEL2 model with ChIP-seq data for GABPA, the TERT promoter mutation, as well as additional ETS motif gains, can be identified with high confidence. In conclusion, we present a new integrative genomics approach and a deep learning model to identify and interpret functional enhancer mutations with allelic imbalance of chromatin accessibility and gene expression.
Collapse
Affiliation(s)
- Zeynep Kalender Atak
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Ibrahim Ihsan Taskiran
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Jonas Demeulemeester
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium.,Cancer Genomics Laboratory, The Francis Crick Institute, London NW1 1AT, United Kingdom
| | - Christopher Flerin
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - David Mauduit
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Liesbeth Minnoye
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Gert Hulselmans
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Valerie Christiaens
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Ghanem-Elias Ghanem
- Institut Jules Bordet, Université Libre de Bruxelles, 1000 Brussels, Belgium
| | - Jasper Wouters
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Stein Aerts
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| |
Collapse
|
12
|
Ura H, Togi S, Niida Y. Targeted Double-Stranded cDNA Sequencing-Based Phase Analysis to Identify Compound Heterozygous Mutations and Differential Allelic Expression. BIOLOGY 2021; 10:biology10040256. [PMID: 33804940 PMCID: PMC8063809 DOI: 10.3390/biology10040256] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 03/22/2021] [Accepted: 03/22/2021] [Indexed: 11/16/2022]
Abstract
Simple Summary Phase analysis to distinguish between in cis and in trans heterozygous mutations is important for clinical diagnosis because in trans compound heterozygous mutations cause autosomal recessive diseases. However, conventional phase analysis is limited because of the large target size of genomic DNA. Here, we performed a targeted double-stranded cDNA sequencing-based phase analysis to resolve the limitation of distance using direct adapter ligation library preparation and paired-end sequencing; we elucidated that two heterozygous mutations on a patient with Wilson disease are in trans compound heterozygous mutations. Furthermore, we detected the differential allelic expression. Our results indicate that a targeted double-stranded cDNA sequencing-based phase analysis is useful for determining compound heterozygous mutations and confers information on allelic expression. Abstract There are two combinations of heterozygous mutation, i.e., in trans, which carries mutations on different alleles, and in cis, which carries mutations on the same allele. Because only in trans compound heterozygous mutations have been implicated in autosomal recessive diseases, it is important to distinguish them for clinical diagnosis. However, conventional phase analysis is limited because of the large target size of genomic DNA. Here, we performed a genetic analysis on a patient with Wilson disease, and we detected two heterozygous mutations chr13:51958362;G>GG (NM_000053.4:c.2304dup r.2304dup p.Met769HisfsTer26) and chr13:51964900;C>T (NM_000053.4:c.1841G>A r.1841g>a p.Gly614Asp) in the causative gene ATP7B. The distance between the two mutations was 6.5 kb in genomic DNA but 464 bp in mRNA. Targeted double-stranded cDNA sequencing-based phase analysis was performed using direct adapter ligation library preparation and paired-end sequencing, and we elucidated they are in trans compound heterozygous mutations. Trio analysis showed that the mutation (chr13:51964900;C>T) derived from the father and the other mutation from the mother, validating that the mutations are in trans composition. Furthermore, targeted double-stranded cDNA sequencing-based phase analysis detected the differential allelic expression, suggesting that the mutation (chr13:51958362;G>GG) caused downregulation of expression by nonsense-mediated mRNA decay. Our results indicate that targeted double-stranded cDNA sequencing-based phase analysis is useful for determining compound heterozygous mutations and confers information on allelic expression.
Collapse
Affiliation(s)
- Hiroki Ura
- Center for Clinical Genomics, Kanazawa Medical University Hospital, 1-1 Daigaku, Uchinada, Kahoku, Ishikawa 920-0923, Japan; (S.T.); (Y.N.)
- Division of Genomic Medicine, Department of Advanced Medicine, Medical Research Institute, Kanazawa Medical University, 1-1 Daigaku, Uchinada, Kahoku, Ishikawa 920-0923, Japan
- Correspondence: ; Tel.: +81-076-286-2211 (ext. 8353)
| | - Sumihito Togi
- Center for Clinical Genomics, Kanazawa Medical University Hospital, 1-1 Daigaku, Uchinada, Kahoku, Ishikawa 920-0923, Japan; (S.T.); (Y.N.)
- Division of Genomic Medicine, Department of Advanced Medicine, Medical Research Institute, Kanazawa Medical University, 1-1 Daigaku, Uchinada, Kahoku, Ishikawa 920-0923, Japan
| | - Yo Niida
- Center for Clinical Genomics, Kanazawa Medical University Hospital, 1-1 Daigaku, Uchinada, Kahoku, Ishikawa 920-0923, Japan; (S.T.); (Y.N.)
- Division of Genomic Medicine, Department of Advanced Medicine, Medical Research Institute, Kanazawa Medical University, 1-1 Daigaku, Uchinada, Kahoku, Ishikawa 920-0923, Japan
| |
Collapse
|
13
|
aScan: A Novel Method for the Study of Allele Specific Expression in Single Individuals. J Mol Biol 2021; 433:166829. [PMID: 33508309 DOI: 10.1016/j.jmb.2021.166829] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 01/08/2021] [Accepted: 01/09/2021] [Indexed: 02/06/2023]
Abstract
In diploid organisms, two copies of each allele are normally inherited from parents. Paternal and maternal alleles can be regulated and expressed unequally, which is referred to as allele-specific expression (ASE). In this work, we present aScan, a novel method for the identification of ASE from the analysis of matched individual genomic and RNA sequencing data. By performing extensive analyses of both real and simulated data, we demonstrate that aScan can correctly identify ASE with high accuracy and sensitivity in different experimental settings. Additionally, by applying our method to a small cohort of individuals that are not included in publicly available databases of human genetic variation, we outline the value of possible applications of ASE analysis in single individuals for deriving a more accurate annotation of "private" low-frequency genetic variants associated with regulatory effects on transcription. All in all, we believe that aScan will represent a beneficial addition to the set of bioinformatics tools for the analysis of ASE. Finally, while our method was initially conceived for the analysis of RNA-seq data, it can in principle be applied to any quantitative NGS assay for which matched genotypic and expression data are available. AVAILABILITY: aScan is currently available in the form of an open source standalone software package at: https://github.com/Federico77z/aScan/. aScan version 1.0.3, available at https://github.com/Federico77z/aScan/releases/tag/1.0.3, has been used for all the analyses included in this manuscript. A Docker image of the tool has also been made available at https://github.com/pmandreoli/aScanDocker.
Collapse
|
14
|
Genome-wide analysis of spatiotemporal allele-specific expression in F1 hybrids of meat- and egg-type chickens. Gene 2020; 747:144671. [PMID: 32304782 DOI: 10.1016/j.gene.2020.144671] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 04/04/2020] [Accepted: 04/12/2020] [Indexed: 12/20/2022]
Abstract
In diploid organisms, each gene locus is composed of two parental alleles, which would interact with each other for determining the phenotypic variation. Better understanding of the allele-specific expression (ASE) in farm animals is much important to explore the genetic basis underlying economically important traits, which have been poorly understood yet. In this study, genome-wide analysis was applied to explore the spatiotemporal pattern of ASE in the F1 hybrids of chicken. First, meat- and egg-type chickens were selected for producing a full-sib F1 hybrid population (n = 57). Then, genome resequencing of two parents and 38 offspring were performed and liver and breast muscle samples (n = 38) were subjected to strand-specific RNA sequencing (ssRNA-seq) for ASE detection at 1, 28, and 56 days of age, respectively. The results accurately identified a total of 465 informative genes that could be distinguished with respect to their parental origins. There were 0.4% - 4.1% of informative genes showing ASE, and 57 of them were found across different tissues and time points. Besides, most ASE genes in chickens were tissue-specific, and no matter what the time-point pattern of one ASE gene, the same parental allele of this gene almost showed consistently higher or lower expression across all time points in the same type tissue. In conclusion, this study indicated that most of ASE genes were tissue-specific and time-dependent.
Collapse
|
15
|
Frochaux MV, Bou Sleiman M, Gardeux V, Dainese R, Hollis B, Litovchenko M, Braman VS, Andreani T, Osman D, Deplancke B. cis-regulatory variation modulates susceptibility to enteric infection in the Drosophila genetic reference panel. Genome Biol 2020; 21:6. [PMID: 31948474 PMCID: PMC6966807 DOI: 10.1186/s13059-019-1912-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 12/05/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Resistance to enteric pathogens is a complex trait at the crossroads of multiple biological processes. We have previously shown in the Drosophila Genetic Reference Panel (DGRP) that resistance to infection is highly heritable, but our understanding of how the effects of genetic variants affect different molecular mechanisms to determine gut immunocompetence is still limited. RESULTS To address this, we perform a systems genetics analysis of the gut transcriptomes from 38 DGRP lines that were orally infected with Pseudomonas entomophila. We identify a large number of condition-specific, expression quantitative trait loci (local-eQTLs) with infection-specific ones located in regions enriched for FOX transcription factor motifs. By assessing the allelic imbalance in the transcriptomes of 19 F1 hybrid lines from a large round robin design, we independently attribute a robust cis-regulatory effect to only 10% of these detected local-eQTLs. However, additional analyses indicate that many local-eQTLs may act in trans instead. Comparison of the transcriptomes of DGRP lines that were either susceptible or resistant to Pseudomonas entomophila infection reveals nutcracker as the only differentially expressed gene. Interestingly, we find that nutcracker is linked to infection-specific eQTLs that correlate with its expression level and to enteric infection susceptibility. Further regulatory analysis reveals one particular eQTL that significantly decreases the binding affinity for the repressor Broad, driving differential allele-specific nutcracker expression. CONCLUSIONS Our collective findings point to a large number of infection-specific cis- and trans-acting eQTLs in the DGRP, including one common non-coding variant that lowers enteric infection susceptibility.
Collapse
Affiliation(s)
- Michael V. Frochaux
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Maroun Bou Sleiman
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Current Address: Laboratory of Integrative Systems Physiology, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Vincent Gardeux
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Riccardo Dainese
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Brian Hollis
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Current Address: Department of Biological Sciences, University of South Carolina, Columbia, South Carolina USA
| | - Maria Litovchenko
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Virginie S. Braman
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Tommaso Andreani
- Computational Biology and Data Mining Group, Institute of Molecular Biology, Johannes Gutenberg-Universität Mainz, Mainz, Germany
| | - Dani Osman
- Faculty of Sciences III and Azm Center for Research in Biotechnology and its Applications, LBA3B, EDST, Lebanese University, Tripoli, 1300 Lebanon
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
16
|
Wang Q, Jia Y, Wang Y, Jiang Z, Zhou X, Zhang Z, Nie C, Li J, Yang N, Qu L. Evolution of cis- and trans-regulatory divergence in the chicken genome between two contrasting breeds analyzed using three tissue types at one-day-old. BMC Genomics 2019; 20:933. [PMID: 31805870 PMCID: PMC6896592 DOI: 10.1186/s12864-019-6342-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 11/27/2019] [Indexed: 11/10/2022] Open
Abstract
Background Gene expression variation is a key underlying factor influencing phenotypic variation, and can occur via cis- or trans-regulation. To understand the role of cis- and trans-regulatory variation on population divergence in chicken, we developed reciprocal crosses of two chicken breeds, White Leghorn and Cornish Game, which exhibit major differences in body size and reproductive traits, and used them to determine the degree of cis versus trans variation in the brain, liver, and muscle tissue of male and female 1-day-old specimens. Results We provided an overview of how transcriptomes are regulated in hybrid progenies of two contrasting breeds based on allele specific expression analysis. Compared with cis-regulatory divergence, trans-acting genes were more extensive in the chicken genome. In addition, considerable compensatory cis- and trans-regulatory changes exist in the chicken genome. Most importantly, stronger purifying selection was observed on genes regulated by trans-variations than in genes regulated by the cis elements. Conclusions We present a pipeline to explore allele-specific expression in hybrid progenies of inbred lines without a specific reference genome. Our research is the first study to describe the regulatory divergence between two contrasting breeds. The results suggest that artificial selection associated with domestication in chicken could have acted more on trans-regulatory divergence than on cis-regulatory divergence.
Collapse
Affiliation(s)
- Qiong Wang
- State Key Laboratory of Animal Nutrition, Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China.,Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture and Rural, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, China
| | - Yaxiong Jia
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yuan Wang
- Department of Animal Science and Technology, Qingdao Agricultural University, Qingdao, China
| | - Zhihua Jiang
- Department of Animal Sciences, Center for Reproductive Biology, Veterinary and Biomedical Research Building, Washington State University, Pullman, USA
| | - Xiang Zhou
- College of Animal Sciences and Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Zebin Zhang
- State Key Laboratory of Animal Nutrition, Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Changsheng Nie
- State Key Laboratory of Animal Nutrition, Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Junying Li
- State Key Laboratory of Animal Nutrition, Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Ning Yang
- State Key Laboratory of Animal Nutrition, Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Lujiang Qu
- State Key Laboratory of Animal Nutrition, Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China.
| |
Collapse
|
17
|
Lee C, Kang EY, Gandal MJ, Eskin E, Geschwind DH. Profiling allele-specific gene expression in brains from individuals with autism spectrum disorder reveals preferential minor allele usage. Nat Neurosci 2019; 22:1521-1532. [PMID: 31455884 PMCID: PMC6750256 DOI: 10.1038/s41593-019-0461-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 07/09/2019] [Indexed: 12/21/2022]
Abstract
One fundamental but understudied mechanism of gene regulation in disease is allele-specific expression (ASE), the preferential expression of one allele. We leveraged RNA-sequencing data from human brain to assess ASE in autism spectrum disorder (ASD). When ASE is observed in ASD, the allele with lower population frequency (minor allele) is preferentially more highly expressed than the major allele, opposite to the canonical pattern. Importantly, genes showing ASE in ASD are enriched in those downregulated in ASD postmortem brains and in genes harboring de novo mutations in ASD. Two regions, 14q32 and 15q11, containing all known orphan C/D box small nucleolar RNAs (snoRNAs), are particularly enriched in shifts to higher minor allele expression. We demonstrate that this allele shifting enhances snoRNA-targeted splicing changes in ASD-related target genes in idiopathic ASD and 15q11-q13 duplication syndrome. Together, these results implicate allelic imbalance and dysregulation of orphan C/D box snoRNAs in ASD pathogenesis.
Collapse
Affiliation(s)
- Changhoon Lee
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Neuroscience, Peter O'Donnell Jr. Brain Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Eun Yong Kang
- Department of Computer Science, Henry Samueli School of Engineering, University of California, Los Angeles, Los Angeles, CA, USA
| | - Michael J Gandal
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Center for Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Eleazar Eskin
- Department of Computer Science, Henry Samueli School of Engineering, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Daniel H Geschwind
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
- Center for Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
- Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
18
|
Miao Z, Alvarez M, Pajukanta P, Ko A. ASElux: an ultra-fast and accurate allelic reads counter. Bioinformatics 2019; 34:1313-1320. [PMID: 29186329 DOI: 10.1093/bioinformatics/btx762] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 11/22/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Mapping bias causes preferential alignment to the reference allele, forming a major obstacle in allele-specific expression (ASE) analysis. The existing methods, such as simulation and SNP-aware alignment, are either inaccurate or relatively slow. To fast and accurately count allelic reads for ASE analysis, we developed a novel approach, ASElux, which utilizes the personal SNP information and counts allelic reads directly from unmapped RNA-sequence (RNA-seq) data. ASElux significantly reduces runtime by disregarding reads outside single nucleotide polymorphisms (SNPs) during the alignment. Results When compared to other tools on simulated and experimental data, ASElux achieves a higher accuracy on ASE estimation than non-SNP-aware aligners and requires a much shorter time than the benchmark SNP-aware aligner, GSNAP with just a slight loss in performance. ASElux can process 40 million read-pairs from an RNA-sequence (RNA-seq) sample and count allelic reads within 10 min, which is comparable to directly counting the allelic reads from alignments based on other tools. Furthermore, processing an RNA-seq sample using ASElux in conjunction with a general aligner, such as STAR, is more accurate and still ∼4× faster than STAR + WASP, and ∼33× faster than the lead SNP-aware aligner, GSNAP, making ASElux ideal for ASE analysis of large-scale transcriptomic studies. We applied ASElux to 273 lung RNA-seq samples from GTEx and identified a splice-QTL rs11078928 in lung which explains the mechanism underlying an asthma GWAS SNP rs11078927. Thus, our analysis demonstrated ASE as a highly powerful complementary tool to cis-expression quantitative trait locus (eQTL) analysis. Availability and implementation The software can be downloaded from https://github.com/abl0719/ASElux. Contact zmiao@ucla.edu or a5ko@ucla.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zong Miao
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90024, USA.,Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA 90024, USA
| | - Marcus Alvarez
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90024, USA
| | - Päivi Pajukanta
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90024, USA.,Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA 90024, USA.,Molecular Biology Institute, UCLA, Los Angeles, CA 90024, USA
| | - Arthur Ko
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90024, USA.,Molecular Biology Institute, UCLA, Los Angeles, CA 90024, USA
| |
Collapse
|
19
|
Zhao C, Xie S, Wu H, Luan Y, Hu S, Ni J, Lin R, Zhao S, Zhang D, Li X. Quantification of allelic differential expression using a simple Fluorescence primer PCR-RFLP-based method. Sci Rep 2019; 9:6334. [PMID: 31004110 PMCID: PMC6474871 DOI: 10.1038/s41598-019-42815-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Accepted: 03/29/2019] [Indexed: 12/04/2022] Open
Abstract
Allelic differential expression (ADE) is common in diploid organisms, and is often the key reason for specific phenotype variations. Thus, ADE detection is important for identification of major genes and causal mutations. To date, sensitive and simple methods to detect ADE are still lacking. In this study, we have developed an accurate, simple, and sensitive method, named fluorescence primer PCR-RFLP quantitative method (fPCR-RFLP), for ADE analysis. This method involves two rounds of PCR amplification using a pair of primers, one of which is double-labeled with an overhang 6-FAM. The two alleles are then separated by RFLP and quantified by fluorescence density. fPCR-RFLP could precisely distinguish ADE cross a range of 1- to 32-fold differences. Using this method, we verified PLAG1 and KIT, two candidate genes related to growth rate and immune response traits of pigs, to be ADE both at different developmental stages and in different tissues. Our data demonstrates that fPCR-RFLP is an accurate and sensitive method for detecting ADE on both DNA and RNA level. Therefore, this powerful tool provides a way to analyze mutations that cause ADE.
Collapse
Affiliation(s)
- Changzhi Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Shengsong Xie
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China.,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Hui Wu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Yu Luan
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Suqin Hu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Juan Ni
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Ruiyi Lin
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China.,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Dingxiao Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China. .,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China.
| | - Xinyun Li
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China. .,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China.
| |
Collapse
|
20
|
Liu Z, Dong X, Li Y. A Genome-Wide Study of Allele-Specific Expression in Colorectal Cancer. Front Genet 2018; 9:570. [PMID: 30538721 PMCID: PMC6277598 DOI: 10.3389/fgene.2018.00570] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 11/06/2018] [Indexed: 12/30/2022] Open
Abstract
Accumulating evidence from small-scale studies has suggested that allele-specific expression (ASE) plays an important role in tumor initiation and progression. However, little is known about genome-wide ASE in tumors. In this study, we conducted a comprehensive analysis of ASE in individuals with colorectal cancer (CRC) on a genome-wide scale. We identified 5.4 thousand genome-wide ASEs of single nucleotide variations (SNVs) from tumor and normal tissues of 59 individuals with CRC. We observed an increased ASE level in tumor samples and the ASEs enriched as hotspots on the genome. Around 63% of the genes located there were previously reported to contain complex regulatory elements, e.g., human leukocyte antigen (HLA), or were implicated in tumor progression. Focussing on the allelic expression of somatic mutations, we found that 37.5% of them exhibited ASE, and genes harboring such somatic mutations, were enriched in important pathways implicated in cancers. In addition, by comparing the expected and observed ASE events in tumor samples, we identified 50 tumor specific ASEs which possibly contributed to the somatic events in the regulatory regions of the genes and significantly enriched known cancer driver genes. By analyzing CRC ASEs from several perspectives, we provided a systematic understanding of how ASE is implicated in both tumor and normal tissues and will be of critical value in guiding ASE studies in cancer.
Collapse
Affiliation(s)
- Zhi Liu
- Department of Epidemiology and Biostatistics, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Xiao Dong
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, United States
| | - Yixue Li
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.,Shanghai Center for Bioinformation Technology, Shanghai Industrial Technology Institute, Shanghai, China.,Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| |
Collapse
|
21
|
Lin CY, Chang KW, Lin CY, Wu JY, Coon H, Huang PH, Ho HN, Akbarian S, Gau SSF, Huang HS. Allele-specific expression in a family quartet with autism reveals mono-to-biallelic switch and novel transcriptional processes of autism susceptibility genes. Sci Rep 2018; 8:4277. [PMID: 29523860 PMCID: PMC5844893 DOI: 10.1038/s41598-018-22753-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 02/28/2018] [Indexed: 02/07/2023] Open
Abstract
Autism spectrum disorder (ASD) is a highly prevalent neurodevelopmental disorder, and the exact causal mechanism is unknown. Dysregulated allele-specific expression (ASE) has been identified in persons with ASD; however, a comprehensive analysis of ASE has not been conducted in a family quartet with ASD. To fill this gap, we analyzed ASE using genomic DNA from parent and offspring and RNA from offspring's postmortem prefrontal cortex (PFC); one of the two offspring had been diagnosed with ASD. DNA- and RNA-sequencing revealed distinct ASE patterns from the PFC of both offspring. However, only the PFC of the offspring with ASD exhibited a mono-to-biallelic switch for LRP2BP and ZNF407. We also identified a novel site of RNA-editing in KMT2C in addition to new monoallelically-expressed genes and miRNAs. Our results demonstrate the prevalence of ASE in human PFC and ASE abnormalities in the PFC of a person with ASD. Taken together, these findings may provide mechanistic insights into the pathogenesis of ASD.
Collapse
Affiliation(s)
- Chun-Yen Lin
- Graduate Institute of Brain and Mind Sciences, College of Medicine, National Taiwan University, Taipei, 10051, Taiwan
- Department of Pediatrics, Yong-He Cardinal Tien Hospital, Taipei, Taiwan
| | - Kai-Wei Chang
- Graduate Institute of Brain and Mind Sciences, College of Medicine, National Taiwan University, Taipei, 10051, Taiwan
| | - Chia-Yi Lin
- Graduate Institute of Brain and Mind Sciences, College of Medicine, National Taiwan University, Taipei, 10051, Taiwan
| | - Jia-Ying Wu
- Graduate Institute of Brain and Mind Sciences, College of Medicine, National Taiwan University, Taipei, 10051, Taiwan
| | - Hilary Coon
- Department of Psychiatry, University of Utah School of Medicine, Salt Lake City, UT, 84108, USA
| | - Pei-Hsin Huang
- Department of Pathology, National Taiwan University Hospital and College of Medicine, National Taiwan University, Taipei, 10051, Taiwan
| | - Hong-Nerng Ho
- Department of Obstetrics and Gynecology, National Taiwan University Hospital and College of Medicine, National Taiwan University, Taipei, 10051, Taiwan
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, 10051, Taiwan
| | - Schahram Akbarian
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, NY, 10029, USA
| | - Susan Shur-Fen Gau
- Graduate Institute of Brain and Mind Sciences, College of Medicine, National Taiwan University, Taipei, 10051, Taiwan
- Department of Psychiatry, National Taiwan University Hospital and College of Medicine, National Taiwan University, Taipei, 10051, Taiwan
| | - Hsien-Sung Huang
- Graduate Institute of Brain and Mind Sciences, College of Medicine, National Taiwan University, Taipei, 10051, Taiwan.
- Neurodevelopment Club in Taiwan, Taipei, 10051, Taiwan.
| |
Collapse
|
22
|
Van Baak TE, Coarfa C, Dugué PA, Fiorito G, Laritsky E, Baker MS, Kessler NJ, Dong J, Duryea JD, Silver MJ, Saffari A, Prentice AM, Moore SE, Ghantous A, Routledge MN, Gong YY, Herceg Z, Vineis P, Severi G, Hopper JL, Southey MC, Giles GG, Milne RL, Waterland RA. Epigenetic supersimilarity of monozygotic twin pairs. Genome Biol 2018; 19:2. [PMID: 29310692 PMCID: PMC5759268 DOI: 10.1186/s13059-017-1374-0] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Accepted: 12/06/2017] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Monozygotic twins have long been studied to estimate heritability and explore epigenetic influences on phenotypic variation. The phenotypic and epigenetic similarities of monozygotic twins have been assumed to be largely due to their genetic identity. RESULTS Here, by analyzing data from a genome-scale study of DNA methylation in monozygotic and dizygotic twins, we identified genomic regions at which the epigenetic similarity of monozygotic twins is substantially greater than can be explained by their genetic identity. This "epigenetic supersimilarity" apparently results from locus-specific establishment of epigenotype prior to embryo cleavage during twinning. Epigenetically supersimilar loci exhibit systemic interindividual epigenetic variation and plasticity to periconceptional environment and are enriched in sub-telomeric regions. In case-control studies nested in a prospective cohort, blood DNA methylation at these loci years before diagnosis is associated with risk of developing several types of cancer. CONCLUSIONS These results establish a link between early embryonic epigenetic development and adult disease. More broadly, epigenetic supersimilarity is a previously unrecognized phenomenon that may contribute to the phenotypic similarity of monozygotic twins.
Collapse
Affiliation(s)
- Timothy E Van Baak
- USDA/ARS Children's Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Cristian Coarfa
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Pierre-Antoine Dugué
- Cancer Epidemiology and Intelligence Division, Cancer Council Victoria, Melbourne, VIC, Australia
- Centre for Epidemiology and Biostatistics, Melbourne School for Global and Population Health, University of Melbourne, Melbourne, VIC, Australia
| | - Giovanni Fiorito
- Department of Medical Sciences, University of Torino and Italian Institute for Genomic Medicine, Torino, Italy
| | - Eleonora Laritsky
- USDA/ARS Children's Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Maria S Baker
- USDA/ARS Children's Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Noah J Kessler
- MRC Unit The Gambia, Keneba, Gambia
- MRC International Nutrition Group at LSHTM, London, UK
| | - Jianrong Dong
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Jack D Duryea
- USDA/ARS Children's Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Matt J Silver
- MRC Unit The Gambia, Keneba, Gambia
- MRC International Nutrition Group at LSHTM, London, UK
| | - Ayden Saffari
- MRC Unit The Gambia, Keneba, Gambia
- MRC International Nutrition Group at LSHTM, London, UK
| | - Andrew M Prentice
- MRC Unit The Gambia, Keneba, Gambia
- MRC International Nutrition Group at LSHTM, London, UK
| | - Sophie E Moore
- MRC Unit The Gambia, Keneba, Gambia
- Division of Women's Health, King's College London, London, UK
| | - Akram Ghantous
- Epigenetics Group, International Agency for Research on Cancer, Lyon, France
| | | | - Yun Yun Gong
- School of Food Science & Nutrition, University of Leeds, Leeds, UK
| | - Zdenko Herceg
- Epigenetics Group, International Agency for Research on Cancer, Lyon, France
| | - Paolo Vineis
- MRC-PHE Center for Environment and Health, School of Public Health, Imperial College London, London, UK
- Italian Institute for Genomic Medicine, Torino, Italy
| | - Gianluca Severi
- Centre for Epidemiology and Biostatistics, Melbourne School for Global and Population Health, University of Melbourne, Melbourne, VIC, Australia
- Italian Institute for Genomic Medicine, Torino, Italy
- CESP Inserm, Facultés de medicine Université Paris-Sud, Paris, France
| | - John L Hopper
- Centre for Epidemiology and Biostatistics, Melbourne School for Global and Population Health, University of Melbourne, Melbourne, VIC, Australia
| | - Melissa C Southey
- Genetic Epidemiology Laboratory, Department of Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - Graham G Giles
- Cancer Epidemiology and Intelligence Division, Cancer Council Victoria, Melbourne, VIC, Australia
- Centre for Epidemiology and Biostatistics, Melbourne School for Global and Population Health, University of Melbourne, Melbourne, VIC, Australia
| | - Roger L Milne
- Cancer Epidemiology and Intelligence Division, Cancer Council Victoria, Melbourne, VIC, Australia
- Centre for Epidemiology and Biostatistics, Melbourne School for Global and Population Health, University of Melbourne, Melbourne, VIC, Australia
| | - Robert A Waterland
- USDA/ARS Children's Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
23
|
Hauser AS, Chavali S, Masuho I, Jahn LJ, Martemyanov KA, Gloriam DE, Babu MM. Pharmacogenomics of GPCR Drug Targets. Cell 2017; 172:41-54.e19. [PMID: 29249361 PMCID: PMC5766829 DOI: 10.1016/j.cell.2017.11.033] [Citation(s) in RCA: 392] [Impact Index Per Article: 56.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2017] [Revised: 09/11/2017] [Accepted: 11/16/2017] [Indexed: 12/14/2022]
Abstract
Natural genetic variation in the human genome is a cause of individual differences in responses to medications and is an underappreciated burden on public health. Although 108 G-protein-coupled receptors (GPCRs) are the targets of 475 (∼34%) Food and Drug Administration (FDA)-approved drugs and account for a global sales volume of over 180 billion US dollars annually, the prevalence of genetic variation among GPCRs targeted by drugs is unknown. By analyzing data from 68,496 individuals, we find that GPCRs targeted by drugs show genetic variation within functional regions such as drug- and effector-binding sites in the human population. We experimentally show that certain variants of μ-opioid and Cholecystokinin-A receptors could lead to altered or adverse drug response. By analyzing UK National Health Service drug prescription and sales data, we suggest that characterizing GPCR variants could increase prescription precision, improving patients’ quality of life, and relieve the economic and societal burden due to variable drug responsiveness. Video Abstract
GPCRs targeted by FDA-approved drugs show genetic variation in the human population Genetic variation occurs in functional sites and may result in altered drug response We present an online resource of GPCR genetic variants for pharmacogenomics research Understanding variation in drug targets may help alleviate economic healthcare burden
Collapse
Affiliation(s)
- Alexander S Hauser
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK; Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark.
| | - Sreenivas Chavali
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Ikuo Masuho
- Department of Neuroscience, The Scripps Research Institute Florida, Jupiter, FL 33458, USA
| | - Leonie J Jahn
- The Novo Nordisk Foundation Center for Biosustainability, Technical University Denmark, Kemitorvet 2800 Kgs. Lyngby, Denmark
| | - Kirill A Martemyanov
- Department of Neuroscience, The Scripps Research Institute Florida, Jupiter, FL 33458, USA
| | - David E Gloriam
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| | - M Madan Babu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK.
| |
Collapse
|
24
|
Stranger BE, Brigham LE, Hasz R, Hunter M, Johns C, Johnson M, Kopen G, Leinweber WF, Lonsdale JT, McDonald A, Mestichelli B, Myer K, Roe B, Salvatore M, Shad S, Thomas JA, Walters G, Washington M, Wheeler J, Bridge J, Foster BA, Gillard BM, Karasik E, Kumar R, Miklos M, Moser MT, Jewell SD, Montroy RG, Rohrer DC, Valley D, Davis DA, Mash DC, Gould SE, Guan P, Koester S, Little AR, Martin C, Moore HM, Rao A, Struewing JP, Volpi S, Hansen KD, Hickey PF, Rizzardi LF, Hou L, Liu Y, Molinie B, Park Y, Rinaldi N, Wang LB, Van Wittenberghe N, Claussnitzer M, Gelfand ET, Li Q, Linder S, Smith KS, Tsang EK, Demanelis K, Doherty JA, Jasmine F, Kibriya MG, Jiang L, Lin S, Wang M, Jian R, Li X, Chan J, Bates D, Diegel M, Halow J, Haugen E, Johnson A, Kaul R, Lee K, Maurano MT, Nelson J, Neri FJ, Sandstrom R, Fernando MS, Linke C, Oliva M, Skol A, Wu F, Akey JM, Feinberg AP, Li JB, Pierce BL, Stamatoyannopoulos JA, Tang H, Ardlie KG, Kellis M, Snyder MP, Montgomery SB. Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease. Nat Genet 2017; 49:1664-1670. [PMID: 29019975 PMCID: PMC6636856 DOI: 10.1038/ng.3969] [Citation(s) in RCA: 133] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Genetic variants have been associated with myriad molecular phenotypes that provide new insight into the range of mechanisms underlying genetic traits and diseases. Identifying any particular genetic variant's cascade of effects, from molecule to individual, requires assaying multiple layers of molecular complexity. We introduce the Enhancing GTEx (eGTEx) project that extends the GTEx project to combine gene expression with additional intermediate molecular measurements on the same tissues to provide a resource for studying how genetic differences cascade through molecular phenotypes to impact human health.
Collapse
Affiliation(s)
- Barbara E. Stranger
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
- Institute for Genomics and Systems Biology, The University of Chicago, Chicago, IL 60637, USA
- Center for Data Intensive Science, The University of Chicago, Chicago, IL 60637, USA
| | - Lori E. Brigham
- Washington Regional Transplant Community, Annandale, VA 22003, USA
| | - Richard Hasz
- Gift of Life Donor Program, Philadelphia, PA 19103, USA
| | | | | | | | - Gene Kopen
- National Disease Research Interchange, Philadelphia, PA 19103, USA
| | | | - John T. Lonsdale
- National Disease Research Interchange, Philadelphia, PA 19103, USA
| | - Alisa McDonald
- National Disease Research Interchange, Philadelphia, PA 19103, USA
| | | | | | | | | | - Saboor Shad
- National Disease Research Interchange, Philadelphia, PA 19103, USA
| | | | | | | | - Joseph Wheeler
- Center for Organ Recovery and Education, Pittsburgh, PA 15238, USA
| | | | - Barbara A. Foster
- Pharmacology and Therapeutics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| | - Bryan M. Gillard
- Pharmacology and Therapeutics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| | - Ellen Karasik
- Pharmacology and Therapeutics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| | - Rachna Kumar
- Pharmacology and Therapeutics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| | - Mark Miklos
- Pharmacology and Therapeutics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| | - Michael T. Moser
- Pharmacology and Therapeutics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| | | | | | | | - Dana Valley
- Van Andel Research Institute, Grand Rapids, MI 49503, USA
| | - David A. Davis
- Brain Endowment Bank, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Deborah C. Mash
- Brain Endowment Bank, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Sarah E. Gould
- Division of Genomic Medicine, National Human Genome Research Institute, Rockville, MD 20852, USA
| | - Ping Guan
- Biorepositories and Biospecimen Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD 20892, USA
| | - Susan Koester
- Division of Neuroscience and Basic Behavioral Science, National Institute of Mental Health, NIH, Bethesda, MD 20892, USA
| | - A. Roger Little
- National Institute on Drug Abuse, NIH, Bethesda, MD 20892, USA
| | - Casey Martin
- Division of Genomic Medicine, National Human Genome Research Institute, Rockville, MD 20852, USA
| | - Helen M. Moore
- Biorepositories and Biospecimen Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD 20892, USA
| | - Abhi Rao
- Biorepositories and Biospecimen Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD 20892, USA
| | - Jeffery P. Struewing
- Division of Genomic Medicine, National Human Genome Research Institute, Rockville, MD 20852, USA
| | - Simona Volpi
- Division of Genomic Medicine, National Human Genome Research Institute, Rockville, MD 20852, USA
| | - Kasper D. Hansen
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Peter F. Hickey
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Lindsay F. Rizzardi
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Lei Hou
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- The Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA
| | - Yaping Liu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- The Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA
| | - Benoit Molinie
- The Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA
| | - Yongjin Park
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- The Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA
| | - Nicola Rinaldi
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- The Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA
| | - Li B. Wang
- The Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA
| | - Nicholas Van Wittenberghe
- The Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA
| | - Melina Claussnitzer
- The Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA
- Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
- Technical University Munich, 8350 Freising, Germany
| | - Ellen T. Gelfand
- The Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA
| | - Qin Li
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Sandra Linder
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
| | - Kevin S. Smith
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
| | - Emily K. Tsang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
- Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA
| | - Kathryn Demanelis
- Department of Public Health Sciences, The University of Chicago, Chicago, IL 60637, USA
| | - Jennifer A. Doherty
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH 03756, USA
| | - Farzana Jasmine
- Department of Public Health Sciences, The University of Chicago, Chicago, IL 60637, USA
| | - Muhammad G. Kibriya
- Department of Public Health Sciences, The University of Chicago, Chicago, IL 60637, USA
| | - Lihua Jiang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Shin Lin
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Meng Wang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Ruiqi Jian
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Xiao Li
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Joanne Chan
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Daniel Bates
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Morgan Diegel
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Jessica Halow
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Eric Haugen
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Audra Johnson
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Rajinder Kaul
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Kristen Lee
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Matthew T. Maurano
- Institute for Systems Genetics, New York University Langone Medical Center, New York, NY 10016, USA
| | - Jemma Nelson
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Fidencio J. Neri
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | | | - Marian S. Fernando
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
- Institute for Genomics and Systems Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Caroline Linke
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
- Institute for Genomics and Systems Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Meritxell Oliva
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
- Institute for Genomics and Systems Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Andrew Skol
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
- Institute for Genomics and Systems Biology, The University of Chicago, Chicago, IL 60637, USA
- Center for Data Intensive Science, The University of Chicago, Chicago, IL 60637, USA
| | - Fan Wu
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
- Institute for Genomics and Systems Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Joshua M. Akey
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Andrew P. Feinberg
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Mental Health, Johns Hopkins University School of Public Health, Baltimore, MD 21205, USA
| | - Jin Billy Li
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Brandon L. Pierce
- Department of Public Health Sciences, The University of Chicago, Chicago, IL 60637, USA
| | | | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Kristin G. Ardlie
- The Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- The Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA
| | - Michael P. Snyder
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Stephen B. Montgomery
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
25
|
Mohammadi P, Castel SE, Brown AA, Lappalainen T. Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change. Genome Res 2017; 27:1872-1884. [PMID: 29021289 PMCID: PMC5668944 DOI: 10.1101/gr.216747.116] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 06/05/2017] [Indexed: 12/11/2022]
Abstract
Mapping cis-acting expression quantitative trait loci (cis-eQTL) has become a popular approach for characterizing proximal genetic regulatory variants. In this paper, we describe and characterize log allelic fold change (aFC), the magnitude of expression change associated with a given genetic variant, as a biologically interpretable unit for quantifying the effect size of cis-eQTLs and a mathematically convenient approach for systematic modeling of cis-regulation. This measure is mathematically independent from expression level and allele frequency, additive, applicable to multiallelic variants, and generalizable to multiple independent variants. We provide efficient tools and guidelines for estimating aFC from both eQTL and allelic expression data sets and apply it to Genotype Tissue Expression (GTEx) data. We show that aFC estimates independently derived from eQTL and allelic expression data are highly consistent, and identify technical and biological correlates of eQTL effect size. We generalize aFC to analyze genes with two eQTLs in GTEx and show that in nearly all cases the two eQTLs act independently in regulating gene expression. In summary, aFC is a solid measure of cis-regulatory effect size that allows quantitative interpretation of cellular regulatory events from population data, and it is a valuable approach for investigating novel aspects of eQTL data sets.
Collapse
Affiliation(s)
- Pejman Mohammadi
- New York Genome Center, New York, New York 10013, USA
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
| | - Stephane E Castel
- New York Genome Center, New York, New York 10013, USA
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
| | - Andrew A Brown
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, 1211, Switzerland
- Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, 1211, Switzerland
- Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland
| | - Tuuli Lappalainen
- New York Genome Center, New York, New York 10013, USA
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
| |
Collapse
|
26
|
Analysis of population-specific pharmacogenomic variants using next-generation sequencing data. Sci Rep 2017; 7:8416. [PMID: 28871186 PMCID: PMC5583360 DOI: 10.1038/s41598-017-08468-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 07/11/2017] [Indexed: 02/03/2023] Open
Abstract
Functional rare variants in drug-related genes are believed to be highly differentiated between ethnic- or racial populations. However, knowledge of population differentiation (PD) of rare single-nucleotide variants (SNVs), remains widely lacking, with the highest fixation indices, (Fst values), from both rare and common variants annotated to specific genes, having only been marginally used to understand PD at the gene level. In this study, we suggest a new, gene-based PD method, PD of Rare and Common variants (PDRC), for analyzing rare variants, as inspired by Generalized Cochran-Mantel-Haenszel (GCMH) statistics, to identify highly population-differentiated drug response-related genes (“pharmacogenes”). Through simulation studies, we reveal that PDRC adequately summarizes rare and common variants, due to PD, over a specific gene. We also applied the proposed method to a real whole-exome sequencing dataset, consisting of 10,000 datasets, from the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples (T2D-GENES) initiative, and 3,000 datasets from the Genetics of Type 2 diabetes (Go-T2D) repository. Among the 48 genes annotated with Very Important Pharmacogenetic summaries (VIPgenes), in the PharmGKB database, our PD method successfully identified candidate genes with high PD, including ACE, CYP2B6, DPYD, F5, MTHFR, and SCN5A.
Collapse
|
27
|
Abstract
Whole-genome and exome sequencing in human populations has revealed the tolerance of each gene for loss-of-function variation. By understanding this tolerance, it has become increasingly possible to identify genes that would make safe therapeutic targets and to identify rare genetic risk factors and phenotypes at the scale of individual genomes. To date, the vast majority of surveyed loss-of-function variants are in protein-coding regions of the genome mainly due to the focus on these regions by exome-based sequencing projects and their relative ease of interpretability. As whole-genome sequencing becomes more prevalent, new strategies will be required to uncover impactful variation in non-coding regions of the genome where the architecture of genome function is more complex. In this review, we investigate recent studies of loss-of-function variation and emerging approaches for interpreting whole-genome sequencing data to identify rare and impactful non-coding loss-of-function variants.
Collapse
Affiliation(s)
- Zachary Zappala
- Department of Genetics, Stanford University, California, USA
| | - Stephen B. Montgomery
- Department of Genetics, Stanford University, California, USA
- Department of Pathology, Stanford University, California, USA
| |
Collapse
|
28
|
Genetic Adaptation and Neandertal Admixture Shaped the Immune System of Human Populations. Cell 2016; 167:643-656.e17. [PMID: 27768888 PMCID: PMC5075285 DOI: 10.1016/j.cell.2016.09.024] [Citation(s) in RCA: 275] [Impact Index Per Article: 34.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Revised: 07/14/2016] [Accepted: 09/15/2016] [Indexed: 12/30/2022]
Abstract
Humans differ in the outcome that follows exposure to life-threatening pathogens, yet the extent of population differences in immune responses and their genetic and evolutionary determinants remain undefined. Here, we characterized, using RNA sequencing, the transcriptional response of primary monocytes from Africans and Europeans to bacterial and viral stimuli-ligands activating Toll-like receptor pathways (TLR1/2, TLR4, and TLR7/8) and influenza virus-and mapped expression quantitative trait loci (eQTLs). We identify numerous cis-eQTLs that contribute to the marked differences in immune responses detected within and between populations and a strong trans-eQTL hotspot at TLR1 that decreases expression of pro-inflammatory genes in Europeans only. We find that immune-responsive regulatory variants are enriched in population-specific signals of natural selection and show that admixture with Neandertals introduced regulatory variants into European genomes, affecting preferentially responses to viral challenges. Together, our study uncovers evolutionarily important determinants of differences in host immune responsiveness between human populations.
Collapse
|
29
|
Moyerbrailean GA, Richards AL, Kurtz D, Kalita CA, Davis GO, Harvey CT, Alazizi A, Watza D, Sorokin Y, Hauff N, Zhou X, Wen X, Pique-Regi R, Luca F. High-throughput allele-specific expression across 250 environmental conditions. Genome Res 2016; 26:1627-1638. [PMID: 27934696 PMCID: PMC5131815 DOI: 10.1101/gr.209759.116] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 10/13/2016] [Indexed: 11/24/2022]
Abstract
Gene-by-environment (GxE) interactions determine common disease risk factors and biomedically relevant complex traits. However, quantifying how the environment modulates genetic effects on human quantitative phenotypes presents unique challenges. Environmental covariates are complex and difficult to measure and control at the organismal level, as found in GWAS and epidemiological studies. An alternative approach focuses on the cellular environment using in vitro treatments as a proxy for the organismal environment. These cellular environments simplify the organism-level environmental exposures to provide a tractable influence on subcellular phenotypes, such as gene expression. Expression quantitative trait loci (eQTL) mapping studies identified GxE interactions in response to drug treatment and pathogen exposure. However, eQTL mapping approaches are infeasible for large-scale analysis of multiple cellular environments. Recently, allele-specific expression (ASE) analysis emerged as a powerful tool to identify GxE interactions in gene expression patterns by exploiting naturally occurring environmental exposures. Here we characterized genetic effects on the transcriptional response to 50 treatments in five cell types. We discovered 1455 genes with ASE (FDR < 10%) and 215 genes with GxE interactions. We demonstrated a major role for GxE interactions in complex traits. Genes with a transcriptional response to environmental perturbations showed sevenfold higher odds of being found in GWAS. Additionally, 105 genes that indicated GxE interactions (49%) were identified by GWAS as associated with complex traits. Examples include GIPR–caffeine interaction and obesity and include LAMP3–selenium interaction and Parkinson disease. Our results demonstrate that comprehensive catalogs of GxE interactions are indispensable to thoroughly annotate genes and bridge epidemiological and genome-wide association studies.
Collapse
Affiliation(s)
- Gregory A Moyerbrailean
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48201, USA
| | - Allison L Richards
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48201, USA
| | - Daniel Kurtz
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48201, USA
| | - Cynthia A Kalita
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48201, USA
| | - Gordon O Davis
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48201, USA
| | - Chris T Harvey
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48201, USA
| | - Adnan Alazizi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48201, USA
| | - Donovan Watza
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48201, USA
| | - Yoram Sorokin
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan 48201, USA
| | - Nancy Hauff
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan 48201, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xiaoquan Wen
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48201, USA.,Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan 48201, USA
| | - Francesca Luca
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48201, USA.,Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan 48201, USA
| |
Collapse
|
30
|
Jagannathan S, Bradley RK. Translational plasticity facilitates the accumulation of nonsense genetic variants in the human population. Genome Res 2016; 26:1639-1650. [PMID: 27646533 PMCID: PMC5131816 DOI: 10.1101/gr.205070.116] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 09/16/2016] [Indexed: 01/12/2023]
Abstract
Genetic variants that disrupt protein-coding DNA are ubiquitous in the human population, with about 100 such loss-of-function variants per individual. While most loss-of-function variants are rare, a subset have risen to high frequency and occur in a homozygous state in healthy individuals. It is unknown why these common variants are well tolerated, even though some affect essential genes implicated in Mendelian disease. Here, we combine genomic, proteomic, and biochemical data to demonstrate that many common nonsense variants do not ablate protein production from their host genes. We provide computational and experimental evidence for diverse mechanisms of gene rescue, including alternative splicing, stop codon readthrough, alternative translation initiation, and C-terminal truncation. Our results suggest a molecular explanation for the mild fitness costs of many common nonsense variants and indicate that translational plasticity plays a prominent role in shaping human genetic diversity.
Collapse
Affiliation(s)
- Sujatha Jagannathan
- Computational Biology Program, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.,Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.,Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| | - Robert K Bradley
- Computational Biology Program, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.,Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| |
Collapse
|
31
|
Hodgkinson A, Grenier JC, Gbeha E, Awadalla P. A haplotype-based normalization technique for the analysis and detection of allele specific expression. BMC Bioinformatics 2016; 17:364. [PMID: 27618913 PMCID: PMC5020486 DOI: 10.1186/s12859-016-1238-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Accepted: 09/02/2016] [Indexed: 12/17/2022] Open
Abstract
Background Allele specific expression (ASE) has become an important phenotype, being utilized for the detection of cis-regulatory variation, nonsense mediated decay and imprinting in the personal genome, and has been used to both identify disease loci and consider the penetrance of damaging alleles. The detection of ASE using high throughput technologies relies on aligning short-read sequencing data, a process that has inherent biases, and there is still a need to develop fast and accurate methods to detect ASE given the unprecedented growth of sequencing information in big data projects. Results Here, we present a new approach to normalize RNA sequencing data in order to call ASE events with high precision in a short time-frame. Using simulated datasets we find that our approach dramatically improves reference allele quantification at heterozygous sites versus default mapping methods and also performs well compared to existing techniques for ASE detection, such as filtering methods and mapping to parental genomes, without the need for complex and time consuming manipulation. Finally, by sequencing the exomes and transcriptomes of 96 well-phenotyped individuals of the CARTaGENE cohort, we characterise the levels of ASE across individuals and find a significant association between the proportion of sites undergoing ASE within the genome and smoking. Conclusions The correct treatment and analysis of RNA sequencing data is vital to control for mapping biases and detect genuine ASE signals. By normalising RNA sequencing information after mapping, we show that this approach can be used to identify biologically relevant signals in personal genomes. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1238-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alan Hodgkinson
- CHU Sainte Justine Research Centre, Department of Pediatrics, Faculty of Medicine, Universite de Montreal, 3175 Chemin de la Cote Sainte Catherine, Montreal, QC, Canada. .,Department of Medical and Molecular Genetics, Guy's Hospital, King's College London, London, SE1 9RT, UK.
| | - Jean-Christophe Grenier
- CHU Sainte Justine Research Centre, Department of Pediatrics, Faculty of Medicine, Universite de Montreal, 3175 Chemin de la Cote Sainte Catherine, Montreal, QC, Canada
| | - Elias Gbeha
- CHU Sainte Justine Research Centre, Department of Pediatrics, Faculty of Medicine, Universite de Montreal, 3175 Chemin de la Cote Sainte Catherine, Montreal, QC, Canada.,Ontario Institute of Cancer Research, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Philip Awadalla
- CHU Sainte Justine Research Centre, Department of Pediatrics, Faculty of Medicine, Universite de Montreal, 3175 Chemin de la Cote Sainte Catherine, Montreal, QC, Canada.,Ontario Institute of Cancer Research, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
32
|
Edsgärd D, Iglesias MJ, Reilly SJ, Hamsten A, Tornvall P, Odeberg J, Emanuelsson O. GeneiASE: Detection of condition-dependent and static allele-specific expression from RNA-seq data without haplotype information. Sci Rep 2016; 6:21134. [PMID: 26887787 PMCID: PMC4758070 DOI: 10.1038/srep21134] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Accepted: 01/18/2016] [Indexed: 12/20/2022] Open
Abstract
Allele-specific expression (ASE) is the imbalance in transcription between maternal and paternal alleles at a locus and can be probed in single individuals using massively parallel DNA sequencing technology. Assessing ASE within a single sample provides a static picture of the ASE, but the magnitude of ASE for a given transcript may vary between different biological conditions in an individual. Such condition-dependent ASE could indicate a genetic variation with a functional role in the phenotypic difference. We investigated ASE through RNA-sequencing of primary white blood cells from eight human individuals before and after the controlled induction of an inflammatory response, and detected condition-dependent and static ASE at 211 and 13021 variants, respectively. We developed a method, GeneiASE, to detect genes exhibiting static or condition-dependent ASE in single individuals. GeneiASE performed consistently over a range of read depths and ASE effect sizes, and did not require phasing of variants to estimate haplotypes. We observed condition-dependent ASE related to the inflammatory response in 19 genes, and static ASE in 1389 genes. Allele-specific expression was confirmed by validation of variants through real-time quantitative RT-PCR, with RNA-seq and RT-PCR ASE effect-size correlations r = 0.67 and r = 0.94 for static and condition-dependent ASE, respectively.
Collapse
Affiliation(s)
- Daniel Edsgärd
- KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, SE-171 65, Solna, Sweden
| | - Maria Jesus Iglesias
- Atherosclerosis Research Unit, Department of Medicine Solna, Karolinska Institutet, Center for Molecular Medicine, and Department of Cardiology, Karolinska University Hospital, Stockholm, Sweden.,KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Proteomics, SE-171 65, Solna, Sweden
| | - Sarah-Jayne Reilly
- Atherosclerosis Research Unit, Department of Medicine Solna, Karolinska Institutet, Center for Molecular Medicine, and Department of Cardiology, Karolinska University Hospital, Stockholm, Sweden
| | - Anders Hamsten
- Atherosclerosis Research Unit, Department of Medicine Solna, Karolinska Institutet, Center for Molecular Medicine, and Department of Cardiology, Karolinska University Hospital, Stockholm, Sweden
| | - Per Tornvall
- Department of Clinical Science and Education, Södersjukhuset, Karolinska Institutet, Stockholm, Sweden
| | - Jacob Odeberg
- Atherosclerosis Research Unit, Department of Medicine Solna, Karolinska Institutet, Center for Molecular Medicine, and Department of Cardiology, Karolinska University Hospital, Stockholm, Sweden.,KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Proteomics, SE-171 65, Solna, Sweden.,Department of Medicine, Centre for Hematology, Karolinska University Hospital and Karolinska Institutet, Solna, Sweden
| | - Olof Emanuelsson
- KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, SE-171 65, Solna, Sweden
| |
Collapse
|
33
|
Schachtschneider KM, Madsen O, Park C, Rund LA, Groenen MAM, Schook LB. Adult porcine genome-wide DNA methylation patterns support pigs as a biomedical model. BMC Genomics 2015; 16:743. [PMID: 26438392 PMCID: PMC4594891 DOI: 10.1186/s12864-015-1938-x] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 09/19/2015] [Indexed: 12/13/2022] Open
Abstract
Background Pigs (Sus scrofa) provide relevant biomedical models to dissect complex diseases due to their anatomical, genetic, and physiological similarities with humans. Aberrant DNA methylation has been linked to many of these diseases and is associated with gene expression; however, the functional similarities and differences between porcine and human DNA methylation patterns are largely unknown. Methods DNA and RNA was isolated from eight tissue samples (fat, heart, kidney, liver, lung, lymph node, muscle, and spleen) from the adult female Duroc utilized for the pig genome sequencing project. Reduced representation bisulfite sequencing (RRBS) and RNA-seq were performed on an Illumina HiSeq2000. RRBS reads were aligned using BSseeker2, and only sites with a minimum depth of 10 reads were used for methylation analysis. RNA-seq reads were aligned using Tophat, and expression analysis was performed using Cufflinks. In addition, SNP calling was performed using GATK for targeted control and whole genome sequencing reads for CpG site validation and allelic expression analysis, respectively. Results Analysis on the influence of DNA variation in methylation calling revealed a reduced effectiveness of WGS datasets in covering CpG rich regions, as well as the usefulness of a targeted control library for SNP detection. Analysis of over 500,000 CpG sites demonstrated genome wide methylation patterns similar to those observed in humans, including reduced methylation within CpG islands and at transcription start sites (TSS), X chromosome inactivation, and anticorrelation of TSS CpG methylation with gene expression. In addition, a positive correlation between TSS CpG density and expression, and a negative correlation between TSS TpG density and expression were demonstrated. Low but non-random non-CpG methylation (<1%) was also detected in all non-neuronal somatic tissues, with differences in tissue clustering observed based on CpG and non-CpG methylation patterns. Finally, allele specific expression analysis revealed enrichment of genes involved in metabolic and regulatory processes. Discussion These results provide transcriptional and DNA methylation datasets for the biomedical community that are directly relatable to current genomic resources. In addition, the correlation between TSS CpG density and expression suggests increased mutation rates at CpG sites play a significant role in adaptive evolution by reducing CpG density at TSS over time, resulting in higher methylation levels in these regions and more permanent changes to lower gene expression. This is proposed to occur predominantly through deamination of 5-methylcytosine to thymidine, resulting in the replacement of CpG with TpG sites in these regions, as indicated by the increased TSS TpG density observed in non-expressed genes, resulting in a negative correlation between expression and TSS TpG density. Conclusions This study provides baseline methylation and gene transcription profiles for a healthy adult pig, reports similar patterns to those observed in humans, and supports future porcine studies related to human disease and development. Additionally, the observed reduced CpG and increased TpG density at TSS of lowly expressed genes suggests DNA methylation plays a significant role in adaptive evolution through more permanent changes to lower gene expression. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1938-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kyle M Schachtschneider
- Department of Animal Sciences, University of Illinois, Urbana, IL, USA. .,Animal Breeding and Genomics Center, Wageningen University, Wageningen, The Netherlands.
| | - Ole Madsen
- Animal Breeding and Genomics Center, Wageningen University, Wageningen, The Netherlands.
| | - Chankyu Park
- Department of Animal Biotechnology, Konkuk University, Gwangjin-gu, Seoul, South Korea.
| | - Laurie A Rund
- Department of Animal Sciences, University of Illinois, Urbana, IL, USA.
| | - Martien A M Groenen
- Animal Breeding and Genomics Center, Wageningen University, Wageningen, The Netherlands.
| | - Lawrence B Schook
- Department of Animal Sciences, University of Illinois, Urbana, IL, USA. .,Institute for Genomic Biology, University of Illinois, Urbana, IL, USA. .,, 1201 W Gregory Drive #382 ERML, Urbana, IL, 61801, USA.
| |
Collapse
|
34
|
Castel SE, Levy-Moonshine A, Mohammadi P, Banks E, Lappalainen T. Tools and best practices for data processing in allelic expression analysis. Genome Biol 2015; 16:195. [PMID: 26381377 PMCID: PMC4574606 DOI: 10.1186/s13059-015-0762-6] [Citation(s) in RCA: 226] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 08/28/2015] [Indexed: 12/25/2022] Open
Abstract
Allelic expression analysis has become important for integrating genome and transcriptome data to characterize various biological phenomena such as cis-regulatory variation and nonsense-mediated decay. We analyze the properties of allelic expression read count data and technical sources of error, such as low-quality or double-counted RNA-seq reads, genotyping errors, allelic mapping bias, and technical covariates due to sample preparation and sequencing, and variation in total read depth. We provide guidelines for correcting such errors, show that our quality control measures improve the detection of relevant allelic expression, and introduce tools for the high-throughput production of allelic expression data from RNA-sequencing data.
Collapse
Affiliation(s)
- Stephane E Castel
- New York Genome Center, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA.
| | | | - Pejman Mohammadi
- New York Genome Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | | | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA.
| |
Collapse
|
35
|
Pirinen M, Lappalainen T, Zaitlen NA, Dermitzakis ET, Donnelly P, McCarthy MI, Rivas MA. Assessing allele-specific expression across multiple tissues from RNA-seq read data. Bioinformatics 2015; 31:2497-504. [PMID: 25819081 PMCID: PMC4514921 DOI: 10.1093/bioinformatics/btv074] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2014] [Revised: 01/09/2015] [Accepted: 01/29/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION RNA sequencing enables allele-specific expression (ASE) studies that complement standard genotype expression studies for common variants and, importantly, also allow measuring the regulatory impact of rare variants. The Genotype-Tissue Expression (GTEx) project is collecting RNA-seq data on multiple tissues of a same set of individuals and novel methods are required for the analysis of these data. RESULTS We present a statistical method to compare different patterns of ASE across tissues and to classify genetic variants according to their impact on the tissue-wide expression profile. We focus on strong ASE effects that we are expecting to see for protein-truncating variants, but our method can also be adjusted for other types of ASE effects. We illustrate the method with a real data example on a tissue-wide expression profile of a variant causal for lipoid proteinosis, and with a simulation study to assess our method more generally.
Collapse
Affiliation(s)
- Matti Pirinen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Tuuli Lappalainen
- Department of Genetic Medicine and Development and, Institute for Genetics and Genomics in Geneva (iG3), University of Geneva, Geneva, Switzerland, Swiss Institute of Bioinformatics, Geneva, Switzerland, Department of Genetics, Stanford University, Palo Alto, CA, USA, New York Genome Center, New York, NY, USA, Department of Systems Biology, Columbia University, New York, NY, USA
| | - Noah A Zaitlen
- Department of Medicine, University of California, San Francisco, CA, USA
| | - Emmanouil T Dermitzakis
- Department of Genetic Medicine and Development and, Institute for Genetics and Genomics in Geneva (iG3), University of Geneva, Geneva, Switzerland, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Peter Donnelly
- Wellcome Trust Centre for Human Genetics and Department of Statistics, University of Oxford, Oxford, UK and
| | - Mark I McCarthy
- Wellcome Trust Centre for Human Genetics and Oxford Centre for Diabetes, Endocrinology and Metabolism, Oxford, UK
| | | |
Collapse
|
36
|
Rivas MA, Pirinen M, Conrad DF, Lek M, Tsang EK, Karczewski KJ, Maller JB, Kukurba KR, DeLuca DS, Fromer M, Ferreira PG, Smith KS, Zhang R, Zhao F, Banks E, Poplin R, Ruderfer DM, Purcell SM, Tukiainen T, Minikel EV, Stenson PD, Cooper DN, Huang KH, Sullivan TJ, Nedzel J, Bustamante CD, Li JB, Daly MJ, Guigo R, Donnelly P, Ardlie K, Sammeth M, Dermitzakis ET, McCarthy MI, Montgomery SB, Lappalainen T, MacArthur DG. Human genomics. Effect of predicted protein-truncating genetic variants on the human transcriptome. Science 2015; 348:666-9. [PMID: 25954003 PMCID: PMC4537935 DOI: 10.1126/science.1261877] [Citation(s) in RCA: 196] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants.
Collapse
Affiliation(s)
- Manuel A Rivas
- Wellcome Trust Centre for Human Genetics, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK.
| | - Matti Pirinen
- FInstitute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | | | - Monkol Lek
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Emily K Tsang
- Department of Genetics, Stanford University, Stanford, CA, USA. Department of Pathology, Stanford University, Stanford, CA, USA. Biomedical Informatics Program, Stanford University, Stanford, CA, USA
| | - Konrad J Karczewski
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Julian B Maller
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Kimberly R Kukurba
- Department of Genetics, Stanford University, Stanford, CA, USA. Department of Pathology, Stanford University, Stanford, CA, USA
| | | | - Menachem Fromer
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA. Department of Psychiatry, Mt. Sinai Hospital, NY, USA
| | - Pedro G Ferreira
- Department of Genetic Medicine and Development,University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Kevin S Smith
- Department of Genetics, Stanford University, Stanford, CA, USA. Department of Pathology, Stanford University, Stanford, CA, USA
| | - Rui Zhang
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Fengmei Zhao
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Eric Banks
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ryan Poplin
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Douglas M Ruderfer
- Department of Psychiatry, Mt. Sinai Hospital, NY, USA. Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, NY, USA
| | - Shaun M Purcell
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA. Department of Psychiatry, Mt. Sinai Hospital, NY, USA. Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, NY, USA
| | - Taru Tukiainen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Eric V Minikel
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, UK
| | | | | | - Jared Nedzel
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jin Billy Li
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Mark J Daly
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Roderic Guigo
- Center for Genomic Regulation (CRG), Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Peter Donnelly
- Wellcome Trust Centre for Human Genetics, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK. Department of Statistics, University of Oxford, Oxford, UK
| | | | - Michael Sammeth
- Center for Genomic Regulation (CRG), Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. National Institute for Scientific Computing (LNCC), Petropolis, Rio de Janeiro, Brazil
| | - Emmanouil T Dermitzakis
- Department of Genetic Medicine and Development,University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Mark I McCarthy
- Wellcome Trust Centre for Human Genetics, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK. Oxford Center for Diabetes Endocrinology and Metabolism, University of Oxford, Oxford, UK
| | - Stephen B Montgomery
- Department of Genetics, Stanford University, Stanford, CA, USA. Department of Pathology, Stanford University, Stanford, CA, USA
| | - Tuuli Lappalainen
- Department of Genetics, Stanford University, Stanford, CA, USA. Department of Genetic Medicine and Development,University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland. New York Genome Center, New York, NY, USA. Department of Systems Biology, Columbia University, New York, NY, USA.
| | - Daniel G MacArthur
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA. Department of Medicine, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
37
|
Yang HJ, Ratnapriya R, Cogliati T, Kim JW, Swaroop A. Vision from next generation sequencing: multi-dimensional genome-wide analysis for producing gene regulatory networks underlying retinal development, aging and disease. Prog Retin Eye Res 2015; 46:1-30. [PMID: 25668385 PMCID: PMC4402139 DOI: 10.1016/j.preteyeres.2015.01.005] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Revised: 01/18/2015] [Accepted: 01/21/2015] [Indexed: 01/10/2023]
Abstract
Genomics and genetics have invaded all aspects of biology and medicine, opening uncharted territory for scientific exploration. The definition of "gene" itself has become ambiguous, and the central dogma is continuously being revised and expanded. Computational biology and computational medicine are no longer intellectual domains of the chosen few. Next generation sequencing (NGS) technology, together with novel methods of pattern recognition and network analyses, has revolutionized the way we think about fundamental biological mechanisms and cellular pathways. In this review, we discuss NGS-based genome-wide approaches that can provide deeper insights into retinal development, aging and disease pathogenesis. We first focus on gene regulatory networks (GRNs) that govern the differentiation of retinal photoreceptors and modulate adaptive response during aging. Then, we discuss NGS technology in the context of retinal disease and develop a vision for therapies based on network biology. We should emphasize that basic strategies for network construction and analyses can be transported to any tissue or cell type. We believe that specific and uniform guidelines are required for generation of genome, transcriptome and epigenome data to facilitate comparative analysis and integration of multi-dimensional data sets, and for constructing networks underlying complex biological processes. As cellular homeostasis and organismal survival are dependent on gene-gene and gene-environment interactions, we believe that network-based biology will provide the foundation for deciphering disease mechanisms and discovering novel drug targets for retinal neurodegenerative diseases.
Collapse
Affiliation(s)
- Hyun-Jin Yang
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, 6 Center Drive, Bethesda, MD 20892-0610, USA
| | - Rinki Ratnapriya
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, 6 Center Drive, Bethesda, MD 20892-0610, USA
| | - Tiziana Cogliati
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, 6 Center Drive, Bethesda, MD 20892-0610, USA
| | - Jung-Woong Kim
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, 6 Center Drive, Bethesda, MD 20892-0610, USA
| | - Anand Swaroop
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, 6 Center Drive, Bethesda, MD 20892-0610, USA.
| |
Collapse
|
38
|
Codina-Solà M, Rodríguez-Santiago B, Homs A, Santoyo J, Rigau M, Aznar-Laín G, Del Campo M, Gener B, Gabau E, Botella MP, Gutiérrez-Arumí A, Antiñolo G, Pérez-Jurado LA, Cuscó I. Integrated analysis of whole-exome sequencing and transcriptome profiling in males with autism spectrum disorders. Mol Autism 2015; 6:21. [PMID: 25969726 PMCID: PMC4427998 DOI: 10.1186/s13229-015-0017-0] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 03/19/2015] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Autism spectrum disorders (ASD) are a group of neurodevelopmental disorders with high heritability. Recent findings support a highly heterogeneous and complex genetic etiology including rare de novo and inherited mutations or chromosomal rearrangements as well as double or multiple hits. METHODS We performed whole-exome sequencing (WES) and blood cell transcriptome by RNAseq in a subset of male patients with idiopathic ASD (n = 36) in order to identify causative genes, transcriptomic alterations, and susceptibility variants. RESULTS We detected likely monogenic causes in seven cases: five de novo (SCN2A, MED13L, KCNV1, CUL3, and PTEN) and two inherited X-linked variants (MAOA and CDKL5). Transcriptomic analyses allowed the identification of intronic causative mutations missed by the usual filtering of WES and revealed functional consequences of some rare mutations. These included aberrant transcripts (PTEN, POLR3C), deregulated expression in 1.7% of mutated genes (that is, SEMA6B, MECP2, ANK3, CREBBP), allele-specific expression (FUS, MTOR, TAF1C), and non-sense-mediated decay (RIT1, ALG9). The analysis of rare inherited variants showed enrichment in relevant pathways such as the PI3K-Akt signaling and the axon guidance. CONCLUSIONS Integrative analysis of WES and blood RNAseq data has proven to be an efficient strategy to identify likely monogenic forms of ASD (19% in our cohort), as well as additional rare inherited mutations that can contribute to ASD risk in a multifactorial manner. Blood transcriptomic data, besides validating 88% of expressed variants, allowed the identification of missed intronic mutations and revealed functional correlations of genetic variants, including changes in splicing, expression levels, and allelic expression.
Collapse
Affiliation(s)
- Marta Codina-Solà
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Doctor Aiguader 88, 422, Barcelona, 08003 Spain ; Hospital del Mar Research Institute (IMIM), C/Doctor Aiguader 88, Barcelona, 08003 Spain ; Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBER-ER), C/ Monforte de Lemos 3-5, Madrid, 28029 Spain
| | | | - Aïda Homs
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Doctor Aiguader 88, 422, Barcelona, 08003 Spain ; Hospital del Mar Research Institute (IMIM), C/Doctor Aiguader 88, Barcelona, 08003 Spain ; Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBER-ER), C/ Monforte de Lemos 3-5, Madrid, 28029 Spain
| | - Javier Santoyo
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), C/Albert Einstein, Cartuja Scientific and Technology Park, INSUR Builiding, Sevilla, 41092 Spain
| | - Maria Rigau
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Doctor Aiguader 88, 422, Barcelona, 08003 Spain
| | - Gemma Aznar-Laín
- Pediatric Neurology, Hospital del Mar, Passeig Marítim 25-29, Barcelona, 08003 Spain
| | - Miguel Del Campo
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Doctor Aiguader 88, 422, Barcelona, 08003 Spain ; Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBER-ER), C/ Monforte de Lemos 3-5, Madrid, 28029 Spain ; Servicio de Genética, Hospital Vall d'Hebron, Passeig Vall d'Hebron, 119-129, Barcelona, 08015 Spain
| | - Blanca Gener
- Genetics Service, BioCruces Health Research Institute, Hospital Universitario Cruces, Plaza de Cruces 12, Barakaldo, Bizkaia 48093 Spain
| | - Elisabeth Gabau
- Pediatrics Service, Corporació Sanitària Parc Taulí, Parc Taulí 1, Sabadell, 08208 Spain
| | - María Pilar Botella
- Pediatric Neurology, Hospital de Txagorritxu, C/José de Atxotegui s/n, Victoria-Gasteiz, 01009 Spain
| | - Armand Gutiérrez-Arumí
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Doctor Aiguader 88, 422, Barcelona, 08003 Spain ; Hospital del Mar Research Institute (IMIM), C/Doctor Aiguader 88, Barcelona, 08003 Spain ; Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBER-ER), C/ Monforte de Lemos 3-5, Madrid, 28029 Spain
| | - Guillermo Antiñolo
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBER-ER), C/ Monforte de Lemos 3-5, Madrid, 28029 Spain ; Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), C/Albert Einstein, Cartuja Scientific and Technology Park, INSUR Builiding, Sevilla, 41092 Spain ; Department of Genetics, Reproduction and Fetal Medicine, Institute of Biomedicine of Seville (IBIS), University Hospital Virgen del Rocío/CSIC/University of Seville, Avda Manuel Siurot s/n, Sevilla, 41013 Spain
| | - Luis Alberto Pérez-Jurado
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Doctor Aiguader 88, 422, Barcelona, 08003 Spain ; Hospital del Mar Research Institute (IMIM), C/Doctor Aiguader 88, Barcelona, 08003 Spain ; Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBER-ER), C/ Monforte de Lemos 3-5, Madrid, 28029 Spain
| | - Ivon Cuscó
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Doctor Aiguader 88, 422, Barcelona, 08003 Spain ; Hospital del Mar Research Institute (IMIM), C/Doctor Aiguader 88, Barcelona, 08003 Spain ; Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBER-ER), C/ Monforte de Lemos 3-5, Madrid, 28029 Spain
| |
Collapse
|
39
|
Abstract
![]()
Whole human genome sequencing of
individuals is becoming rapid
and inexpensive, enabling new strategies for using personal genome
information to help diagnose, treat, and even prevent human disorders
for which genetic variations are causative or are known to be risk
factors. Many of the exploding number of newly discovered genetic
variations alter the structure, function, dynamics, stability, and/or
interactions of specific proteins and RNA molecules. Accordingly,
there are a host of opportunities for biochemists and biophysicists
to participate in (1) developing tools to allow accurate and sometimes
medically actionable assessment of the potential pathogenicity of
individual variations and (2) establishing the mechanistic linkage
between pathogenic variations and their physiological consequences,
providing a rational basis for treatment or preventive care. In this
review, we provide an overview of these opportunities and their associated
challenges in light of the current status of genomic science and personalized
medicine, the latter often termed precision medicine.
Collapse
Affiliation(s)
- Brett M Kroncke
- †Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, United States.,‡Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37232, United States
| | - Carlos G Vanoye
- §Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611, United States
| | - Jens Meiler
- ‡Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37232, United States.,∥Departments of Chemistry, Pharmacology, and Bioinformatics, Vanderbilt University, Nashville, Tennessee 37232, United States
| | - Alfred L George
- §Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611, United States
| | - Charles R Sanders
- †Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, United States.,‡Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37232, United States
| |
Collapse
|
40
|
Deelen P, Zhernakova DV, de Haan M, van der Sijde M, Bonder MJ, Karjalainen J, van der Velde KJ, Abbott KM, Fu J, Wijmenga C, Sinke RJ, Swertz MA, Franke L. Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels. Genome Med 2015; 7:30. [PMID: 25954321 PMCID: PMC4423486 DOI: 10.1186/s13073-015-0152-4] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Accepted: 03/09/2015] [Indexed: 11/10/2022] Open
Abstract
Background RNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus (eQTL) mapping or through allele-specific expression (ASE) analysis. Given increasing numbers of RNA-seq samples in the public domain, we here studied to what extent eQTLs and ASE effects can be identified when using public RNA-seq data while deriving the genotypes from the RNA-sequencing reads themselves. Methods We downloaded the raw reads for all available human RNA-seq datasets. Using these reads we performed gene expression quantification. All samples were jointly normalized and subjected to a strict quality control. We also derived genotypes using the RNA-seq reads and used imputation to infer non-coding variants. This allowed us to perform eQTL mapping and ASE analyses jointly on all samples that passed quality control. Our results were validated using samples for which DNA-seq genotypes were available. Results 4,978 public human RNA-seq runs, representing many different tissues and cell-types, passed quality control. Even though these data originated from many different laboratories, samples reflecting the same cell type clustered together, suggesting that technical biases due to different sequencing protocols are limited. In a joint analysis on the 1,262 samples with high quality genotypes, we identified cis-eQTLs effects for 8,034 unique genes (at a false discovery rate ≤0.05). eQTL mapping on individual tissues revealed that a limited number of samples already suffice to identify tissue-specific eQTLs for known disease-associated genetic variants. Additionally, we observed strong ASE effects for 34 rare pathogenic variants, corroborating previously observed effects on the corresponding protein levels. Conclusions By deriving and imputing genotypes from RNA-seq data, it is possible to identify both eQTLs and ASE effects. Given the exponential growth of the number of publicly available RNA-seq samples, we expect this approach will become especially relevant for studying the effects of tissue-specific and rare pathogenic genetic variants to aid clinical interpretation of exome and genome sequencing. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0152-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Patrick Deelen
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands ; University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 RB Groningen, The Netherlands
| | - Daria V Zhernakova
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Mark de Haan
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands ; University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 RB Groningen, The Netherlands
| | - Marijke van der Sijde
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Marc Jan Bonder
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Juha Karjalainen
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - K Joeri van der Velde
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands ; University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 RB Groningen, The Netherlands
| | - Kristin M Abbott
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Jingyuan Fu
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Cisca Wijmenga
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Richard J Sinke
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Morris A Swertz
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands ; University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 RB Groningen, The Netherlands
| | - Lude Franke
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| |
Collapse
|
41
|
Harvey CT, Moyerbrailean GA, Davis GO, Wen X, Luca F, Pique-Regi R. QuASAR: quantitative allele-specific analysis of reads. ACTA ACUST UNITED AC 2014; 31:1235-42. [PMID: 25480375 DOI: 10.1093/bioinformatics/btu802] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2014] [Accepted: 11/26/2014] [Indexed: 12/30/2022]
Abstract
MOTIVATION Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele-specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available, it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls. RESULTS We present QuASAR, quantitative allele-specific analysis of reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls, while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high-quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available. AVAILABILITY AND IMPLEMENTATION http://github.com/piquelab/QuASAR. CONTACT fluca@wayne.edu or rpique@wayne.edu SUPPLEMENTARY INFORMATION Supplementary Material is available at Bioinformatics online.
Collapse
Affiliation(s)
- Chris T Harvey
- Center for Molecular Medicine and Genetics, Department of Obstetrics and Gynecology, Wayne State University, 540 E Canfield, Scott Hall, Detroit, MI 48201, USA and Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gregory A Moyerbrailean
- Center for Molecular Medicine and Genetics, Department of Obstetrics and Gynecology, Wayne State University, 540 E Canfield, Scott Hall, Detroit, MI 48201, USA and Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gordon O Davis
- Center for Molecular Medicine and Genetics, Department of Obstetrics and Gynecology, Wayne State University, 540 E Canfield, Scott Hall, Detroit, MI 48201, USA and Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiaoquan Wen
- Center for Molecular Medicine and Genetics, Department of Obstetrics and Gynecology, Wayne State University, 540 E Canfield, Scott Hall, Detroit, MI 48201, USA and Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Francesca Luca
- Center for Molecular Medicine and Genetics, Department of Obstetrics and Gynecology, Wayne State University, 540 E Canfield, Scott Hall, Detroit, MI 48201, USA and Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Department of Obstetrics and Gynecology, Wayne State University, 540 E Canfield, Scott Hall, Detroit, MI 48201, USA and Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
42
|
Steyaert S, Van Criekinge W, De Paepe A, Denil S, Mensaert K, Vandepitte K, Vanden Berghe W, Trooskens G, De Meyer T. SNP-guided identification of monoallelic DNA-methylation events from enrichment-based sequencing data. Nucleic Acids Res 2014; 42:e157. [PMID: 25237057 PMCID: PMC4227762 DOI: 10.1093/nar/gku847] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Monoallelic gene expression is typically initiated early in the development of an organism. Dysregulation of monoallelic gene expression has already been linked to several non-Mendelian inherited genetic disorders. In humans, DNA-methylation is deemed to be an important regulator of monoallelic gene expression, but only few examples are known. One important reason is that current, cost-affordable truly genome-wide methods to assess DNA-methylation are based on sequencing post-enrichment. Here, we present a new methodology based on classical population genetic theory, i.e. the Hardy–Weinberg theorem, that combines methylomic data from MethylCap-seq with associated SNP profiles to identify monoallelically methylated loci. Applied on 334 MethylCap-seq samples of very diverse origin, this resulted in the identification of 80 genomic regions featured by monoallelic DNA-methylation. Of these 80 loci, 49 are located in genic regions of which 25 have already been linked to imprinting. Further analysis revealed statistically significant enrichment of these loci in promoter regions, further establishing the relevance and usefulness of the method. Additional validation was done using both 14 whole-genome bisulfite sequencing data sets and 16 mRNA-seq data sets. Importantly, the developed approach can be easily applied to other enrichment-based sequencing technologies, like the ChIP-seq-based identification of monoallelic histone modifications.
Collapse
Affiliation(s)
- Sandra Steyaert
- Department of Mathematical Modelling, Statistics and Bioinformatics, University of Ghent, Ghent 9000, Belgium
| | - Wim Van Criekinge
- Department of Mathematical Modelling, Statistics and Bioinformatics, University of Ghent, Ghent 9000, Belgium
| | - Ayla De Paepe
- Department of Mathematical Modelling, Statistics and Bioinformatics, University of Ghent, Ghent 9000, Belgium
| | - Simon Denil
- Department of Mathematical Modelling, Statistics and Bioinformatics, University of Ghent, Ghent 9000, Belgium
| | - Klaas Mensaert
- Department of Mathematical Modelling, Statistics and Bioinformatics, University of Ghent, Ghent 9000, Belgium
| | | | - Wim Vanden Berghe
- PPES, Department of Biomedical Sciences, University of Antwerp, Wilrijk 2610, Belgium
| | - Geert Trooskens
- Department of Mathematical Modelling, Statistics and Bioinformatics, University of Ghent, Ghent 9000, Belgium
| | - Tim De Meyer
- Department of Mathematical Modelling, Statistics and Bioinformatics, University of Ghent, Ghent 9000, Belgium
| |
Collapse
|
43
|
Analysis of stop-gain and frameshift variants in human innate immunity genes. PLoS Comput Biol 2014; 10:e1003757. [PMID: 25058640 PMCID: PMC4110073 DOI: 10.1371/journal.pcbi.1003757] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2014] [Accepted: 06/16/2014] [Indexed: 12/03/2022] Open
Abstract
Loss-of-function variants in innate immunity genes are associated with Mendelian disorders in the form of primary immunodeficiencies. Recent resequencing projects report that stop-gains and frameshifts are collectively prevalent in humans and could be responsible for some of the inter-individual variability in innate immune response. Current computational approaches evaluating loss-of-function in genes carrying these variants rely on gene-level characteristics such as evolutionary conservation and functional redundancy across the genome. However, innate immunity genes represent a particular case because they are more likely to be under positive selection and duplicated. To create a ranking of severity that would be applicable to innate immunity genes we evaluated 17,764 stop-gain and 13,915 frameshift variants from the NHLBI Exome Sequencing Project and 1,000 Genomes Project. Sequence-based features such as loss of functional domains, isoform-specific truncation and nonsense-mediated decay were found to correlate with variant allele frequency and validated with gene expression data. We integrated these features in a Bayesian classification scheme and benchmarked its use in predicting pathogenic variants against Online Mendelian Inheritance in Man (OMIM) disease stop-gains and frameshifts. The classification scheme was applied in the assessment of 335 stop-gains and 236 frameshifts affecting 227 interferon-stimulated genes. The sequence-based score ranks variants in innate immunity genes according to their potential to cause disease, and complements existing gene-based pathogenicity scores. Specifically, the sequence-based score improves measurement of functional gene impairment, discriminates across different variants in a given gene and appears particularly useful for analysis of less conserved genes. There are well-characterized severe immunodeficiencies associated with loss-of-function variants in innate immunity genes. Genome sequencing projects identify rare stop-gain and frameshift variants in innate immunity genes whose phenotype is uncharacterized. Current methods to estimate the severity of rare stop-gains and frameshifts are based on evolutionary conservation of the gene, the likelihood for redundancy in its function or mutational burden. These parameters are not always applicable to innate immunity genes. We evaluated sequence-level characteristics of more than 30'000 stop-gains and frameshifts and prioritized variants according to their predicted functional consequences. Our scoring approach complements existing tools in the prediction of innate immunity OMIM disease variants and associates with functional readouts such as gene expression. In this framework, we show that many individuals do carry highly pathogenic variants in genes participating in antiviral defense. The clinical assessment of these variants is of significant interest.
Collapse
|