1
|
Adduri A, Kim S. Ornaments for efficient allele-specific expression estimation with bias correction. Am J Hum Genet 2024:S0002-9297(24)00221-0. [PMID: 39047729 DOI: 10.1016/j.ajhg.2024.06.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 06/22/2024] [Accepted: 06/24/2024] [Indexed: 07/27/2024] Open
Abstract
Allele-specific expression plays a crucial role in unraveling various biological mechanisms, including genomic imprinting and gene expression controlled by cis-regulatory variants. However, existing methods for quantification from RNA-sequencing (RNA-seq) reads do not adequately and efficiently remove various allele-specific read mapping biases, such as reference bias arising from reads containing the alternative allele that do not map to the reference transcriptome or ambiguous mapping bias caused by reads containing the reference allele that map differently from reads containing the alternative allele. We present Ornaments, a computational tool for rapid and accurate estimation of allele-specific transcript expression at unphased heterozygous loci from RNA-seq reads while correcting for allele-specific read mapping biases. Ornaments removes reference bias by mapping reads to a personalized transcriptome and ambiguous mapping bias by probabilistically assigning reads to multiple transcripts and variant loci they map to. Ornaments is a lightweight extension of kallisto, a popular tool for fast RNA-seq quantification, that improves the efficiency and accuracy of WASP, a popular tool for bias correction in allele-specific read mapping. In experiments with simulated and human lymphoblastoid cell-line RNA-seq reads with the genomes of the 1000 Genomes Project, we demonstrate that Ornaments improves the accuracy of WASP and kallisto, is nearly as efficient as kallisto, and is an order of magnitude faster than WASP per sample, with the additional cost of constructing a personalized index for multiple samples. Additionally, we show that Ornaments finds imprinted transcripts with higher sensitivity than WASP, which detects imprinted signals only at gene level.
Collapse
Affiliation(s)
- Abhinav Adduri
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Seyoung Kim
- Department of Epidemiology, University of Pittsburgh, Pittsburgh, PA 15261, USA.
| |
Collapse
|
2
|
Antontseva EV, Degtyareva AO, Korbolina EE, Damarov IS, Merkulova TI. Human-genome single nucleotide polymorphisms affecting transcription factor binding and their role in pathogenesis. Vavilovskii Zhurnal Genet Selektsii 2023; 27:662-675. [PMID: 37965371 PMCID: PMC10641029 DOI: 10.18699/vjgb-23-77] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 03/24/2023] [Accepted: 03/30/2023] [Indexed: 11/16/2023] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most common type of variation in the human genome. The vast majority of SNPs identified in the human genome do not have any effect on the phenotype; however, some can lead to changes in the function of a gene or the level of its expression. Most SNPs associated with certain traits or pathologies are mapped to regulatory regions of the genome and affect gene expression by changing transcription factor binding sites. In recent decades, substantial effort has been invested in searching for such regulatory SNPs (rSNPs) and understanding the mechanisms by which they lead to phenotypic differences, primarily to individual differences in susceptibility to diseases and in sensitivity to drugs. The development of the NGS (next-generation sequencing) technology has contributed not only to the identification of a huge number of SNPs and to the search for their association (genome-wide association studies, GWASs) with certain diseases or phenotypic manifestations, but also to the development of more productive approaches to their functional annotation. It should be noted that the presence of an association does not allow one to identify a functional, truly disease-associated DNA sequence variant among multiple marker SNPs that are detected due to linkage disequilibrium. Moreover, determination of associations of genetic variants with a disease does not provide information about the functionality of these variants, which is necessary to elucidate the molecular mechanisms of the development of pathology and to design effective methods for its treatment and prevention. In this regard, the functional analysis of SNPs annotated in the GWAS catalog, both at the genome-wide level and at the level of individual SNPs, became especially relevant in recent years. A genome-wide search for potential rSNPs is possible without any prior knowledge of their association with a trait. Thus, mapping expression quantitative trait loci (eQTLs) makes it possible to identify an SNP for which - among transcriptomes of homozygotes and heterozygotes for its various alleles - there are differences in the expression level of certain genes, which can be located at various distances from the SNP. To predict rSNPs, approaches based on searches for allele-specific events in RNA-seq, ChIP-seq, DNase-seq, ATAC-seq, MPRA, and other data are also used. Nonetheless, for a more complete functional annotation of such rSNPs, it is necessary to establish their association with a trait, in particular, with a predisposition to a certain pathology or sensitivity to drugs. Thus, approaches to finding SNPs important for the development of a trait can be categorized into two groups: (1) starting from data on an association of SNPs with a certain trait, (2) starting from the determination of allele-specific changes at the molecular level (in a transcriptome or regulome). Only comprehensive use of strategically different approaches can considerably enrich our knowledge about the role of genetic determinants in the molecular mechanisms of trait formation, including predisposition to multifactorial diseases.
Collapse
Affiliation(s)
- E V Antontseva
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - A O Degtyareva
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - E E Korbolina
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - I S Damarov
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - T I Merkulova
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
3
|
Murani E, Hadlich F. Exploration of genotype-by-environment interactions affecting gene expression responses in porcine immune cells. Front Genet 2023; 14:1157267. [PMID: 37007953 PMCID: PMC10061014 DOI: 10.3389/fgene.2023.1157267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 03/06/2023] [Indexed: 03/18/2023] Open
Abstract
As one of the keys to healthy performance, robustness of farm animals is gaining importance, and with this comes increasing interest in genetic dissection of genotype-by-environment interactions (G×E). Changes in gene expression are among the most sensitive responses conveying adaptation to environmental stimuli. Environmentally responsive regulatory variation thus likely plays a central role in G×E. In the present study, we set out to detect action of environmentally responsive cis-regulatory variation by the analysis of condition-dependent allele specific expression (cd-ASE) in porcine immune cells. For this, we harnessed mRNA-sequencing data of peripheral blood mononuclear cells (PBMCs) stimulated in vitro with lipopolysaccharide, dexamethasone, or their combination. These treatments mimic common challenges such as bacterial infection or stress, and induce vast transcriptome changes. About two thirds of the examined loci showed significant ASE in at least one treatment, and out of those about ten percent exhibited cd-ASE. Most of the ASE variants were not yet reported in the PigGTEx Atlas. Genes showing cd-ASE were enriched in cytokine signaling in immune system and include several key candidates for animal health. In contrast, genes showing no ASE featured cell-cycle related functions. We confirmed LPS-dependent ASE for one of the top candidates, SOD2, which ranks among the major response genes in LPS-stimulated monocytes. The results of the present study demonstrate the potential of in vitro cell models coupled with cd-ASE analysis for the investigation of G×E in farm animals. The identified loci may benefit efforts to unravel the genetic basis of robustness and improvement of health and welfare in pigs.
Collapse
|
4
|
Mu W, Sarkar H, Srivastava A, Choi K, Patro R, Love MI. Airpart: interpretable statistical models for analyzing allelic imbalance in single-cell datasets. Bioinformatics 2022; 38:2773-2780. [PMID: 35561168 PMCID: PMC9113279 DOI: 10.1093/bioinformatics/btac212] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 03/05/2022] [Accepted: 04/05/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Allelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation, which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial- or time-dependent AI signals may be dampened or not detected. RESULTS We introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing data, or dynamics AI from other spatially or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower Root Mean Square Error (RMSE) of allelic ratio estimates than existing methods. In real data, airpart identified differential allelic imbalance patterns across cell states and could be used to define trends of AI signal over spatial or time axes. AVAILABILITY AND IMPLEMENTATION The airpart package is available as an R/Bioconductor package at https://bioconductor.org/packages/airpart. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wancen Mu
- To whom correspondence should be addressed. or
| | - Hirak Sarkar
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | | | | | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | | |
Collapse
|
5
|
Francisco Junior RDS, Temerozo JR, Ferreira CDS, Martins Y, Souza TML, Medina-Acosta E, de Vasconcelos ATR. Differential haplotype expression in class I MHC genes during SARS-CoV-2 infection of human lung cell lines. Front Immunol 2022; 13:1101526. [PMID: 36818472 PMCID: PMC9929942 DOI: 10.3389/fimmu.2022.1101526] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 12/19/2022] [Indexed: 02/05/2023] Open
Abstract
Introduction Cell entry of SARS-CoV-2 causes genome-wide disruption of the transcriptional profiles of genes and biological pathways involved in the pathogenesis of COVID-19. Expression allelic imbalance is characterized by a deviation from the Mendelian expected 1:1 expression ratio and is an important source of allele-specific heterogeneity. Expression allelic imbalance can be measured by allele-specific expression analysis (ASE) across heterozygous informative expressed single nucleotide variants (eSNVs). ASE reflects many regulatory biological phenomena that can be assessed by combining genome and transcriptome information. ASE contributes to the interindividual variability associated with the disease. We aim to estimate the transcriptome-wide impact of SARS-CoV-2 infection by analyzing eSNVs. Methods We compared ASE profiles in the human lung cell lines Calu-3, A459, and H522 before and after infection with SARS-CoV-2 using RNA-Seq experiments. Results We identified 34 differential ASE (DASE) sites in 13 genes (HLA-A, HLA-B, HLA-C, BRD2, EHD2, GFM2, GSPT1, HAVCR1, MAT2A, NQO2, SUPT6H, TNFRSF11A, UMPS), all of which are enriched in protein binding functions and play a role in COVID-19. Most DASE sites were assigned to the MHC class I locus and were predominantly upregulated upon infection. DASE sites in the MHC class I locus also occur in iPSC-derived airway epithelium basal cells infected with SARS-CoV-2. Using an RNA-Seq haplotype reconstruction approach, we found DASE sites and adjacent eSNVs in phase (i.e., predicted on the same DNA strand), demonstrating differential haplotype expression upon infection. We found a bias towards the expression of the HLA alleles with a higher binding affinity to SARS-CoV-2 epitopes. Discussion Independent of gene expression compensation, SARS-CoV-2 infection of human lung cell lines induces transcriptional allelic switching at the MHC loci. This suggests a response mechanism to SARS-CoV-2 infection that swaps HLA alleles with poor epitope binding affinity, an expectation supported by publicly available proteome data.
Collapse
Affiliation(s)
| | - Jairo R Temerozo
- Laboratory on Thymus Research, Oswaldo Cruz Institute (Fiocruz), Rio de Janeiro, Brazil.,National Institute of Science and Technology on Neuroimmunomodulation, Rio de Janeiro, Brazil
| | - Cristina Dos Santos Ferreira
- Bioinformatics Laboratory (LABINFO), National Laboratory of Scientific Computation (LNCC/MCTIC), Petrópolis, Brazil
| | - Yasmmin Martins
- Instituto de Cálculo, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires (FCEyN-UBA), Buenos Aires, Argentina
| | - Thiago Moreno L Souza
- Laboratory of Immunopharmacology, Oswaldo Cruz Institute (IOC), Oswaldo Cruz Foundation (Fiocruz), Rio de Janeiro, Brazil.,Center for Technological Development in Health (CDTS), National Institute for Science and Technology on Innovation on Neglected Diseases Neglected Populations (INCT/IDNP), Oswaldo Cruz Foundation (Fiocruz), Rio de Janeiro, Brazil
| | - Enrique Medina-Acosta
- Molecular Identification and Diagnostics Unit (NUDIM), Laboratory of Biotechnology, Center for Biosciences and Biotechnology, Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Campos dos Goytacazes, Brazil
| | | |
Collapse
|
6
|
Sherbina K, León-Novelo LG, Nuzhdin SV, McIntyre LM, Marroni F. Power calculator for detecting allelic imbalance using hierarchical Bayesian model. BMC Res Notes 2021; 14:436. [PMID: 34838135 PMCID: PMC8626927 DOI: 10.1186/s13104-021-05851-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 11/15/2021] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Allelic imbalance (AI) is the differential expression of the two alleles in a diploid. AI can vary between tissues, treatments, and environments. Methods for testing AI exist, but methods are needed to estimate type I error and power for detecting AI and difference of AI between conditions. As the costs of the technology plummet, what is more important: reads or replicates? RESULTS We find that a minimum of 2400, 480, and 240 allele specific reads divided equally among 12, 5, and 3 replicates is needed to detect a 10, 20, and 30%, respectively, deviation from allelic balance in a condition with power > 80%. A minimum of 960 and 240 allele specific reads divided equally among 8 replicates is needed to detect a 20 or 30% difference in AI between conditions with comparable power. Higher numbers of replicates increase power more than adding coverage without affecting type I error. We provide a Python package that enables simulation of AI scenarios and enables individuals to estimate type I error and power in detecting AI and differences in AI between conditions.
Collapse
Affiliation(s)
- Katrina Sherbina
- Quantitative and Computational Biology Section, University of Southern California, Los Angeles, CA, 90046, USA
| | - Luis G León-Novelo
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston-School of Public Health, Houston, TX, 77030, USA
| | - Sergey V Nuzhdin
- Molecular and Computational Biology Section, University of Southern California, Los Angeles, CA, 90046, USA
| | - Lauren M McIntyre
- Genetics Institute and Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32603, USA
| | - Fabio Marroni
- Dipartimento di Scienze Agroalimentari, Ambientali e Animali, Università di Udine, 33100, Udine, Italy.
| |
Collapse
|
7
|
Ye Z, Jiang X, Pfrender ME, Lynch M. Genome-Wide Allele-Specific Expression in Obligately Asexual Daphnia pulex and the Implications for the Genetic Basis of Asexuality. Genome Biol Evol 2021; 13:6415829. [PMID: 34726699 PMCID: PMC8598174 DOI: 10.1093/gbe/evab243] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2021] [Indexed: 01/17/2023] Open
Abstract
Although obligately asexual lineages are thought to experience selective disadvantages associated with reduced efficiency of fixing beneficial mutations and purging deleterious mutations, such lineages are phylogenetically and geographically widespread. However, despite several genome-wide association studies, little is known about the genetic elements underlying the origin of obligate asexuality and how they spread. Because many obligately asexual lineages have hybrid origins, it has been suggested that asexuality is caused by the unbalanced expression of alleles from the hybridizing species. Here, we investigate this idea by identifying genes with allele-specific expression (ASE) in a Daphnia pulex population, in which obligate parthenogens (OP) and cyclical parthenogens (CP) coexist, with the OP clones having been originally derived from hybridization between CP D. pulex and its sister species, Daphnia pulicaria. OP D. pulex have significantly more ASE genes (ASEGs) than do CP D. pulex. Whole-genomic comparison of OP and CP clones revealed ∼15,000 OP-specific markers and 42 consistent ASEGs enriched in marker-defined regions. Ten of the 42 ASEGs have alleles coding for different protein sequences, suggesting functional differences between the products of the two parental alleles. At least three of these ten genes appear to be directly involved in meiosis-related processes, for example, RanBP2 can cause abnormal chromosome segregation in anaphase I, and the presence of Wee1 in immature oocytes leads to failure to enter meiosis II. These results provide a guide for future molecular resolution of the genetic basis of the transition to ameiotic parthenogenesis.
Collapse
Affiliation(s)
- Zhiqiang Ye
- Center for Mechanisms of Evolution, Arizona State University, Tempe, Arizona
| | | | - Michael E Pfrender
- Department of Biological Sciences and Environmental Change Initiative, University of Notre Dame, Notre Dame, Indiana
| | - Michael Lynch
- Center for Mechanisms of Evolution, Arizona State University, Tempe, Arizona
| |
Collapse
|
8
|
Alonso L, Piron A, Morán I, Guindo-Martínez M, Bonàs-Guarch S, Atla G, Miguel-Escalada I, Royo R, Puiggròs M, Garcia-Hurtado X, Suleiman M, Marselli L, Esguerra JLS, Turatsinze JV, Torres JM, Nylander V, Chen J, Eliasson L, Defrance M, Amela R, Mulder H, Gloyn AL, Groop L, Marchetti P, Eizirik DL, Ferrer J, Mercader JM, Cnop M, Torrents D. TIGER: The gene expression regulatory variation landscape of human pancreatic islets. Cell Rep 2021; 37:109807. [PMID: 34644572 PMCID: PMC8864863 DOI: 10.1016/j.celrep.2021.109807] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 07/23/2021] [Accepted: 09/16/2021] [Indexed: 12/30/2022] Open
Abstract
Genome-wide association studies (GWASs) identified hundreds of signals associated with type 2 diabetes (T2D). To gain insight into their underlying molecular mechanisms, we have created the translational human pancreatic islet genotype tissue-expression resource (TIGER), aggregating >500 human islet genomic datasets from five cohorts in the Horizon 2020 consortium T2DSystems. We impute genotypes using four reference panels and meta-analyze cohorts to improve the coverage of expression quantitative trait loci (eQTL) and develop a method to combine allele-specific expression across samples (cASE). We identify >1 million islet eQTLs, 53 of which colocalize with T2D signals. Among them, a low-frequency allele that reduces T2D risk by half increases CCND2 expression. We identify eight cASE colocalizations, among which we found a T2D-associated SLC30A8 variant. We make all data available through the TIGER portal (http://tiger.bsc.es), which represents a comprehensive human islet genomic data resource to elucidate how genetic variation affects islet function and translates into therapeutic insight and precision medicine for T2D.
Collapse
Affiliation(s)
- Lorena Alonso
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Anthony Piron
- ULB Center for Diabetes Research, Université Libre de Bruxelles, Brussels 1070, Belgium; Interuniversity Institute of Bioinformatics in Brussels (IB2), Brussels 1050, Belgium
| | - Ignasi Morán
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Marta Guindo-Martínez
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Sílvia Bonàs-Guarch
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona 08003, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM) Barcelona 08013, Spain
| | - Goutham Atla
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona 08003, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM) Barcelona 08013, Spain
| | - Irene Miguel-Escalada
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona 08003, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM) Barcelona 08013, Spain
| | - Romina Royo
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Montserrat Puiggròs
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Xavier Garcia-Hurtado
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona 08003, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM) Barcelona 08013, Spain
| | - Mara Suleiman
- Department of Clinical and Experimental Medicine and AOUP Cisanello University Hospital, University of Pisa, Pisa 56126, Italy
| | - Lorella Marselli
- Department of Clinical and Experimental Medicine and AOUP Cisanello University Hospital, University of Pisa, Pisa 56126, Italy
| | - Jonathan L S Esguerra
- Unit of Islet Cell Exocytosis, Lund University Diabetes Centre, Malmö 214 28, Sweden
| | | | - Jason M Torres
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK; Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7LF, UK
| | - Vibe Nylander
- Oxford Centre for Diabetes, Endocrinology, and Metabolism, Radcliffe Department of Medicine, University of Oxford, Oxford OX3 7LE, UK
| | - Ji Chen
- Exeter Centre of Excellence for Diabetes Research (EXCEED), University of Exeter Medical School, Exeter EX4 4PY, UK
| | - Lena Eliasson
- Unit of Islet Cell Exocytosis, Lund University Diabetes Centre, Malmö 214 28, Sweden
| | - Matthieu Defrance
- ULB Center for Diabetes Research, Université Libre de Bruxelles, Brussels 1070, Belgium
| | - Ramon Amela
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Hindrik Mulder
- Unit of Molecular Metabolism, Lund University Diabetes Centre, Malmö 214 28, Sweden
| | - Anna L Gloyn
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7LF, UK; Oxford Centre for Diabetes, Endocrinology, and Metabolism, Radcliffe Department of Medicine, University of Oxford, Oxford OX3 7LE, UK; Division of Endocrinology, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94304, USA; NIHR Oxford Biomedical Research Centre, Churchill Hospital, Oxford OX3 7DQ, UK; Stanford Diabetes Research Centre, Stanford University, Stanford, CA 94305, USA
| | - Leif Groop
- Unit of Islet Cell Exocytosis, Lund University Diabetes Centre, Malmö 214 28, Sweden; Unit of Molecular Metabolism, Lund University Diabetes Centre, Malmö 214 28, Sweden; Finnish Institute of Molecular Medicine Finland (FIMM), Helsinki University, Helsinki 00014, Finland
| | - Piero Marchetti
- Department of Clinical and Experimental Medicine and AOUP Cisanello University Hospital, University of Pisa, Pisa 56126, Italy
| | - Decio L Eizirik
- ULB Center for Diabetes Research, Université Libre de Bruxelles, Brussels 1070, Belgium; WELBIO, Université Libre de Bruxelles, Brussels 1050, Belgium
| | - Jorge Ferrer
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona 08003, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM) Barcelona 08013, Spain; Section of Epigenomics and Disease, Department of Medicine, Imperial College London, London SW7 2AZ, UK
| | - Josep M Mercader
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain; Programs in Metabolism and Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
| | - Miriam Cnop
- ULB Center for Diabetes Research, Université Libre de Bruxelles, Brussels 1070, Belgium; Division of Endocrinology, Erasmus Hospital, Université Libre de Bruxelles, Brussels 1070, Belgium.
| | - David Torrents
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona 08010, Spain.
| |
Collapse
|
9
|
Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, Ranson M, Ashford B. Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology. Brief Bioinform 2021; 22:6330938. [PMID: 34329375 DOI: 10.1093/bib/bbab259] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 06/14/2021] [Accepted: 06/18/2021] [Indexed: 12/13/2022] Open
Abstract
Significant innovations in next-generation sequencing techniques and bioinformatics tools have impacted our appreciation and understanding of RNA. Practical RNA sequencing (RNA-Seq) applications have evolved in conjunction with sequence technology and bioinformatic tools advances. In most projects, bulk RNA-Seq data is used to measure gene expression patterns, isoform expression, alternative splicing and single-nucleotide polymorphisms. However, RNA-Seq holds far more hidden biological information including details of copy number alteration, microbial contamination, transposable elements, cell type (deconvolution) and the presence of neoantigens. Recent novel and advanced bioinformatic algorithms developed the capacity to retrieve this information from bulk RNA-Seq data, thus broadening its scope. The focus of this review is to comprehend the emerging bulk RNA-Seq-based analyses, emphasizing less familiar and underused applications. In doing so, we highlight the power of bulk RNA-Seq in providing biological insights.
Collapse
Affiliation(s)
- Amarinder Singh Thind
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Isha Monga
- Columbia University, New York City, NY, USA
| | | | - Pallawi Kumari
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Kiran Dindhoria
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | | | - Marie Ranson
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Bruce Ashford
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| |
Collapse
|
10
|
Abstract
Diploidy has profound implications for population genetics and susceptibility to genetic diseases. Although two copies are present for most genes in the human genome, they are not necessarily both active or active at the same level in a given individual. Genomic imprinting, resulting in exclusive or biased expression in favor of the allele of paternal or maternal origin, is now believed to affect hundreds of human genes. A far greater number of genes display unequal expression of gene copies due to cis-acting genetic variants that perturb gene expression. The availability of data generated by RNA sequencing applied to large numbers of individuals and tissue types has generated unprecedented opportunities to assess the contribution of genetic variation to allelic imbalance in gene expression. Here we review the insights gained through the analysis of these data about the extent of the genetic contribution to allelic expression imbalance, the tools and statistical models for gene expression imbalance, and what the results obtained reveal about the contribution of genetic variants that alter gene expression to complex human diseases and phenotypes.
Collapse
Affiliation(s)
- Siobhan Cleary
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway H91 H3CY, Ireland;
| | - Cathal Seoighe
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway H91 H3CY, Ireland;
| |
Collapse
|
11
|
Degtyareva AO, Antontseva EV, Merkulova TI. Regulatory SNPs: Altered Transcription Factor Binding Sites Implicated in Complex Traits and Diseases. Int J Mol Sci 2021; 22:6454. [PMID: 34208629 PMCID: PMC8235176 DOI: 10.3390/ijms22126454] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 06/15/2021] [Accepted: 06/15/2021] [Indexed: 12/19/2022] Open
Abstract
The vast majority of the genetic variants (mainly SNPs) associated with various human traits and diseases map to a noncoding part of the genome and are enriched in its regulatory compartment, suggesting that many causal variants may affect gene expression. The leading mechanism of action of these SNPs consists in the alterations in the transcription factor binding via creation or disruption of transcription factor binding sites (TFBSs) or some change in the affinity of these regulatory proteins to their cognate sites. In this review, we first focus on the history of the discovery of regulatory SNPs (rSNPs) and systematized description of the existing methodical approaches to their study. Then, we brief the recent comprehensive examples of rSNPs studied from the discovery of the changes in the TFBS sequence as a result of a nucleotide substitution to identification of its effect on the target gene expression and, eventually, to phenotype. We also describe state-of-the-art genome-wide approaches to identification of regulatory variants, including both making molecular sense of genome-wide association studies (GWAS) and the alternative approaches the primary goal of which is to determine the functionality of genetic variants. Among these approaches, special attention is paid to expression quantitative trait loci (eQTLs) analysis and the search for allele-specific events in RNA-seq (ASE events) as well as in ChIP-seq, DNase-seq, and ATAC-seq (ASB events) data.
Collapse
Affiliation(s)
- Arina O. Degtyareva
- Department of Molecular Genetic, Institute of Cytology and Genetics, 630090 Novosibirsk, Russia; (A.O.D.); (E.V.A.)
| | - Elena V. Antontseva
- Department of Molecular Genetic, Institute of Cytology and Genetics, 630090 Novosibirsk, Russia; (A.O.D.); (E.V.A.)
| | - Tatiana I. Merkulova
- Department of Molecular Genetic, Institute of Cytology and Genetics, 630090 Novosibirsk, Russia; (A.O.D.); (E.V.A.)
- Department of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| |
Collapse
|
12
|
Mendelevich A, Vinogradova S, Gupta S, Mironov AA, Sunyaev SR, Gimelbrant AA. Replicate sequencing libraries are important for quantification of allelic imbalance. Nat Commun 2021; 12:3370. [PMID: 34099647 PMCID: PMC8184992 DOI: 10.1038/s41467-021-23544-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 04/30/2021] [Indexed: 12/13/2022] Open
Abstract
A sensitive approach to quantitative analysis of transcriptional regulation in diploid organisms is analysis of allelic imbalance (AI) in RNA sequencing (RNA-seq) data. A near-universal practice in such studies is to prepare and sequence only one library per RNA sample. We present theoretical and experimental evidence that data from a single RNA-seq library is insufficient for reliable quantification of the contribution of technical noise to the observed AI signal; consequently, reliance on one-replicate experimental design can lead to unaccounted-for variation in error rates in allele-specific analysis. We develop a computational approach, Qllelic, that accurately accounts for technical noise by making use of replicate RNA-seq libraries. Testing on new and existing datasets shows that application of Qllelic greatly decreases false positive rate in allele-specific analysis while conserving appropriate signal, and thus greatly improves reproducibility of AI estimates. We explore sources of technical overdispersion in observed AI signal and conclude by discussing design of RNA-seq studies addressing two biologically important questions: quantification of transcriptome-wide AI in one sample, and differential analysis of allele-specific expression between samples.
Collapse
Affiliation(s)
- Asia Mendelevich
- Skolkovo Institute of Science and Technology, Moscow, Russia.
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA.
| | - Svetlana Vinogradova
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA
| | - Saumya Gupta
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, USA
| | - Andrey A Mironov
- Lomonosov Moscow State University, Faculty of Bioengineering and Bioinformatics, Moscow, Russia
- Institute of Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Shamil R Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, USA
- Division of Genetics, Brigham and Women's Hospital, Boston, USA
| | - Alexander A Gimelbrant
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA.
- Broad Institute of Harvard and MIT, Cambridge, USA.
| |
Collapse
|
13
|
Fan J, Wang X, Xiao R, Li M. Detecting cell-type-specific allelic expression imbalance by integrative analysis of bulk and single-cell RNA sequencing data. PLoS Genet 2021; 17:e1009080. [PMID: 33661921 PMCID: PMC7963069 DOI: 10.1371/journal.pgen.1009080] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Revised: 03/16/2021] [Accepted: 02/09/2021] [Indexed: 12/27/2022] Open
Abstract
Allelic expression imbalance (AEI), quantified by the relative expression of two alleles of a gene in a diploid organism, can help explain phenotypic variations among individuals. Traditional methods detect AEI using bulk RNA sequencing (RNA-seq) data, a data type that averages out cell-to-cell heterogeneity in gene expression across cell types. Since the patterns of AEI may vary across different cell types, it is desirable to study AEI in a cell-type-specific manner. Although this can be achieved by single-cell RNA sequencing (scRNA-seq), it requires full-length transcript to be sequenced in single cells of a large number of individuals, which are still cost prohibitive to generate. To overcome this limitation and utilize the vast amount of existing disease relevant bulk tissue RNA-seq data, we developed BSCET, which enables the characterization of cell-type-specific AEI in bulk RNA-seq data by integrating cell type composition information inferred from a small set of scRNA-seq samples, possibly obtained from an external dataset. By modeling covariate effect, BSCET can also detect genes whose cell-type-specific AEI are associated with clinical factors. Through extensive benchmark evaluations, we show that BSCET correctly detected genes with cell-type-specific AEI and differential AEI between healthy and diseased samples using bulk RNA-seq data. BSCET also uncovered cell-type-specific AEIs that were missed in bulk data analysis when the directions of AEI are opposite in different cell types. We further applied BSCET to two pancreatic islet bulk RNA-seq datasets, and detected genes showing cell-type-specific AEI that are related to the progression of type 2 diabetes. Since bulk RNA-seq data are easily accessible, BSCET provides a convenient tool to integrate information from scRNA-seq data to gain insight on AEI with cell type resolution. Results from such analysis will advance our understanding of cell type contributions in human diseases. Detection of allelic expression imbalance (AEI), a phenomenon where the two alleles of a gene differ in their expression magnitude, is a key step towards the understanding of phenotypic variations among individuals. Existing methods detect AEI using bulk RNA sequencing (RNA-seq) data and ignore AEI variations among different cell types. Although single-cell RNA sequencing (scRNA-seq) has enabled the characterization of cell-to-cell heterogeneity in gene expression, the high costs have limited its application in AEI analysis. To overcome this limitation, we developed BSCET to characterize cell-type-specific AEI using the widely available bulk RNA-seq data by integrating cell-type composition information inferred from scRNA-seq samples. Since the degree of AEI may vary with disease phenotypes, we further extended BSCET to detect genes whose cell-type-specific AEIs are associated with clinical factors. Through extensive benchmark evaluations and analyses of two pancreatic islet bulk RNA-seq datasets, we demonstrated BSCET’s ability to refine bulk-level AEI to cell-type resolution, and to identify genes whose cell-type-specific AEIs are associated with the progression of type 2 diabetes. With the vast amount of easily accessible bulk RNA-seq data, we believe BSCET will be a valuable tool for elucidating cell type contributions in human diseases.
Collapse
Affiliation(s)
- Jiaxin Fan
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Xuran Wang
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Rui Xiao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- * E-mail: (RX); (ML)
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- * E-mail: (RX); (ML)
| |
Collapse
|
14
|
Tomlinson MJ, Polson SW, Qiu J, Lake JA, Lee W, Abasht B. Investigation of allele specific expression in various tissues of broiler chickens using the detection tool VADT. Sci Rep 2021; 11:3968. [PMID: 33597613 PMCID: PMC7889858 DOI: 10.1038/s41598-021-83459-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Accepted: 02/01/2021] [Indexed: 12/30/2022] Open
Abstract
Differential abundance of allelic transcripts in a diploid organism, commonly referred to as allele specific expression (ASE), is a biologically significant phenomenon and can be examined using single nucleotide polymorphisms (SNPs) from RNA-seq. Quantifying ASE aids in our ability to identify and understand cis-regulatory mechanisms that influence gene expression, and thereby assist in identifying causal mutations. This study examines ASE in breast muscle, abdominal fat, and liver of commercial broiler chickens using variants called from a large sub-set of the samples (n = 68). ASE analysis was performed using a custom software called VCF ASE Detection Tool (VADT), which detects ASE of biallelic SNPs using a binomial test. On average ~ 174,000 SNPs in each tissue passed our filtering criteria and were considered informative, of which ~ 24,000 (~ 14%) showed ASE. Of all ASE SNPs, only 3.7% exhibited ASE in all three tissues, with ~ 83% showing ASE specific to a single tissue. When ASE genes (genes containing ASE SNPs) were compared between tissues, the overlap among all three tissues increased to 20.1%. Our results indicate that ASE genes show tissue-specific enrichment patterns, but all three tissues showed enrichment for pathways involved in translation.
Collapse
Affiliation(s)
- M Joseph Tomlinson
- Department of Animal and Food Sciences, University of Delaware, 531 South College Ave, Newark, DE, 19716, USA.,Center for Bioinformatics and Computational Biology, University of Delaware, Newark, USA
| | - Shawn W Polson
- Department of Computer and Information Sciences, University of Delaware, Newark, USA.,Department of Biological Sciences, University of Delaware, Newark, USA.,Center for Bioinformatics and Computational Biology, University of Delaware, Newark, USA
| | - Jing Qiu
- Department of Applied Economics and Statistics, University of Delaware, Newark, USA.,Center for Bioinformatics and Computational Biology, University of Delaware, Newark, USA
| | - Juniper A Lake
- Department of Animal and Food Sciences, University of Delaware, 531 South College Ave, Newark, DE, 19716, USA.,Center for Bioinformatics and Computational Biology, University of Delaware, Newark, USA
| | - William Lee
- Maple Leaf Farms, Inc., Leesburg, IN, 46538, USA
| | - Behnam Abasht
- Department of Animal and Food Sciences, University of Delaware, 531 South College Ave, Newark, DE, 19716, USA. .,Center for Bioinformatics and Computational Biology, University of Delaware, Newark, USA.
| |
Collapse
|
15
|
aScan: A Novel Method for the Study of Allele Specific Expression in Single Individuals. J Mol Biol 2021; 433:166829. [PMID: 33508309 DOI: 10.1016/j.jmb.2021.166829] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 01/08/2021] [Accepted: 01/09/2021] [Indexed: 02/06/2023]
Abstract
In diploid organisms, two copies of each allele are normally inherited from parents. Paternal and maternal alleles can be regulated and expressed unequally, which is referred to as allele-specific expression (ASE). In this work, we present aScan, a novel method for the identification of ASE from the analysis of matched individual genomic and RNA sequencing data. By performing extensive analyses of both real and simulated data, we demonstrate that aScan can correctly identify ASE with high accuracy and sensitivity in different experimental settings. Additionally, by applying our method to a small cohort of individuals that are not included in publicly available databases of human genetic variation, we outline the value of possible applications of ASE analysis in single individuals for deriving a more accurate annotation of "private" low-frequency genetic variants associated with regulatory effects on transcription. All in all, we believe that aScan will represent a beneficial addition to the set of bioinformatics tools for the analysis of ASE. Finally, while our method was initially conceived for the analysis of RNA-seq data, it can in principle be applied to any quantitative NGS assay for which matched genotypic and expression data are available. AVAILABILITY: aScan is currently available in the form of an open source standalone software package at: https://github.com/Federico77z/aScan/. aScan version 1.0.3, available at https://github.com/Federico77z/aScan/releases/tag/1.0.3, has been used for all the analyses included in this manuscript. A Docker image of the tool has also been made available at https://github.com/pmandreoli/aScanDocker.
Collapse
|
16
|
Cooper RD, Shaffer HB. Allele-specific expression and gene regulation help explain transgressive thermal tolerance in non-native hybrids of the endangered California tiger salamander (Ambystoma californiense). Mol Ecol 2021; 30:987-1004. [PMID: 33338297 DOI: 10.1111/mec.15779] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 11/30/2020] [Accepted: 12/11/2020] [Indexed: 01/26/2023]
Abstract
Hybridization between native and non-native species is an ongoing global conservation threat. Hybrids that exhibit traits and tolerances that surpass parental values are of particular concern, given their potential to outperform native species. Effective management of hybrid populations requires an understanding of both physiological performance and the underlying mechanisms that drive transgressive hybrid traits. Here, we explore several aspects of the hybridization between the endangered California tiger salamander (Ambystoma californiense; CTS) and the introduced barred tiger salamander (Ambystoma mavortium; BTS). We assayed critical thermal maximum (CTMax) to compare the ability of CTS, BTS and F1 hybrids to tolerate acute thermal stress, and found that hybrids exhibit a wide range of CTMax values, with 33% (4/12) able to tolerate temperatures greater than either parent. We then quantified the genomic response, measured at the RNA transcript level, of each salamander, to explore the mechanisms underlying thermal tolerance strategies. We found that CTS and BTS have strikingly different values and tissue-specific patterns of overall gene expression, with hybrids expressing intermediate values. F1 hybrids display abundant and variable degrees of allele-specific expression (ASE), likely arising from extensive compensatory evolution in gene regulatory mechanisms between CTS and BTS. We found evidence that the proportion of genes with allelic imbalance in individual hybrids correlates with their CTMax, suggesting a link between ASE and expanded thermal tolerance that may contribute to the success of hybrid salamanders in California. Future climate change may further complicate management of CTS if hybrid salamanders are better equipped to deal with rising temperatures.
Collapse
Affiliation(s)
- Robert D Cooper
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, USA.,La Kretz Center for California Conservation Science, Institute of the Environment and Sustainability, University of California, Los Angeles, CA, USA
| | - H Bradley Shaffer
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, USA.,La Kretz Center for California Conservation Science, Institute of the Environment and Sustainability, University of California, Los Angeles, CA, USA
| |
Collapse
|
17
|
Chandradoss KR, Chawla B, Dhuppar S, Nayak R, Ramachandran R, Kurukuti S, Mazumder A, Sandhu KS. CTCF-Mediated Genome Architecture Regulates the Dosage of Mitotically Stable Mono-allelic Expression of Autosomal Genes. Cell Rep 2020; 33:108302. [PMID: 33113374 DOI: 10.1016/j.celrep.2020.108302] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Revised: 07/31/2020] [Accepted: 09/30/2020] [Indexed: 11/30/2022] Open
Abstract
The mechanisms that guide the clonally stable random mono-allelic expression of autosomal genes remain enigmatic. We show that (1) mono-allelically expressed (MAE) genes are assorted and insulated from bi-allelically expressed (BAE) genes through CTCF-mediated chromatin loops; (2) the cell-type-specific dynamics of mono-allelic expression coincides with the gain and loss of chromatin insulator sites; (3) dosage of MAE genes is more sensitive to the loss of chromatin insulation than that of BAE genes; and (4) inactive alleles of MAE genes are significantly more insulated than active alleles and are de-repressed upon CTCF depletion. This alludes to a topology wherein the inactive alleles of MAE genes are insulated from the spatial interference of transcriptional states from the neighboring bi-allelic domains via CTCF-mediated loops. We propose that CTCF functions as a typical insulator on inactive alleles, but facilitates transcription through enhancer-linking on active allele of MAE genes, indicating widespread allele-specific regulatory roles of CTCF.
Collapse
Affiliation(s)
- Keerthivasan Raanin Chandradoss
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER)-Mohali, Knowledge City, Sector 81, SAS Nagar 140306, India
| | - Bindia Chawla
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER)-Mohali, Knowledge City, Sector 81, SAS Nagar 140306, India
| | - Shivnarayan Dhuppar
- TIFR Centre for Interdisciplinary Sciences, Tata Institute of Fundamental Research (TIFR) Hyderabad, 36/P, Gopanpally Village, Serilingampally Mandal, Hyderabad 500046, India
| | - Rakhee Nayak
- Department of Animal Biology, School of Life Sciences, University of Hyderabad, Prof. C.R. Rao Road, Gachibowli, Hyderabad 500046, India
| | - Rajesh Ramachandran
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER)-Mohali, Knowledge City, Sector 81, SAS Nagar 140306, India
| | - Sreenivasulu Kurukuti
- Department of Animal Biology, School of Life Sciences, University of Hyderabad, Prof. C.R. Rao Road, Gachibowli, Hyderabad 500046, India
| | - Aprotim Mazumder
- TIFR Centre for Interdisciplinary Sciences, Tata Institute of Fundamental Research (TIFR) Hyderabad, 36/P, Gopanpally Village, Serilingampally Mandal, Hyderabad 500046, India
| | - Kuljeet Singh Sandhu
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER)-Mohali, Knowledge City, Sector 81, SAS Nagar 140306, India.
| |
Collapse
|
18
|
Fan J, Hu J, Xue C, Zhang H, Susztak K, Reilly MP, Xiao R, Li M. ASEP: Gene-based detection of allele-specific expression across individuals in a population by RNA sequencing. PLoS Genet 2020; 16:e1008786. [PMID: 32392242 PMCID: PMC7241832 DOI: 10.1371/journal.pgen.1008786] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 05/21/2020] [Accepted: 04/21/2020] [Indexed: 12/16/2022] Open
Abstract
Allele-specific expression (ASE) analysis, which quantifies the relative expression of two alleles in a diploid individual, is a powerful tool for identifying cis-regulated gene expression variations that underlie phenotypic differences among individuals. Existing methods for gene-level ASE detection analyze one individual at a time, therefore failing to account for shared information across individuals. Failure to accommodate such shared information not only reduces power, but also makes it difficult to interpret results across individuals. However, when only RNA sequencing (RNA-seq) data are available, ASE detection across individuals is challenging because the data often include individuals that are either heterozygous or homozygous for the unobserved cis-regulatory SNP, leading to sample heterogeneity as only those heterozygous individuals are informative for ASE, whereas those homozygous individuals have balanced expression. To simultaneously model multi-individual information and account for such heterogeneity, we developed ASEP, a mixture model with subject-specific random effect to account for multi-SNP correlations within the same gene. ASEP only requires RNA-seq data, and is able to detect gene-level ASE under one condition and differential ASE between two conditions (e.g., pre- versus post-treatment). Extensive simulations demonstrated the convincing performance of ASEP under a wide range of scenarios. We applied ASEP to a human kidney RNA-seq dataset, identified ASE genes and validated our results with two published eQTL studies. We further applied ASEP to a human macrophage RNA-seq dataset, identified genes showing evidence of differential ASE between M0 and M1 macrophages, and confirmed our findings by results from cardiometabolic trait-relevant genome-wide association studies. To the best of our knowledge, ASEP is the first method for gene-level ASE detection at the population level that only requires the use of RNA-seq data. With the growing adoption of RNA-seq, we believe ASEP will be well-suited for various ASE studies for human diseases.
Collapse
Affiliation(s)
- Jiaxin Fan
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Jian Hu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Chenyi Xue
- Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York City, New York, United States of America
| | - Hanrui Zhang
- Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York City, New York, United States of America
| | - Katalin Susztak
- Departments of Medicine and Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Muredach P. Reilly
- Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York City, New York, United States of America
- The Irving Institute for Clinical and Translational Research, Columbia University Irving Medical Center, New York City, New York, United States of America
| | - Rui Xiao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
19
|
Muriuki C, Bush SJ, Salavati M, McCulloch ME, Lisowski ZM, Agaba M, Djikeng A, Hume DA, Clark EL. A Mini-Atlas of Gene Expression for the Domestic Goat ( Capra hircus). Front Genet 2019; 10:1080. [PMID: 31749840 PMCID: PMC6844187 DOI: 10.3389/fgene.2019.01080] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 10/09/2019] [Indexed: 12/12/2022] Open
Abstract
Goats (Capra hircus) are an economically important livestock species providing meat and milk across the globe. They are of particular importance in tropical agri-systems contributing to sustainable agriculture, alleviation of poverty, social cohesion, and utilisation of marginal grazing. There are excellent genetic and genomic resources available for goats, including a highly contiguous reference genome (ARS1). However, gene expression information is limited in comparison to other ruminants. To support functional annotation of the genome and comparative transcriptomics, we created a mini-atlas of gene expression for the domestic goat. RNA-Seq analysis of 17 transcriptionally rich tissues and 3 cell-types detected the majority (90%) of predicted protein-coding transcripts and assigned informative gene names to more than 1000 previously unannotated protein-coding genes in the current reference genome for goat (ARS1). Using network-based cluster analysis, we grouped genes according to their expression patterns and assigned those groups of coexpressed genes to specific cell populations or pathways. We describe clusters of genes expressed in the gastro-intestinal tract and provide the expression profiles across tissues of a subset of genes associated with functional traits. Comparative analysis of the goat atlas with the larger sheep gene expression atlas dataset revealed transcriptional similarities between macrophage associated signatures in the sheep and goats sampled in this study. The goat transcriptomic resource complements the large gene expression dataset we have generated for sheep and contributes to the available genomic resources for interpretation of the relationship between genotype and phenotype in small ruminants.
Collapse
Affiliation(s)
- Charity Muriuki
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
- Centre for Tropical Livestock Genetics and Health (CTLGH), Edinburgh, United Kingdom
| | - Stephen J. Bush
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
- Nuffield Department of Clinical Medicine, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
| | - Mazdak Salavati
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
- Centre for Tropical Livestock Genetics and Health (CTLGH), Edinburgh, United Kingdom
| | - Mary E.B. McCulloch
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Zofia M. Lisowski
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Morris Agaba
- Biosciences Eastern and Central Africa - International Livestock Research Institute (BecA - ILRI) Hub, Nairobi, Kenya
| | - Appolinaire Djikeng
- Centre for Tropical Livestock Genetics and Health (CTLGH), Edinburgh, United Kingdom
| | - David A. Hume
- Mater Research Institute-University of Queensland, Woolloongabba, QLD, Australia
| | - Emily L. Clark
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
- Centre for Tropical Livestock Genetics and Health (CTLGH), Edinburgh, United Kingdom
| |
Collapse
|
20
|
Salavati M, Bush SJ, Palma-Vera S, McCulloch MEB, Hume DA, Clark EL. Elimination of Reference Mapping Bias Reveals Robust Immune Related Allele-Specific Expression in Crossbred Sheep. Front Genet 2019; 10:863. [PMID: 31608110 PMCID: PMC6761296 DOI: 10.3389/fgene.2019.00863] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 08/19/2019] [Indexed: 12/13/2022] Open
Abstract
Pervasive allelic variation at both gene and single nucleotide level (SNV) between individuals is commonly associated with complex traits in humans and animals. Allele-specific expression (ASE) analysis, using RNA-Seq, can provide a detailed annotation of allelic imbalance and infer the existence of cis-acting transcriptional regulation. However, variant detection in RNA-Seq data is compromised by biased mapping of reads to the reference DNA sequence. In this manuscript, we describe an unbiased standardized computational pipeline for allele-specific expression analysis using RNA-Seq data, which we have adapted and developed using tools available under open license. The analysis pipeline we present is designed to minimize reference bias while providing accurate profiling of allele-specific expression across tissues and cell types. Using this methodology, we were able to profile pervasive allelic imbalance across tissues and cell types, at both the gene and SNV level, in Texel×Scottish Blackface sheep, using the sheep gene expression atlas data set. ASE profiles were pervasive in each sheep and across all tissue types investigated. However, ASE profiles shared across tissues were limited, and instead, they tended to be highly tissue-specific. These tissue-specific ASE profiles may underlie the expression of economically important traits and could be utilized as weighted SNVs, for example, to improve the accuracy of genomic selection in breeding programs for sheep. An additional benefit of the pipeline is that it does not require parental genotypes and can therefore be applied to other RNA-Seq data sets for livestock, including those available on the Functional Annotation of Animal Genomes (FAANG) data portal. This study is the first global characterization of moderate to extreme ASE in tissues and cell types from sheep. We have applied a robust methodology for ASE profiling to provide both a novel analysis of the multi-dimensional sheep gene expression atlas data set and a foundation for identifying the regulatory and expressed elements of the genome that are driving complex traits in livestock.
Collapse
Affiliation(s)
- Mazdak Salavati
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Edinburgh, United Kingdom
| | - Stephen J. Bush
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Edinburgh, United Kingdom
| | - Sergio Palma-Vera
- Leibniz Institute for Farm Animal Biology (FBN), Institute for Reproductive Biology, Dummerstorf, Germany
| | - Mary E. B. McCulloch
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Edinburgh, United Kingdom
| | - David A. Hume
- Mater Research Institute-University of Queensland, Translational Research Institute, Woolloongabba, QLD, Australia
| | - Emily L. Clark
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Edinburgh, United Kingdom
| |
Collapse
|
21
|
Young R, Lefevre L, Bush SJ, Joshi A, Singh SH, Jadhav SK, Dhanikachalam V, Lisowski ZM, Iamartino D, Summers KM, Williams JL, Archibald AL, Gokhale S, Kumar S, Hume DA. A Gene Expression Atlas of the Domestic Water Buffalo ( Bubalus bubalis). Front Genet 2019; 10:668. [PMID: 31428126 PMCID: PMC6689995 DOI: 10.3389/fgene.2019.00668] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 06/26/2019] [Indexed: 12/19/2022] Open
Abstract
The domestic water buffalo (Bubalus bubalis) makes a major contribution to the global agricultural economy in the form of milk, meat, hides, and draught power. The global water buffalo population is predominantly found in Asia, and per head of population more people depend upon the buffalo than on any other livestock species. Despite its agricultural importance, there are comparatively fewer genomic and transcriptomic resources available for buffalo than for other livestock species. We have generated a large-scale gene expression atlas covering multiple tissue and cell types from all major organ systems collected from three breeds of riverine water buffalo (Mediterranean, Pandharpuri and Bhadawari) and used the network analysis tool Graphia Professional to identify clusters of genes with similar expression profiles. Alongside similar data, we and others have generated for ruminants as part of the Functional Annotation of Animal Genomes Consortium; this comprehensive transcriptome supports functional annotation and comparative analysis of the water buffalo genome.
Collapse
Affiliation(s)
- Rachel Young
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Lucas Lefevre
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Stephen J. Bush
- Nuffield Department of Clinical Medicine, University of Oxford, Oxford, United Kingdom
| | - Akshay Joshi
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | | | | | - Velu Dhanikachalam
- Central Research Station, BAIF Development Research Foundation, Pune, India
| | - Zofia M. Lisowski
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | | | - Kim M. Summers
- Mater Research Institute-University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| | - John L. Williams
- Davies Research Centre, School of Animal and Veterinary Sciences, University of Adelaide, Adelaide, SA, Australia
| | - Alan L. Archibald
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Suresh Gokhale
- Central Research Station, BAIF Development Research Foundation, Pune, India
| | - Satish Kumar
- Centre for Cellular and Molecular Biology, Hyderabad, India
- School of Life Science, Central University of Haryana, Mahendergargh, India
| | - David A. Hume
- Mater Research Institute-University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| |
Collapse
|
22
|
Ahn B, Choi MK, Yum J, Cho IC, Kim JH, Park C. Analysis of allele-specific expression using RNA-seq of the Korean native pig and Landrace reciprocal cross. ASIAN-AUSTRALASIAN JOURNAL OF ANIMAL SCIENCES 2019; 32:1816-1825. [PMID: 31208168 PMCID: PMC6819674 DOI: 10.5713/ajas.19.0097] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Accepted: 05/25/2019] [Indexed: 11/27/2022]
Abstract
Objective We tried to analyze allele-specific expression in the pig neocortex using bioinformatic analysis of high-throughput sequencing results from the parental genomes and offspring transcriptomes from reciprocal crosses between Korean Native and Landrace pigs. Methods We carried out sequencing of parental genomes and offspring transcriptomes using next generation sequencing. We subsequently carried out genome scale identification of single nucleotide polymorphisms (SNPs) in two different ways using either individual genome mapping or joint genome mapping of the same breed parents that were used for the reciprocal crosses. Using parent-specific SNPs, allele-specifically expressed genes were analyzed. Results Because of the low genome coverage (~4×) of the sequencing results, most SNPs were non-informative for parental lineage determination of the expressed alleles in the offspring and were thus excluded from our analysis. Consequently, 436 SNPs covering 336 genes were applicable to measure the imbalanced expression of paternal alleles in the offspring. By calculating the read ratios of parental alleles in the offspring, we identified seven genes showing allele-biased expression (p<0.05) including three previously reported and four newly identified genes in this study. Conclusion The newly identified allele-specifically expressing genes in the neocortex of pigs should contribute to improving our knowledge on genomic imprinting in pigs. To our knowledge, this is the first study of allelic imbalance using high throughput analysis of both parental genomes and offspring transcriptomes of the reciprocal cross in outbred animals. Our study also showed the effect of the number of informative animals on the genome level investigation of allele-specific expression using RNA-seq analysis in livestock species.
Collapse
Affiliation(s)
- Byeongyong Ahn
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Korea
| | - Min-Kyeung Choi
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Korea
| | - Joori Yum
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Korea
| | - In-Cheol Cho
- Subtropical Livestock Research Institute, National Institute of Animal Science, Jeju 63242, Korea
| | - Jin-Hoi Kim
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Korea
| | - Chankyu Park
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Korea
| |
Collapse
|
23
|
Sandler G, Beaudry FEG, Barrett SCH, Wright SI. The effects of haploid selection on Y chromosome evolution in two closely related dioecious plants. Evol Lett 2018; 2:368-377. [PMID: 30283688 PMCID: PMC6121804 DOI: 10.1002/evl3.60] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Revised: 05/08/2018] [Accepted: 05/14/2018] [Indexed: 01/21/2023] Open
Abstract
The evolution of sex chromosomes is usually considered to be driven by sexually antagonistic selection in the diploid phase. However, selection during the haploid gametic phase of the lifecycle has recently received theoretical attention as possibly playing a central role in sex chromosome evolution, especially in plants where gene expression in the haploid phase is extensive. In particular, male‐specific haploid selection might favor the linkage of pollen beneficial alleles to male sex determining regions on incipient Y chromosomes. This linkage might then allow such alleles to further specialize for the haploid phase. Purifying haploid selection is also expected to slow the degeneration of Y‐linked genes expressed in the haploid phase. Here, we examine the evolution of gene expression in flower buds and pollen of two species of Rumex to test for signatures of haploid selection acting during plant sex chromosome evolution. We find that genes with high ancestral pollen expression bias occur more often on sex chromosomes than autosomes and that genes on the Y chromosome are more likely to become enriched for pollen expression bias. We also find that genes with low expression in pollen are more likely to be lost from the Y chromosome. Our results suggest that sex‐specific haploid selection during the gametophytic stage of the lifecycle may be a major contributor to several features of plant sex chromosome evolution.
Collapse
Affiliation(s)
- George Sandler
- Department of Ecology and Evolutionary Biology University of Toronto Toronto ON M5S 3B2 Canada
| | - Felix E G Beaudry
- Department of Ecology and Evolutionary Biology University of Toronto Toronto ON M5S 3B2 Canada
| | - Spencer C H Barrett
- Department of Ecology and Evolutionary Biology University of Toronto Toronto ON M5S 3B2 Canada
| | - Stephen I Wright
- Department of Ecology and Evolutionary Biology University of Toronto Toronto ON M5S 3B2 Canada
| |
Collapse
|
24
|
Wang M, Uebbing S, Pawitan Y, Scofield DG. RPASE: Individual-based allele-specific expression detection without prior knowledge of haplotype phase. Mol Ecol Resour 2018; 18:1247-1262. [PMID: 29858523 DOI: 10.1111/1755-0998.12909] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 05/09/2018] [Accepted: 05/21/2018] [Indexed: 01/04/2023]
Abstract
Variation in gene expression is believed to make a significant contribution to phenotypic diversity and divergence. The analysis of allele-specific expression (ASE) can reveal important insights into gene expression regulation. We developed a novel method called RPASE (Read-backed Phasing-based ASE detection) to test for genes that show ASE. With mapped RNA-seq data from a single individual and a list of SNPs from the same individual as the only input, RPASE is capable of aggregating information across multiple dependent SNPs and producing individual-based gene-level tests for ASE. RPASE performs well in simulations and comparisons. We applied RPASE to multiple bird species and found a potentially rich landscape of ASE.
Collapse
Affiliation(s)
- Mi Wang
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Severin Uebbing
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Douglas G Scofield
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| |
Collapse
|
25
|
Spurr L, Li M, Alomran N, Zhang Q, Restrepo P, Movassagh M, Trenkov C, Tunnessen N, Apanasovich T, Crandall KA, Edwards N, Horvath A. Systematic pan-cancer analysis of somatic allele frequency. Sci Rep 2018; 8:7735. [PMID: 29769535 PMCID: PMC5956099 DOI: 10.1038/s41598-018-25462-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Accepted: 04/11/2018] [Indexed: 12/31/2022] Open
Abstract
Imbalanced expression of somatic alleles in cancer can suggest functional and selective features, and can therefore indicate possible driving potential of the underlying genetic variants. To explore the correlation between allele frequency of somatic variants and total gene expression of their harboring gene, we used the unique data set of matched tumor and normal RNA and DNA sequencing data of 5523 distinct single nucleotide variants in 381 individuals across 10 cancer types obtained from The Cancer Genome Atlas (TCGA). We analyzed the allele frequency in the context of the variant and gene functional features and linked it with changes in the total gene expression. We documented higher allele frequency of somatic variants in cancer-implicated genes (Cancer Gene Census, CGC). Furthermore, somatic alleles bearing premature terminating variants (PTVs), when positioned in CGC genes, appeared to be less frequently degraded via nonsense-mediated mRNA decay, indicating possible favoring of truncated proteins by the tumor transcriptome. Among the genes with multiple PTVs with high allele frequency, ARID1, TP53 and NSD1 were known key cancer genes. All together, our analyses suggest that high allele frequency of tumor somatic variants can indicate driving functionality and can serve to identify potential cancer-implicated genes.
Collapse
Affiliation(s)
- Liam Spurr
- Department of Pharmacology and Physiology, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA.,McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Muzi Li
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA.,Department of Biochemistry and Molecular and Cellular Biology, Georgetown University, School of Medicine, Washington, DC, 20057, USA
| | - Nawaf Alomran
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA.,Department of Biochemistry and Molecular and Cellular Biology, Georgetown University, School of Medicine, Washington, DC, 20057, USA
| | - Qianqian Zhang
- Department of Pharmacology and Physiology, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA.,Department of Biochemistry and Molecular Medicine, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Paula Restrepo
- Department of Pharmacology and Physiology, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA.,McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Mercedeh Movassagh
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA.,University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, 01605, USA
| | - Chris Trenkov
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Nerissa Tunnessen
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Tatiyana Apanasovich
- Department of Statistics, The George Washington University, Washington, DC, 20037, USA
| | - Keith A Crandall
- Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Washington, DC, 20052, USA
| | - Nathan Edwards
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA.,Department of Biochemistry and Molecular and Cellular Biology, Georgetown University, School of Medicine, Washington, DC, 20057, USA
| | - Anelia Horvath
- Department of Pharmacology and Physiology, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA. .,McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA. .,Department of Biochemistry and Molecular Medicine, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA. .,Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Washington, DC, 20052, USA.
| |
Collapse
|
26
|
Direct Testing for Allele-Specific Expression Differences Between Conditions. G3-GENES GENOMES GENETICS 2018; 8:447-460. [PMID: 29167272 PMCID: PMC5919738 DOI: 10.1534/g3.117.300139] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Allelic imbalance (AI) indicates the presence of functional variation in cis regulatory regions. Detecting cis regulatory differences using AI is widespread, yet there is no formal statistical methodology that tests whether AI differs between conditions. Here, we present a novel model and formally test differences in AI across conditions using Bayesian credible intervals. The approach tests AI by environment (G×E) interactions, and can be used to test AI between environments, genotypes, sex, and any other condition. We incorporate bias into the modeling process. Bias is allowed to vary between conditions, making the formulation of the model general. As gene expression affects power for detection of AI, and, as expression may vary between conditions, the model explicitly takes coverage into account. The proposed model has low type I and II error under several scenarios, and is robust to large differences in coverage between conditions. We reanalyze RNA-seq data from a Drosophila melanogaster population panel, with F1 genotypes, to compare levels of AI between mated and virgin female flies, and we show that AI × genotype interactions can also be tested. To demonstrate the use of the model to test genetic differences and interactions, a formal test between two F1s was performed, showing the expected 20% difference in AI. The proposed model allows a formal test of G×E and G×G, and reaffirms a previous finding that cis regulation is robust between environments.
Collapse
|
27
|
Esteve-Codina A. RNA-Seq Data Analysis, Applications and Challenges. COMPREHENSIVE ANALYTICAL CHEMISTRY 2018. [DOI: 10.1016/bs.coac.2018.06.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
28
|
Wang M, Uebbing S, Ellegren H. Bayesian Inference of Allele-Specific Gene Expression Indicates Abundant Cis-Regulatory Variation in Natural Flycatcher Populations. Genome Biol Evol 2017; 9:1266-1279. [PMID: 28453623 PMCID: PMC5434935 DOI: 10.1093/gbe/evx080] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2017] [Indexed: 12/13/2022] Open
Abstract
Polymorphism in cis-regulatory sequences can lead to different levels of expression for the two alleles of a gene, providing a starting point for the evolution of gene expression. Little is known about the genome-wide abundance of genetic variation in gene regulation in natural populations but analysis of allele-specific expression (ASE) provides a means for investigating such variation. We performed RNA-seq of multiple tissues from population samples of two closely related flycatcher species and developed a Bayesian algorithm that maximizes data usage by borrowing information from the whole data set and combines several SNPs per transcript to detect ASE. Of 2,576 transcripts analyzed in collared flycatcher, ASE was detected in 185 (7.2%) and a similar frequency was seen in the pied flycatcher. Transcripts with statistically significant ASE commonly showed the major allele in >90% of the reads, reflecting that power was highest when expression was heavily biased toward one of the alleles. This would suggest that the observed frequencies of ASE likely are underestimates. The proportion of ASE transcripts varied among tissues, being lowest in testis and highest in muscle. Individuals often showed ASE of particular transcripts in more than one tissue (73.4%), consistent with a genetic basis for regulation of gene expression. The results suggest that genetic variation in regulatory sequences commonly affects gene expression in natural populations and that it provides a seedbed for phenotypic evolution via divergence in gene expression.
Collapse
Affiliation(s)
- Mi Wang
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Sweden
| | - Severin Uebbing
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Sweden
| | - Hans Ellegren
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Sweden
| |
Collapse
|
29
|
RNA-Seq Analyses Identify Frequent Allele Specific Expression and No Evidence of Genomic Imprinting in Specific Embryonic Tissues of Chicken. Sci Rep 2017; 7:11944. [PMID: 28931927 PMCID: PMC5607270 DOI: 10.1038/s41598-017-12179-9] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Accepted: 09/05/2017] [Indexed: 12/30/2022] Open
Abstract
Epigenetic and genetic cis-regulatory elements in diploid organisms may cause allele specific expression (ASE) – unequal expression of the two chromosomal gene copies. Genomic imprinting is an intriguing type of ASE in which some genes are expressed monoallelically from either the paternal allele or maternal allele as a result of epigenetic modifications. Imprinted genes have been identified in several animal species and are frequently associated with embryonic development and growth. Whether genomic imprinting exists in chickens remains debatable, as previous studies have reported conflicting evidence. Albeit no genomic imprinting has been reported in the chicken embryo as a whole, we interrogated the existence or absence of genomic imprinting in the 12-day-old chicken embryonic brain and liver by examining ASE in F1 reciprocal crosses of two highly inbred chicken lines (Fayoumi and Leghorn). We identified 5197 and 4638 ASE SNPs, corresponding to 18.3% and 17.3% of the genes with a detectable expression in the embryonic brain and liver, respectively. There was no evidence detected of genomic imprinting in 12-day-old embryonic brain and liver. While ruling out the possibility of imprinted Z-chromosome inactivation, our results indicated that Z-linked gene expression is partially compensated between sexes in chickens.
Collapse
|
30
|
Lonsdale Z, Lee K, Kiriakidu M, Amarasinghe H, Nathanael D, O’Connor CJ, Mallon EB. Allele specific expression and methylation in the bumblebee, Bombus terrestris. PeerJ 2017; 5:e3798. [PMID: 28929021 PMCID: PMC5600721 DOI: 10.7717/peerj.3798] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 08/21/2017] [Indexed: 12/29/2022] Open
Abstract
The social hymenoptera are emerging as models for epigenetics. DNA methylation, the addition of a methyl group, is a common epigenetic marker. In mammals and flowering plants methylation affects allele specific expression. There is contradictory evidence for the role of methylation on allele specific expression in social insects. The aim of this paper is to investigate allele specific expression and monoallelic methylation in the bumblebee, Bombus terrestris. We found nineteen genes that were both monoallelically methylated and monoallelically expressed in a single bee. Fourteen of these genes express the hypermethylated allele, while the other five express the hypomethylated allele. We also searched for allele specific expression in twenty-nine published RNA-seq libraries. We found 555 loci with allele-specific expression. We discuss our results with reference to the functional role of methylation in gene expression in insects and in the as yet unquantified role of genetic cis effects in insect allele specific methylation and expression.
Collapse
Affiliation(s)
- Zoë Lonsdale
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Kate Lee
- Bioinformatics and Biostatistics Support Hub (B/BASH), University of Leicester, Leicester, United Kingdom
| | - Maria Kiriakidu
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | - Harindra Amarasinghe
- Academic Unit of Cancer Sciences, University of Southampton, Southampton, United Kingdom
| | - Despina Nathanael
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | | | - Eamonn B. Mallon
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
31
|
Restrepo P, Movassagh M, Alomran N, Miller C, Li M, Trenkov C, Manchev Y, Bahl S, Warnken S, Spurr L, Apanasovich T, Crandall K, Edwards N, Horvath A. Overexpressed somatic alleles are enriched in functional elements in Breast Cancer. Sci Rep 2017; 7:8287. [PMID: 28811643 PMCID: PMC5557904 DOI: 10.1038/s41598-017-08416-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Accepted: 07/10/2017] [Indexed: 12/31/2022] Open
Abstract
Asymmetric allele content in the transcriptome can be indicative of functional and selective features of the underlying genetic variants. Yet, imbalanced alleles, especially from diploid genome regions, are poorly explored in cancer. Here we systematically quantify and integrate the variant allele fraction from corresponding RNA and DNA sequence data from patients with breast cancer acquired through The Cancer Genome Atlas (TCGA). We test for correlation between allele prevalence and functionality in known cancer-implicated genes from the Cancer Gene Census (CGC). We document significant allele-preferential expression of functional variants in CGC genes and across the entire dataset. Notably, we find frequent allele-specific overexpression of variants in tumor-suppressor genes. We also report a list of over-expressed variants from non-CGC genes. Overall, our analysis presents an integrated set of features of somatic allele expression and points to the vast information content of the asymmetric alleles in the cancer transcriptome.
Collapse
Affiliation(s)
- Paula Restrepo
- Department of Pharmacology and Physiology, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA.,McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Mercedeh Movassagh
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, 01605, USA
| | - Nawaf Alomran
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA.,Department of Biochemistry and Molecular and Cellular Biology, Georgetown University, School of Medicine, Washington, DC, 20057, USA
| | - Christian Miller
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Muzi Li
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA.,Department of Biochemistry and Molecular and Cellular Biology, Georgetown University, School of Medicine, Washington, DC, 20057, USA
| | - Chris Trenkov
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Yulian Manchev
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Sonali Bahl
- Department of Pharmacology and Physiology, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Stephanie Warnken
- Computational Biology Institute, The George Washington University, Washington, DC, 20037, USA
| | - Liam Spurr
- Department of Pharmacology and Physiology, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA.,McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Tatiyana Apanasovich
- Department of Statistics, The George Washington University, Washington, DC, 20037, USA
| | - Keith Crandall
- Computational Biology Institute, The George Washington University, Washington, DC, 20037, USA
| | - Nathan Edwards
- Department of Biochemistry and Molecular and Cellular Biology, Georgetown University, School of Medicine, Washington, DC, 20057, USA
| | - Anelia Horvath
- Department of Pharmacology and Physiology, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA. .,McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA. .,Department of Statistics, The George Washington University, Washington, DC, 20037, USA. .,Department of Biochemistry and Molecular Medicine, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA.
| |
Collapse
|