1
|
The genomes of 204 Vitis vinifera accessions reveal the origin of European wine grapes. Nat Commun 2021; 12:7240. [PMID: 34934047 PMCID: PMC8692429 DOI: 10.1038/s41467-021-27487-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 11/18/2021] [Indexed: 01/29/2023] Open
Abstract
In order to elucidate the still controversial processes that originated European wine grapes from its wild progenitor, here we analyse 204 genomes of Vitis vinifera and show that all analyses support a single domestication event that occurred in Western Asia and was followed by numerous and pervasive introgressions from European wild populations. This admixture generated the so-called international wine grapes that have diffused from Alpine countries worldwide. Across Europe, marked differences in genomic diversity are observed in local varieties that are traditionally cultivated in different wine producing countries, with Italy and France showing the largest diversity. Three genomic regions of reduced genetic diversity are observed, presumably as a consequence of artificial selection. In the lowest diversity region, two candidate genes that gained berry–specific expression in domesticated varieties may contribute to the change in berry size and morphology that makes the fruit attractive for human consumption and adapted for winemaking. Reports on the origin of European wine grapes are controversial. Here, the authors perform population genetics analyses on a large set of representative wine-making varieties and reveal a single domestication event at the origin of the entire germplasm followed by repeated introgression from wild populations.
Collapse
|
2
|
Valentini S, Marchioretti C, Bisio A, Rossi A, Zaccara S, Romanel A, Inga A. TranSNPs: A class of functional SNPs affecting mRNA translation potential revealed by fraction-based allelic imbalance. iScience 2021; 24:103531. [PMID: 34917903 PMCID: PMC8666669 DOI: 10.1016/j.isci.2021.103531] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 10/27/2021] [Accepted: 11/23/2021] [Indexed: 12/23/2022] Open
Abstract
Few studies have explored the association between SNPs and alterations in mRNA translation potential. We developed an approach to identify SNPs that can mark allele-specific protein expression levels and could represent sources of inter-individual variation in disease risk. Using MCF7 cells under different treatments, we performed polysomal profiling followed by RNA sequencing of total or polysome-associated mRNA fractions and designed a computational approach to identify SNPs showing a significant change in the allelic balance between total and polysomal mRNA fractions. We identified 147 SNPs, 39 of which located in UTRs. Allele-specific differences at the translation level were confirmed in transfected MCF7 cells by reporter assays. Exploiting breast cancer data from TCGA we identified UTR SNPs demonstrating distinct prognosis features and altering binding sites of RNA-binding proteins. Our approach produced a catalog of tranSNPs, a class of functional SNPs associated with allele-specific translation and potentially endowed with prognostic value for disease risk.
Collapse
Affiliation(s)
- Samuel Valentini
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
| | - Caterina Marchioretti
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
- Department of Biomedical Sciences (DBS), University of Padova, 35131 Padova, Italy
| | - Alessandra Bisio
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
| | - Annalisa Rossi
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
| | - Sara Zaccara
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
- Weill Medical College, Cornell University, New York 10065, NY, USA
| | - Alessandro Romanel
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
| | - Alberto Inga
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, 38123 Trento, Italy
| |
Collapse
|
3
|
Sherbina K, León-Novelo LG, Nuzhdin SV, McIntyre LM, Marroni F. Power calculator for detecting allelic imbalance using hierarchical Bayesian model. BMC Res Notes 2021; 14:436. [PMID: 34838135 PMCID: PMC8626927 DOI: 10.1186/s13104-021-05851-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 11/15/2021] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Allelic imbalance (AI) is the differential expression of the two alleles in a diploid. AI can vary between tissues, treatments, and environments. Methods for testing AI exist, but methods are needed to estimate type I error and power for detecting AI and difference of AI between conditions. As the costs of the technology plummet, what is more important: reads or replicates? RESULTS We find that a minimum of 2400, 480, and 240 allele specific reads divided equally among 12, 5, and 3 replicates is needed to detect a 10, 20, and 30%, respectively, deviation from allelic balance in a condition with power > 80%. A minimum of 960 and 240 allele specific reads divided equally among 8 replicates is needed to detect a 20 or 30% difference in AI between conditions with comparable power. Higher numbers of replicates increase power more than adding coverage without affecting type I error. We provide a Python package that enables simulation of AI scenarios and enables individuals to estimate type I error and power in detecting AI and differences in AI between conditions.
Collapse
Affiliation(s)
- Katrina Sherbina
- Quantitative and Computational Biology Section, University of Southern California, Los Angeles, CA, 90046, USA
| | - Luis G León-Novelo
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston-School of Public Health, Houston, TX, 77030, USA
| | - Sergey V Nuzhdin
- Molecular and Computational Biology Section, University of Southern California, Los Angeles, CA, 90046, USA
| | - Lauren M McIntyre
- Genetics Institute and Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32603, USA
| | - Fabio Marroni
- Dipartimento di Scienze Agroalimentari, Ambientali e Animali, Università di Udine, 33100, Udine, Italy.
| |
Collapse
|
4
|
Schwope R, Magris G, Miculan M, Paparelli E, Celii M, Tocci A, Marroni F, Fornasiero A, De Paoli E, Morgante M. Open chromatin in grapevine marks candidate CREs and with other chromatin features correlates with gene expression. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 107:1631-1647. [PMID: 34219317 PMCID: PMC8518642 DOI: 10.1111/tpj.15404] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 06/24/2021] [Accepted: 06/25/2021] [Indexed: 05/14/2023]
Abstract
Vitis vinifera is an economically important crop and a useful model in which to study chromatin dynamics. In contrast to the small and relatively simple genome of Arabidopsis thaliana, grapevine contains a complex genome of 487 Mb that exhibits extensive colonization by transposable elements. We used Hi-C, ChIP-seq and ATAC-seq to measure how chromatin features correlate to the expression of 31 845 grapevine genes. ATAC-seq revealed the presence of more than 16 000 open chromatin regions, of which we characterize nearly 5000 as possible distal enhancer candidates that occur in intergenic space > 2 kb from the nearest transcription start site (TSS). A motif search identified more than 480 transcription factor (TF) binding sites in these regions, with those for TCP family proteins in greatest abundance. These open chromatin regions are typically within 15 kb from their nearest promoter, and a gene ontology analysis indicated that their nearest genes are significantly enriched for TF activity. The presence of a candidate cis-regulatory element (cCRE) > 2 kb upstream of the TSS, location in the active nuclear compartment as determined by Hi-C, and the enrichment of H3K4me3, H3K4me1 and H3K27ac at the gene are correlated with gene expression. Taken together, these results suggest that regions of intergenic open chromatin identified by ATAC-seq can be considered potential candidates for cis-regulatory regions in V. vinifera. Our findings enhance the characterization of a valuable agricultural crop, and help to clarify the understanding of unique plant biology.
Collapse
Affiliation(s)
- Rachel Schwope
- Dipartimento di Scienze AgroalimentariAmbientali e Animali (DI4A)UdineI‐33100Italy
- Istituto di Genomica ApplicataUdineI‐33100Italy
| | - Gabriele Magris
- Dipartimento di Scienze AgroalimentariAmbientali e Animali (DI4A)UdineI‐33100Italy
- Istituto di Genomica ApplicataUdineI‐33100Italy
| | - Mara Miculan
- Dipartimento di Scienze AgroalimentariAmbientali e Animali (DI4A)UdineI‐33100Italy
- Istituto di Genomica ApplicataUdineI‐33100Italy
- Present address:
Institute of Life SciencesScuola Superiore Sant'Anna PisaPisa56127Italy
| | - Eleonora Paparelli
- Dipartimento di Scienze AgroalimentariAmbientali e Animali (DI4A)UdineI‐33100Italy
- Istituto di Genomica ApplicataUdineI‐33100Italy
- Present address:
IGA Technology ServicesUdineI‐33100Italy
| | - Mirko Celii
- Dipartimento di Scienze AgroalimentariAmbientali e Animali (DI4A)UdineI‐33100Italy
- Istituto di Genomica ApplicataUdineI‐33100Italy
- Present address:
Center for Desert Agriculture, Biological and Environmental Sciences & Engineering Division (BESE)KAUSTThuwalMakkahSaudi Arabia
| | - Aldo Tocci
- Dipartimento di Scienze AgroalimentariAmbientali e Animali (DI4A)UdineI‐33100Italy
- Istituto di Genomica ApplicataUdineI‐33100Italy
- Scuola Internazionale Superiore di Studi AvanzatiTriesteFriuli‐Venezia GiuliaItaly
| | - Fabio Marroni
- Dipartimento di Scienze AgroalimentariAmbientali e Animali (DI4A)UdineI‐33100Italy
- Istituto di Genomica ApplicataUdineI‐33100Italy
| | - Alice Fornasiero
- Dipartimento di Scienze AgroalimentariAmbientali e Animali (DI4A)UdineI‐33100Italy
- Istituto di Genomica ApplicataUdineI‐33100Italy
- Present address:
Center for Desert Agriculture, Biological and Environmental Sciences & Engineering Division (BESE)KAUSTThuwalMakkahSaudi Arabia
| | - Emanuele De Paoli
- Dipartimento di Scienze AgroalimentariAmbientali e Animali (DI4A)UdineI‐33100Italy
| | - Michele Morgante
- Dipartimento di Scienze AgroalimentariAmbientali e Animali (DI4A)UdineI‐33100Italy
- Istituto di Genomica ApplicataUdineI‐33100Italy
| |
Collapse
|
5
|
Abstract
Diploidy has profound implications for population genetics and susceptibility to genetic diseases. Although two copies are present for most genes in the human genome, they are not necessarily both active or active at the same level in a given individual. Genomic imprinting, resulting in exclusive or biased expression in favor of the allele of paternal or maternal origin, is now believed to affect hundreds of human genes. A far greater number of genes display unequal expression of gene copies due to cis-acting genetic variants that perturb gene expression. The availability of data generated by RNA sequencing applied to large numbers of individuals and tissue types has generated unprecedented opportunities to assess the contribution of genetic variation to allelic imbalance in gene expression. Here we review the insights gained through the analysis of these data about the extent of the genetic contribution to allelic expression imbalance, the tools and statistical models for gene expression imbalance, and what the results obtained reveal about the contribution of genetic variants that alter gene expression to complex human diseases and phenotypes.
Collapse
Affiliation(s)
- Siobhan Cleary
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway H91 H3CY, Ireland;
| | - Cathal Seoighe
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway H91 H3CY, Ireland;
| |
Collapse
|
6
|
Groza C, Kwan T, Soranzo N, Pastinen T, Bourque G. Personalized and graph genomes reveal missing signal in epigenomic data. Genome Biol 2020; 21:124. [PMID: 32450900 PMCID: PMC7249353 DOI: 10.1186/s13059-020-02038-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 05/08/2020] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesize that using a generic reference could lead to incorrectly mapped reads and bias downstream results. RESULTS We show that accounting for genetic variation using a modified reference genome or a de novo assembled genome can alter histone H3K4me1 and H3K27ac ChIP-seq peak calls either by creating new personal peaks or by the loss of reference peaks. Using permissive cutoffs, modified reference genomes are found to alter approximately 1% of peak calls while de novo assembled genomes alter up to 5% of peaks. We also show statistically significant differences in the amount of reads observed in regions associated with the new, altered, and unchanged peaks. We report that short insertions and deletions (indels), followed by single nucleotide variants (SNVs), have the highest probability of modifying peak calls. We show that using a graph personalized genome represents a reasonable compromise between modified reference genomes and de novo assembled genomes. We demonstrate that altered peaks have a genomic distribution typical of other peaks. CONCLUSIONS Analyzing epigenomic datasets with personalized and graph genomes allows the recovery of new peaks enriched for indels and SNVs. These altered peaks are more likely to differ between individuals and, as such, could be relevant in the study of various human phenotypes.
Collapse
Affiliation(s)
| | - Tony Kwan
- Human Genetics, McGill University, Montreal, QC, Canada
- McGill University and Genome Quebec Innovation Centre, McGill University, Montreal, QC, Canada
| | - Nicole Soranzo
- Department of Human Genetics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambdridge, UK
- British Heart Foundation Centre of Excellence, Division of Cardiovascular Medicine, Addenbrooke's Hospital, Hills Road, Cambdridge, UK
- The National Institute for Health Research Blood and Transplant Unit (NIHR BTRU) in Donor Health and Genomics, University of Cambridge, Strangeways Research Laboratory, Wort's Causeway, Cambdridge, UK
| | - Tomi Pastinen
- Human Genetics, McGill University, Montreal, QC, Canada
- McGill University and Genome Quebec Innovation Centre, McGill University, Montreal, QC, Canada
- Center for Pediatric Genomic Medicine, Kansas City, MO, USA
| | - Guillaume Bourque
- Human Genetics, McGill University, Montreal, QC, Canada.
- McGill University and Genome Quebec Innovation Centre, McGill University, Montreal, QC, Canada.
- Canadian Centre for Computational Genomics, Montreal, QC, Canada.
- Institute for the Advanced Study of Human Biology, Kyoto University, Kyoto, Japan.
| |
Collapse
|
7
|
Xie J, Ji T, Ferreira MAR, Li Y, Patel BN, Rivera RM. Modeling allele-specific expression at the gene and SNP levels simultaneously by a Bayesian logistic mixed regression model. BMC Bioinformatics 2019; 20:530. [PMID: 31660858 PMCID: PMC6819473 DOI: 10.1186/s12859-019-3141-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 10/09/2019] [Indexed: 12/29/2022] Open
Abstract
Background High-throughput sequencing experiments, which can determine allele origins, have been used to assess genome-wide allele-specific expression. Despite the amount of data generated from high-throughput experiments, statistical methods are often too simplistic to understand the complexity of gene expression. Specifically, existing methods do not test allele-specific expression (ASE) of a gene as a whole and variation in ASE within a gene across exons separately and simultaneously. Results We propose a generalized linear mixed model to close these gaps, incorporating variations due to genes, single nucleotide polymorphisms (SNPs), and biological replicates. To improve reliability of statistical inferences, we assign priors on each effect in the model so that information is shared across genes in the entire genome. We utilize Bayesian model selection to test the hypothesis of ASE for each gene and variations across SNPs within a gene. We apply our method to four tissue types in a bovine study to de novo detect ASE genes in the bovine genome, and uncover intriguing predictions of regulatory ASEs across gene exons and across tissue types. We compared our method to competing approaches through simulation studies that mimicked the real datasets. The R package, BLMRM, that implements our proposed algorithm, is publicly available for download at https://github.com/JingXieMIZZOU/BLMRM. Conclusions We will show that the proposed method exhibits improved control of the false discovery rate and improved power over existing methods when SNP variation and biological variation are present. Besides, our method also maintains low computational requirements that allows for whole genome analysis.
Collapse
|
8
|
Lee W, Plant K, Humburg P, Knight JC. AltHapAlignR: improved accuracy of RNA-seq analyses through the use of alternative haplotypes. Bioinformatics 2019. [PMID: 29514179 PMCID: PMC6041798 DOI: 10.1093/bioinformatics/bty125] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Reliance on mapping to a single reference haplotype currently limits accurate estimation of allele or haplotype-specific expression using RNA-sequencing, notably in highly polymorphic regions such as the major histocompatibility complex. Results We present AltHapAlignR, a method incorporating alternate reference haplotypes to generate gene- and haplotype-level estimates of transcript abundance for any genomic region where such information is available. We validate using simulated and experimental data to quantify input allelic ratios for major histocompatibility complex haplotypes, demonstrating significantly improved correlation with ground truth estimates of gene counts compared to standard single reference mapping. We apply AltHapAlignR to RNA-seq data from 462 individuals, showing how significant underestimation of expression of the majority of classical human leukocyte antigen genes using conventional mapping can be corrected using AltHapAlignR to allow more accurate quantification of gene expression for individual alleles and haplotypes. Availability and implementation Source code freely available at https://github.com/jknightlab/AltHapAlignR. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wanseon Lee
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Katharine Plant
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Peter Humburg
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Julian C Knight
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| |
Collapse
|
9
|
Abstract
The use of the human reference genome has shaped methods and data across modern genomics. This has offered many benefits while creating a few constraints. In the following opinion, we outline the history, properties, and pitfalls of the current human reference genome. In a few illustrative analyses, we focus on its use for variant-calling, highlighting its nearness to a 'type specimen'. We suggest that switching to a consensus reference would offer important advantages over the continued use of the current reference with few disadvantages.
Collapse
Affiliation(s)
- Sara Ballouz
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA
| | - Alexander Dobin
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA
| | - Jesse A Gillis
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
10
|
Abstract
Allele-specific expression arises when transcriptional activity at the different alleles of a gene differs considerably. Although extensive research has been carried out to detect and characterize this phenomenon, the landscape of allele-specific expression in cancer is still poorly understood. In this chapter, we describe a fast and reliable analysis pipeline to study allele-specific expression in cancer using next-generation sequencing data. The pipeline provides a gene-level analysis approach that exploits paired germline DNA and tumor RNA sequencing data and benefits from parallel computation resources when available.
Collapse
Affiliation(s)
- Alessandro Romanel
- Centre for Integrative Biology (CIBIO), University of Trento, Trento, Italy.
| |
Collapse
|
11
|
Ghazanfar S, Vuocolo T, Morrison JL, Nicholas LM, McMillen IC, Yang JYH, Buckley MJ, Tellam RL. Gene expression allelic imbalance in ovine brown adipose tissue impacts energy homeostasis. PLoS One 2017; 12:e0180378. [PMID: 28665992 PMCID: PMC5493397 DOI: 10.1371/journal.pone.0180378] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 06/14/2017] [Indexed: 12/22/2022] Open
Abstract
Heritable trait variation within a population of organisms is largely governed by DNA variations that impact gene transcription and protein function. Identifying genetic variants that affect complex functional traits is a primary aim of population genetics studies, especially in the context of human disease and agricultural production traits. The identification of alleles directly altering mRNA expression and thereby biological function is challenging due to difficulty in isolating direct effects of cis-acting genetic variations from indirect trans-acting genetic effects. Allele specific gene expression or allelic imbalance in gene expression (AI) occurring at heterozygous loci provides an opportunity to identify genes directly impacted by cis-acting genetic variants as indirect trans-acting effects equally impact the expression of both alleles. However, the identification of genes showing AI in the context of the expression of all genes remains a challenge due to a variety of technical and statistical issues. The current study focuses on the discovery of genes showing AI using single nucleotide polymorphisms as allelic reporters. By developing a computational and statistical process that addressed multiple analytical challenges, we ranked 5,809 genes for evidence of AI using RNA-Seq data derived from brown adipose tissue samples from a cohort of late gestation fetal lambs and then identified a conservative subgroup of 1,293 genes. Thus, AI was extensive, representing approximately 25% of the tested genes. Genes associated with AI were enriched for multiple Gene Ontology (GO) terms relating to lipid metabolism, mitochondrial function and the extracellular matrix. These functions suggest that cis-acting genetic variations causing AI in the population are preferentially impacting genes involved in energy homeostasis and tissue remodelling. These functions may contribute to production traits likely to be under genetic selection in the population.
Collapse
Affiliation(s)
- Shila Ghazanfar
- Data61, CSIRO, North Ryde, NSW, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- * E-mail: (SG); (RLT)
| | - Tony Vuocolo
- CSIRO Agriculture, Queensland Biosciences Precinct, St Lucia, QLD, Australia
| | - Janna L. Morrison
- Early Origins of Adult Health Research Group, School of Pharmacy and Medical Sciences, Sansom Institute for Health Research, The University of South Australia, Adelaide, SA, Australia
| | - Lisa M. Nicholas
- Early Origins of Adult Health Research Group, School of Pharmacy and Medical Sciences, Sansom Institute for Health Research, The University of South Australia, Adelaide, SA, Australia
| | - Isabella C. McMillen
- Early Origins of Adult Health Research Group, School of Pharmacy and Medical Sciences, Sansom Institute for Health Research, The University of South Australia, Adelaide, SA, Australia
| | - Jean Y. H. Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
| | | | - Ross L. Tellam
- CSIRO Agriculture, Queensland Biosciences Precinct, St Lucia, QLD, Australia
- * E-mail: (SG); (RLT)
| |
Collapse
|
12
|
Deonovic B, Wang Y, Weirather J, Wang XJ, Au KF. IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing. Nucleic Acids Res 2017; 45:e32. [PMID: 27899656 PMCID: PMC5952581 DOI: 10.1093/nar/gkw1076] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Revised: 10/20/2016] [Accepted: 10/26/2016] [Indexed: 12/14/2022] Open
Abstract
Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only.
Collapse
Affiliation(s)
- Benjamin Deonovic
- Department of Biostatistics, University of Iowa, Iowa City, IA 52242, USA
| | - Yunhao Wang
- Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA
- Key laboratory of Genetics Network Biology, Collaborative Innovation Center of Genetics and Development, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jason Weirather
- Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA
| | - Xiu-Jie Wang
- Key laboratory of Genetics Network Biology, Collaborative Innovation Center of Genetics and Development, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Kin Fai Au
- Department of Biostatistics, University of Iowa, Iowa City, IA 52242, USA
- Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA
| |
Collapse
|
13
|
Liu Z, Gui T, Wang Z, Li H, Fu Y, Dong X, Li Y. cisASE: a likelihood-based method for detecting putative cis-regulated allele-specific expression in RNA sequencing data. Bioinformatics 2016; 32:3291-3297. [PMID: 27412088 DOI: 10.1093/bioinformatics/btw416] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 06/24/2016] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Allele-specific expression (ASE) is a useful way to identify cis-acting regulatory variation, which provides opportunities to develop new therapeutic strategies that activate beneficial alleles or silence mutated alleles at specific loci. However, multiple problems hinder the identification of ASE in next-generation sequencing (NGS) data. RESULTS We developed cisASE, a likelihood-based method for detecting ASE on single nucleotide variant (SNV), exon and gene levels from sequencing data without requiring phasing or parental information. cisASE uses matched DNA-seq data to control technical bias and copy number variation (CNV) in putative cis-regulated ASE identification. Compared with state-of-the-art methods, cisASE exhibits significantly increased accuracy and speed. cisASE works moderately well for datasets without DNA-seq and thus is widely applicable. By applying cisASE to real datasets, we identified specific ASE characteristics in normal and cancer tissues, thus indicating that cisASE has potential for wide applications in cancer genomics. AVAILABILITY AND IMPLEMENTATION cisASE is freely available at http://lifecenter.sgst.cn/cisASE CONTACT: biosinodx@gmail.com or yxli@sibs.ac.cnSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhi Liu
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Tuantuan Gui
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Zhen Wang
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Hong Li
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yunhe Fu
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Xiao Dong
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Yixue Li
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China School of Life Science and Technology, Shanghai Jiaotong University, Shanghai 200240, China Shanghai Center for Bioinformation Technology, Shanghai Industrial Technology Institute, Shanghai 201203, China and Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai 200438, China
| |
Collapse
|
14
|
Oh S. How are Bayesian and Non-Parametric Methods Doing a Great Job in RNA-Seq Differential Expression Analysis? : A Review. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2015. [DOI: 10.5351/csam.2015.22.2.181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Sunghee Oh
- Department of Veterinary Medicine, Jeju National University, Korea
| |
Collapse
|
15
|
Romanel A, Lago S, Prandi D, Sboner A, Demichelis F. ASEQ: fast allele-specific studies from next-generation sequencing data. BMC Med Genomics 2015; 8:9. [PMID: 25889339 PMCID: PMC4363342 DOI: 10.1186/s12920-015-0084-2] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Accepted: 02/12/2015] [Indexed: 11/17/2022] Open
Abstract
Background Single base level information from next-generation sequencing (NGS) allows for the quantitative assessment of biological phenomena such as mosaicism or allele-specific features in healthy and diseased cells. Such studies often present with computationally challenging burdens that hinder genome-wide investigations across large datasets that are now becoming available through the 1,000 Genomes Project and The Cancer Genome Atlas (TCGA) initiatives. Results We present ASEQ, a tool to perform gene-level allele-specific expression (ASE) analysis from paired genomic and transcriptomic NGS data without requiring paternal and maternal genome data. ASEQ offers an easy-to-use set of modes that transparently to the user takes full advantage of a built-in fast computational engine. We report its performances on a set of 20 individuals from the 1,000 Genomes Project and show its detection power on imprinted genes. Next we demonstrate high level of ASE calls concordance when comparing it to AlleleSeq and MBASED tools. Finally, using a prostate cancer dataset we report on a higher fraction of ASE genes with respect to healthy individuals and show allele-specific events nominated by ASEQ in genes that are implicated in the disease. Conclusions ASEQ can be used to rapidly and reliably screen large NGS datasets for the identification of allele specific features. It can be integrated in any NGS pipeline and runs on computer systems with multiple CPUs, CPUs with multiple cores or across clusters of machines. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0084-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alessandro Romanel
- Centre for Integrative Biology (CIBIO), University of Trento, Trento, Italy.
| | - Sara Lago
- Centre for Integrative Biology (CIBIO), University of Trento, Trento, Italy.
| | - Davide Prandi
- Centre for Integrative Biology (CIBIO), University of Trento, Trento, Italy.
| | - Andrea Sboner
- Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, USA. .,Institute for Computational Biomedicine, Weill Cornell Medical College, New York, USA. .,Institute for Precision Medicine, Weill Cornell Medical College & New York Presbyterian Hospital, New York, USA.
| | - Francesca Demichelis
- Centre for Integrative Biology (CIBIO), University of Trento, Trento, Italy. .,Institute for Computational Biomedicine, Weill Cornell Medical College, New York, USA. .,Institute for Precision Medicine, Weill Cornell Medical College & New York Presbyterian Hospital, New York, USA.
| |
Collapse
|
16
|
Chen J, Nolte V, Schlötterer C. Temperature stress mediates decanalization and dominance of gene expression in Drosophila melanogaster. PLoS Genet 2015; 11:e1004883. [PMID: 25719753 PMCID: PMC4342254 DOI: 10.1371/journal.pgen.1004883] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Accepted: 11/10/2014] [Indexed: 11/18/2022] Open
Abstract
The regulatory architecture of gene expression remains an area of active research. Here, we studied how the interplay of genetic and environmental variation affects gene expression by exposing Drosophila melanogaster strains to four different developmental temperatures. At 18°C we observed almost complete canalization with only very few allelic effects on gene expression. In contrast, at the two temperature extremes, 13°C and 29°C a large number of allelic differences in gene expression were detected due to both cis- and trans-regulatory effects. Allelic differences in gene expression were mainly dominant, but for up to 62% of the genes the dominance swapped between 13 and 29°C. Our results are consistent with stabilizing selection causing buffering of allelic expression variation in non-stressful environments. We propose that decanalization of gene expression in stressful environments is not only central to adaptation, but may also contribute to genetic disorders in human populations.
Collapse
Affiliation(s)
- Jun Chen
- Institut für Populationsgenetik, Vienna, Austria
| | - Viola Nolte
- Institut für Populationsgenetik, Vienna, Austria
| | | |
Collapse
|
17
|
A hidden Markov approach for ascertaining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium. BMC Bioinformatics 2015; 16:61. [PMID: 25887316 PMCID: PMC4351697 DOI: 10.1186/s12859-015-0479-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 01/27/2015] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Allelic specific expression (ASE) increases our understanding of the genetic control of gene expression and its links to phenotypic variation. ASE testing is implemented through binomial or beta-binomial tests of sequence read counts of alternative alleles at a cSNP of interest in heterozygous individuals. This requires prior ascertainment of the cSNP genotypes for all individuals. To meet the needs, we propose hidden Markov methods to call SNPs from next generation RNA sequence data when ASE possibly exists. RESULTS We propose two hidden Markov models (HMMs), HMM-ASE and HMM-NASE that consider or do not consider ASE, respectively, in order to improve genotyping accuracy. Both HMMs have the advantages of calling the genotypes of several SNPs simultaneously and allow mapping error which, respectively, utilize the dependence among SNPs and correct the bias due to mapping error. In addition, HMM-ASE exploits ASE information to further improve genotype accuracy when the ASE is likely to be present. Simulation results indicate that the HMMs proposed demonstrate a very good prediction accuracy in terms of controlling both the false discovery rate (FDR) and the false negative rate (FNR). When ASE is present, the HMM-ASE had a lower FNR than HMM-NASE, while both can control the false discovery rate (FDR) at a similar level. By exploiting linkage disequilibrium (LD), a real data application demonstrate that the proposed methods have better sensitivity and similar FDR in calling heterozygous SNPs than the VarScan method. Sensitivity and FDR are similar to that of the BCFtools and Beagle methods. The resulting genotypes show good properties for the estimation of the genetic parameters and ASE ratios. CONCLUSIONS We introduce HMMs, which are able to exploit LD and account for the ASE and mapping errors, to simultaneously call SNPs from the next generation RNA sequence data. The method introduced can reliably call for cSNP genotypes even in the presence of ASE and under low sequencing coverage. As a byproduct, the proposed method is able to provide predictions of ASE ratios for the heterozygous genotypes, which can then be used for ASE testing.
Collapse
|
18
|
Narum SR, Campbell NR. Transcriptomic response to heat stress among ecologically divergent populations of redband trout. BMC Genomics 2015; 16:103. [PMID: 25765850 PMCID: PMC4337095 DOI: 10.1186/s12864-015-1246-5] [Citation(s) in RCA: 72] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Accepted: 01/15/2015] [Indexed: 12/12/2022] Open
Abstract
Background As ectothermic organisms have evolved to differing aquatic climates, the molecular basis of thermal adaptation is a key area of research. In this study, we tested for differential transcriptional response of ecologically divergent populations of redband trout (Oncorhynchus mykiss gairdneri) that have evolved in desert and montane climates. Each pure strain and their F1 cross were reared in a common garden environment and exposed over four weeks to diel water temperatures that were similar to those experienced in desert climates within the species’ range. Gill tissues were collected from the three strains of fish (desert, montane, F1 crosses) at the peak of heat stress and tested for mRNA expression differences across the transcriptome with RNA-seq. Results Strong differences in transcriptomic response to heat stress were observed across strains confirming that fish from desert environments have evolved diverse mechanisms to cope with stressful environments. As expected, a large number of total transcripts (12,814) were differentially expressed in the study (FDR ≤ 0.05) with 2310 transcripts in common for all three strains, but the desert strain had a larger number of unique differentially expressed transcripts (2875) than the montane (1982) or the F1 (2355) strain. Strongly differentiated genes (>4 fold change and FDR ≤ 0.05) were particularly abundant in the desert strain (824 unique contigs) relative to the other two strains (montane = 58; F1 = 192). Conclusions This study demonstrated patterns of acclimation (i.e., phenotypic plasticity) within strains and evolutionary adaptation among strains in numerous genes throughout the transcriptome. Key stress response genes such as molecular chaperones (i.e., heat shock proteins) had adaptive patterns of gene expression among strains, but also a much higher number of metabolic and cellular process genes were differentially expressed in the desert strain demonstrating these biological pathways are critical for thermal adaptation to warm aquatic climates. The results of this study further elucidate the molecular basis for thermal adaptation in aquatic ecosystems and extend the potential for identifying genes that may be critical for adaptation to changing climates. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1246-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shawn R Narum
- Columbia River Inter-Tribal Fish Commission, 3059-F National Fish Hatchery Road, Hagerman, ID, 83332, USA.
| | - Nathan R Campbell
- Columbia River Inter-Tribal Fish Commission, 3059-F National Fish Hatchery Road, Hagerman, ID, 83332, USA.
| |
Collapse
|
19
|
Soderlund CA, Nelson WM, Goff SA. Allele Workbench: transcriptome pipeline and interactive graphics for allele-specific expression. PLoS One 2014; 9:e115740. [PMID: 25541944 PMCID: PMC4277417 DOI: 10.1371/journal.pone.0115740] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 10/19/2014] [Indexed: 12/30/2022] Open
Abstract
Sequencing the transcriptome can answer various questions such as determining the transcripts expressed in a given species for a specific tissue or condition, evaluating differential expression, discovering variants, and evaluating allele-specific expression. Differential expression evaluates the expression differences between different strains, tissues, and conditions. Allele-specific expression evaluates expression differences between parental alleles. Both differential expression and allele-specific expression have been studied for heterosis (hybrid vigor), where the hybrid has improved performance over the parents for one or more traits. The Allele Workbench software was developed for a heterosis study that evaluated allele-specific expression for a mouse F1 hybrid using libraries from multiple tissues with biological replicates. This software has been made into a distributable package, which includes a pipeline, a Java interface to build the database, and a Java interface for query and display of the results. The required input is a reference genome, annotation file, and one or more RNA-Seq libraries with optional replicates. It evaluates allelic imbalance at the SNP and transcript level and flags transcripts with significant opposite directional allele-specific expression. The Java interface allows the user to view data from libraries, replicates, genes, transcripts, exons, and variants, including queries on allele imbalance for selected libraries. To determine the impact of allele-specific SNPs on protein folding, variants are annotated with their effect (e.g., missense), and the parental protein sequences may be exported for protein folding analysis. The Allele Workbench processing results in transcript files and read counts that can be used as input to the previously published Transcriptome Computational Workbench, which has a new algorithm for determining a trimmed set of gene ontology terms. The software with demo files is available from https://code.google.com/p/allele-workbench. Additionally, all software is ready for immediate use from an Atmosphere Virtual Machine Image available from the iPlant Collaborative (www.iplantcollaborative.org).
Collapse
Affiliation(s)
- Carol A. Soderlund
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
- * E-mail:
| | - William M. Nelson
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Stephen A. Goff
- iPlant Collaborative, University of Arizona, Tucson, Arizona, United States of America
| |
Collapse
|
20
|
Clément-Ziza M, Marsellach FX, Codlin S, Papadakis MA, Reinhardt S, Rodríguez-López M, Martin S, Marguerat S, Schmidt A, Lee E, Workman CT, Bähler J, Beyer A. Natural genetic variation impacts expression levels of coding, non-coding, and antisense transcripts in fission yeast. Mol Syst Biol 2014; 10:764. [PMID: 25432776 PMCID: PMC4299605 DOI: 10.15252/msb.20145123] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Our current understanding of how natural genetic variation affects gene expression beyond
well-annotated coding genes is still limited. The use of deep sequencing technologies for the study
of expression quantitative trait loci (eQTLs) has the potential to close this gap. Here, we
generated the first recombinant strain library for fission yeast and conducted an RNA-seq-based QTL
study of the coding, non-coding, and antisense transcriptomes. We show that the frequency of distal
effects (trans-eQTLs) greatly exceeds the number of local effects
(cis-eQTLs) and that non-coding RNAs are as likely to be affected by eQTLs as
protein-coding RNAs. We identified a genetic variation of swc5 that modifies the
levels of 871 RNAs, with effects on both sense and antisense transcription, and show that this
effect most likely goes through a compromised deposition of the histone variant H2A.Z. The strains,
methods, and datasets generated here provide a rich resource for future studies.
Collapse
Affiliation(s)
- Mathieu Clément-Ziza
- Biotechnology Centre, Technische Universität Dresden, Dresden, Germany Cologne Cluster of Excellence in Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Francesc X Marsellach
- Department of Genetics, Evolution & Environment and UCL Genetics Institute, University College London, London, UK
| | - Sandra Codlin
- Department of Genetics, Evolution & Environment and UCL Genetics Institute, University College London, London, UK
| | - Manos A Papadakis
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Susanne Reinhardt
- Biotechnology Centre, Technische Universität Dresden, Dresden, Germany
| | - María Rodríguez-López
- Department of Genetics, Evolution & Environment and UCL Genetics Institute, University College London, London, UK
| | - Stuart Martin
- Department of Genetics, Evolution & Environment and UCL Genetics Institute, University College London, London, UK
| | - Samuel Marguerat
- Department of Genetics, Evolution & Environment and UCL Genetics Institute, University College London, London, UK
| | | | - Eunhye Lee
- Department of Genetics, Evolution & Environment and UCL Genetics Institute, University College London, London, UK
| | - Christopher T Workman
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Jürg Bähler
- Department of Genetics, Evolution & Environment and UCL Genetics Institute, University College London, London, UK
| | - Andreas Beyer
- Biotechnology Centre, Technische Universität Dresden, Dresden, Germany Cologne Cluster of Excellence in Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| |
Collapse
|
21
|
León-Novelo LG, McIntyre LM, Fear JM, Graze RM. A flexible Bayesian method for detecting allelic imbalance in RNA-seq data. BMC Genomics 2014; 15:920. [PMID: 25339465 PMCID: PMC4230747 DOI: 10.1186/1471-2164-15-920] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 10/09/2014] [Indexed: 01/01/2023] Open
Abstract
Background One method of identifying cis regulatory differences is to analyze allele-specific expression (ASE) and identify cases of allelic imbalance (AI). RNA-seq is the most common way to measure ASE and a binomial test is often applied to determine statistical significance of AI. This implicitly assumes that there is no bias in estimation of AI. However, bias has been found to result from multiple factors including: genome ambiguity, reference quality, the mapping algorithm, and biases in the sequencing process. Two alternative approaches have been developed to handle bias: adjusting for bias using a statistical model and filtering regions of the genome suspected of harboring bias. Existing statistical models which account for bias rely on information from DNA controls, which can be cost prohibitive for large intraspecific studies. In contrast, data filtering is inexpensive and straightforward, but necessarily involves sacrificing a portion of the data. Results Here we propose a flexible Bayesian model for analysis of AI, which accounts for bias and can be implemented without DNA controls. In lieu of DNA controls, this Poisson-Gamma (PG) model uses an estimate of bias from simulations. The proposed model always has a lower type I error rate compared to the binomial test. Consistent with prior studies, bias dramatically affects the type I error rate. All of the tested models are sensitive to misspecification of bias. The closer the estimate of bias is to the true underlying bias, the lower the type I error rate. Correct estimates of bias result in a level alpha test. Conclusions To improve the assessment of AI, some forms of systematic error (e.g., map bias) can be identified using simulation. The resulting estimates of bias can be used to correct for bias in the PG model, without data filtering. Other sources of bias (e.g., unidentified variant calls) can be easily captured by DNA controls, but are missed by common filtering approaches. Consequently, as variant identification improves, the need for DNA controls will be reduced. Filtering does not significantly improve performance and is not recommended, as information is sacrificed without a measurable gain. The PG model developed here performs well when bias is known, or slightly misspecified. The model is flexible and can accommodate differences in experimental design and bias estimation. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-920) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | - Rita M Graze
- Department of Biological Sciences, Auburn University, 101 Rouse Life Science Building, 36849 Auburn, AL, USA.
| |
Collapse
|
22
|
Liu Z, Yang J, Xu H, Li C, Wang Z, Li Y, Dong X, Li Y. Comparing computational methods for identification of allele-specific expression based on next generation sequencing data. Genet Epidemiol 2014; 38:591-8. [PMID: 25183311 DOI: 10.1002/gepi.21846] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 05/15/2014] [Accepted: 06/16/2014] [Indexed: 11/07/2022]
Abstract
Allele-specific expression (ASE) studies have wide-ranging implications for genome biology and medicine. Whole transcriptome RNA sequencing (RNA-Seq) has emerged as a genome-wide tool for identifying ASE, but suffers from mapping bias favoring reference alleles. Two categories of methods are adopted nowadays, to reduce the effect of mapping bias on ASE identification-normalizing RNA allelic ratio with the parallel genomic allelic ratio (pDNAar) and modifying reference genome to make reads carrying both alleles with the same chance to be mapped (mREF). We compared the sensitivity and specificity of both methods with simulated data, and demonstrated that the pDNAar, though ideally practical, was lower in sensitivity, because of its lower mapping rate of reads carrying nonreference (alternative) alleles, although mREF achieved higher sensitivity and specificity for its efficiency in mapping reads carrying both alleles. Application of these two methods in real sequencing data showed that mREF were able to identify more ASE loci because of its higher mapping efficiency, and able to correcting some seemly incorrect ASE loci identified by pDNAar due to the inefficiency in mapping reads carrying alternative alleles of pDNAar. Our study provides useful information for RNA sequencing data processing in the identification of ASE.
Collapse
Affiliation(s)
- Zhi Liu
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academic of Science, Shanghai, P. R. China; University of Chinese Academic of Science, Beijing, P. R. China
| | | | | | | | | | | | | | | |
Collapse
|
23
|
Quinn A, Juneja P, Jiggins FM. Estimates of allele-specific expression in Drosophila with a single genome sequence and RNA-seq data. ACTA ACUST UNITED AC 2014; 30:2603-10. [PMID: 24845654 DOI: 10.1093/bioinformatics/btu342] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
MOTIVATION Genetic variation in cis-regulatory elements is an important cause of variation in gene expression. Cis-regulatory variation can be detected by using high-throughput RNA sequencing (RNA-seq) to identify differences in the expression of the two alleles of a gene. This requires that reads from the two alleles are equally likely to map to a reference genome(s), and that single-nucleotide polymorphisms (SNPs) are accurately called, so that reads derived from the different alleles can be identified. Both of these prerequisites can be achieved by sequencing the genomes of the parents of the individual being studied, but this is often prohibitively costly. RESULTS In Drosophila, we demonstrate that biases during read mapping can be avoided by mapping reads to two alternative genomes that incorporate SNPs called from the RNA-seq data. The SNPs can be reliably called from the RNA-seq data itself, provided any variants not found in high-quality SNP databases are filtered out. Finally, we suggest a way of measuring allele-specific expression (ASE) by crossing the line of interest to a reference line with a high-quality genome sequence. Combined with our bioinformatic methods, this approach minimizes mapping biases, allows poor-quality data to be identified and removed and aides in the biological interpretation of the data as the parent of origin of each allele is known. In conclusion, our results suggest that accurate estimates of ASE do not require the parental genomes of the individual being studied to be sequenced. AVAILABILITY AND IMPLEMENTATION Scripts used to perform our analysis are available at https://github.com/d-quinn/bio_quinn2013.
Collapse
Affiliation(s)
- Andrew Quinn
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Punita Juneja
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Francis M Jiggins
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| |
Collapse
|
24
|
Suvorov A, Nolte V, Pandey RV, Franssen SU, Futschik A, Schlötterer C. Intra-specific regulatory variation in Drosophila pseudoobscura. PLoS One 2013; 8:e83547. [PMID: 24386226 PMCID: PMC3873948 DOI: 10.1371/journal.pone.0083547] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2013] [Accepted: 11/06/2013] [Indexed: 11/18/2022] Open
Abstract
It is generally accepted that gene regulation serves an important role in determining the phenotype. To shed light on the evolutionary forces operating on gene regulation, previous studies mainly focused on the expression differences between species and their inter-specific hybrids. Here, we use RNA-Seq to study the intra-specific distribution of cis- and trans-regulatory variation in Drosophila pseudoobscura. Consistent with previous results, we find almost twice as many genes (26%) with significant trans-effects than genes with significant cis-effects (18%). While this result supports the previous suggestion of a larger mutational target of trans-effects, we also show that trans-effects may be subjected to purifying selection. Our results underline the importance of intra-specific analyses for the understanding of the evolution of gene expression.
Collapse
Affiliation(s)
- Anton Suvorov
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Vienna Graduate School of Population Genetics, Vienna, Austria
| | - Viola Nolte
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | - Ram Vinay Pandey
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | | | - Andreas Futschik
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Department of Applied Statistics, Johannes Kepler Universität Linz, Linz, Austria
| | | |
Collapse
|
25
|
Younesy H, Möller T, Heravi-Moussavi A, Cheng JB, Costello JF, Lorincz MC, Karimi MM, Jones SJM. ALEA: a toolbox for allele-specific epigenomics analysis. ACTA ACUST UNITED AC 2013; 30:1172-1174. [PMID: 24371156 DOI: 10.1093/bioinformatics/btt744] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Accepted: 12/09/2013] [Indexed: 11/13/2022]
Abstract
The assessment of expression and epigenomic status using sequencing based methods provides an unprecedented opportunity to identify and correlate allelic differences with epigenomic status. We present ALEA, a computational toolbox for allele-specific epigenomics analysis, which incorporates allelic variation data within existing resources, allowing for the identification of significant associations between epigenetic modifications and specific allelic variants in human and mouse cells. ALEA provides a customizable pipeline of command line tools for allele-specific analysis of next-generation sequencing data (ChIP-seq, RNA-seq, etc.) that takes the raw sequencing data and produces separate allelic tracks ready to be viewed on genome browsers. The pipeline has been validated using human and hybrid mouse ChIP-seq and RNA-seq data. AVAILABILITY The package, test data and usage instructions are available online at http://www.bcgsc.ca/platform/bioinfo/software/alea CONTACT: : mkarimi1@interchange.ubc.ca or sjones@bcgsc.ca Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hamid Younesy
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada, Graphics Usability and Visualization Lab, School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada, Visualization and Data Analysis Lab, Faculty of Computer Science, University of Vienna, A-1090 Vienna, Austria, Department of Dermatology, University of California San Francisco, San Francisco, California 94143, USA, Brain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 94158, USA and Department of Medical Genetics, Life Sciences Institute, The University of British Columbia, Vancouver, British Columbia, V6T 1Z3, Canada Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada, Graphics Usability and Visualization Lab, School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada, Visualization and Data Analysis Lab, Faculty of Computer Science, University of Vienna, A-1090 Vienna, Austria, Department of Dermatology, University of California San Francisco, San Francisco, California 94143, USA, Brain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 94158, USA and Department of Medical Genetics, Life Sciences Institute, The University of British Columbia, Vancouver, British Columbia, V6T 1Z3, Canada
| | - Torsten Möller
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada, Graphics Usability and Visualization Lab, School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada, Visualization and Data Analysis Lab, Faculty of Computer Science, University of Vienna, A-1090 Vienna, Austria, Department of Dermatology, University of California San Francisco, San Francisco, California 94143, USA, Brain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 94158, USA and Department of Medical Genetics, Life Sciences Institute, The University of British Columbia, Vancouver, British Columbia, V6T 1Z3, Canada Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada, Graphics Usability and Visualization Lab, School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada, Visualization and Data Analysis Lab, Faculty of Computer Science, University of Vienna, A-1090 Vienna, Austria, Department of Dermatology, University of California San Francisco, San Francisco, California 94143, USA, Brain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 94158, USA and Department of Medical Genetics, Life Sciences Institute, The University of British Columbia, Vancouver, British Columbia, V6T 1Z3, Canada
| | - Alireza Heravi-Moussavi
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada, Graphics Usability and Visualization Lab, School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada, Visualization and Data Analysis Lab, Faculty of Computer Science, University of Vienna, A-1090 Vienna, Austria, Department of Dermatology, University of California San Francisco, San Francisco, California 94143, USA, Brain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 94158, USA and Department of Medical Genetics, Life Sciences Institute, The University of British Columbia, Vancouver, British Columbia, V6T 1Z3, Canada
| | - Jeffrey B Cheng
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada, Graphics Usability and Visualization Lab, School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada, Visualization and Data Analysis Lab, Faculty of Computer Science, University of Vienna, A-1090 Vienna, Austria, Department of Dermatology, University of California San Francisco, San Francisco, California 94143, USA, Brain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 94158, USA and Department of Medical Genetics, Life Sciences Institute, The University of British Columbia, Vancouver, British Columbia, V6T 1Z3, Canada
| | - Joseph F Costello
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada, Graphics Usability and Visualization Lab, School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada, Visualization and Data Analysis Lab, Faculty of Computer Science, University of Vienna, A-1090 Vienna, Austria, Department of Dermatology, University of California San Francisco, San Francisco, California 94143, USA, Brain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 94158, USA and Department of Medical Genetics, Life Sciences Institute, The University of British Columbia, Vancouver, British Columbia, V6T 1Z3, Canada
| | - Matthew C Lorincz
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada, Graphics Usability and Visualization Lab, School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada, Visualization and Data Analysis Lab, Faculty of Computer Science, University of Vienna, A-1090 Vienna, Austria, Department of Dermatology, University of California San Francisco, San Francisco, California 94143, USA, Brain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 94158, USA and Department of Medical Genetics, Life Sciences Institute, The University of British Columbia, Vancouver, British Columbia, V6T 1Z3, Canada
| | - Mohammad M Karimi
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada, Graphics Usability and Visualization Lab, School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada, Visualization and Data Analysis Lab, Faculty of Computer Science, University of Vienna, A-1090 Vienna, Austria, Department of Dermatology, University of California San Francisco, San Francisco, California 94143, USA, Brain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 94158, USA and Department of Medical Genetics, Life Sciences Institute, The University of British Columbia, Vancouver, British Columbia, V6T 1Z3, Canada Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada, Graphics Usability and Visualization Lab, School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada, Visualization and Data Analysis Lab, Faculty of Computer Science, University of Vienna, A-1090 Vienna, Austria, Department of Dermatology, University of California San Francisco, San Francisco, California 94143, USA, Brain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 94158, USA and Department of Medical Genetics, Life Sciences Institute, The University of British Columbia, Vancouver, British Columbia, V6T 1Z3, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada, Graphics Usability and Visualization Lab, School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada, Visualization and Data Analysis Lab, Faculty of Computer Science, University of Vienna, A-1090 Vienna, Austria, Department of Dermatology, University of California San Francisco, San Francisco, California 94143, USA, Brain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 94158, USA and Department of Medical Genetics, Life Sciences Institute, The University of British Columbia, Vancouver, British Columbia, V6T 1Z3, Canada
| |
Collapse
|