1
|
Ruperao P, Rangan P, Shah T, Thakur V, Kalia S, Mayes S, Rathore A. The Progression in Developing Genomic Resources for Crop Improvement. Life (Basel) 2023; 13:1668. [PMID: 37629524 PMCID: PMC10455509 DOI: 10.3390/life13081668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 07/25/2023] [Indexed: 08/27/2023] Open
Abstract
Sequencing technologies have rapidly evolved over the past two decades, and new technologies are being continually developed and commercialized. The emerging sequencing technologies target generating more data with fewer inputs and at lower costs. This has also translated to an increase in the number and type of corresponding applications in genomics besides enhanced computational capacities (both hardware and software). Alongside the evolving DNA sequencing landscape, bioinformatics research teams have also evolved to accommodate the increasingly demanding techniques used to combine and interpret data, leading to many researchers moving from the lab to the computer. The rich history of DNA sequencing has paved the way for new insights and the development of new analysis methods. Understanding and learning from past technologies can help with the progress of future applications. This review focuses on the evolution of sequencing technologies, their significant enabling role in generating plant genome assemblies and downstream applications, and the parallel development of bioinformatics tools and skills, filling the gap in data analysis techniques.
Collapse
Affiliation(s)
- Pradeep Ruperao
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Parimalan Rangan
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India;
| | - Trushar Shah
- International Institute of Tropical Agriculture (IITA), Nairobi 30709-00100, Kenya;
| | - Vivek Thakur
- Department of Systems & Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad 500046, India;
| | - Sanjay Kalia
- Department of Biotechnology, Ministry of Science and Technology, Government of India, New Delhi 110003, India;
| | - Sean Mayes
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Abhishek Rathore
- Excellence in Breeding, International Maize and Wheat Improvement Center (CIMMYT), Hyderabad 502324, India
| |
Collapse
|
2
|
Boatwright JL. A Robust Methodology for Assessing Homoeolog-Specific Expression. Methods Mol Biol 2023; 2545:251-258. [PMID: 36720817 DOI: 10.1007/978-1-0716-2561-3_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Angiosperm evolution is marked by numerous, recurring polyploidization events. While hybridization and polyploidization have greatly increased the degree of genetic and phenotypic diversity in plants, the mechanisms underlying changes in the genotype-to-phenotype relationships remain unclear. As the field of natural sciences continues to expand during the post-genomic era, large datasets are becoming increasingly common. However, the development of tools and workflows available to robustly assess these changes have lagged behind data production. A robust homoeolog-specific expression analysis strongly depends upon proper homoeolog calling, the ability to account for reference sequence biases, flexible and accurate methods for dealing with residual bias, and a reproducible workflow. To that end, this chapter aims to provide a detailed description of the potential pitfalls encountered while estimating homoeolog-specific expression as well as provide a workflow that allows for robust inferences based on precise estimates of expression changes.
Collapse
Affiliation(s)
- J Lucas Boatwright
- Advanced Plant Technology, Clemson University, Clemson, SC, USA. .,Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, USA.
| |
Collapse
|
3
|
He L, Loika Y, Kulminski AM. Allele-specific analysis reveals exon- and cell-type-specific regulatory effects of Alzheimer's disease-associated genetic variants. Transl Psychiatry 2022; 12:163. [PMID: 35436980 PMCID: PMC9016079 DOI: 10.1038/s41398-022-01913-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 03/18/2022] [Accepted: 03/22/2022] [Indexed: 01/20/2023] Open
Abstract
Elucidating regulatory effects of Alzheimer's disease (AD)-associated genetic variants is critical for unraveling their causal pathways and understanding the pathology. However, their cell-type-specific regulatory mechanisms in the brain remain largely unclear. Here, we conducted an analysis of allele-specific expression quantitative trait loci (aseQTLs) for 33 AD-associated variants in four brain regions and seven cell types using ~3000 bulk RNA-seq samples and >0.25 million single nuclei. We first develop a flexible hierarchical Poisson mixed model (HPMM) and demonstrate its superior statistical power to a beta-binomial model achieved by unifying samples in both allelic and genotype-level expression data. Using the HPMM, we identified 24 (~73%) aseQTLs in at least one brain region, including three new eQTLs associated with CA12, CHRNE, and CASS4. Notably, the APOE ε4 variant reduces APOE expression across all regions, even in AD-unaffected controls. Our results reveal region-dependent and exon-specific effects of multiple aseQTLs, such as rs2093760 with CR1, rs7982 with CLU, and rs3865444 with CD33. In an attempt to pinpoint the cell types responsible for the observed tissue-level aseQTLs using the snRNA-seq data, we detected many aseQTLs in microglia or monocytes associated with immune-related genes, including HLA-DQB1, HLA-DQA2, CD33, FCER1G, MS4A6A, SPI1, and BIN1, highlighting the regulatory role of AD-associated variants in the immune response. These findings provide further insights into potential causal pathways and cell types mediating the effects of the AD-associated variants.
Collapse
Affiliation(s)
- Liang He
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA.
| | - Yury Loika
- grid.26009.3d0000 0004 1936 7961Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC USA
| | - Alexander M. Kulminski
- grid.26009.3d0000 0004 1936 7961Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC USA
| |
Collapse
|
4
|
Sherbina K, León-Novelo LG, Nuzhdin SV, McIntyre LM, Marroni F. Power calculator for detecting allelic imbalance using hierarchical Bayesian model. BMC Res Notes 2021; 14:436. [PMID: 34838135 PMCID: PMC8626927 DOI: 10.1186/s13104-021-05851-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 11/15/2021] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Allelic imbalance (AI) is the differential expression of the two alleles in a diploid. AI can vary between tissues, treatments, and environments. Methods for testing AI exist, but methods are needed to estimate type I error and power for detecting AI and difference of AI between conditions. As the costs of the technology plummet, what is more important: reads or replicates? RESULTS We find that a minimum of 2400, 480, and 240 allele specific reads divided equally among 12, 5, and 3 replicates is needed to detect a 10, 20, and 30%, respectively, deviation from allelic balance in a condition with power > 80%. A minimum of 960 and 240 allele specific reads divided equally among 8 replicates is needed to detect a 20 or 30% difference in AI between conditions with comparable power. Higher numbers of replicates increase power more than adding coverage without affecting type I error. We provide a Python package that enables simulation of AI scenarios and enables individuals to estimate type I error and power in detecting AI and differences in AI between conditions.
Collapse
Affiliation(s)
- Katrina Sherbina
- Quantitative and Computational Biology Section, University of Southern California, Los Angeles, CA, 90046, USA
| | - Luis G León-Novelo
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston-School of Public Health, Houston, TX, 77030, USA
| | - Sergey V Nuzhdin
- Molecular and Computational Biology Section, University of Southern California, Los Angeles, CA, 90046, USA
| | - Lauren M McIntyre
- Genetics Institute and Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32603, USA
| | - Fabio Marroni
- Dipartimento di Scienze Agroalimentari, Ambientali e Animali, Università di Udine, 33100, Udine, Italy.
| |
Collapse
|
5
|
Kuo TCY, Hatakeyama M, Tameshige T, Shimizu KK, Sese J. Homeolog expression quantification methods for allopolyploids. Brief Bioinform 2021; 21:395-407. [PMID: 30590436 PMCID: PMC7299288 DOI: 10.1093/bib/bby121] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 11/06/2018] [Accepted: 11/21/2018] [Indexed: 12/19/2022] Open
Abstract
Genome duplication with hybridization, or allopolyploidization, occurs in animals, fungi and plants, and is especially common in crop plants. There is an increasing interest in the study of allopolyploids because of advances in polyploid genome assembly; however, the high level of sequence similarity in duplicated gene copies (homeologs) poses many challenges. Here we compared standard RNA-seq expression quantification approaches used currently for diploid species against subgenome-classification approaches which maps reads to each subgenome separately. We examined mapping error using our previous and new RNA-seq data in which a subgenome is experimentally added (synthetic allotetraploid Arabidopsis kamchatica) or reduced (allohexaploid wheat Triticum aestivum versus extracted allotetraploid) as ground truth. The error rates in the two species were very similar. The standard approaches showed higher error rates (>10% using pseudo-alignment with Kallisto) while subgenome-classification approaches showed much lower error rates (<1% using EAGLE-RC, <2% using HomeoRoq). Although downstream analysis may partly mitigate mapping errors, the difference in methods was substantial in hexaploid wheat, where Kallisto appeared to have systematic differences relative to other methods. Only approximately half of the differentially expressed homeologs detected using Kallisto overlapped with those by any other method in wheat. In general, disagreement in low-expression genes was responsible for most of the discordance between methods, which is consistent with known biases in Kallisto. We also observed that there exist uncertainties in genome sequences and annotation which can affect each method differently. Overall, subgenome-classification approaches tend to perform better than standard approaches with EAGLE-RC having the highest precision.
Collapse
Affiliation(s)
- Tony C Y Kuo
- Artificial Intelligence Research Center, AIST, 2-3-26 Aomi, Koto-ku, Tokyo 135-0064, Japan.,AIST-Tokyo Tech RWBC-OIL, 2-12-1 Okayama, Meguro-ku, Tokyo 152-8550, Japan
| | - Masaomi Hatakeyama
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, Zurich CH-8057, Switzerland.,Functional Genomics Center Zurich, Winterthurerstrasse 190, Zurich CH-8057, Switzerland.,Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Genopode, Lausanne 1015, Switzerland
| | - Toshiaki Tameshige
- Kihara Institute for Biological Research, Yokohama City University, 641-12, Maioka, Totsuka-ku, Yokohama 244-0813, Japan
| | - Kentaro K Shimizu
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, Zurich CH-8057, Switzerland.,Kihara Institute for Biological Research, Yokohama City University, 641-12, Maioka, Totsuka-ku, Yokohama 244-0813, Japan
| | - Jun Sese
- Artificial Intelligence Research Center, AIST, 2-3-26 Aomi, Koto-ku, Tokyo 135-0064, Japan.,AIST-Tokyo Tech RWBC-OIL, 2-12-1 Okayama, Meguro-ku, Tokyo 152-8550, Japan
| |
Collapse
|
6
|
Boatwright JL, Yeh CT, Hu HC, Susanna A, Soltis DE, Soltis PS, Schnable PS, Barbazuk WB. Trajectories of Homoeolog-Specific Expression in Allotetraploid Tragopogon castellanus Populations of Independent Origins. FRONTIERS IN PLANT SCIENCE 2021; 12:679047. [PMID: 34249049 PMCID: PMC8261302 DOI: 10.3389/fpls.2021.679047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 05/20/2021] [Indexed: 06/13/2023]
Abstract
Polyploidization can have a significant ecological and evolutionary impact by providing substantially more genetic material that may result in novel phenotypes upon which selection may act. While the effects of polyploidization are broadly reviewed across the plant tree of life, the reproducibility of these effects within naturally occurring, independently formed polyploids is poorly characterized. The flowering plant genus Tragopogon (Asteraceae) offers a rare glimpse into the intricacies of repeated allopolyploid formation with both nascent (< 90 years old) and more ancient (mesopolyploids) formations. Neo- and mesopolyploids in Tragopogon have formed repeatedly and have extant diploid progenitors that facilitate the comparison of genome evolution after polyploidization across a broad span of evolutionary time. Here, we examine four independently formed lineages of the mesopolyploid Tragopogon castellanus for homoeolog expression changes and fractionation after polyploidization. We show that expression changes are remarkably similar among these independently formed polyploid populations with large convergence among expressed loci, moderate convergence among loci lost, and stochastic silencing. We further compare and contrast these results for T. castellanus with two nascent Tragopogon allopolyploids. While homoeolog expression bias was balanced in both nascent polyploids and T. castellanus, the degree of additive expression was significantly different, with the mesopolyploid populations demonstrating more non-additive expression. We suggest that gene dosage and expression noise minimization may play a prominent role in regulating gene expression patterns immediately after allopolyploidization as well as deeper into time, and these patterns are conserved across independent polyploid lineages.
Collapse
Affiliation(s)
- J. Lucas Boatwright
- Advanced Plant Technology Program, Clemson University, Clemson, SC, United States
| | - Cheng-Ting Yeh
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Heng-Cheng Hu
- Department of Agronomy, Iowa State University, Ames, IA, United States
- Covance Inc., Indianapolis, IN, United States
| | - Alfonso Susanna
- Botanic Institute of Barcelona, Consejo Superior de Investigaciones Científicas, ICUB, Barcelona, Spain
| | - Douglas E. Soltis
- Department of Biology, University of Florida, Gainesville, FL, United States
- Plant Molecular and Cellular Biology Program, University of Florida, Gainesville, FL, United States
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
- Genetics Institute, University of Florida, Gainesville, FL, United States
- Biodiversity Institute, University of Florida, Gainesville, FL, United States
| | - Pamela S. Soltis
- Plant Molecular and Cellular Biology Program, University of Florida, Gainesville, FL, United States
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
- Genetics Institute, University of Florida, Gainesville, FL, United States
- Biodiversity Institute, University of Florida, Gainesville, FL, United States
| | | | - William B. Barbazuk
- Department of Biology, University of Florida, Gainesville, FL, United States
| |
Collapse
|
7
|
Miller BR, Morse AM, Borgert JE, Liu Z, Sinclair K, Gamble G, Zou F, Newman JRB, León-Novelo LG, Marroni F, McIntyre LM. Testcrosses are an efficient strategy for identifying cis-regulatory variation: Bayesian analysis of allele-specific expression (BayesASE). G3 (BETHESDA, MD.) 2021; 11:jkab096. [PMID: 33772539 PMCID: PMC8104932 DOI: 10.1093/g3journal/jkab096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 03/10/2021] [Indexed: 12/30/2022]
Abstract
Allelic imbalance (AI) occurs when alleles in a diploid individual are differentially expressed and indicates cis acting regulatory variation. What is the distribution of allelic effects in a natural population? Are all alleles the same? Are all alleles distinct? The approach described applies to any technology generating allele-specific sequence counts, for example for chromatin accessibility and can be applied generally including to comparisons between tissues or environments for the same genotype. Tests of allelic effect are generally performed by crossing individuals and comparing expression between alleles directly in the F1. However, a crossing scheme that compares alleles pairwise is a prohibitive cost for more than a handful of alleles as the number of crosses is at least (n2-n)/2 where n is the number of alleles. We show here that a testcross design followed by a hypothesis test of AI between testcrosses can be used to infer differences between nontester alleles, allowing n alleles to be compared with n crosses. Using a mouse data set where both testcrosses and direct comparisons have been performed, we show that the predicted differences between nontester alleles are validated at levels of over 90% when a parent-of-origin effect is present and of 60%-80% overall. Power considerations for a testcross, are similar to those in a reciprocal cross. In all applications, the testing for AI involves several complex bioinformatics steps. BayesASE is a complete bioinformatics pipeline that incorporates state-of-the-art error reduction techniques and a flexible Bayesian approach to estimating AI and formally comparing levels of AI between conditions. The modular structure of BayesASE has been packaged in Galaxy, made available in Nextflow and as a collection of scripts for the SLURM workload manager on github (https://github.com/McIntyre-Lab/BayesASE).
Collapse
Affiliation(s)
- Brecca R Miller
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- NYU Langone Health, New York University, New York, NY 10013, USA
| | - Alison M Morse
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| | - Jacqueline E Borgert
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27515, USA
| | - Zihao Liu
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| | - Kelsey Sinclair
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| | - Gavin Gamble
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27515, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27515, USA
| | - Jeremy R B Newman
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Pathology, University of Florida, Gainesville, FL 32608 USA
| | - Luis G León-Novelo
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston-University of Texas School of Public Health, Houston, TX 7703, USA
| | - Fabio Marroni
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, Udine, 33100, Italy
| | - Lauren M McIntyre
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| |
Collapse
|
8
|
Tangwancharoen S, Semmens BX, Burton RS. Allele-Specific Expression and Evolution of Gene Regulation Underlying Acute Heat Stress Response and Local Adaptation in the Copepod Tigriopus californicus. J Hered 2020; 111:539-547. [PMID: 33141173 DOI: 10.1093/jhered/esaa044] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 10/26/2020] [Indexed: 01/02/2023] Open
Abstract
Geographic variation in environmental temperature can select for local adaptation among conspecific populations. Divergence in gene expression across the transcriptome is a key mechanism for evolution of local thermal adaptation in many systems, yet the genetic mechanisms underlying this regulatory evolution remain poorly understood. Here we examine gene expression in 2 locally adapted Tigriopus californicus populations (heat tolerant San Diego, SD, and less tolerant Santa Cruz, SC) and their F1 hybrids during acute heat stress response. Allele-specific expression (ASE) in F1 hybrids was used to determine cis-regulatory divergence. We found that the number of genes showing significant allelic imbalance increased under heat stress compared to unstressed controls. This suggests that there is significant population divergence in cis-regulatory elements underlying heat stress response. Specifically, the number of genes showing an excess of transcripts from the more thermal tolerant (SD) population increased with heat stress while that number of genes with an SC excess was similar in both treatments. Inheritance patterns of gene expression also revealed that genes displaying SD-dominant expression phenotypes increase in number in response to heat stress; that is, across loci, gene expression in F1's following heat stress showed more similarity to SD than SC, a pattern that was absent in the control treatment. The observed patterns of ASE and inheritance of gene expression provide insight into the complex processes underlying local adaptation and thermal stress response.
Collapse
Affiliation(s)
- Sumaetee Tangwancharoen
- Marine Biology Research Division, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA.,Department of Biology, University of Vermont, Burlington, VT
| | - Brice X Semmens
- Marine Biology Research Division, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA
| | - Ronald S Burton
- Marine Biology Research Division, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA
| |
Collapse
|
9
|
Cartwright EL, Lott SE. Evolved Differences in cis and trans Regulation Between the Maternal and Zygotic mRNA Complements in the Drosophila Embryo. Genetics 2020; 216:805-821. [PMID: 32928902 PMCID: PMC7648588 DOI: 10.1534/genetics.120.303626] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 08/26/2020] [Indexed: 11/18/2022] Open
Abstract
How gene expression can evolve depends on the mechanisms driving gene expression. Gene expression is controlled in different ways in different developmental stages; here we ask whether different developmental stages show different patterns of regulatory evolution. To explore the mode of regulatory evolution, we used the early stages of embryonic development controlled by two different genomes, that of the mother and that of the zygote. During embryogenesis in all animals, initial developmental processes are driven entirely by maternally provided gene products deposited into the oocyte. The zygotic genome is activated later, when developmental control is handed off from maternal gene products to the zygote during the maternal-to-zygotic transition. Using hybrid crosses between sister species of Drosophila (Dsimulans, D. sechellia, and D. mauritiana) and transcriptomics, we find that the regulation of maternal transcript deposition and zygotic transcription evolve through different mechanisms. We find that patterns of transcript level inheritance in hybrids, relative to parental species, differ between maternal and zygotic transcripts, and maternal transcript levels are more likely to be conserved. Changes in transcript levels occur predominantly through differences in trans regulation for maternal genes, while changes in zygotic transcription occur through a combination of both cis and trans regulatory changes. Differences in the underlying regulatory landscape in the mother and the zygote are likely the primary determinants for how maternal and zygotic transcripts evolve.
Collapse
Affiliation(s)
- Emily L Cartwright
- Department of Evolution and Ecology, University of California, Davis, California 95616
| | - Susan E Lott
- Department of Evolution and Ecology, University of California, Davis, California 95616
| |
Collapse
|
10
|
Shan S, Boatwright JL, Liu X, Chanderbali AS, Fu C, Soltis PS, Soltis DE. Transcriptome Dynamics of the Inflorescence in Reciprocally Formed Allopolyploid Tragopogon miscellus (Asteraceae). Front Genet 2020; 11:888. [PMID: 32849847 PMCID: PMC7423994 DOI: 10.3389/fgene.2020.00888] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 07/20/2020] [Indexed: 11/13/2022] Open
Abstract
Polyploidy is an important evolutionary mechanism and is prevalent among land plants. Most polyploid species examined have multiple origins, which provide genetic diversity and may enhance the success of polyploids. In some polyploids, recurrent origins can result from reciprocal crosses between the same diploid progenitors. Although great progress has been made in understanding the genetic consequences of polyploidy, the genetic implications of reciprocal polyploidization remain poorly understood, especially in natural polyploids. Tragopogon (Asteraceae) has become an evolutionary model system for studies of recent and recurrent polyploidy. Allotetraploid T. miscellus has formed reciprocally in nature with resultant distinctive floral and inflorescence morphologies (i.e., short- vs. long-liguled forms). In this study, we performed comparative inflorescence transcriptome analyses of reciprocally formed T. miscellus and its diploid parents, T. dubius and T. pratensis. In both forms of T. miscellus, homeolog expression of ∼70% of the loci showed vertical transmission of the parental expression patterns (i.e., parental legacy), and ∼20% of the loci showed biased homeolog expression, which was unbalanced toward T. pratensis. However, 17.9% of orthologous pairs showed different homeolog expression patterns between the two forms of T. miscellus. No clear effect of cytonuclear interaction on biased expression of the maternal homeolog was found. In terms of the total expression level of the homeologs studied, 22.6% and 16.2% of the loci displayed non-additive expression in short- and long-liguled T. miscellus, respectively. Unbalanced expression level dominance toward T. pratensis was observed in both forms of T. miscellus. Significantly, genes annotated as being involved in pectin catabolic processes were highly expressed in long-liguled T. miscellus relative to the short-liguled form, and the majority of these differentially expressed genes were transgressively down-regulated in short-liguled T. miscellus. Given the known role of these genes in cell expansion, they may play a role in the differing floral and inflorescence morphologies of the two forms. In summary, the overall inflorescence transcriptome profiles are highly similar between reciprocal origins of T. miscellus. However, the dynamic homeolog-specific expression and non-additive expression patterns observed in T. miscellus emphasize the importance of reciprocal origins in promoting the genetic diversity of polyploids.
Collapse
Affiliation(s)
- Shengchen Shan
- Plant Molecular and Cellular Biology Program, University of Florida, Gainesville, FL, United States.,Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
| | - J Lucas Boatwright
- Advanced Plant Technology Program, Clemson University, Clemson, SC, United States
| | - Xiaoxian Liu
- Department of Biology, University of Florida, Gainesville, FL, United States.,Environmental Genomics and Systems Biology (EGSB), Biosciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Andre S Chanderbali
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
| | - Chaonan Fu
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
| | - Pamela S Soltis
- Plant Molecular and Cellular Biology Program, University of Florida, Gainesville, FL, United States.,Florida Museum of Natural History, University of Florida, Gainesville, FL, United States.,Biodiversity Institute, University of Florida, Gainesville, FL, United States.,Genetics Institute, University of Florida, Gainesville, FL, United States
| | - Douglas E Soltis
- Plant Molecular and Cellular Biology Program, University of Florida, Gainesville, FL, United States.,Florida Museum of Natural History, University of Florida, Gainesville, FL, United States.,Department of Biology, University of Florida, Gainesville, FL, United States.,Biodiversity Institute, University of Florida, Gainesville, FL, United States.,Genetics Institute, University of Florida, Gainesville, FL, United States
| |
Collapse
|
11
|
Abstract
Allelic imbalance occurs when the two alleles of a gene are differentially expressed within a diploid organism, and can indicate important differences in cis-regulation and epigenetic state across the two chromosomes. Because of this, the ability to accurately quantify the proportion at which each allele of a gene is expressed is of great interest to researchers. This becomes challenging in the presence of small read counts and/or sample sizes, which can cause estimates for allelic expression proportions to have high variance. Investigators have traditionally dealt with this problem by filtering out genes with small counts and samples. However, this may inadvertently remove important genes that have truly large allelic imbalances. Another option is to use Bayesian estimators to reduce the variance. To this end, we evaluated the accuracy of three different estimators, the latter two of which are Bayesian shrinkage estimators: maximum likelihood, approximate posterior estimation of GLM coefficients (apeglm) and adaptive shrinkage (ash). We also wrote C++ code to quickly calculate ML and apeglm estimates, and integrated it into the apeglm package. The three methods were evaluated on both simulated and real data. Apeglm consistently performed better than ML according to a variety of criteria, including mean absolute error and concordance at the top. While ash had lower error and greater concordance than ML on the simulations, it also had a tendency to over-shrink large effects, and performed worse on the real data according to error and concordance. Furthermore, when compared to five other packages that also fit beta-binomial models, the apeglm package was substantially faster, making our package useful for quick and reliable analyses of allelic imbalance. Apeglm is available as an R/Bioconductor package at http://bioconductor.org/packages/apeglm.
Collapse
Affiliation(s)
- Joshua P Zitovsky
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514, USA
| |
Collapse
|
12
|
Abstract
Allelic imbalance occurs when the two alleles of a gene are differentially expressed within a diploid organism and can indicate important differences in cis-regulation and epigenetic state across the two chromosomes. Because of this, the ability to accurately quantify the proportion at which each allele of a gene is expressed is of great interest to researchers. This becomes challenging in the presence of small read counts and/or sample sizes, which can cause estimators for allelic expression proportions to have high variance. Investigators have traditionally dealt with this problem by filtering out genes with small counts and samples. However, this may inadvertently remove important genes that have truly large allelic imbalances. Another option is to use pseudocounts or Bayesian estimators to reduce the variance. To this end, we evaluated the accuracy of four different estimators, the latter two of which are Bayesian shrinkage estimators: maximum likelihood, adding a pseudocount to each allele, approximate posterior estimation of GLM coefficients (apeglm) and adaptive shrinkage (ash). We also wrote C++ code to quickly calculate ML and apeglm estimates and integrated it into the apeglm package. The four methods were evaluated on two simulations and one real data set. Apeglm consistently performed better than ML according to a variety of criteria, and generally outperformed use of pseudocounts as well. Ash also performed better than ML in one of the simulations, but in the other performance was more mixed. Finally, when compared to five other packages that also fit beta-binomial models, the apeglm package was substantially faster and more numerically reliable, making our package useful for quick and reliable analyses of allelic imbalance. Apeglm is available as an R/Bioconductor package at http://bioconductor.org/packages/apeglm.
Collapse
Affiliation(s)
- Joshua P. Zitovsky
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514, USA
| |
Collapse
|
13
|
Miao Z, Alvarez M, Pajukanta P, Ko A. ASElux: an ultra-fast and accurate allelic reads counter. Bioinformatics 2019; 34:1313-1320. [PMID: 29186329 DOI: 10.1093/bioinformatics/btx762] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 11/22/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Mapping bias causes preferential alignment to the reference allele, forming a major obstacle in allele-specific expression (ASE) analysis. The existing methods, such as simulation and SNP-aware alignment, are either inaccurate or relatively slow. To fast and accurately count allelic reads for ASE analysis, we developed a novel approach, ASElux, which utilizes the personal SNP information and counts allelic reads directly from unmapped RNA-sequence (RNA-seq) data. ASElux significantly reduces runtime by disregarding reads outside single nucleotide polymorphisms (SNPs) during the alignment. Results When compared to other tools on simulated and experimental data, ASElux achieves a higher accuracy on ASE estimation than non-SNP-aware aligners and requires a much shorter time than the benchmark SNP-aware aligner, GSNAP with just a slight loss in performance. ASElux can process 40 million read-pairs from an RNA-sequence (RNA-seq) sample and count allelic reads within 10 min, which is comparable to directly counting the allelic reads from alignments based on other tools. Furthermore, processing an RNA-seq sample using ASElux in conjunction with a general aligner, such as STAR, is more accurate and still ∼4× faster than STAR + WASP, and ∼33× faster than the lead SNP-aware aligner, GSNAP, making ASElux ideal for ASE analysis of large-scale transcriptomic studies. We applied ASElux to 273 lung RNA-seq samples from GTEx and identified a splice-QTL rs11078928 in lung which explains the mechanism underlying an asthma GWAS SNP rs11078927. Thus, our analysis demonstrated ASE as a highly powerful complementary tool to cis-expression quantitative trait locus (eQTL) analysis. Availability and implementation The software can be downloaded from https://github.com/abl0719/ASElux. Contact zmiao@ucla.edu or a5ko@ucla.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zong Miao
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90024, USA.,Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA 90024, USA
| | - Marcus Alvarez
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90024, USA
| | - Päivi Pajukanta
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90024, USA.,Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA 90024, USA.,Molecular Biology Institute, UCLA, Los Angeles, CA 90024, USA
| | - Arthur Ko
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90024, USA.,Molecular Biology Institute, UCLA, Los Angeles, CA 90024, USA
| |
Collapse
|
14
|
Zhao C, Xie S, Wu H, Luan Y, Hu S, Ni J, Lin R, Zhao S, Zhang D, Li X. Quantification of allelic differential expression using a simple Fluorescence primer PCR-RFLP-based method. Sci Rep 2019; 9:6334. [PMID: 31004110 PMCID: PMC6474871 DOI: 10.1038/s41598-019-42815-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Accepted: 03/29/2019] [Indexed: 12/04/2022] Open
Abstract
Allelic differential expression (ADE) is common in diploid organisms, and is often the key reason for specific phenotype variations. Thus, ADE detection is important for identification of major genes and causal mutations. To date, sensitive and simple methods to detect ADE are still lacking. In this study, we have developed an accurate, simple, and sensitive method, named fluorescence primer PCR-RFLP quantitative method (fPCR-RFLP), for ADE analysis. This method involves two rounds of PCR amplification using a pair of primers, one of which is double-labeled with an overhang 6-FAM. The two alleles are then separated by RFLP and quantified by fluorescence density. fPCR-RFLP could precisely distinguish ADE cross a range of 1- to 32-fold differences. Using this method, we verified PLAG1 and KIT, two candidate genes related to growth rate and immune response traits of pigs, to be ADE both at different developmental stages and in different tissues. Our data demonstrates that fPCR-RFLP is an accurate and sensitive method for detecting ADE on both DNA and RNA level. Therefore, this powerful tool provides a way to analyze mutations that cause ADE.
Collapse
Affiliation(s)
- Changzhi Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Shengsong Xie
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China.,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Hui Wu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Yu Luan
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Suqin Hu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Juan Ni
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Ruiyi Lin
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China.,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Dingxiao Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China. .,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China.
| | - Xinyun Li
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China. .,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China.
| |
Collapse
|
15
|
Abstract
Allele-specific expression is traditionally studied by bulk RNA sequencing, which measures average gene expression across cells. Single-cell RNA sequencing (scRNA-seq) allows the comparison of expression distribution between the two alleles of a diploid organism, and characterization of allele-specific bursting. Here we describe SCALE, a bioinformatic and statistical framework for allele-specific gene expression analysis by scRNA-seq. SCALE estimates genome-wide bursting kinetics at the allelic level while accounting for technical bias and other complicating factors such as cell size. SCALE detects genes with significantly different bursting kinetics between the two alleles, as well as genes where the two alleles exhibit non-independent bursting processes. Here, we illustrate SCALE on a mouse blastocyst single-cell dataset with step-by-step demonstration from the upstream bioinformatic processing to the downstream biological interpretation of SCALE's output.
Collapse
Affiliation(s)
- Meichen Dong
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Yuchao Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA.
- Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, NC, USA.
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA.
| |
Collapse
|
16
|
Combs PA, Fraser HB. Spatially varying cis-regulatory divergence in Drosophila embryos elucidates cis-regulatory logic. PLoS Genet 2018; 14:e1007631. [PMID: 30383747 PMCID: PMC6211617 DOI: 10.1371/journal.pgen.1007631] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 08/14/2018] [Indexed: 12/30/2022] Open
Abstract
Spatial patterning of gene expression is a key process in development, yet how it evolves is still poorly understood. Both cis- and trans-acting changes could participate in complex interactions, so to isolate the cis-regulatory component of patterning evolution, we measured allele-specific spatial gene expression patterns in D. melanogaster × simulans hybrid embryos. RNA-seq of cryo-sectioned slices revealed 66 genes with strong spatially varying allele-specific expression. We found that hunchback, a major regulator of developmental patterning, had reduced expression of the D. simulans allele specifically in the anterior tip of hybrid embryos. Mathematical modeling of hunchback cis-regulation suggested a candidate transcription factor binding site variant, which we verified as causal using CRISPR-Cas9 genome editing. In sum, even comparing morphologically near-identical species we identified surprisingly extensive spatial variation in gene expression, suggesting not only that development is robust to many such changes, but also that natural selection may have ample raw material for evolving new body plans via changes in spatial patterning.
Collapse
Affiliation(s)
- Peter A. Combs
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Hunter B. Fraser
- Department of Biology, Stanford University, Stanford, California, United States of America
| |
Collapse
|
17
|
A Robust Methodology for Assessing Differential Homeolog Contributions to the Transcriptomes of Allopolyploids. Genetics 2018; 210:883-894. [PMID: 30213855 DOI: 10.1534/genetics.118.301564] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 09/07/2018] [Indexed: 12/18/2022] Open
Abstract
Polyploidy has played a pivotal and recurring role in angiosperm evolution. Allotetraploids arise from hybridization between species and possess duplicated gene copies (homeologs) that serve redundant roles immediately after polyploidization. Although polyploidization is a major contributor to plant evolution, it remains poorly understood. We describe an analytical approach for assessing homeolog-specific expression that begins with de novo assembly of parental transcriptomes and effectively (i) reduces redundancy in de novo assemblies, (ii) identifies putative orthologs, (iii) isolates common regions between orthologs, and (iv) assesses homeolog-specific expression using a robust Bayesian Poisson-Gamma model to account for sequence bias when mapping polyploid reads back to parental references. Using this novel methodology, we examine differential homeolog contributions to the transcriptome in the recently formed allopolyploids Tragopogon mirus and T. miscellus (Compositae). Notably, we assess a larger Tragopogon gene set than previous studies of this system. Using carefully identified orthologous regions and filtering biased orthologs, we find in both allopolyploids largely balanced expression with no strong parental bias. These new methods can be used to examine homeolog expression in any tetrapolyploid system without requiring a reference genome.
Collapse
|
18
|
Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data. G3-GENES GENOMES GENETICS 2018; 8:2923-2940. [PMID: 30021829 PMCID: PMC6118309 DOI: 10.1534/g3.118.200373] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the construction of transcripts that do not exist, and the failure to identify true transcripts. An alternative approach is to catalog the events that make up isoforms (splice junctions and exons). We present here the Event Analysis (EA) approach, where we project transcripts onto the genome and identify overlapping/unique regions and junctions. In addition, all possible logical junctions are assembled into a catalog. Transcripts are filtered before quantitation based on simple measures: the proportion of the events detected, and the coverage. We find that mapping to a junction catalog is more efficient at detecting novel junctions than mapping in a splice aware manner. We identify 99.8% of true transcripts while iReckon identifies 82% of the true transcripts and creates more transcripts not included in the simulation than were initially used in the simulation. Using PacBio Iso-seq data from a mouse neural progenitor cell model, EA detects 60% of the novel junctions that are combinations of existing exons while only 43% are detected by STAR. EA further detects ∼5,000 annotated junctions missed by STAR. Filtering transcripts based on the proportion of the transcript detected and the number of reads on average supporting that transcript captures 95% of the PacBio transcriptome. Filtering the reference transcriptome before quantitation, results in is a more stable estimate of isoform abundance, with improved correlation between replicates. This was particularly evident when EA is applied to an RNA-seq study of type 1 diabetes (T1D), where the coefficient of variation among subjects (n = 81) in the transcript abundance estimates was substantially reduced compared to the estimation using the full reference. EA focuses on individual transcriptional events. These events can be quantitate and analyzed directly or used to identify the probable set of expressed transcripts. Simple rules based on detected events and coverage used in filtering result in a dramatic improvement in isoform estimation without the use of ancillary data (e.g., ChIP, long reads) that may not be available for many studies.
Collapse
|
19
|
Genomic approaches for the elucidation of genes and gene networks underlying cardiovascular traits. Biophys Rev 2018; 10:1053-1060. [PMID: 29934864 PMCID: PMC6082306 DOI: 10.1007/s12551-018-0435-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Accepted: 06/13/2018] [Indexed: 12/31/2022] Open
Abstract
Genome-wide association studies have shed light on the association between natural genetic variation and cardiovascular traits. However, linking a cardiovascular trait associated locus to a candidate gene or set of candidate genes for prioritization for follow-up mechanistic studies is all but straightforward. Genomic technologies based on next-generation sequencing technology nowadays offer multiple opportunities to dissect gene regulatory networks underlying genetic cardiovascular trait associations, thereby aiding in the identification of candidate genes at unprecedented scale. RNA sequencing in particular becomes a powerful tool when combined with genotyping to identify loci that modulate transcript abundance, known as expression quantitative trait loci (eQTL), or loci modulating transcript splicing known as splicing quantitative trait loci (sQTL). Additionally, the allele-specific resolution of RNA-sequencing technology enables estimation of allelic imbalance, a state where the two alleles of a gene are expressed at a ratio differing from the expected 1:1 ratio. When multiple high-throughput approaches are combined with deep phenotyping in a single study, a comprehensive elucidation of the relationship between genotype and phenotype comes into view, an approach known as systems genetics. In this review, we cover key applications of systems genetics in the broad cardiovascular field.
Collapse
|
20
|
Wang M, Uebbing S, Pawitan Y, Scofield DG. RPASE: Individual-based allele-specific expression detection without prior knowledge of haplotype phase. Mol Ecol Resour 2018; 18:1247-1262. [PMID: 29858523 DOI: 10.1111/1755-0998.12909] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 05/09/2018] [Accepted: 05/21/2018] [Indexed: 01/04/2023]
Abstract
Variation in gene expression is believed to make a significant contribution to phenotypic diversity and divergence. The analysis of allele-specific expression (ASE) can reveal important insights into gene expression regulation. We developed a novel method called RPASE (Read-backed Phasing-based ASE detection) to test for genes that show ASE. With mapped RNA-seq data from a single individual and a list of SNPs from the same individual as the only input, RPASE is capable of aggregating information across multiple dependent SNPs and producing individual-based gene-level tests for ASE. RPASE performs well in simulations and comparisons. We applied RPASE to multiple bird species and found a potentially rich landscape of ASE.
Collapse
Affiliation(s)
- Mi Wang
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Severin Uebbing
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Douglas G Scofield
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| |
Collapse
|
21
|
Direct Testing for Allele-Specific Expression Differences Between Conditions. G3-GENES GENOMES GENETICS 2018; 8:447-460. [PMID: 29167272 PMCID: PMC5919738 DOI: 10.1534/g3.117.300139] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Allelic imbalance (AI) indicates the presence of functional variation in cis regulatory regions. Detecting cis regulatory differences using AI is widespread, yet there is no formal statistical methodology that tests whether AI differs between conditions. Here, we present a novel model and formally test differences in AI across conditions using Bayesian credible intervals. The approach tests AI by environment (G×E) interactions, and can be used to test AI between environments, genotypes, sex, and any other condition. We incorporate bias into the modeling process. Bias is allowed to vary between conditions, making the formulation of the model general. As gene expression affects power for detection of AI, and, as expression may vary between conditions, the model explicitly takes coverage into account. The proposed model has low type I and II error under several scenarios, and is robust to large differences in coverage between conditions. We reanalyze RNA-seq data from a Drosophila melanogaster population panel, with F1 genotypes, to compare levels of AI between mated and virgin female flies, and we show that AI × genotype interactions can also be tested. To demonstrate the use of the model to test genetic differences and interactions, a formal test between two F1s was performed, showing the expected 20% difference in AI. The proposed model allows a formal test of G×E and G×G, and reaffirms a previous finding that cis regulation is robust between environments.
Collapse
|
22
|
Rhoné B, Mariac C, Couderc M, Berthouly-Salazar C, Ousseini IS, Vigouroux Y. No Excess of Cis-Regulatory Variation Associated with Intraspecific Selection in Wild Pearl Millet (Cenchrus americanus). Genome Biol Evol 2017; 9:388-397. [PMID: 28137746 PMCID: PMC5381623 DOI: 10.1093/gbe/evx004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/25/2017] [Indexed: 12/15/2022] Open
Abstract
Several studies suggest that cis-regulatory mutations are the favorite target of evolutionary changes, one reason being that cis-regulatory mutations might have fewer deleterious pleiotropic effects than protein-coding mutations. A review of the process also suggests that this bias towards adaptive cis-regulatory variation might be less pronounced at the intraspecific level compared with the interspecific level. In this study, we assessed the contribution of cis-regulatory variation to adaptation at the intraspecific level using populations of wild pearl millet (Cenchrus americanus ssp. monodii) sampled along an environmental gradient in Niger. From RNA sequencing of hybrids to assess allele-specific expression, we identified genes with cis-regulatory divergence between two parental accessions collected in contrasted environmental conditions. This revealed that ∼15% of transcribed genes showed cis-regulatory variation. Intersecting the gene set exhibiting cis-regulatory variation with the gene set identified as targets of selection revealed no excess of cis-acting mutations among the selected genes. We additionally found no excess of cis-regulatory variation among genes associated with adaptive traits. As our approach relied on methods identifying mainly genes submitted to strong selection pressure or with high phenotypic effect, the contribution of cis-regulatory changes to soft selection or polygenic adaptive traits remains to be tested. However our results favor the hypothesis that enrichment of adaptive cis-regulatory divergence builds up over time. For short evolutionary time-scales, cis-acting mutations are not predominantly involved in adaptive evolution associated with strong selective signal.
Collapse
Affiliation(s)
- Bénédicte Rhoné
- Unité Mixte de Recherche Diversité Adaptation et Développement des Plantes (UMR DIADE), Institut de Recherche pour le Développement, Montpellier, France.,Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, Lyon, France
| | - Cédric Mariac
- Unité Mixte de Recherche Diversité Adaptation et Développement des Plantes (UMR DIADE), Institut de Recherche pour le Développement, Montpellier, France
| | - Marie Couderc
- Unité Mixte de Recherche Diversité Adaptation et Développement des Plantes (UMR DIADE), Institut de Recherche pour le Développement, Montpellier, France
| | - Cécile Berthouly-Salazar
- Unité Mixte de Recherche Diversité Adaptation et Développement des Plantes (UMR DIADE), Institut de Recherche pour le Développement, Montpellier, France.,Laboratoire Mixte International Adaptation des Plantes et Microorganismes Associés aux Stress Environnementaux (LMI LAPSE), Centre de Recherche de Bel Air, Dakar, Sénégal
| | - Issaka Salia Ousseini
- Unité Mixte de Recherche Diversité Adaptation et Développement des Plantes (UMR DIADE), Institut de Recherche pour le Développement, Montpellier, France.,Laboratoire Mixte International Adaptation des Plantes et Microorganismes Associés aux Stress Environnementaux (LMI LAPSE), Centre de Recherche de Bel Air, Dakar, Sénégal.,Biology Department, Unité Mixte de Recherche Diversité Adaptation et Développement des plantes (UMR DIADE), Université Montpellier, France.,Université Abdou Moumouni de Niamey, Niger
| | - Yves Vigouroux
- Unité Mixte de Recherche Diversité Adaptation et Développement des Plantes (UMR DIADE), Institut de Recherche pour le Développement, Montpellier, France.,Laboratoire Mixte International Adaptation des Plantes et Microorganismes Associés aux Stress Environnementaux (LMI LAPSE), Centre de Recherche de Bel Air, Dakar, Sénégal.,Biology Department, Unité Mixte de Recherche Diversité Adaptation et Développement des plantes (UMR DIADE), Université Montpellier, France
| |
Collapse
|
23
|
Jiang Y, Zhang NR, Li M. SCALE: modeling allele-specific gene expression by single-cell RNA sequencing. Genome Biol 2017; 18:74. [PMID: 28446220 PMCID: PMC5407026 DOI: 10.1186/s13059-017-1200-8] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 03/24/2017] [Indexed: 12/13/2022] Open
Abstract
Allele-specific expression is traditionally studied by bulk RNA sequencing, which measures average expression across cells. Single-cell RNA sequencing allows the comparison of expression distribution between the two alleles of a diploid organism and the characterization of allele-specific bursting. Here, we propose SCALE to analyze genome-wide allele-specific bursting, with adjustment of technical variability. SCALE detects genes exhibiting allelic differences in bursting parameters and genes whose alleles burst non-independently. We apply SCALE to mouse blastocyst and human fibroblast cells and find that cis control in gene expression overwhelmingly manifests as differences in burst frequency.
Collapse
Affiliation(s)
- Yuchao Jiang
- Genomics and Computational Biology Graduate Program, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Nancy R Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Mingyao Li
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
24
|
Movassagh M, Alomran N, Mudvari P, Dede M, Dede C, Kowsari K, Restrepo P, Cauley E, Bahl S, Li M, Waterhouse W, Tsaneva-Atanasova K, Edwards N, Horvath A. RNA2DNAlign: nucleotide resolution allele asymmetries through quantitative assessment of RNA and DNA paired sequencing data. Nucleic Acids Res 2016; 44:e161. [PMID: 27576531 PMCID: PMC5159535 DOI: 10.1093/nar/gkw757] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Revised: 08/15/2016] [Accepted: 08/19/2016] [Indexed: 12/14/2022] Open
Abstract
We introduce RNA2DNAlign, a computational framework for quantitative assessment of allele counts across paired RNA and DNA sequencing datasets. RNA2DNAlign is based on quantitation of the relative abundance of variant and reference read counts, followed by binomial tests for genotype and allelic status at SNV positions between compatible sequences. RNA2DNAlign detects positions with differential allele distribution, suggesting asymmetries due to regulatory/structural events. Based on the type of asymmetry, RNA2DNAlign outlines positions likely to be implicated in RNA editing, allele-specific expression or loss, somatic mutagenesis or loss-of-heterozygosity (the first three also in a tumor-specific setting). We applied RNA2DNAlign on 360 matching normal and tumor exomes and transcriptomes from 90 breast cancer patients from TCGA. Under high-confidence settings, RNA2DNAlign identified 2038 distinct SNV sites associated with one of the aforementioned asymetries, the majority of which have not been linked to functionality before. The performance assessment shows very high specificity and sensitivity, due to the corroboration of signals across multiple matching datasets. RNA2DNAlign is freely available from http://github.com/HorvathLab/NGS as a self-contained binary package for 64-bit Linux systems.
Collapse
Affiliation(s)
- Mercedeh Movassagh
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine, The George Washington University, Washington, DC 20037, USA.,University of Massachusetts Medical School, Graduate School of Biomedical Sciences, Program in Bioinformatics and Integrative Biology, Worcester, MA 01605, USA
| | - Nawaf Alomran
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine, The George Washington University, Washington, DC 20037, USA.,Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington, DC 20057, USA
| | - Prakriti Mudvari
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine, The George Washington University, Washington, DC 20037, USA
| | - Merve Dede
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine, The George Washington University, Washington, DC 20037, USA
| | - Cem Dede
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine, The George Washington University, Washington, DC 20037, USA
| | - Kamran Kowsari
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine, The George Washington University, Washington, DC 20037, USA.,Department of Computer Science, School of Engineering and applied Science, The George Washington University, Washington, DC 20037, USA
| | - Paula Restrepo
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine, The George Washington University, Washington, DC 20037, USA
| | - Edmund Cauley
- Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA
| | - Sonali Bahl
- Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA
| | - Muzi Li
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine, The George Washington University, Washington, DC 20037, USA.,Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington, DC 20057, USA
| | - Wesley Waterhouse
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine, The George Washington University, Washington, DC 20037, USA
| | - Krasimira Tsaneva-Atanasova
- Department of Mathematics, College of Engineering, Mathematics and Physical Sciences & EPSRC Centre for Predictive Modelling in Healthcare, University of Exeter, Exeter, EX4 4QJ, UK
| | - Nathan Edwards
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington, DC 20057, USA
| | - Anelia Horvath
- McCormick Genomics and Proteomics Center, Department of Biochemistry and Molecular Medicine, The George Washington University, Washington, DC 20037, USA .,Department of Pharmacology and Physiology, The George Washington University, Washington, DC 20037, USA
| |
Collapse
|
25
|
Arunkumar R, Maddison TI, Barrett SCH, Wright SI. Recent mating-system evolution in Eichhornia is accompanied by cis-regulatory divergence. THE NEW PHYTOLOGIST 2016; 211:697-707. [PMID: 26990568 DOI: 10.1111/nph.13918] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Accepted: 01/30/2016] [Indexed: 06/05/2023]
Abstract
The evolution of predominant self-fertilization from cross-fertilization in plants is accompanied by diverse changes to morphology, ecology and genetics, some of which likely result from regulatory changes in gene expression. We examined changes in gene expression during early stages in the transition to selfing in populations of animal-pollinated Eichhornia paniculata with contrasting mating patterns. We crossed plants from outcrossing and selfing populations and tested for the presence of allele-specific expression (ASE) in floral buds and leaf tissue of F1 offspring, indicative of cis-regulatory changes. We identified 1365 genes exhibiting ASE in floral buds and leaf tissue. These genes preferentially expressed alleles from outcrossing parents. Moreover, we found evidence that genes exhibiting ASE had a greater nonsynonymous diversity compared to synonymous diversity in the selfing parents. Our results suggest that the transition from outcrossing to high rates of self-fertilization may have the potential to shape the cis-regulatory genomic landscape of angiosperm species, but that the changes in ASE may be moderate, particularly during the early stages of this transition.
Collapse
Affiliation(s)
- Ramesh Arunkumar
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| | - Teresa I Maddison
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| | - Spencer C H Barrett
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| | - Stephen I Wright
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| |
Collapse
|
26
|
Buffering of Genetic Regulatory Networks in Drosophila melanogaster. Genetics 2016; 203:1177-90. [PMID: 27194752 DOI: 10.1534/genetics.116.188797] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 05/17/2016] [Indexed: 01/01/2023] Open
Abstract
Regulatory variation in gene expression can be described by cis- and trans-genetic components. Here we used RNA-seq data from a population panel of Drosophila melanogaster test crosses to compare allelic imbalance (AI) in female head tissue between mated and virgin flies, an environmental change known to affect transcription. Indeed, 3048 exons (1610 genes) are differentially expressed in this study. A Bayesian model for AI, with an intersection test, controls type I error. There are ∼200 genes with AI exclusively in mated or virgin flies, indicating an environmental component of expression regulation. On average 34% of genes within a cross and 54% of all genes show evidence for genetic regulation of transcription. Nearly all differentially regulated genes are affected in cis, with an average of 63% of expression variation explained by the cis-effects. Trans-effects explain 8% of the variance in AI on average and the interaction between cis and trans explains an average of 11% of the total variance in AI. In both environments cis- and trans-effects are compensatory in their overall effect, with a negative association between cis- and trans-effects in 85% of the exons examined. We hypothesize that the gene expression level perturbed by cis-regulatory mutations is compensated through trans-regulatory mechanisms, e.g., trans and cis by trans-factors buffering cis-mutations. In addition, when AI is detected in both environments, cis-mated, cis-virgin, and trans-mated-trans-virgin estimates are highly concordant with 99% of all exons positively correlated with a median correlation of 0.83 for cis and 0.95 for trans We conclude that the gene regulatory networks (GRNs) are robust and that trans-buffering explains robustness.
Collapse
|
27
|
Nariai N, Kojima K, Mimori T, Kawai Y, Nagasaki M. A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes. BMC Genomics 2016; 17 Suppl 1:2. [PMID: 26818838 PMCID: PMC4895278 DOI: 10.1186/s12864-015-2295-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome. Even with the information of diploid genome sequences, precise alignment of reads to the correct allele is difficult because of the high-similarity between the corresponding allele sequences. Results We propose a Bayesian approach to estimate ASE from RNA-Seq data with diploid genome sequences. In the statistical framework, the haploid choice is modeled as a hidden variable and estimated simultaneously with isoform expression levels by variational Bayesian inference. Through the simulation data analysis, we demonstrate the effectiveness of the proposed approach in terms of identifying ASE compared to the existing approach. We also show that our approach enables better quantification of isoform expression levels compared to the existing methods, TIGAR2, RSEM and Cufflinks. In the real data analysis of the human reference lymphoblastoid cell line GM12878, some autosomal genes were identified as ASE genes, and skewed paternal X-chromosome inactivation in GM12878 was identified. Conclusions The proposed method, called ASE-TIGAR, enables accurate estimation of gene expression from RNA-Seq data in an allele-specific manner. Our results show the effectiveness of utilizing personal genomic information for accurate estimation of ASE. An implementation of our method is available at http://nagasakilab.csml.org/ase-tigar.
Collapse
Affiliation(s)
- Naoki Nariai
- Present address: Institute for Genomic Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, 92093, California, USA. .,Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
| | - Kaname Kojima
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
| | - Takahiro Mimori
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
| | - Yosuke Kawai
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
| | - Masao Nagasaki
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
| |
Collapse
|
28
|
Lu R, Smith RM, Seweryn M, Wang D, Hartmann K, Webb A, Sadee W, Rempala GA. Analyzing allele specific RNA expression using mixture models. BMC Genomics 2015; 16:566. [PMID: 26231172 PMCID: PMC4521363 DOI: 10.1186/s12864-015-1749-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 07/03/2015] [Indexed: 11/10/2022] Open
Abstract
Background Measuring allele-specific RNA expression provides valuable insights into cis-acting genetic and epigenetic regulation of gene expression. Widespread adoption of high-throughput sequencing technologies for studying RNA expression (RNA-Seq) permits measurement of allelic RNA expression imbalance (AEI) at heterozygous single nucleotide polymorphisms (SNPs) across the entire transcriptome, and this approach has become especially popular with the emergence of large databases, such as GTEx. However, the existing binomial-type methods used to model allelic expression from RNA-seq assume a strong negative correlation between reference and variant allele reads, which may not be reasonable biologically. Results Here we propose a new strategy for AEI analysis using RNA-seq data. Under the null hypothesis of no AEI, a group of SNPs (possibly across multiple genes) is considered comparable if their respective total sums of the allelic reads are of similar magnitude. Within each group of “comparable” SNPs, we identify SNPs with AEI signal by fitting a mixture of folded Skellam distributions to the absolute values of read differences. By applying this methodology to RNA-Seq data from human autopsy brain tissues, we identified numerous instances of moderate to strong imbalanced allelic RNA expression at heterozygous SNPs. Findings with SLC1A3 mRNA exhibiting known expression differences are discussed as examples. Conclusion The folded Skellam mixture model searches for SNPs with significant difference between reference and variant allele reads (adjusted for different library sizes), using information from a group of “comparable” SNPs across multiple genes. This model is particularly suitable for performing AEI analysis on genes with few heterozygous SNPs available from RNA-seq, and it can fit over-dispersed read counts without specifying the direction of the correlation between reference and variant alleles. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1749-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rong Lu
- Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, 43210, USA
| | - Ryan M Smith
- Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Michal Seweryn
- Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, 43210, USA.,Mathematical Biosciences Institute, The Ohio State University, Columbus, OH, 43201, USA
| | - Danxin Wang
- Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Katherine Hartmann
- Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Amy Webb
- Department of Biomedical Informatics, Program in Pharmacogenomics, College of Medicine, The Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - Wolfgang Sadee
- Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Grzegorz A Rempala
- Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, 43210, USA. .,Mathematical Biosciences Institute, The Ohio State University, Columbus, OH, 43201, USA.
| |
Collapse
|
29
|
Buchkovich ML, Eklund K, Duan Q, Li Y, Mohlke KL, Furey TS. Removing reference mapping biases using limited or no genotype data identifies allelic differences in protein binding at disease-associated loci. BMC Med Genomics 2015. [PMID: 26210163 PMCID: PMC4515314 DOI: 10.1186/s12920-015-0117-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Background Genetic variation can alter transcriptional regulatory activity contributing to variation in complex traits and risk of disease, but identifying individual variants that affect regulatory activity has been challenging. Quantitative sequence-based experiments such as ChIP-seq and DNase-seq can detect sites of allelic imbalance where alleles contribute disproportionately to the overall signal suggesting allelic differences in regulatory activity. Methods We created an allelic imbalance detection pipeline, AA-ALIGNER, to remove reference mapping biases influencing allelic imbalance detection and evaluate accuracy of allelic imbalance predictions in the absence of complete genotype data. Using the sequence aligner, GSNAP, and varying amounts of genotype information to remove mapping biases we investigated the accuracy of allelic imbalance detection (binomial test) in CREB1 ChIP-seq reads from the GM12878 cell line. Additionally we thoroughly evaluated the influence of experimental and analytical parameters on imbalance detection. Results Compared to imbalances identified using complete genotypes, using imputed partial sample genotypes, AA-ALIGNER detected >95 % of imbalances with >90 % accuracy. AA-ALIGNER performed nearly as well using common variants when genotypes were unknown. In contrast, predicting additional heterozygous sites and imbalances using the sequence data led to >50 % false positive rates. We evaluated effects of experimental data characteristics and key analytical parameter settings on imbalance detection. Overall, total base coverage and signal dispersion across the genome most affected our ability to detect imbalances, while parameters such as imbalance significance, imputation quality thresholds, and alignment mismatches had little effect. To assess the biological relevance of imbalance predictions, we used electrophoretic mobility shift assays to functionally test for predicted allelic differences in CREB1 binding in the GM12878 lymphoblast cell line. Six of nine tested variants exhibited allelic differences in binding. Two of these variants, rs2382818 and rs713875, are located within inflammatory bowel disease-associated loci. Conclusions AA-ALIGNER accurately detects allelic imbalance in quantitative sequence data using partial genotypes or common variants filling a critical methodological gap in these analyses, as full genotypes are rarely available. Importantly, we demonstrate how experimental and analytical features impact imbalance detection providing guidance for similar future studies. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0117-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Martin L Buchkovich
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA.
| | - Karl Eklund
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA.
| | - Qing Duan
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA.
| | - Yun Li
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA. .,Department of Biostatistics, University of North Carolina, Chapel Hill, NC, 27599, USA. .,Department of Computer Science, University of North Carolina, Chapel Hill, NC, 27599, USA.
| | - Karen L Mohlke
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA.
| | - Terrence S Furey
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA. .,Department of Biology, University of North Carolina, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
30
|
Oh S. How are Bayesian and Non-Parametric Methods Doing a Great Job in RNA-Seq Differential Expression Analysis? : A Review. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2015. [DOI: 10.5351/csam.2015.22.2.181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Sunghee Oh
- Department of Veterinary Medicine, Jeju National University, Korea
| |
Collapse
|