1
|
Sherbina K, León-Novelo LG, Nuzhdin SV, McIntyre LM, Marroni F. Power calculator for detecting allelic imbalance using hierarchical Bayesian model. BMC Res Notes 2021; 14:436. [PMID: 34838135 PMCID: PMC8626927 DOI: 10.1186/s13104-021-05851-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 11/15/2021] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Allelic imbalance (AI) is the differential expression of the two alleles in a diploid. AI can vary between tissues, treatments, and environments. Methods for testing AI exist, but methods are needed to estimate type I error and power for detecting AI and difference of AI between conditions. As the costs of the technology plummet, what is more important: reads or replicates? RESULTS We find that a minimum of 2400, 480, and 240 allele specific reads divided equally among 12, 5, and 3 replicates is needed to detect a 10, 20, and 30%, respectively, deviation from allelic balance in a condition with power > 80%. A minimum of 960 and 240 allele specific reads divided equally among 8 replicates is needed to detect a 20 or 30% difference in AI between conditions with comparable power. Higher numbers of replicates increase power more than adding coverage without affecting type I error. We provide a Python package that enables simulation of AI scenarios and enables individuals to estimate type I error and power in detecting AI and differences in AI between conditions.
Collapse
Affiliation(s)
- Katrina Sherbina
- Quantitative and Computational Biology Section, University of Southern California, Los Angeles, CA, 90046, USA
| | - Luis G León-Novelo
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston-School of Public Health, Houston, TX, 77030, USA
| | - Sergey V Nuzhdin
- Molecular and Computational Biology Section, University of Southern California, Los Angeles, CA, 90046, USA
| | - Lauren M McIntyre
- Genetics Institute and Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32603, USA
| | - Fabio Marroni
- Dipartimento di Scienze Agroalimentari, Ambientali e Animali, Università di Udine, 33100, Udine, Italy.
| |
Collapse
|
2
|
Miller BR, Morse AM, Borgert JE, Liu Z, Sinclair K, Gamble G, Zou F, Newman JRB, León-Novelo LG, Marroni F, McIntyre LM. Testcrosses are an efficient strategy for identifying cis-regulatory variation: Bayesian analysis of allele-specific expression (BayesASE). G3 (BETHESDA, MD.) 2021; 11:jkab096. [PMID: 33772539 PMCID: PMC8104932 DOI: 10.1093/g3journal/jkab096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 03/10/2021] [Indexed: 12/30/2022]
Abstract
Allelic imbalance (AI) occurs when alleles in a diploid individual are differentially expressed and indicates cis acting regulatory variation. What is the distribution of allelic effects in a natural population? Are all alleles the same? Are all alleles distinct? The approach described applies to any technology generating allele-specific sequence counts, for example for chromatin accessibility and can be applied generally including to comparisons between tissues or environments for the same genotype. Tests of allelic effect are generally performed by crossing individuals and comparing expression between alleles directly in the F1. However, a crossing scheme that compares alleles pairwise is a prohibitive cost for more than a handful of alleles as the number of crosses is at least (n2-n)/2 where n is the number of alleles. We show here that a testcross design followed by a hypothesis test of AI between testcrosses can be used to infer differences between nontester alleles, allowing n alleles to be compared with n crosses. Using a mouse data set where both testcrosses and direct comparisons have been performed, we show that the predicted differences between nontester alleles are validated at levels of over 90% when a parent-of-origin effect is present and of 60%-80% overall. Power considerations for a testcross, are similar to those in a reciprocal cross. In all applications, the testing for AI involves several complex bioinformatics steps. BayesASE is a complete bioinformatics pipeline that incorporates state-of-the-art error reduction techniques and a flexible Bayesian approach to estimating AI and formally comparing levels of AI between conditions. The modular structure of BayesASE has been packaged in Galaxy, made available in Nextflow and as a collection of scripts for the SLURM workload manager on github (https://github.com/McIntyre-Lab/BayesASE).
Collapse
Affiliation(s)
- Brecca R Miller
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- NYU Langone Health, New York University, New York, NY 10013, USA
| | - Alison M Morse
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| | - Jacqueline E Borgert
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27515, USA
| | - Zihao Liu
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| | - Kelsey Sinclair
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| | - Gavin Gamble
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27515, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27515, USA
| | - Jeremy R B Newman
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Pathology, University of Florida, Gainesville, FL 32608 USA
| | - Luis G León-Novelo
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston-University of Texas School of Public Health, Houston, TX 7703, USA
| | - Fabio Marroni
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, Udine, 33100, Italy
| | - Lauren M McIntyre
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| |
Collapse
|
3
|
Haas M, Himmelbach A, Mascher M. The contribution of cis- and trans-acting variants to gene regulation in wild and domesticated barley under cold stress and control conditions. JOURNAL OF EXPERIMENTAL BOTANY 2020; 71:2573-2584. [PMID: 31989179 PMCID: PMC7210754 DOI: 10.1093/jxb/eraa036] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Accepted: 01/27/2020] [Indexed: 05/16/2023]
Abstract
Barley, like other crops, has experienced a series of genetic changes that have impacted its architecture and growth habit to suit the needs of humans, termed the domestication syndrome. Domestication also resulted in a concomitant bottleneck that reduced sequence diversity in genes and regulatory regions. Little is known about regulatory changes resulting from domestication in barley. We used RNA sequencing to examine allele-specific expression in hybrids between wild and domesticated barley. Our results show that most genes have conserved regulation. In contrast to studies of allele-specific expression in interspecific hybrids, we find almost a complete absence of trans effects. We also find that cis regulation is largely stable in response to short-term cold stress. Our study has practical implications for crop improvement using wild relatives. Genes regulated in cis are more likely to be expressed in a new genetic background at the same level as in their native background.
Collapse
Affiliation(s)
- Matthew Haas
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, D-06466 Seeland, Germany
- Correspondence: or Present address: University of Minnesota, Department of Agronomy and Plant Genetics, Saint Paul, MN 55108, USA
| | - Axel Himmelbach
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, D-06466 Seeland, Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, D-06466 Seeland, Germany
- German Center for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, D-04103 Leipzig, Germany
- Correspondence: or Present address: University of Minnesota, Department of Agronomy and Plant Genetics, Saint Paul, MN 55108, USA
| |
Collapse
|
4
|
Abstract
Allelic imbalance occurs when the two alleles of a gene are differentially expressed within a diploid organism, and can indicate important differences in cis-regulation and epigenetic state across the two chromosomes. Because of this, the ability to accurately quantify the proportion at which each allele of a gene is expressed is of great interest to researchers. This becomes challenging in the presence of small read counts and/or sample sizes, which can cause estimates for allelic expression proportions to have high variance. Investigators have traditionally dealt with this problem by filtering out genes with small counts and samples. However, this may inadvertently remove important genes that have truly large allelic imbalances. Another option is to use Bayesian estimators to reduce the variance. To this end, we evaluated the accuracy of three different estimators, the latter two of which are Bayesian shrinkage estimators: maximum likelihood, approximate posterior estimation of GLM coefficients (apeglm) and adaptive shrinkage (ash). We also wrote C++ code to quickly calculate ML and apeglm estimates, and integrated it into the apeglm package. The three methods were evaluated on both simulated and real data. Apeglm consistently performed better than ML according to a variety of criteria, including mean absolute error and concordance at the top. While ash had lower error and greater concordance than ML on the simulations, it also had a tendency to over-shrink large effects, and performed worse on the real data according to error and concordance. Furthermore, when compared to five other packages that also fit beta-binomial models, the apeglm package was substantially faster, making our package useful for quick and reliable analyses of allelic imbalance. Apeglm is available as an R/Bioconductor package at http://bioconductor.org/packages/apeglm.
Collapse
Affiliation(s)
- Joshua P Zitovsky
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514, USA
| |
Collapse
|
5
|
Abstract
Allelic imbalance occurs when the two alleles of a gene are differentially expressed within a diploid organism and can indicate important differences in cis-regulation and epigenetic state across the two chromosomes. Because of this, the ability to accurately quantify the proportion at which each allele of a gene is expressed is of great interest to researchers. This becomes challenging in the presence of small read counts and/or sample sizes, which can cause estimators for allelic expression proportions to have high variance. Investigators have traditionally dealt with this problem by filtering out genes with small counts and samples. However, this may inadvertently remove important genes that have truly large allelic imbalances. Another option is to use pseudocounts or Bayesian estimators to reduce the variance. To this end, we evaluated the accuracy of four different estimators, the latter two of which are Bayesian shrinkage estimators: maximum likelihood, adding a pseudocount to each allele, approximate posterior estimation of GLM coefficients (apeglm) and adaptive shrinkage (ash). We also wrote C++ code to quickly calculate ML and apeglm estimates and integrated it into the apeglm package. The four methods were evaluated on two simulations and one real data set. Apeglm consistently performed better than ML according to a variety of criteria, and generally outperformed use of pseudocounts as well. Ash also performed better than ML in one of the simulations, but in the other performance was more mixed. Finally, when compared to five other packages that also fit beta-binomial models, the apeglm package was substantially faster and more numerically reliable, making our package useful for quick and reliable analyses of allelic imbalance. Apeglm is available as an R/Bioconductor package at http://bioconductor.org/packages/apeglm.
Collapse
Affiliation(s)
- Joshua P. Zitovsky
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514, USA
| |
Collapse
|
6
|
Wang Y, Gao S, Zhao Y, Chen WH, Shao JJ, Wang NN, Li M, Zhou GX, Wang L, Shen WJ, Xu JT, Deng WD, Wang W, Chen YL, Jiang Y. Allele-specific expression and alternative splicing in horse×donkey and cattle×yak hybrids. Zool Res 2019; 40:293-304. [PMID: 31271004 PMCID: PMC6680129 DOI: 10.24272/j.issn.2095-8137.2019.042] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Divergence of gene expression and alternative splicing is a crucial driving force in the evolution of species; to date, however the molecular mechanism remains unclear. Hybrids of closely related species provide a suitable model to analyze allele-specific expression (ASE) and allele-specific alternative splicing (ASS). Analysis of ASE and ASS can uncover the differences in cis-regulatory elements between closely related species, while eliminating interference of trans-regulatory elements. Here, we provide a detailed characterization of ASE and ASS from 19 and 10 transcriptome datasets across five tissues from reciprocal-cross hybrids of horse×donkey (mule/hinny) and cattle×yak (dzo), respectively. Results showed that 4.8%-8.7% and 10.8%-16.7% of genes exhibited ASE and ASS, respectively. Notably, lncRNAs and pseudogenes were more likely to show ASE than protein-coding genes. In addition, genes showing ASE and ASS in mule/hinny were found to be involved in the regulation of muscle strength, whereas those of dzo were involved in high-altitude adaptation. In conclusion, our study demonstrated that exploration of genes showing ASE and ASS in hybrids of closely related species is feasible for species evolution research.
Collapse
Affiliation(s)
- Yu Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling Shaanxi 712100, China
| | - Shan Gao
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling Shaanxi 712100, China
| | - Yue Zhao
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling Shaanxi 712100, China
| | - Wei-Huang Chen
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling Shaanxi 712100, China
| | - Jun-Jie Shao
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling Shaanxi 712100, China
| | - Ni-Ni Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling Shaanxi 712100, China
| | - Ming Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling Shaanxi 712100, China
| | - Guang-Xian Zhou
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling Shaanxi 712100, China
| | - Lei Wang
- Stake Key Laboratory of Plateau Ecology and Agriculture, Qinghai Academy of Animal Science and Veterinary Medicine, Qinghai University, Xining Qinghai 810016, China
| | - Wen-Jing Shen
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
| | - Jing-Tao Xu
- Stake Key Laboratory of Plateau Ecology and Agriculture, Qinghai Academy of Animal Science and Veterinary Medicine, Qinghai University, Xining Qinghai 810016, China
| | - Wei-Dong Deng
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming Yunnan 650223, China
| | - Wen Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
| | - Yu-Lin Chen
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling Shaanxi 712100, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling Shaanxi 712100, China
| |
Collapse
|
7
|
Combs PA, Fraser HB. Spatially varying cis-regulatory divergence in Drosophila embryos elucidates cis-regulatory logic. PLoS Genet 2018; 14:e1007631. [PMID: 30383747 PMCID: PMC6211617 DOI: 10.1371/journal.pgen.1007631] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 08/14/2018] [Indexed: 12/30/2022] Open
Abstract
Spatial patterning of gene expression is a key process in development, yet how it evolves is still poorly understood. Both cis- and trans-acting changes could participate in complex interactions, so to isolate the cis-regulatory component of patterning evolution, we measured allele-specific spatial gene expression patterns in D. melanogaster × simulans hybrid embryos. RNA-seq of cryo-sectioned slices revealed 66 genes with strong spatially varying allele-specific expression. We found that hunchback, a major regulator of developmental patterning, had reduced expression of the D. simulans allele specifically in the anterior tip of hybrid embryos. Mathematical modeling of hunchback cis-regulation suggested a candidate transcription factor binding site variant, which we verified as causal using CRISPR-Cas9 genome editing. In sum, even comparing morphologically near-identical species we identified surprisingly extensive spatial variation in gene expression, suggesting not only that development is robust to many such changes, but also that natural selection may have ample raw material for evolving new body plans via changes in spatial patterning.
Collapse
Affiliation(s)
- Peter A. Combs
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Hunter B. Fraser
- Department of Biology, Stanford University, Stanford, California, United States of America
| |
Collapse
|