1
|
Zou X, Gomez ZW, Reddy TE, Allen AS, Majoros WH. Bayesian Estimation of Allele-Specific Expression in the Presence of Phasing Uncertainty. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.09.607371. [PMID: 39211106 PMCID: PMC11361064 DOI: 10.1101/2024.08.09.607371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Motivation Allele-specific expression (ASE) analyses aim to detect imbalanced expression of maternal versus paternal copies of an autosomal gene. Such allelic imbalance can result from a variety of cis-acting causes, including disruptive mutations within one copy of a gene that impact the stability of transcripts, as well as regulatory variants outside the gene that impact transcription initiation. Current methods for ASE estimation suffer from a number of shortcomings, such as relying on only one variant within a gene, assuming perfect phasing information across multiple variants within a gene, or failing to account for alignment biases and possible genotyping errors. Results We developed BEASTIE, a Bayesian hierarchical model designed for precise ASE quantification at the gene level, based on given genotypes and RNA-Seq data. BEASTIE addresses the complexities of allelic mapping bias, genotyping error, and phasing errors by incorporating empirical phasing error rates derived from Genome-in-a-Bottle individual NA12878. BEASTIE surpasses existing methods in accuracy, especially in scenarios with high phasing errors. This improvement is critical for identifying rare genetic variants often obscured by such errors. Through rigorous validation on simulated data and application to real data from the 1000 Genomes Project, we establish the robustness of BEASTIE. These findings underscore the value of BEASTIE in revealing patterns of ASE across gene sets and pathways. Availability and Implementation The software is freely available from https://github.com/x811zou/BEASTIE . BEASTIE is available as Python source code and as a Docker image. Supplementary information Additional information is available online.
Collapse
|
2
|
Yu J, Xu F, Wei Z, Zhang X, Chen T, Pu L. Epigenomic landscape and epigenetic regulation in maize. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2020; 133:1467-1489. [PMID: 31965233 DOI: 10.1007/s00122-020-03549-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2019] [Accepted: 01/14/2020] [Indexed: 05/12/2023]
Abstract
Epigenetic regulation has been implicated in the control of multiple agronomic traits in maize. Here, we review current advances in our understanding of epigenetic regulation, which has great potential for improving agronomic traits and the environmental adaptability of crops. Epigenetic regulation plays vital role in the control of complex agronomic traits. Epigenetic variation could contribute to phenotypic diversity and can be used to improve the quality and productivity of crops. Maize (Zea mays L.), one of the most widely cultivated crops for human food, animal feed, and ethanol biofuel, is a model plant for genetic studies. Recent advances in high-throughput sequencing technology have made possible the study of epigenetic regulation in maize on a genome-wide scale. In this review, we discuss recent epigenetic studies in maize many achieved by Chinese research groups. These studies have explored the roles of DNA methylation, posttranslational modifications of histones, chromatin remodeling, and noncoding RNAs in the regulation of gene expression in plant development and environment response. We also provide our future prospects for manipulating epigenetic regulation to improve crops.
Collapse
Affiliation(s)
- Jia Yu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Fan Xu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ziwei Wei
- School of Life Sciences, Anhui Agricultural University, Hefei, China
| | - Xiangxiang Zhang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Tao Chen
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China
| | - Li Pu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China.
| |
Collapse
|
3
|
Nucleus-specific expression in the multinuclear mushroom-forming fungus Agaricus bisporus reveals different nuclear regulatory programs. Proc Natl Acad Sci U S A 2018; 115:4429-4434. [PMID: 29643074 PMCID: PMC5924919 DOI: 10.1073/pnas.1721381115] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Fungi are a broad class of organisms that play crucial roles in a wide variety of natural and industrial processes. Some are also harmful, destroying crops or infecting immunocompromised patients. Many fungi, at some point during their life cycle, contain two different nuclei, each with different genetic content. We examine the regulation of genes from these nuclei in a mushroom-forming fungus. We find that these nuclei contribute differently to the regulation of the fungal cells, and may therefore have a different impact on their environment. Furthermore, these differences change throughout the development of different tissues. This work contributes to our understanding of fungal physiology by examining this process. Many fungi are polykaryotic, containing multiple nuclei per cell. In the case of heterokaryons, there are different nuclear types within a single cell. It is unknown what the different nuclear types contribute in terms of mRNA expression levels in fungal heterokaryons. Each cell of the mushroom Agaricus bisporus contains two to 25 nuclei of two nuclear types originating from two parental strains. Using RNA-sequencing data, we assess the differential mRNA contribution of individual nuclear types and its functional impact. We studied differential expression between genes of the two nuclear types, P1 and P2, throughout mushroom development in various tissue types. P1 and P2 produced specific mRNA profiles that changed through mushroom development. Differential regulation occurred at the gene level, rather than at the locus, chromosomal, or nuclear level. P1 dominated mRNA production throughout development, and P2 showed more differentially up-regulated genes in important functional groups. In the vegetative mycelium, P2 up-regulated almost threefold more metabolism genes and carbohydrate active enzymes (cazymes) than P1, suggesting phenotypic differences in growth. We identified widespread transcriptomic variation between the nuclear types of A. bisporus. Our method enables studying nucleus-specific expression, which likely influences the phenotype of a fungus in a polykaryotic stage. Our findings have a wider impact to better understand gene regulation in fungi in a heterokaryotic state. This work provides insight into the transcriptomic variation introduced by genomic nuclear separation.
Collapse
|
4
|
Baldauf JA, Marcon C, Paschold A, Hochholdinger F. Nonsyntenic Genes Drive Tissue-Specific Dynamics of Differential, Nonadditive, and Allelic Expression Patterns in Maize Hybrids. PLANT PHYSIOLOGY 2016; 171:1144-55. [PMID: 27208302 PMCID: PMC4902609 DOI: 10.1104/pp.16.00262] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 04/18/2016] [Indexed: 05/21/2023]
Abstract
Distantly related maize (Zea mays) inbred lines display an exceptional degree of genomic diversity. F1 progeny of such inbred lines are often more vigorous than their parents, a phenomenon known as heterosis. In this study, we investigated how the genetic divergence of the maize inbred lines B73 and Mo17 and their F1 hybrid progeny is reflected in differential, nonadditive, and allelic expression patterns in primary root tissues. In pairwise comparisons of the four genotypes, the number of differentially expressed genes between the two parental inbred lines significantly exceeded those of parent versus hybrid comparisons in all four tissues under analysis. No differentially expressed genes were detected between reciprocal hybrids, which share the same nuclear genome. Moreover, hundreds of nonadditive and allelic expression ratios that were different from the expression ratios of the parents were observed in the reciprocal hybrids. The overlap of both nonadditive and allelic expression patterns in the reciprocal hybrids significantly exceeded the expected values. For all studied types of expression - differential, nonadditive, and allelic - substantial tissue-specific plasticity was observed. Significantly, nonsyntenic genes that evolved after the last whole genome duplication of a maize progenitor from genes with synteny to sorghum (Sorghum bicolor) were highly overrepresented among differential, nonadditive, and allelic expression patterns compared with the fraction of these genes among all expressed genes. This observation underscores the role of nonsyntenic genes in shaping the transcriptomic landscape of maize hybrids during the early developmental manifestation of heterosis in root tissues of maize hybrids.
Collapse
Affiliation(s)
- Jutta A Baldauf
- Institute of Crop Science and Resource Conservation, Crop Functional Genomics, University of Bonn, 53113 Bonn, Germany
| | - Caroline Marcon
- Institute of Crop Science and Resource Conservation, Crop Functional Genomics, University of Bonn, 53113 Bonn, Germany
| | - Anja Paschold
- Institute of Crop Science and Resource Conservation, Crop Functional Genomics, University of Bonn, 53113 Bonn, Germany
| | - Frank Hochholdinger
- Institute of Crop Science and Resource Conservation, Crop Functional Genomics, University of Bonn, 53113 Bonn, Germany
| |
Collapse
|
5
|
Niemi J, Mittman E, Landau W, Nettleton D. Empirical Bayes analysis of RNA-seq data for detection of gene expression heterosis. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2015; 20:614-628. [PMID: 27147815 DOI: 10.1007/s13253-015-0230-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
An important type of heterosis, known as hybrid vigor, refers to the enhancements in the phenotype of hybrid progeny relative to their inbred parents. Although hybrid vigor is extensively utilized in agriculture, its molecular basis is still largely unknown. In an effort to understand phenotypic heterosis at the molecular level, researchers are measuring transcript abundance levels of thousands of genes in parental inbred lines and their hybrid offspring using RNA sequencing (RNA-seq) technology. The resulting data allow researchers to search for evidence of gene expression heterosis as one potential molecular mechanism underlying heterosis of agriculturally important traits. The null hypotheses of greatest interest in testing for gene expression heterosis are composite null hypotheses that are difficult to test with standard statistical approaches for RNA-seq analysis. To address these shortcomings, we develop a hierarchical negative binomial model and draw inferences using a computationally tractable empirical Bayes approach to inference. We demonstrate improvements over alternative methods via a simulation study based on a maize experiment and then analyze that maize experiment with our newly proposed methodology. This article has supplementary material online.
Collapse
Affiliation(s)
- Jarad Niemi
- Department of Statistics, Iowa State University, Ames, Iowa, U.S.A
| | - Eric Mittman
- Department of Statistics, Iowa State University, Ames, Iowa, U.S.A
| | - Will Landau
- Department of Statistics, Iowa State University, Ames, Iowa, U.S.A
| | - Dan Nettleton
- Department of Statistics, Iowa State University, Ames, Iowa, U.S.A
| |
Collapse
|
6
|
Abstract
RNA sequencing (RNA-Seq) uses the capabilities of high-throughput sequencing methods to provide insight into the transcriptome of a cell. Compared to previous Sanger sequencing- and microarray-based methods, RNA-Seq provides far higher coverage and greater resolution of the dynamic nature of the transcriptome. Beyond quantifying gene expression, the data generated by RNA-Seq facilitate the discovery of novel transcripts, identification of alternatively spliced genes, and detection of allele-specific expression. Recent advances in the RNA-Seq workflow, from sample preparation to library construction to data analysis, have enabled researchers to further elucidate the functional complexity of the transcription. In addition to polyadenylated messenger RNA (mRNA) transcripts, RNA-Seq can be applied to investigate different populations of RNA, including total RNA, pre-mRNA, and noncoding RNA, such as microRNA and long ncRNA. This article provides an introduction to RNA-Seq methods, including applications, experimental design, and technical challenges.
Collapse
Affiliation(s)
- Kimberly R Kukurba
- Department of Pathology, Stanford University School of Medicine, Stanford, California 94305; Department of Genetics, Stanford University School of Medicine, Stanford, California 94305
| | - Stephen B Montgomery
- Department of Pathology, Stanford University School of Medicine, Stanford, California 94305; Department of Genetics, Stanford University School of Medicine, Stanford, California 94305; Department of Computer Science, Stanford University School of Medicine, Stanford, California 94305
| |
Collapse
|
7
|
Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet 2015; 16:85-97. [PMID: 25582081 DOI: 10.1038/nrg3868] [Citation(s) in RCA: 558] [Impact Index Per Article: 62.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Recent technological advances have expanded the breadth of available omic data, from whole-genome sequencing data, to extensive transcriptomic, methylomic and metabolomic data. A key goal of analyses of these data is the identification of effective models that predict phenotypic traits and outcomes, elucidating important biomarkers and generating important insights into the genetic underpinnings of the heritability of complex traits. There is still a need for powerful and advanced analysis strategies to fully harness the utility of these comprehensive high-throughput data, identifying true associations and reducing the number of false associations. In this Review, we explore the emerging approaches for data integration - including meta-dimensional and multi-staged analyses - which aim to deepen our understanding of the role of genetics and genomics in complex outcomes. With the use and further development of these approaches, an improved understanding of the relationship between genomic variation and human phenotypes may be revealed.
Collapse
Affiliation(s)
- Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Emily R Holzinger
- National Human Genome Research Institute, Inherited Disease Research Branch, Baltimore, Maryland 21224, USA
| | - Ruowang Li
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Sarah A Pendergrass
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Dokyoon Kim
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
8
|
Soderlund CA, Nelson WM, Goff SA. Allele Workbench: transcriptome pipeline and interactive graphics for allele-specific expression. PLoS One 2014; 9:e115740. [PMID: 25541944 PMCID: PMC4277417 DOI: 10.1371/journal.pone.0115740] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 10/19/2014] [Indexed: 12/30/2022] Open
Abstract
Sequencing the transcriptome can answer various questions such as determining the transcripts expressed in a given species for a specific tissue or condition, evaluating differential expression, discovering variants, and evaluating allele-specific expression. Differential expression evaluates the expression differences between different strains, tissues, and conditions. Allele-specific expression evaluates expression differences between parental alleles. Both differential expression and allele-specific expression have been studied for heterosis (hybrid vigor), where the hybrid has improved performance over the parents for one or more traits. The Allele Workbench software was developed for a heterosis study that evaluated allele-specific expression for a mouse F1 hybrid using libraries from multiple tissues with biological replicates. This software has been made into a distributable package, which includes a pipeline, a Java interface to build the database, and a Java interface for query and display of the results. The required input is a reference genome, annotation file, and one or more RNA-Seq libraries with optional replicates. It evaluates allelic imbalance at the SNP and transcript level and flags transcripts with significant opposite directional allele-specific expression. The Java interface allows the user to view data from libraries, replicates, genes, transcripts, exons, and variants, including queries on allele imbalance for selected libraries. To determine the impact of allele-specific SNPs on protein folding, variants are annotated with their effect (e.g., missense), and the parental protein sequences may be exported for protein folding analysis. The Allele Workbench processing results in transcript files and read counts that can be used as input to the previously published Transcriptome Computational Workbench, which has a new algorithm for determining a trimmed set of gene ontology terms. The software with demo files is available from https://code.google.com/p/allele-workbench. Additionally, all software is ready for immediate use from an Atmosphere Virtual Machine Image available from the iPlant Collaborative (www.iplantcollaborative.org).
Collapse
Affiliation(s)
- Carol A. Soderlund
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
- * E-mail:
| | - William M. Nelson
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Stephen A. Goff
- iPlant Collaborative, University of Arizona, Tucson, Arizona, United States of America
| |
Collapse
|