1
|
de Jong TV, Pan Y, Rastas P, Munro D, Tutaj M, Akil H, Benner C, Chen D, Chitre AS, Chow W, Colonna V, Dalgard CL, Demos WM, Doris PA, Garrison E, Geurts AM, Gunturkun HM, Guryev V, Hourlier T, Howe K, Huang J, Kalbfleisch T, Kim P, Li L, Mahaffey S, Martin FJ, Mohammadi P, Ozel AB, Polesskaya O, Pravenec M, Prins P, Sebat J, Smith JR, Solberg Woods LC, Tabakoff B, Tracey A, Uliano-Silva M, Villani F, Wang H, Sharp BM, Telese F, Jiang Z, Saba L, Wang X, Murphy TD, Palmer AA, Kwitek AE, Dwinell MR, Williams RW, Li JZ, Chen H. A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats. CELL GENOMICS 2024; 4:100527. [PMID: 38537634 PMCID: PMC11019364 DOI: 10.1016/j.xgen.2024.100527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 12/26/2023] [Accepted: 02/29/2024] [Indexed: 04/09/2024]
Abstract
The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared with its predecessor. Gene annotations are now more complete, improving the mapping precision of genomic, transcriptomic, and proteomics datasets. We jointly analyzed 163 short-read whole-genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined ∼20.0 million sequence variations, of which 18,700 are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.
Collapse
Affiliation(s)
- Tristan V de Jong
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Yanchao Pan
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Pasi Rastas
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Daniel Munro
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA; Department of Integrative Structural and Computational Biology, Scripps Research, San Diego, CA, USA
| | - Monika Tutaj
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Huda Akil
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | - Chris Benner
- Department of Medicine, University of California San Diego, San Diego, CA, USA
| | - Denghui Chen
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Apurva S Chitre
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy; Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Clifton L Dalgard
- Department of Anatomy, Physiology & Genetics, The American Genome Center, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Wendy M Demos
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Peter A Doris
- The Brown Foundation Institute of Molecular Medicine, Center for Human Genetics, University of Texas Health Science Center, Houston, TX, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Aron M Geurts
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Hakan M Gunturkun
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Victor Guryev
- Genome Structure and Ageing, University of Groningen, UMC, Groningen, the Netherlands
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Jun Huang
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ted Kalbfleisch
- Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Louisville, KY, USA
| | - Panjun Kim
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ling Li
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Spencer Mahaffey
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Pejman Mohammadi
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA; Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
| | - Ayse Bilge Ozel
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Oksana Polesskaya
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Michal Pravenec
- Institute of Physiology, Czech Academy of Sciences, Prague, Czechia
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Jennifer R Smith
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Leah C Solberg Woods
- Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Boris Tabakoff
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | | | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Hongyang Wang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Burt M Sharp
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Francesca Telese
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Zhihua Jiang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Laura Saba
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Xusheng Wang
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA; Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Anne E Kwitek
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Melinda R Dwinell
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jun Z Li
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
| | - Hao Chen
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA.
| |
Collapse
|
2
|
Coombes B, Lux T, Akhunov E, Hall A. Introgressions lead to reference bias in wheat RNA-seq analysis. BMC Biol 2024; 22:56. [PMID: 38454464 PMCID: PMC10921782 DOI: 10.1186/s12915-024-01853-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 02/21/2024] [Indexed: 03/09/2024] Open
Abstract
BACKGROUND RNA-seq is a fundamental technique in genomics, yet reference bias, where transcripts derived from non-reference alleles are quantified less accurately, can undermine the accuracy of RNA-seq quantification and thus the conclusions made downstream. Reference bias in RNA-seq analysis has yet to be explored in complex polyploid genomes despite evidence that they are often a complex mosaic of wild relative introgressions, which introduce blocks of highly divergent genes. RESULTS Here we use hexaploid wheat as a model complex polyploid, using both simulated and experimental data to show that RNA-seq alignment in wheat suffers from widespread reference bias which is largely driven by divergent introgressed genes. This leads to underestimation of gene expression and incorrect assessment of homoeologue expression balance. By incorporating gene models from ten wheat genome assemblies into a pantranscriptome reference, we present a novel method to reduce reference bias, which can be readily scaled to capture more variation as new genome and transcriptome data becomes available. CONCLUSIONS This study shows that the presence of introgressions can lead to reference bias in wheat RNA-seq analysis. Caution should be exercised by researchers using non-sample reference genomes for RNA-seq alignment and novel methods, such as the one presented here, should be considered.
Collapse
Affiliation(s)
| | - Thomas Lux
- Plant Genome and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Eduard Akhunov
- Department of Plant Pathology, Kansas State University, Manhattan, KS, USA
| | - Anthony Hall
- Earlham Institute, Norwich, Norfolk, NR4 7UZ, UK.
| |
Collapse
|
3
|
Ball RL, Bogue MA, Liang H, Srivastava A, Ashbrook DG, Lamoureux A, Gerring MW, Hatoum AS, Kim MJ, He H, Emerson J, Berger AK, Walton DO, Sheppard K, El Kassaby B, Castellanos F, Kunde-Ramamoorthy G, Lu L, Bluis J, Desai S, Sundberg BA, Peltz G, Fang Z, Churchill GA, Williams RW, Agrawal A, Bult CJ, Philip VM, Chesler EJ. GenomeMUSter mouse genetic variation service enables multitrait, multipopulation data integration and analysis. Genome Res 2024; 34:145-159. [PMID: 38290977 PMCID: PMC10903950 DOI: 10.1101/gr.278157.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 01/10/2024] [Indexed: 02/01/2024]
Abstract
Hundreds of inbred mouse strains and intercross populations have been used to characterize the function of genetic variants that contribute to disease. Thousands of disease-relevant traits have been characterized in mice and made publicly available. New strains and populations including consomics, the collaborative cross, expanded BXD, and inbred wild-derived strains add to existing complex disease mouse models, mapping populations, and sensitized backgrounds for engineered mutations. The genome sequences of inbred strains, along with dense genotypes from others, enable integrated analysis of trait-variant associations across populations, but these analyses are hampered by the sparsity of genotypes available. Moreover, the data are not readily interoperable with other resources. To address these limitations, we created a uniformly dense variant resource by harmonizing multiple data sets. Missing genotypes were imputed using the Viterbi algorithm with a data-driven technique that incorporates local phylogenetic information, an approach that is extendable to other model organisms. The result is a web- and programmatically accessible data service called GenomeMUSter, comprising single-nucleotide variants covering 657 strains at 106.8 million segregating sites. Interoperation with phenotype databases, analytic tools, and other resources enable a wealth of applications, including multitrait, multipopulation meta-analysis. We show this in cross-species comparisons of type 2 diabetes and substance use disorder meta-analyses, leveraging mouse data to characterize the likely role of human variant effects in disease. Other applications include refinement of mapped loci and prioritization of strain backgrounds for disease modeling to further unlock extant mouse diversity for genetic and genomic studies in health and disease.
Collapse
Affiliation(s)
- Robyn L Ball
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA;
| | - Molly A Bogue
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | | | - Anuj Srivastava
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - David G Ashbrook
- University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | | | | | - Alexander S Hatoum
- Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130, USA
- Artificial Intelligence and the Internet of Things Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Matthew J Kim
- University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Hao He
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | - Jake Emerson
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | | | | | | | | | | | | | - Lu Lu
- University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - John Bluis
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | - Sejal Desai
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | | | - Gary Peltz
- Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Zhuoqing Fang
- Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | | | - Robert W Williams
- University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Arpana Agrawal
- Department of Psychiatry, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Carol J Bult
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | | | | |
Collapse
|
4
|
Meade RK, Long JE, Jinich A, Rhee KY, Ashbrook DG, Williams RW, Sassetti CM, Smith CM. Genome-wide screen identifies host loci that modulate Mycobacterium tuberculosis fitness in immunodivergent mice. G3 (BETHESDA, MD.) 2023; 13:jkad147. [PMID: 37405387 PMCID: PMC10468300 DOI: 10.1093/g3journal/jkad147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 06/05/2023] [Accepted: 06/27/2023] [Indexed: 07/06/2023]
Abstract
Genetic differences among mammalian hosts and among strains of Mycobacterium tuberculosis (Mtb) are well-established determinants of tuberculosis (TB) patient outcomes. The advent of recombinant inbred mouse panels and next-generation transposon mutagenesis and sequencing approaches has enabled dissection of complex host-pathogen interactions. To identify host and pathogen genetic determinants of Mtb pathogenesis, we infected members of the highly diverse BXD family of strains with a comprehensive library of Mtb transposon mutants (TnSeq). Members of the BXD family segregate for Mtb-resistant C57BL/6J (B6 or B) and Mtb-susceptible DBA/2J (D2 or D) haplotypes. The survival of each bacterial mutant was quantified within each BXD host, and we identified those bacterial genes that were differentially required for Mtb fitness across BXD genotypes. Mutants that varied in survival among the host family of strains were leveraged as reporters of "endophenotypes," each bacterial fitness profile directly probing specific components of the infection microenvironment. We conducted quantitative trait loci (QTL) mapping of these bacterial fitness endophenotypes and identified 140 host-pathogen QTL (hpQTL). We located a QTL hotspot on chromosome 6 (75.97-88.58 Mb) associated with the genetic requirement of multiple Mtb genes: Rv0127 (mak), Rv0359 (rip2), Rv0955 (perM), and Rv3849 (espR). Together, this screen reinforces the utility of bacterial mutant libraries as precise reporters of the host immunological microenvironment during infection and highlights specific host-pathogen genetic interactions for further investigation. To enable downstream follow-up for both bacterial and mammalian genetic research communities, all bacterial fitness profiles have been deposited into GeneNetwork.org and added into the comprehensive collection of TnSeq libraries in MtbTnDB.
Collapse
Affiliation(s)
- Rachel K Meade
- Department of Molecular Genetics and Microbiology, Duke University, Durham, NC 27710, USA
- University Program in Genetics and Genomics, Duke University, Durham, NC 27710, USA
| | - Jarukit E Long
- Department of Microbiology and Physiological Systems, UMass Chan Medical School, Worcester, MA 01655, USA
- Research Animal Diagnostic Services, Charles River Laboratories, Wilmington, MA 01887, USA
| | - Adrian Jinich
- Division of Infectious Diseases, Weill Cornell Medical College, New York, NY 10021, USA
| | - Kyu Y Rhee
- Division of Infectious Diseases, Weill Cornell Medical College, New York, NY 10021, USA
| | - David G Ashbrook
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Christopher M Sassetti
- Department of Microbiology and Physiological Systems, UMass Chan Medical School, Worcester, MA 01655, USA
| | - Clare M Smith
- Department of Molecular Genetics and Microbiology, Duke University, Durham, NC 27710, USA
- University Program in Genetics and Genomics, Duke University, Durham, NC 27710, USA
| |
Collapse
|
5
|
Wu EY, Singh NP, Choi K, Zakeri M, Vincent M, Churchill GA, Ackert-Bicknell CL, Patro R, Love MI. SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty. Genome Biol 2023; 24:165. [PMID: 37438847 PMCID: PMC10337143 DOI: 10.1186/s13059-023-03003-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 06/29/2023] [Indexed: 07/14/2023] Open
Abstract
Detecting allelic imbalance at the isoform level requires accounting for inferential uncertainty, caused by multi-mapping of RNA-seq reads. Our proposed method, SEESAW, uses Salmon and Swish to offer analysis at various levels of resolution, including gene, isoform, and aggregating isoforms to groups by transcription start site. The aggregation strategies strengthen the signal for transcripts with high uncertainty. The SEESAW suite of methods is shown to have higher power than other allelic imbalance methods when there is isoform-level allelic imbalance. We also introduce a new test for detecting imbalance that varies across a covariate, such as time.
Collapse
Affiliation(s)
- Euphy Y Wu
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Noor P Singh
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | | | - Mohsen Zakeri
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | | | | | - Cheryl L Ackert-Bicknell
- Department of Orthopedics, School of Medicine, University of Colorado, Anschutz Campus, Aurora, CO, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
6
|
Huynh K, Smith BR, Macdonald SJ, Long AD. Genetic variation in chromatin state across multiple tissues in Drosophila melanogaster. PLoS Genet 2023; 19:e1010439. [PMID: 37146087 PMCID: PMC10191298 DOI: 10.1371/journal.pgen.1010439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 05/17/2023] [Accepted: 04/20/2023] [Indexed: 05/07/2023] Open
Abstract
We use ATAC-seq to examine chromatin accessibility for four different tissues in Drosophila melanogaster: adult female brain, ovaries, and both wing and eye-antennal imaginal discs from males. Each tissue is assayed in eight different inbred strain genetic backgrounds, seven associated with a reference quality genome assembly. We develop a method for the quantile normalization of ATAC-seq fragments and test for differences in coverage among genotypes, tissues, and their interaction at 44099 peaks throughout the euchromatic genome. For the strains with reference quality genome assemblies, we correct ATAC-seq profiles for read mis-mapping due to nearby polymorphic structural variants (SVs). Comparing coverage among genotypes without accounting for SVs results in a highly elevated rate (55%) of identifying false positive differences in chromatin state between genotypes. After SV correction, we identify 1050, 30383, and 4508 regions whose peak heights are polymorphic among genotypes, among tissues, or exhibit genotype-by-tissue interactions, respectively. Finally, we identify 3988 candidate causative variants that explain at least 80% of the variance in chromatin state at nearby ATAC-seq peaks.
Collapse
Affiliation(s)
- Khoi Huynh
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California, United States of America
| | - Brittny R. Smith
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| | - Stuart J. Macdonald
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
- Center for Computational Biology, University of Kansas, Lawrence, Kansas, United States of America
| | - Anthony D. Long
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California, United States of America
| |
Collapse
|
7
|
Perez BC, Bink MCAM, Svenson KL, Churchill GA, Calus MPL. Adding gene transcripts into genomic prediction improves accuracy and reveals sampling time dependence. G3 (BETHESDA, MD.) 2022; 12:jkac258. [PMID: 36161485 PMCID: PMC9635642 DOI: 10.1093/g3journal/jkac258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 09/07/2022] [Indexed: 06/16/2023]
Abstract
Recent developments allowed generating multiple high-quality 'omics' data that could increase the predictive performance of genomic prediction for phenotypes and genetic merit in animals and plants. Here, we have assessed the performance of parametric and nonparametric models that leverage transcriptomics in genomic prediction for 13 complex traits recorded in 478 animals from an outbred mouse population. Parametric models were implemented using the best linear unbiased prediction, while nonparametric models were implemented using the gradient boosting machine algorithm. We also propose a new model named GTCBLUP that aims to remove between-omics-layer covariance from predictors, whereas its counterpart GTBLUP does not do that. While gradient boosting machine models captured more phenotypic variation, their predictive performance did not exceed the best linear unbiased prediction models for most traits. Models leveraging gene transcripts captured higher proportions of the phenotypic variance for almost all traits when these were measured closer to the moment of measuring gene transcripts in the liver. In most cases, the combination of layers was not able to outperform the best single-omics models to predict phenotypes. Using only gene transcripts, the gradient boosting machine model was able to outperform best linear unbiased prediction for most traits except body weight, but the same pattern was not observed when using both single nucleotide polymorphism genotypes and gene transcripts. Although the GTCBLUP model was not able to produce the most accurate phenotypic predictions, it showed the highest accuracies for breeding values for 9 out of 13 traits. We recommend using the GTBLUP model for prediction of phenotypes and using the GTCBLUP for prediction of breeding values.
Collapse
Affiliation(s)
- Bruno C Perez
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | - Marco C A M Bink
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | | | | | - Mario P L Calus
- Corresponding author: Animal Breeding and Genomics, Wageningen University & Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands.
| |
Collapse
|
8
|
Gobet N, Jan M, Franken P, Xenarios I. Towards mouse genetic-specific RNA-sequencing read mapping. PLoS Comput Biol 2022; 18:e1010552. [PMID: 36155976 PMCID: PMC9536569 DOI: 10.1371/journal.pcbi.1010552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 10/06/2022] [Accepted: 09/07/2022] [Indexed: 11/18/2022] Open
Abstract
Genetic variations affect behavior and cause disease but understanding how these variants drive complex traits is still an open question. A common approach is to link the genetic variants to intermediate molecular phenotypes such as the transcriptome using RNA-sequencing (RNA-seq). Paradoxically, these variants between the samples are usually ignored at the beginning of RNA-seq analyses of many model organisms. This can skew the transcriptome estimates that are used later for downstream analyses, such as expression quantitative trait locus (eQTL) detection. Here, we assessed the impact of reference-based analysis on the transcriptome and eQTLs in a widely-used mouse genetic population: the BXD panel of recombinant inbred lines. We highlight existing reference bias in the transcriptome data analysis and propose practical solutions which combine available genetic variants, genotypes, and genome reference sequence. The use of custom BXD line references improved downstream analysis compared to classical genome reference. These insights would likely benefit genetic studies with a transcriptomic component and demonstrate that genome references need to be reassessed and improved. To understand how genetic variations affect behavior and cause disease it is common to quantify expression of transcripts by sequencing. Transcripts are extracted, fragmented, and the sequence of the fragments read. An important step for their quantification is to virtually assign the different fragments to the transcript they originate from using a reference genome. Reference genomes are costly to build, so usually only one high-quality reference per animal model species is available. When comparing genetically different individuals, using a single reference may introduce a bias because it might be more similar to some individuals than to others. Paradoxically, the variations at the core of genetic studies are thus ignored at the start of the analysis. We built customized references with known genetic variants for each of the mouse lines we had and quantified the impact of the reference at different levels of the bioinformatic analysis. We found that using customized references reduced the bias compared to using a single reference. Our study uses publicly available data and tools, so others can easily implement this improvement in their analyses.
Collapse
Affiliation(s)
- Nastassia Gobet
- Centre for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- Vital-IT, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Maxime Jan
- Centre for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- Bioinformatics Competence Center, University of Lausanne, Lausanne, Switzerland
| | - Paul Franken
- Centre for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Ioannis Xenarios
- Ludwig Cancer Research/CHUV-UNIL, Lausanne, Switzerland
- Health 2030 Genome Center, Geneva, Switzerland
- * E-mail:
| |
Collapse
|
9
|
Guo W, Coulter M, Waugh R, Zhang R. The value of genotype-specific reference for transcriptome analyses in barley. Life Sci Alliance 2022; 5:5/8/e202101255. [PMID: 35459738 PMCID: PMC9034525 DOI: 10.26508/lsa.202101255] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 04/10/2022] [Accepted: 04/11/2022] [Indexed: 12/31/2022] Open
Abstract
We demonstrate in this study that using a common reference genome may lead to loss of genotype-specific information in the assembled Reference Transcript Dataset (RTD) and the generation of erroneous, incomplete, or misleading transcriptomics analysis results in barley. It is increasingly apparent that although different genotypes within a species share “core” genes, they also contain variable numbers of “specific” genes and different structures of “core” genes that are only present in a subset of individuals. Using a common reference genome may thus lead to a loss of genotype-specific information in the assembled Reference Transcript Dataset (RTD) and the generation of erroneous, incomplete or misleading transcriptomics analysis results. In this study, we assembled genotype-specific RTD (sRTD) and common reference–based RTD (cRTD) from RNA-seq data of cultivated Barke and Morex barley, respectively. Our quantitative evaluation showed that the sRTD has a significantly higher diversity of transcripts and alternative splicing events, whereas the cRTD missed 40% of transcripts present in the sRTD and it only has ∼70% accurate transcript assemblies. We found that the sRTD is more accurate for transcript quantification as well as differential expression analysis. However, gene-level quantification is less affected, which may be a reasonable compromise when a high-quality genotype-specific reference is not available.
Collapse
Affiliation(s)
- Wenbin Guo
- Information and Computational Sciences, James Hutton Institute, Dundee, UK
| | - Max Coulter
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Dundee, UK
| | - Robbie Waugh
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Dundee, UK.,Cell and Molecular Sciences, James Hutton Institute, Dundee, UK
| | - Runxuan Zhang
- Information and Computational Sciences, James Hutton Institute, Dundee, UK
| |
Collapse
|
10
|
Thomas SM, Ackert-Bicknell CL, Zuscik MJ, Payne KA. Understanding the Transcriptomic Landscape to Drive New Innovations in Musculoskeletal Regenerative Medicine. Curr Osteoporos Rep 2022; 20:141-152. [PMID: 35156183 DOI: 10.1007/s11914-022-00726-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/18/2022] [Indexed: 11/03/2022]
Abstract
PURPOSE OF REVIEW RNA-sequencing (RNA-seq) is a novel and highly sought-after tool in the field of musculoskeletal regenerative medicine. The technology is being used to better understand pathological processes, as well as elucidate mechanisms governing development and regeneration. It has allowed in-depth characterization of stem cell populations and discovery of molecular mechanisms that regulate stem cell development, maintenance, and differentiation in a way that was not possible with previous technology. This review introduces RNA-seq technology and how it has paved the way for advances in musculoskeletal regenerative medicine. RECENT FINDINGS Recent studies in regenerative medicine have utilized RNA-seq to decipher mechanisms of pathophysiology and identify novel targets for regenerative medicine. The technology has also advanced stem cell biology through in-depth characterization of stem cells, identifying differentiation trajectories and optimizing cell culture conditions. It has also provided new knowledge that has led to improved growth factor use and scaffold design for musculoskeletal regenerative medicine. This article reviews recent studies utilizing RNA-seq in the field of musculoskeletal regenerative medicine. It demonstrates how transcriptomic analysis can be used to provide insights that can aid in formulating a regenerative strategy.
Collapse
Affiliation(s)
- Stacey M Thomas
- Colorado Program for Musculoskeletal Research, Department of Orthopedics, University of Colorado Anschutz Medical Campus, Mail Stop 8343, 12800 East 19th Avenue, Aurora, CO, 80045, USA
| | - Cheryl L Ackert-Bicknell
- Colorado Program for Musculoskeletal Research, Department of Orthopedics, University of Colorado Anschutz Medical Campus, Mail Stop 8343, 12800 East 19th Avenue, Aurora, CO, 80045, USA
| | - Michael J Zuscik
- Colorado Program for Musculoskeletal Research, Department of Orthopedics, University of Colorado Anschutz Medical Campus, Mail Stop 8343, 12800 East 19th Avenue, Aurora, CO, 80045, USA
| | - Karin A Payne
- Colorado Program for Musculoskeletal Research, Department of Orthopedics, University of Colorado Anschutz Medical Campus, Mail Stop 8343, 12800 East 19th Avenue, Aurora, CO, 80045, USA.
- Gates Center for Regenerative Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| |
Collapse
|
11
|
Zhou X, Sam TW, Lee AY, Leung D. Mouse strain-specific polymorphic provirus functions as cis-regulatory element leading to epigenomic and transcriptomic variations. Nat Commun 2021; 12:6462. [PMID: 34753915 PMCID: PMC8578388 DOI: 10.1038/s41467-021-26630-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 10/14/2021] [Indexed: 12/27/2022] Open
Abstract
Polymorphic integrations of endogenous retroviruses (ERVs) have been previously detected in mouse and human genomes. While most are inert, a subset can influence the activity of the host genes. However, the molecular mechanism underlying how such elements affect the epigenome and transcriptome and their roles in driving intra-specific variation remain unclear. Here, by utilizing wildtype murine embryonic stem cells (mESCs) derived from distinct genetic backgrounds, we discover a polymorphic MMERGLN (GLN) element capable of regulating H3K27ac enrichment and transcription of neighboring loci. We demonstrate that this polymorphic element can enhance the neighboring Klhdc4 gene expression in cis, which alters the activity of downstream stress response genes. These results suggest that the polymorphic ERV-derived cis-regulatory element contributes to differential phenotypes from stimuli between mouse strains. Moreover, we identify thousands of potential polymorphic ERVs in mESCs, a subset of which show an association between proviral activity and nearby chromatin states and transcription. Overall, our findings elucidate the mechanism of how polymorphic ERVs can shape the epigenome and transcriptional networks that give rise to phenotypic divergence between individuals.
Collapse
Affiliation(s)
- Xuemeng Zhou
- Division of Life Science, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, SAR, China
| | - Tsz Wing Sam
- Division of Life Science, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, SAR, China
| | - Ah Young Lee
- Center for Epigenomics Research, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, SAR, China
| | - Danny Leung
- Division of Life Science, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, SAR, China. .,Center for Epigenomics Research, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, SAR, China.
| |
Collapse
|
12
|
Spaulding EL, Hines TJ, Bais P, Tadenev ALD, Schneider R, Jewett D, Pattavina B, Pratt SL, Morelli KH, Stum MG, Hill DP, Gobet C, Pipis M, Reilly MM, Jennings MJ, Horvath R, Bai Y, Shy ME, Alvarez-Castelao B, Schuman EM, Bogdanik LP, Storkebaum E, Burgess RW. The integrated stress response contributes to tRNA synthetase-associated peripheral neuropathy. Science 2021; 373:1156-1161. [PMID: 34516839 PMCID: PMC8908546 DOI: 10.1126/science.abb3414] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Dominant mutations in ubiquitously expressed transfer RNA (tRNA) synthetase genes cause axonal peripheral neuropathy, accounting for at least six forms of Charcot-Marie-Tooth (CMT) disease. Genetic evidence in mouse and Drosophila models suggests a gain-of-function mechanism. In this study, we used in vivo, cell type–specific transcriptional and translational profiling to show that mutant tRNA synthetases activate the integrated stress response (ISR) through the sensor kinase GCN2 (general control nonderepressible 2). The chronic activation of the ISR contributed to the pathophysiology, and genetic deletion or pharmacological inhibition of Gcn2 alleviated the peripheral neuropathy. The activation of GCN2 suggests that the aberrant activity of the mutant tRNA synthetases is still related to translation and that inhibiting GCN2 or the ISR may represent a therapeutic strategy in CMT.
Collapse
Affiliation(s)
- E. L. Spaulding
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
- Graduate School of Biomedical Science and Engineering, University of Maine, Orono, ME 04469, USA
| | - T. J. Hines
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - P. Bais
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - A. L. D. Tadenev
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - R. Schneider
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - D. Jewett
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - B. Pattavina
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - S. L. Pratt
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
- Neuroscience Program, Graduate School of Biomedical Sciences, Tufts University, Boston, MA, 02111 USA
| | - K. H. Morelli
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
- Graduate School of Biomedical Science and Engineering, University of Maine, Orono, ME 04469, USA
| | - M. G. Stum
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - D. P. Hill
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - C. Gobet
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
| | - M. Pipis
- MRC Centre for Neuromuscular Diseases, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - M. M. Reilly
- MRC Centre for Neuromuscular Diseases, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - M. J. Jennings
- Department of Clinical Neuroscience, University of Cambridge, Cambridge, UK
| | - R. Horvath
- Department of Clinical Neuroscience, University of Cambridge, Cambridge, UK
| | - Y. Bai
- Department of Neurology, Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
| | - M. E. Shy
- Department of Neurology, Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
| | | | - E. M. Schuman
- Max Planck Institute for Brain Research, Frankfurt, Germany
| | - L. P. Bogdanik
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - E. Storkebaum
- Department of Molecular Neurobiology, Donders Institute for Brain, Cognition and Behaviour and Faculty of Science, Radboud University, Nijmegen, Netherlands
| | - R. W. Burgess
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
- Graduate School of Biomedical Science and Engineering, University of Maine, Orono, ME 04469, USA
- Neuroscience Program, Graduate School of Biomedical Sciences, Tufts University, Boston, MA, 02111 USA
| |
Collapse
|
13
|
Que E, James KL, Coffey AR, Smallwood TL, Albright J, Huda MN, Pomp D, Sethupathy P, Bennett BJ. Genetic architecture modulates diet-induced hepatic mRNA and miRNA expression profiles in Diversity Outbred mice. Genetics 2021; 218:6321522. [PMID: 34849860 PMCID: PMC8757298 DOI: 10.1093/genetics/iyab068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 07/27/2020] [Indexed: 11/30/2022] Open
Abstract
Genetic approaches in model organisms have consistently demonstrated that molecular traits such as gene expression are under genetic regulation, similar to clinical traits. The resulting expression quantitative trait loci (eQTL) have revolutionized our understanding of genetic regulation and identified numerous candidate genes for clinically relevant traits. More recently, these analyses have been extended to other molecular traits such as protein abundance, metabolite levels, and miRNA expression. Here, we performed global hepatic eQTL and microRNA expression quantitative trait loci (mirQTL) analysis in a population of Diversity Outbred mice fed two different diets. We identified several key features of eQTL and mirQTL, namely differences in the mode of genetic regulation (cis or trans) between mRNA and miRNA. Approximately 50% of mirQTL are regulated by a trans-acting factor, compared to ∼25% of eQTL. We note differences in the heritability of mRNA and miRNA expression and variance explained by each eQTL or mirQTL. In general, cis-acting variants affecting mRNA or miRNA expression explain more phenotypic variance than trans-acting variants. Finally, we investigated the effect of diet on the genetic architecture of eQTL and mirQTL, highlighting the critical effects of environment on both eQTL and mirQTL. Overall, these data underscore the complex genetic regulation of two well-characterized RNA classes (mRNA and miRNA) that have critical roles in the regulation of clinical traits and disease susceptibility
Collapse
Affiliation(s)
- Excel Que
- Western Human Nutrition Research Center, Agricultural Research Service, US Department of Agriculture, Davis, CA 95616, USA.,Department of Nutrition, University of California, Davis, Davis, CA 95616, USA
| | - Kristen L James
- Department of Nutrition, University of California, Davis, Davis, CA 95616, USA
| | - Alisha R Coffey
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 28081, USA
| | - Tangi L Smallwood
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 28081, USA
| | - Jody Albright
- Nutrition Research Institute, University of North Carolina at Chapel Hill, Kannapolis, NC 28081, USA
| | - M Nazmul Huda
- Western Human Nutrition Research Center, Agricultural Research Service, US Department of Agriculture, Davis, CA 95616, USA.,Department of Nutrition, University of California, Davis, Davis, CA 95616, USA
| | - Daniel Pomp
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Praveen Sethupathy
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY 14853, USA
| | - Brian J Bennett
- Western Human Nutrition Research Center, Agricultural Research Service, US Department of Agriculture, Davis, CA 95616, USA.,Department of Nutrition, University of California, Davis, Davis, CA 95616, USA
| |
Collapse
|
14
|
Boatwright JL, Yeh CT, Hu HC, Susanna A, Soltis DE, Soltis PS, Schnable PS, Barbazuk WB. Trajectories of Homoeolog-Specific Expression in Allotetraploid Tragopogon castellanus Populations of Independent Origins. FRONTIERS IN PLANT SCIENCE 2021; 12:679047. [PMID: 34249049 PMCID: PMC8261302 DOI: 10.3389/fpls.2021.679047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 05/20/2021] [Indexed: 06/13/2023]
Abstract
Polyploidization can have a significant ecological and evolutionary impact by providing substantially more genetic material that may result in novel phenotypes upon which selection may act. While the effects of polyploidization are broadly reviewed across the plant tree of life, the reproducibility of these effects within naturally occurring, independently formed polyploids is poorly characterized. The flowering plant genus Tragopogon (Asteraceae) offers a rare glimpse into the intricacies of repeated allopolyploid formation with both nascent (< 90 years old) and more ancient (mesopolyploids) formations. Neo- and mesopolyploids in Tragopogon have formed repeatedly and have extant diploid progenitors that facilitate the comparison of genome evolution after polyploidization across a broad span of evolutionary time. Here, we examine four independently formed lineages of the mesopolyploid Tragopogon castellanus for homoeolog expression changes and fractionation after polyploidization. We show that expression changes are remarkably similar among these independently formed polyploid populations with large convergence among expressed loci, moderate convergence among loci lost, and stochastic silencing. We further compare and contrast these results for T. castellanus with two nascent Tragopogon allopolyploids. While homoeolog expression bias was balanced in both nascent polyploids and T. castellanus, the degree of additive expression was significantly different, with the mesopolyploid populations demonstrating more non-additive expression. We suggest that gene dosage and expression noise minimization may play a prominent role in regulating gene expression patterns immediately after allopolyploidization as well as deeper into time, and these patterns are conserved across independent polyploid lineages.
Collapse
Affiliation(s)
- J. Lucas Boatwright
- Advanced Plant Technology Program, Clemson University, Clemson, SC, United States
| | - Cheng-Ting Yeh
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Heng-Cheng Hu
- Department of Agronomy, Iowa State University, Ames, IA, United States
- Covance Inc., Indianapolis, IN, United States
| | - Alfonso Susanna
- Botanic Institute of Barcelona, Consejo Superior de Investigaciones Científicas, ICUB, Barcelona, Spain
| | - Douglas E. Soltis
- Department of Biology, University of Florida, Gainesville, FL, United States
- Plant Molecular and Cellular Biology Program, University of Florida, Gainesville, FL, United States
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
- Genetics Institute, University of Florida, Gainesville, FL, United States
- Biodiversity Institute, University of Florida, Gainesville, FL, United States
| | - Pamela S. Soltis
- Plant Molecular and Cellular Biology Program, University of Florida, Gainesville, FL, United States
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
- Genetics Institute, University of Florida, Gainesville, FL, United States
- Biodiversity Institute, University of Florida, Gainesville, FL, United States
| | | | - William B. Barbazuk
- Department of Biology, University of Florida, Gainesville, FL, United States
| |
Collapse
|
15
|
Miller BR, Morse AM, Borgert JE, Liu Z, Sinclair K, Gamble G, Zou F, Newman JRB, León-Novelo LG, Marroni F, McIntyre LM. Testcrosses are an efficient strategy for identifying cis-regulatory variation: Bayesian analysis of allele-specific expression (BayesASE). G3 (BETHESDA, MD.) 2021; 11:jkab096. [PMID: 33772539 PMCID: PMC8104932 DOI: 10.1093/g3journal/jkab096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 03/10/2021] [Indexed: 12/30/2022]
Abstract
Allelic imbalance (AI) occurs when alleles in a diploid individual are differentially expressed and indicates cis acting regulatory variation. What is the distribution of allelic effects in a natural population? Are all alleles the same? Are all alleles distinct? The approach described applies to any technology generating allele-specific sequence counts, for example for chromatin accessibility and can be applied generally including to comparisons between tissues or environments for the same genotype. Tests of allelic effect are generally performed by crossing individuals and comparing expression between alleles directly in the F1. However, a crossing scheme that compares alleles pairwise is a prohibitive cost for more than a handful of alleles as the number of crosses is at least (n2-n)/2 where n is the number of alleles. We show here that a testcross design followed by a hypothesis test of AI between testcrosses can be used to infer differences between nontester alleles, allowing n alleles to be compared with n crosses. Using a mouse data set where both testcrosses and direct comparisons have been performed, we show that the predicted differences between nontester alleles are validated at levels of over 90% when a parent-of-origin effect is present and of 60%-80% overall. Power considerations for a testcross, are similar to those in a reciprocal cross. In all applications, the testing for AI involves several complex bioinformatics steps. BayesASE is a complete bioinformatics pipeline that incorporates state-of-the-art error reduction techniques and a flexible Bayesian approach to estimating AI and formally comparing levels of AI between conditions. The modular structure of BayesASE has been packaged in Galaxy, made available in Nextflow and as a collection of scripts for the SLURM workload manager on github (https://github.com/McIntyre-Lab/BayesASE).
Collapse
Affiliation(s)
- Brecca R Miller
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- NYU Langone Health, New York University, New York, NY 10013, USA
| | - Alison M Morse
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| | - Jacqueline E Borgert
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27515, USA
| | - Zihao Liu
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| | - Kelsey Sinclair
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| | - Gavin Gamble
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27515, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27515, USA
| | - Jeremy R B Newman
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Pathology, University of Florida, Gainesville, FL 32608 USA
| | - Luis G León-Novelo
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston-University of Texas School of Public Health, Houston, TX 7703, USA
| | - Fabio Marroni
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, Udine, 33100, Italy
| | - Lauren M McIntyre
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| |
Collapse
|
16
|
Zhan S, Griswold C, Lukens L. Zea mays RNA-seq estimated transcript abundances are strongly affected by read mapping bias. BMC Genomics 2021; 22:285. [PMID: 33874908 PMCID: PMC8056621 DOI: 10.1186/s12864-021-07577-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 03/30/2021] [Indexed: 11/27/2022] Open
Abstract
Background Genetic variation for gene expression is a source of phenotypic variation for natural and agricultural species. The common approach to map and to quantify gene expression from genetically distinct individuals is to assign their RNA-seq reads to a single reference genome. However, RNA-seq reads from alleles dissimilar to this reference genome may fail to map correctly, causing transcript levels to be underestimated. Presently, the extent of this mapping problem is not clear, particularly in highly diverse species. We investigated if mapping bias occurred and if chromosomal features associated with mapping bias. Zea mays presents a model species to assess these questions, given it has genotypically distinct and well-studied genetic lines. Results In Zea mays, the inbred B73 genome is the standard reference genome and template for RNA-seq read assignments. In the absence of mapping bias, B73 and a second inbred line, Mo17, would each have an approximately equal number of regulatory alleles that increase gene expression. Remarkably, Mo17 had 2–4 times fewer such positively acting alleles than did B73 when RNA-seq reads were aligned to the B73 reference genome. Reciprocally, over one-half of the B73 alleles that increased gene expression were not detected when reads were aligned to the Mo17 genome template. Genes at dissimilar chromosomal ends were strongly affected by mapping bias, and genes at more similar pericentromeric regions were less affected. Biased transcript estimates were higher in untranslated regions and lower in splice junctions. Bias occurred across software and alignment parameters. Conclusions Mapping bias very strongly affects gene transcript abundance estimates in maize, and bias varies across chromosomal features. Individual genome or transcriptome templates are likely necessary for accurate transcript estimation across genetically variable individuals in maize and other species. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07577-3.
Collapse
Affiliation(s)
- Shuhua Zhan
- Department of Plant Agriculture, University of Guelph, Guelph, Ontario, Canada
| | - Cortland Griswold
- Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada
| | - Lewis Lukens
- Department of Plant Agriculture, University of Guelph, Guelph, Ontario, Canada.
| |
Collapse
|
17
|
Patro R, Salmela L. Algorithms meet sequencing technologies - 10th edition of the RECOMB-Seq workshop. iScience 2021; 24:101956. [PMID: 33437938 PMCID: PMC7788091 DOI: 10.1016/j.isci.2020.101956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
DNA and RNA sequencing is a core technology in biological and medical research. The high throughput of these technologies and the consistent development of new experimental assays and biotechnologies demand the continuous development of methods to analyze the resulting data. The RECOMB Satellite Workshop on Massively Parallel Sequencing brings together leading researchers in computational genomics to discuss emerging frontiers in algorithm development for massively parallel sequencing data. The 10th meeting in this series, RECOMB-Seq 2020, was scheduled to be held in Padua, Italy, but due to the ongoing COVID-19 pandemic, the meeting was carried out virtually instead. The online workshop featured keynote talks by Paola Bonizzoni and Zamin Iqbal, two highlight talks, ten regular talks, and three short talks. Seven of the works presented in the workshop are featured in this edition of iScience, and many of the talks are available online in the RECOMB-Seq 2020 YouTube channel.
Collapse
Affiliation(s)
- Rob Patro
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, MD, USA
| | - Leena Salmela
- Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| |
Collapse
|
18
|
Melia T, Waxman DJ. Genetic factors contributing to extensive variability of sex-specific hepatic gene expression in Diversity Outbred mice. PLoS One 2020; 15:e0242665. [PMID: 33264334 PMCID: PMC7710091 DOI: 10.1371/journal.pone.0242665] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 11/09/2020] [Indexed: 12/12/2022] Open
Abstract
Sex-specific transcription characterizes hundreds of genes in mouse liver, many implicated in sex-differential drug and lipid metabolism and disease susceptibility. While the regulation of liver sex differences by growth hormone-activated STAT5 is well established, little is known about autosomal genetic factors regulating the sex-specific liver transcriptome. Here we show, using genotyping and expression data from a large population of Diversity Outbred mice, that genetic factors work in tandem with growth hormone to control the individual variability of hundreds of sex-biased genes, including many long non-coding RNA genes. Significant associations between single nucleotide polymorphisms and sex-specific gene expression were identified as expression quantitative trait loci (eQTLs), many of which showed strong sex-dependent associations. Remarkably, autosomal genetic modifiers of sex-specific genes were found to account for more than 200 instances of gain or loss of sex-specificity across eight Diversity Outbred mouse founder strains. Sex-biased STAT5 binding sites and open chromatin regions with strain-specific variants were significantly enriched at eQTL regions regulating correspondingly sex-specific genes, supporting the proposed functional regulatory nature of the eQTL regions identified. Binding of the male-biased, growth hormone-regulated repressor BCL6 was most highly enriched at trans-eQTL regions controlling female-specific genes. Co-regulated gene clusters defined by overlapping eQTLs included sets of highly correlated genes from different chromosomes, further supporting trans-eQTL action. These findings elucidate how an unexpectedly large number of autosomal factors work in tandem with growth hormone signaling pathways to regulate the individual variability associated with sex differences in liver metabolism and disease.
Collapse
Affiliation(s)
- Tisha Melia
- Department of Biology and Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - David J. Waxman
- Department of Biology and Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
19
|
Srivastava A, Malik L, Sarkar H, Zakeri M, Almodaresi F, Soneson C, Love MI, Kingsford C, Patro R. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol 2020; 21:239. [PMID: 32894187 PMCID: PMC7487471 DOI: 10.1186/s13059-020-02151-8] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 08/19/2020] [Indexed: 01/23/2023] Open
Abstract
Background The accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted. While the choice of quantification model has been shown to be important, considerably less attention has been given to comparing the effect of various read alignment approaches on quantification accuracy. Results We investigate the influence of mapping and alignment on the accuracy of transcript quantification in both simulated and experimental data, as well as the effect on subsequent differential expression analysis. We observe that, even when the quantification model itself is held fixed, the effect of choosing a different alignment methodology, or aligning reads using different parameters, on quantification estimates can sometimes be large and can affect downstream differential expression analyses as well. These effects can go unnoticed when assessment is focused too heavily on simulated data, where the alignment task is often simpler than in experimentally acquired samples. We also introduce a new alignment methodology, called selective alignment, to overcome the shortcomings of lightweight approaches without incurring the computational cost of traditional alignment. Conclusion We observe that, on experimental datasets, the performance of lightweight mapping and alignment-based approaches varies significantly, and highlight some of the underlying factors. We show this variation both in terms of quantification and downstream differential expression analysis. In all comparisons, we also show the improved performance of our proposed selective alignment method and suggest best practices for performing RNA-seq quantification.
Collapse
Affiliation(s)
- Avi Srivastava
- Department of Computer Science, Stony Brook University, Stony Brook, USA
| | - Laraib Malik
- Department of Computer Science, Stony Brook University, Stony Brook, USA
| | - Hirak Sarkar
- Department of Computer Science, University of Maryland, College Park, USA
| | - Mohsen Zakeri
- Department of Computer Science, University of Maryland, College Park, USA
| | - Fatemeh Almodaresi
- Department of Computer Science, University of Maryland, College Park, USA
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA.,Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Carl Kingsford
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, USA.
| |
Collapse
|
20
|
Liang ZS, Cimino I, Yalcin B, Raghupathy N, Vancollie VE, Ibarra-Soria X, Firth HV, Rimmington D, Farooqi IS, Lelliott CJ, Munger SC, O’Rahilly S, Ferguson-Smith AC, Coll AP, Logan DW. Trappc9 deficiency causes parent-of-origin dependent microcephaly and obesity. PLoS Genet 2020; 16:e1008916. [PMID: 32877400 PMCID: PMC7467316 DOI: 10.1371/journal.pgen.1008916] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 06/08/2020] [Indexed: 11/30/2022] Open
Abstract
Some imprinted genes exhibit parental origin specific expression bias rather than being transcribed exclusively from one copy. The physiological relevance of this remains poorly understood. In an analysis of brain-specific allele-biased expression, we identified that Trappc9, a cellular trafficking factor, was expressed predominantly (~70%) from the maternally inherited allele. Loss-of-function mutations in human TRAPPC9 cause a rare neurodevelopmental syndrome characterized by microcephaly and obesity. By studying Trappc9 null mice we discovered that homozygous mutant mice showed a reduction in brain size, exploratory activity and social memory, as well as a marked increase in body weight. A role for Trappc9 in energy balance was further supported by increased ad libitum food intake in a child with TRAPPC9 deficiency. Strikingly, heterozygous mice lacking the maternal allele (70% reduced expression) had pathology similar to homozygous mutants, whereas mice lacking the paternal allele (30% reduction) were phenotypically normal. Taken together, we conclude that Trappc9 deficient mice recapitulate key pathological features of TRAPPC9 mutations in humans and identify a role for Trappc9 and its imprinting in controlling brain development and metabolism.
Collapse
Affiliation(s)
- Zhengzheng S. Liang
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| | - Irene Cimino
- MRC Metabolic Diseases Unit, Wellcome Trust-Medical Research Council Institute of Metabolic Science, University of Cambridge, Cambridge, United Kingdom
| | - Binnaz Yalcin
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, Université de Strasbourg, France
| | | | | | - Ximena Ibarra-Soria
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, United Kingdom
| | - Helen V. Firth
- Department of Clinical Genetics, Addenbrooke’s Hospital, Cambridge, United Kingdom
| | - Debra Rimmington
- MRC Metabolic Diseases Unit, Wellcome Trust-Medical Research Council Institute of Metabolic Science, University of Cambridge, Cambridge, United Kingdom
| | - I. Sadaf Farooqi
- University of Cambridge Metabolic Research Laboratories and NIHR Cambridge Biomedical Research Centre, Addenbrooke's Hospital, Cambridge, United Kingdom
| | | | - Steven C. Munger
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Stephen O’Rahilly
- MRC Metabolic Diseases Unit, Wellcome Trust-Medical Research Council Institute of Metabolic Science, University of Cambridge, Cambridge, United Kingdom
| | | | - Anthony P. Coll
- MRC Metabolic Diseases Unit, Wellcome Trust-Medical Research Council Institute of Metabolic Science, University of Cambridge, Cambridge, United Kingdom
| | - Darren W. Logan
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| |
Collapse
|
21
|
Que E, James KL, Coffey AR, Smallwood TL, Albright J, Huda MN, Pomp D, Sethupathy P, Bennett BJ. Genetic Architecture Modulates Diet-Induced Hepatic mRNA and miRNA Expression Profiles in Diversity Outbred Mice. Genetics 2020; 216:241-259. [PMID: 32763908 PMCID: PMC7463293 DOI: 10.1534/genetics.120.303481] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 07/27/2020] [Indexed: 02/07/2023] Open
Abstract
Genetic approaches in model organisms have consistently demonstrated that molecular traits such as gene expression are under genetic regulation, similar to clinical traits. The resulting expression quantitative trait loci (eQTL) have revolutionized our understanding of genetic regulation and identified numerous candidate genes for clinically relevant traits. More recently, these analyses have been extended to other molecular traits such as protein abundance, metabolite levels, and miRNA expression. Here, we performed global hepatic eQTL and microRNA expression quantitative trait loci (mirQTL) analysis in a population of Diversity Outbred mice fed two different diets. We identified several key features of eQTL and mirQTL, namely differences in the mode of genetic regulation (cis or trans) between mRNA and miRNA. Approximately 50% of mirQTL are regulated by a trans-acting factor, compared to ∼25% of eQTL. We note differences in the heritability of mRNA and miRNA expression and variance explained by each eQTL or mirQTL. In general, cis-acting variants affecting mRNA or miRNA expression explain more phenotypic variance than trans-acting variants. Lastly, we investigated the effect of diet on the genetic architecture of eQTL and mirQTL, highlighting the critical effects of environment on both eQTL and mirQTL. Overall, these data underscore the complex genetic regulation of two well-characterized RNA classes (mRNA and miRNA) that have critical roles in the regulation of clinical traits and disease susceptibility.
Collapse
Affiliation(s)
- Excel Que
- Western Human Nutrition Research Center, Agricultural Research Service, US Department of Agriculture, Davis, California 95616
- Department of Nutrition, University of California, Davis, California
| | - Kristen L James
- Department of Nutrition, University of California, Davis, California
| | - Alisha R Coffey
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, North Carolina
| | - Tangi L Smallwood
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, North Carolina
| | - Jody Albright
- Nutrition Research Institute, University of North Carolina at Chapel Hill, Kannapolis, North Carolina
| | - M Nazmul Huda
- Western Human Nutrition Research Center, Agricultural Research Service, US Department of Agriculture, Davis, California 95616
- Department of Nutrition, University of California, Davis, California
| | - Daniel Pomp
- Department of Genetics, University of North Carolina at Chapel Hill, North Carolina
| | - Praveen Sethupathy
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, New York
| | - Brian J Bennett
- Western Human Nutrition Research Center, Agricultural Research Service, US Department of Agriculture, Davis, California 95616
- Department of Nutrition, University of California, Davis, California
| |
Collapse
|
22
|
Groza C, Kwan T, Soranzo N, Pastinen T, Bourque G. Personalized and graph genomes reveal missing signal in epigenomic data. Genome Biol 2020; 21:124. [PMID: 32450900 PMCID: PMC7249353 DOI: 10.1186/s13059-020-02038-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 05/08/2020] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesize that using a generic reference could lead to incorrectly mapped reads and bias downstream results. RESULTS We show that accounting for genetic variation using a modified reference genome or a de novo assembled genome can alter histone H3K4me1 and H3K27ac ChIP-seq peak calls either by creating new personal peaks or by the loss of reference peaks. Using permissive cutoffs, modified reference genomes are found to alter approximately 1% of peak calls while de novo assembled genomes alter up to 5% of peaks. We also show statistically significant differences in the amount of reads observed in regions associated with the new, altered, and unchanged peaks. We report that short insertions and deletions (indels), followed by single nucleotide variants (SNVs), have the highest probability of modifying peak calls. We show that using a graph personalized genome represents a reasonable compromise between modified reference genomes and de novo assembled genomes. We demonstrate that altered peaks have a genomic distribution typical of other peaks. CONCLUSIONS Analyzing epigenomic datasets with personalized and graph genomes allows the recovery of new peaks enriched for indels and SNVs. These altered peaks are more likely to differ between individuals and, as such, could be relevant in the study of various human phenotypes.
Collapse
Affiliation(s)
| | - Tony Kwan
- Human Genetics, McGill University, Montreal, QC, Canada
- McGill University and Genome Quebec Innovation Centre, McGill University, Montreal, QC, Canada
| | - Nicole Soranzo
- Department of Human Genetics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambdridge, UK
- British Heart Foundation Centre of Excellence, Division of Cardiovascular Medicine, Addenbrooke's Hospital, Hills Road, Cambdridge, UK
- The National Institute for Health Research Blood and Transplant Unit (NIHR BTRU) in Donor Health and Genomics, University of Cambridge, Strangeways Research Laboratory, Wort's Causeway, Cambdridge, UK
| | - Tomi Pastinen
- Human Genetics, McGill University, Montreal, QC, Canada
- McGill University and Genome Quebec Innovation Centre, McGill University, Montreal, QC, Canada
- Center for Pediatric Genomic Medicine, Kansas City, MO, USA
| | - Guillaume Bourque
- Human Genetics, McGill University, Montreal, QC, Canada.
- McGill University and Genome Quebec Innovation Centre, McGill University, Montreal, QC, Canada.
- Canadian Centre for Computational Genomics, Montreal, QC, Canada.
- Institute for the Advanced Study of Human Biology, Kyoto University, Kyoto, Japan.
| |
Collapse
|
23
|
Raghupathy N, Choi K, Vincent MJ, Beane GL, Sheppard KS, Munger SC, Korstanje R, Pardo-Manual de Villena F, Churchill GA. Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression. Bioinformatics 2019; 34:2177-2184. [PMID: 29444201 DOI: 10.1093/bioinformatics/bty078] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 02/09/2018] [Indexed: 02/06/2023] Open
Abstract
Motivation Allele-specific expression (ASE) refers to the differential abundance of the allelic copies of a transcript. RNA sequencing (RNA-seq) can provide quantitative estimates of ASE for genes with transcribed polymorphisms. When short-read sequences are aligned to a diploid transcriptome, read-mapping ambiguities confound our ability to directly count reads. Multi-mapping reads aligning equally well to multiple genomic locations, isoforms or alleles can comprise the majority (>85%) of reads. Discarding them can result in biases and substantial loss of information. Methods have been developed that use weighted allocation of read counts but these methods treat the different types of multi-reads equivalently. We propose a hierarchical approach to allocation of read counts that first resolves ambiguities among genes, then among isoforms, and lastly between alleles. We have implemented our model in EMASE software (Expectation-Maximization for Allele Specific Expression) to estimate total gene expression, isoform usage and ASE based on this hierarchical allocation. Results Methods that align RNA-seq reads to a diploid transcriptome incorporating known genetic variants improve estimates of ASE and total gene expression compared to methods that use reference genome alignments. Weighted allocation methods outperform methods that discard multi-reads. Hierarchical allocation of reads improves estimation of ASE even when data are simulated from a non-hierarchical model. Analysis of RNA-seq data from F1 hybrid mice using EMASE reveals widespread ASE associated with cis-acting polymorphisms and a small number of parent-of-origin effects. Availability and implementation EMASE software is available at https://github.com/churchill-lab/emase. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
24
|
Bioinformatic methods for cancer neoantigen prediction. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2019; 164:25-60. [PMID: 31383407 DOI: 10.1016/bs.pmbts.2019.06.016] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Tumor cells accumulate aberrations not present in normal cells, leading to presentation of neoantigens on MHC molecules on their surface. These non-self neoantigens distinguish tumor cells from normal cells to the immune system and are thus targets for cancer immunotherapy. The rapid development of molecular profiling platforms, such as next-generation sequencing, has enabled the generation of large datasets characterizing tumor cells. The simultaneous development of algorithms has enabled rapid and accurate processing of these data. Bioinformatic software tools encoding the algorithms can be strung together in a workflow to identify neoantigens. Here, with a focus on high-throughput sequencing, we review state-of-the art bioinformatic tools along with the steps and challenges involved in neoantigen identification and recognition.
Collapse
|
25
|
Skelly DA, Raghupathy N, Robledo RF, Graber JH, Chesler EJ. Reference Trait Analysis Reveals Correlations Between Gene Expression and Quantitative Traits in Disjoint Samples. Genetics 2019; 212:919-929. [PMID: 31113812 PMCID: PMC6614885 DOI: 10.1534/genetics.118.301865] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 05/14/2019] [Indexed: 12/21/2022] Open
Abstract
Systems genetic analysis of complex traits involves the integrated analysis of genetic, genomic, and disease-related measures. However, these data are often collected separately across multiple study populations, rendering direct correlation of molecular features to complex traits impossible. Recent transcriptome-wide association studies (TWAS) have harnessed gene expression quantitative trait loci (eQTL) to associate unmeasured gene expression with a complex trait in genotyped individuals, but this approach relies primarily on strong eQTL. We propose a simple and powerful alternative strategy for correlating independently obtained sets of complex traits and molecular features. In contrast to TWAS, our approach gains precision by correlating complex traits through a common set of continuous phenotypes instead of genetic predictors, and can identify transcript-trait correlations for which the regulation is not genetic. In our approach, a set of multiple quantitative "reference" traits is measured across all individuals, while measures of the complex trait of interest and transcriptional profiles are obtained in disjoint subsamples. A conventional multivariate statistical method, canonical correlation analysis, is used to relate the reference traits and traits of interest to identify gene expression correlates. We evaluate power and sample size requirements of this methodology, as well as performance relative to other methods, via extensive simulation and analysis of a behavioral genetics experiment in 258 Diversity Outbred mice involving two independent sets of anxiety-related behaviors and hippocampal gene expression. After splitting the data set and hiding one set of anxiety-related traits in half the samples, we identified transcripts correlated with the hidden traits using the other set of anxiety-related traits and exploiting the highest canonical correlation (R = 0.69) between the trait data sets. We demonstrate that this approach outperforms TWAS in identifying associated transcripts. Together, these results demonstrate the validity, reliability, and power of reference trait analysis for identifying relations between complex traits and their molecular substrates.
Collapse
Affiliation(s)
| | | | | | - Joel H Graber
- The Jackson Laboratory, Bar Harbor, Maine 04609
- MDI Biological Laboratory, Bar Harbor, Maine 04609
| | | |
Collapse
|
26
|
Melia T, Waxman DJ. Sex-Biased lncRNAs Inversely Correlate With Sex-Opposite Gene Coexpression Networks in Diversity Outbred Mouse Liver. Endocrinology 2019; 160:989-1007. [PMID: 30840070 PMCID: PMC6449536 DOI: 10.1210/en.2018-00949] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 02/27/2019] [Indexed: 01/05/2023]
Abstract
Sex differences in liver gene expression are determined by pituitary growth hormone secretion patterns, which regulate sex-dependent liver transcription factors and establish sex-specific chromatin states. Hypophysectomy (hypox) identifies two major classes of liver sex-biased genes, defined by their sex-dependent positive or negative responses to pituitary hormone ablation. However, the mechanisms that underlie each hypox-response class are unknown. We sought to discover candidate, regulatory, long noncoding RNAs (lncRNAs) controlling responsiveness to hypox. We characterized gene structures and expression patterns for 15,558 mouse liver-expressed lncRNAs, including many sex-specific lncRNAs regulated during postnatal development or subject to circadian regulation. Using the high natural allelic variance of Diversity Outbred (DO) mice, we discovered tightly coexpressed clusters of sex-specific protein-coding genes (gene modules) in male and female DO liver. Remarkably, many gene modules were strongly enriched for sex-specific genes within a single hypox-response class, indicating that the genetic heterogeneity of DO mice encompasses responsiveness to hypox. Moreover, several distant gene modules were enriched for gene subsets of the same hypox-response class, highlighting the complex regulation of hypox-responsiveness. Finally, we identified eight sex-specific lncRNAs with strong negative regulatory potential, as indicated by their strong negative correlation of expression across DO mouse livers with that of protein-coding gene modules enriched for genes of the opposite sex bias and inverse hypox-response class. These findings reveal an important role for genetic factors in regulating responsiveness to hypox, and present testable hypotheses for the roles of sex-biased liver lncRNAs in controlling the sex-bias of liver gene expression.
Collapse
Affiliation(s)
- Tisha Melia
- Department of Biology and Bioinformatics Program, Boston University, Boston, Massachusetts
| | - David J Waxman
- Department of Biology and Bioinformatics Program, Boston University, Boston, Massachusetts
- Correspondence: David J. Waxman, PhD, Department of Biology, Boston University, 5 Cummington Mall, Boston, Massachusetts 02215. E-mail:
| |
Collapse
|
27
|
Zhao C, Xie S, Wu H, Luan Y, Hu S, Ni J, Lin R, Zhao S, Zhang D, Li X. Quantification of allelic differential expression using a simple Fluorescence primer PCR-RFLP-based method. Sci Rep 2019; 9:6334. [PMID: 31004110 PMCID: PMC6474871 DOI: 10.1038/s41598-019-42815-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Accepted: 03/29/2019] [Indexed: 12/04/2022] Open
Abstract
Allelic differential expression (ADE) is common in diploid organisms, and is often the key reason for specific phenotype variations. Thus, ADE detection is important for identification of major genes and causal mutations. To date, sensitive and simple methods to detect ADE are still lacking. In this study, we have developed an accurate, simple, and sensitive method, named fluorescence primer PCR-RFLP quantitative method (fPCR-RFLP), for ADE analysis. This method involves two rounds of PCR amplification using a pair of primers, one of which is double-labeled with an overhang 6-FAM. The two alleles are then separated by RFLP and quantified by fluorescence density. fPCR-RFLP could precisely distinguish ADE cross a range of 1- to 32-fold differences. Using this method, we verified PLAG1 and KIT, two candidate genes related to growth rate and immune response traits of pigs, to be ADE both at different developmental stages and in different tissues. Our data demonstrates that fPCR-RFLP is an accurate and sensitive method for detecting ADE on both DNA and RNA level. Therefore, this powerful tool provides a way to analyze mutations that cause ADE.
Collapse
Affiliation(s)
- Changzhi Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Shengsong Xie
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China.,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Hui Wu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Yu Luan
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Suqin Hu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Juan Ni
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Ruiyi Lin
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China.,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China
| | - Dingxiao Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China. .,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China.
| | - Xinyun Li
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, P.R. China. .,The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, 430070, P.R. China.
| |
Collapse
|
28
|
Qu W, Gurdziel K, Pique-Regi R, Ruden DM. Lead Modulates trans- and cis-Expression Quantitative Trait Loci (eQTLs) in Drosophila melanogaster Heads. Front Genet 2018; 9:395. [PMID: 30294342 PMCID: PMC6158337 DOI: 10.3389/fgene.2018.00395] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 08/30/2018] [Indexed: 11/13/2022] Open
Abstract
Lead exposure has long been one of the most important topics in global public health because it is a potent developmental neurotoxin. Here, an eQTL analysis, which is the genome-wide association analysis of genetic variants with gene expression, was performed. In this analysis, the male heads of 79 Drosophila melanogaster inbred lines from Drosophila Synthetic Population Resource (DSPR) were treated with or without developmental exposure, from hatching to adults, to 250 μM lead acetate [Pb(C2H3O2)2]. The goal was to identify genomic intervals that influence the gene-expression response to lead. After detecting 1798 cis-eQTLs and performing an initial trans-eQTL analysis, we focused our analysis on lead-sensitive "trans-eQTL hotspots," defined as genomic regions that are associated with a cluster of genes in a lead-dependent manner. We noticed that the genes associated with one of the 14 detected trans-eQTL hotspots, Chr 2L: 6,250,000 could be roughly divided into two groups based on their differential expression profile patterns and different categories of function. This trans-eQTL hotspot validates one identified in a previous study using different recombinant inbred lines. The expression of all the associated genes in the trans-eQTL hotspot was visualized with hierarchical clustering analysis. Besides the overall expression profile patterns, the heatmap displayed the segregation of differential parental genetic contributions. This suggested that trans-regulatory regions with different genetic contributions from the parental lines have significantly different expression changes after lead exposure. We believe this study confirms our earlier study, and provides important insights to unravel the genetic variation in lead susceptibility in Drosophila model.
Collapse
Affiliation(s)
- Wen Qu
- Department of Pharmacology, Wayne State University, Detroit, MI, United States
| | - Katherine Gurdziel
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, United States
| | - Roger Pique-Regi
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, United States.,Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, United States
| | - Douglas M Ruden
- Department of Pharmacology, Wayne State University, Detroit, MI, United States.,Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, United States.,Institute of Environmental Health Sciences, Wayne State University, Detroit, MI, United States
| |
Collapse
|
29
|
A Robust Methodology for Assessing Differential Homeolog Contributions to the Transcriptomes of Allopolyploids. Genetics 2018; 210:883-894. [PMID: 30213855 DOI: 10.1534/genetics.118.301564] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 09/07/2018] [Indexed: 12/18/2022] Open
Abstract
Polyploidy has played a pivotal and recurring role in angiosperm evolution. Allotetraploids arise from hybridization between species and possess duplicated gene copies (homeologs) that serve redundant roles immediately after polyploidization. Although polyploidization is a major contributor to plant evolution, it remains poorly understood. We describe an analytical approach for assessing homeolog-specific expression that begins with de novo assembly of parental transcriptomes and effectively (i) reduces redundancy in de novo assemblies, (ii) identifies putative orthologs, (iii) isolates common regions between orthologs, and (iv) assesses homeolog-specific expression using a robust Bayesian Poisson-Gamma model to account for sequence bias when mapping polyploid reads back to parental references. Using this novel methodology, we examine differential homeolog contributions to the transcriptome in the recently formed allopolyploids Tragopogon mirus and T. miscellus (Compositae). Notably, we assess a larger Tragopogon gene set than previous studies of this system. Using carefully identified orthologous regions and filtering biased orthologs, we find in both allopolyploids largely balanced expression with no strong parental bias. These new methods can be used to examine homeolog expression in any tetrapolyploid system without requiring a reference genome.
Collapse
|
30
|
Liu X, MacLeod JN, Liu J. iMapSplice: Alleviating reference bias through personalized RNA-seq alignment. PLoS One 2018; 13:e0201554. [PMID: 30096157 PMCID: PMC6086400 DOI: 10.1371/journal.pone.0201554] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Accepted: 07/17/2018] [Indexed: 11/19/2022] Open
Abstract
Genomic variants in both coding and non-coding sequences can have functionally important and sometimes deleterious effects on exon splicing of gene transcripts. For transcriptome profiling using RNA-seq, the accurate alignment of reads across exon junctions is a critical step. Existing algorithms that utilize a standard reference genome as a template sometimes have difficulty in mapping reads that carry genomic variants. These problems can lead to allelic ratio biases and the failure to detect splice variants created by splice site polymorphisms. To improve RNA-seq read alignment, we have developed a novel approach called iMapSplice that enables personalized mRNA transcriptome profiling. The algorithm makes use of personal genomic information and performs an unbiased alignment towards genome indices carrying both reference and alternative bases. Importantly, this breaks the dependency on reference genome splice site dinucleotide motifs and enables iMapSplice to discover personal splice junctions created through splice site polymorphisms. We report comparative analyses using a number of simulated and real datasets. Besides general improvements in read alignment and splice junction discovery, iMapSplice greatly alleviates allelic ratio biases and unravels many previously uncharacterized splice junctions created by splice site polymorphisms, with minimal overhead in computation time and storage. Software download URL: https://github.com/LiuBioinfo/iMapSplice.
Collapse
Affiliation(s)
- Xinan Liu
- Department of Computer Science, University of Kentucky, Lexington, KY, United States of America
| | - James N. MacLeod
- Department of Veterinary Science, University of Kentucky, Lexington, KY, United States of America
| | - Jinze Liu
- Department of Computer Science, University of Kentucky, Lexington, KY, United States of America
- * E-mail:
| |
Collapse
|
31
|
Winter JM, Curry NL, Gildea DM, Williams KA, Lee M, Hu Y, Crawford NPS. Modifier locus mapping of a transgenic F2 mouse population identifies CCDC115 as a novel aggressive prostate cancer modifier gene in humans. BMC Genomics 2018; 19:450. [PMID: 29890952 PMCID: PMC5996485 DOI: 10.1186/s12864-018-4827-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 05/25/2018] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND It is well known that development of prostate cancer (PC) can be attributed to somatic mutations of the genome, acquired within proto-oncogenes or tumor-suppressor genes. What is less well understood is how germline variation contributes to disease aggressiveness in PC patients. To map germline modifiers of aggressive neuroendocrine PC, we generated a genetically diverse F2 intercross population using the transgenic TRAMP mouse model and the wild-derived WSB/EiJ (WSB) strain. The relevance of germline modifiers of aggressive PC identified in these mice was extensively correlated in human PC datasets and functionally validated in cell lines. RESULTS Aggressive PC traits were quantified in a population of 30 week old (TRAMP x WSB) F2 mice (n = 307). Correlation of germline genotype with aggressive disease phenotype revealed seven modifier loci that were significantly associated with aggressive disease. RNA-seq were analyzed using cis-eQTL and trait correlation analyses to identify candidate genes within each of these loci. Analysis of 92 (TRAMP x WSB) F2 prostates revealed 25 candidate genes that harbored both a significant cis-eQTL and mRNA expression correlations with an aggressive PC trait. We further delineated these candidate genes based on their clinical relevance, by interrogating human PC GWAS and PC tumor gene expression datasets. We identified four genes (CCDC115, DNAJC10, RNF149, and STYXL1), which encompassed all of the following characteristics: 1) one or more germline variants associated with aggressive PC traits; 2) differential mRNA levels associated with aggressive PC traits; and 3) differential mRNA expression between normal and tumor tissue. Functional validation studies of these four genes using the human LNCaP prostate adenocarcinoma cell line revealed ectopic overexpression of CCDC115 can significantly impede cell growth in vitro and tumor growth in vivo. Furthermore, CCDC115 human prostate tumor expression was associated with better survival outcomes. CONCLUSION We have demonstrated how modifier locus mapping in mouse models of PC, coupled with in silico analyses of human PC datasets, can reveal novel germline modifier genes of aggressive PC. We have also characterized CCDC115 as being associated with less aggressive PC in humans, placing it as a potential prognostic marker of aggressive PC.
Collapse
Affiliation(s)
- Jean M Winter
- Metastasis Genetics Section, Genetics and Molecular Biology Branch, National Human Genome Research Institute, NIH, Bethesda, MD, 20892, USA.,Present address: Dame Roma Mitchell Cancer Research Laboratories, Adelaide Health and Medical Sciences, The University of Adelaide, Adelaide, South Australia, 5000, Australia
| | - Natasha L Curry
- Metastasis Genetics Section, Genetics and Molecular Biology Branch, National Human Genome Research Institute, NIH, Bethesda, MD, 20892, USA
| | - Derek M Gildea
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, 20892, USA
| | - Kendra A Williams
- Metastasis Genetics Section, Genetics and Molecular Biology Branch, National Human Genome Research Institute, NIH, Bethesda, MD, 20892, USA
| | - Minnkyong Lee
- Metastasis Genetics Section, Genetics and Molecular Biology Branch, National Human Genome Research Institute, NIH, Bethesda, MD, 20892, USA
| | - Ying Hu
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, NIH, Rockville, MD, 20892, USA
| | - Nigel P S Crawford
- Metastasis Genetics Section, Genetics and Molecular Biology Branch, National Human Genome Research Institute, NIH, Bethesda, MD, 20892, USA. .,, Present address: Sanofi, 55 Corporate Dr., Bridgewater, NJ, 08897, USA.
| |
Collapse
|
32
|
Abstract
The majority of gene loci that have been associated with type 2 diabetes play a role in pancreatic islet function. To evaluate the role of islet gene expression in the etiology of diabetes, we sensitized a genetically diverse mouse population with a Western diet high in fat (45% kcal) and sucrose (34%) and carried out genome-wide association mapping of diabetes-related phenotypes. We quantified mRNA abundance in the islets and identified 18,820 expression QTL. We applied mediation analysis to identify candidate causal driver genes at loci that affect the abundance of numerous transcripts. These include two genes previously associated with monogenic diabetes (PDX1 and HNF4A), as well as three genes with nominal association with diabetes-related traits in humans (FAM83E, IL6ST, and SAT2). We grouped transcripts into gene modules and mapped regulatory loci for modules enriched with transcripts specific for α-cells, and another specific for δ-cells. However, no single module enriched for β-cell-specific transcripts, suggesting heterogeneity of gene expression patterns within the β-cell population. A module enriched in transcripts associated with branched-chain amino acid metabolism was the most strongly correlated with physiological traits that reflect insulin resistance. Although the mice in this study were not overtly diabetic, the analysis of pancreatic islet gene expression under dietary-induced stress enabled us to identify correlated variation in groups of genes that are functionally linked to diabetes-associated physiological traits. Our analysis suggests an expected degree of concordance between diabetes-associated loci in the mouse and those found in human populations, and demonstrates how the mouse can provide evidence to support nominal associations found in human genome-wide association mapping.
Collapse
|
33
|
Direct Testing for Allele-Specific Expression Differences Between Conditions. G3-GENES GENOMES GENETICS 2018; 8:447-460. [PMID: 29167272 PMCID: PMC5919738 DOI: 10.1534/g3.117.300139] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Allelic imbalance (AI) indicates the presence of functional variation in cis regulatory regions. Detecting cis regulatory differences using AI is widespread, yet there is no formal statistical methodology that tests whether AI differs between conditions. Here, we present a novel model and formally test differences in AI across conditions using Bayesian credible intervals. The approach tests AI by environment (G×E) interactions, and can be used to test AI between environments, genotypes, sex, and any other condition. We incorporate bias into the modeling process. Bias is allowed to vary between conditions, making the formulation of the model general. As gene expression affects power for detection of AI, and, as expression may vary between conditions, the model explicitly takes coverage into account. The proposed model has low type I and II error under several scenarios, and is robust to large differences in coverage between conditions. We reanalyze RNA-seq data from a Drosophila melanogaster population panel, with F1 genotypes, to compare levels of AI between mated and virgin female flies, and we show that AI × genotype interactions can also be tested. To demonstrate the use of the model to test genetic differences and interactions, a formal test between two F1s was performed, showing the expected 20% difference in AI. The proposed model allows a formal test of G×E and G×G, and reaffirms a previous finding that cis regulation is robust between environments.
Collapse
|
34
|
Sunde RA. Selenium regulation of selenoprotein enzyme activity and transcripts in a pilot study with Founder strains from the Collaborative Cross. PLoS One 2018; 13:e0191449. [PMID: 29338053 PMCID: PMC5770059 DOI: 10.1371/journal.pone.0191449] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 01/04/2018] [Indexed: 12/02/2022] Open
Abstract
Rodents and humans have 24–25 selenoproteins, and these proteins contain the 21st amino acid, selenocysteine, incorporated co-translationally into the peptide backbone in a series of reactions dependent on at least 6 unique gene products. In selenium (Se) deficiency, there is differential regulation of selenoprotein expression, whereby levels of some selenoproteins and their transcripts decrease dramatically in Se deficiency, but other selenoprotein transcripts are spared this decrease; the underlying mechanism, however, is not fully understood. To begin explore the genetic basis for this variation in regulation by Se status in a pilot study, we fed Se-deficient or Se-adequate diets (0.005 or 0.2 μg Se/g, respectively) for eight weeks to the eight Founder strains of the Collaborative Cross. We found rather uniform expression of selenoenzyme activity for glutathione peroxidase (Gpx) 3 in plasma, Gpx1 in red blood cells, and Gpx1, Gpx4, and thioredoxin reductase in liver. In Founder mice, Se deficiency decreased each of these activities to a similar extent. Regulation of selenoprotein transcript expression by Se status was also globally retained intact, with dramatic down-regulation of Gpx1, Selenow, and Selenoh transcripts in all 8 strains of Founder mice. These results indicate that differential regulation of selenoprotein expression by Se status is an essential aspect of Se metabolism and selenoprotein function. A few lone differences in Se regulation were observed for individual selenoproteins in this pilot study, but these differences did not single-out one strain or one selenoprotein that consistently had unique Se regulation of selenoprotein expression. These differences should be affirmed in larger studies; use of the Diversity Outbred and Collaborative Cross strains may help to better define the functions of these selenoproteins.
Collapse
Affiliation(s)
- Roger A. Sunde
- Department of Nutritional Sciences, University of Wisconsin, Madison, Wisconsin, United States of America
- * E-mail:
| |
Collapse
|
35
|
Chen A, Liu Y, Williams SM, Morris N, Buchner DA. Widespread epistasis regulates glucose homeostasis and gene expression. PLoS Genet 2017; 13:e1007025. [PMID: 28961251 PMCID: PMC5636166 DOI: 10.1371/journal.pgen.1007025] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Revised: 10/11/2017] [Accepted: 09/17/2017] [Indexed: 02/07/2023] Open
Abstract
The relative contributions of additive versus non-additive interactions in the regulation of complex traits remains controversial. This may be in part because large-scale epistasis has traditionally been difficult to detect in complex, multi-cellular organisms. We hypothesized that it would be easier to detect interactions using mouse chromosome substitution strains that simultaneously incorporate allelic variation in many genes on a controlled genetic background. Analyzing metabolic traits and gene expression levels in the offspring of a series of crosses between mouse chromosome substitution strains demonstrated that inter-chromosomal epistasis was a dominant feature of these complex traits. Epistasis typically accounted for a larger proportion of the heritable effects than those due solely to additive effects. These epistatic interactions typically resulted in trait values returning to the levels of the parental CSS host strain. Due to the large epistatic effects, analyses that did not account for interactions consistently underestimated the true effect sizes due to allelic variation or failed to detect the loci controlling trait variation. These studies demonstrate that epistatic interactions are a common feature of complex traits and thus identifying these interactions is key to understanding their genetic regulation. Most complex traits and diseases are regulated by the combined influence of multiple genetic variants. However, it remains controversial whether these genetic variants independently influence complex traits, and therefore the impact of each variant could be simply added together (additivity), or whether the variants work together to influence trait variation, in which case the combined impact of multiple variants would differ from the summed impact of each individual variant (epistasis). In this study in mice, we discovered that the genetic regulation of blood sugar levels and gene expression in the liver were predominantly controlled by non-additive interactions, whereas body weight was predominantly controlled by additive interactions. Remarkably, the expression level of nearly 25% of all genes in the liver was controlled by non-additive interactions. The non-additive interactions typically acted to return trait values to the levels detected in control mice, thus contributing to a reduction in trait variation. We also demonstrated that not accounting for non-additive interactions significantly underestimated the phenotypic effect of a genetic variant on a particular genetic background, suggesting that many previously identified risk loci may have significantly larger effects on disease susceptibility in a subset of individuals. These studies highlight the importance of understanding interactions between genetic variants to better understand disease risk and personalize clinical care.
Collapse
Affiliation(s)
- Anlu Chen
- Department of Biochemistry, Case Western Reserve University, Cleveland, OH, United States of America
| | - Yang Liu
- Department of Biochemistry, Case Western Reserve University, Cleveland, OH, United States of America
| | - Scott M. Williams
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, United States of America
| | - Nathan Morris
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, United States of America
| | - David A. Buchner
- Department of Biochemistry, Case Western Reserve University, Cleveland, OH, United States of America
- Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, OH, United States of America
- * E-mail:
| |
Collapse
|
36
|
Epistatic Networks Jointly Influence Phenotypes Related to Metabolic Disease and Gene Expression in Diversity Outbred Mice. Genetics 2017; 206:621-639. [PMID: 28592500 PMCID: PMC5499176 DOI: 10.1534/genetics.116.198051] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 04/03/2017] [Indexed: 12/20/2022] Open
Abstract
In this study, Tyler et al. analyzed the complex genetic architecture of metabolic disease-related traits using the Diversity Outbred mouse population Genetic studies of multidimensional phenotypes can potentially link genetic variation, gene expression, and physiological data to create multi-scale models of complex traits. The challenge of reducing these data to specific hypotheses has become increasingly acute with the advent of genome-scale data resources. Multi-parent populations derived from model organisms provide a resource for developing methods to understand this complexity. In this study, we simultaneously modeled body composition, serum biomarkers, and liver transcript abundances from 474 Diversity Outbred mice. This population contained both sexes and two dietary cohorts. Transcript data were reduced to functional gene modules with weighted gene coexpression network analysis (WGCNA), which were used as summary phenotypes representing enriched biological processes. These module phenotypes were jointly analyzed with body composition and serum biomarkers in a combined analysis of pleiotropy and epistasis (CAPE), which inferred networks of epistatic interactions between quantitative trait loci that affect one or more traits. This network frequently mapped interactions between alleles of different ancestries, providing evidence of both genetic synergy and redundancy between haplotypes. Furthermore, a number of loci interacted with sex and diet to yield sex-specific genetic effects and alleles that potentially protect individuals from the effects of a high-fat diet. Although the epistatic interactions explained small amounts of trait variance, the combination of directional interactions, allelic specificity, and high genomic resolution provided context to generate hypotheses for the roles of specific genes in complex traits. Our approach moves beyond the cataloging of single loci to infer genetic networks that map genetic etiology by simultaneously modeling all phenotypes.
Collapse
|
37
|
Ibarra-Soria X, Nakahara TS, Lilue J, Jiang Y, Trimmer C, Souza MA, Netto PH, Ikegami K, Murphy NR, Kusma M, Kirton A, Saraiva LR, Keane TM, Matsunami H, Mainland J, Papes F, Logan DW. Variation in olfactory neuron repertoires is genetically controlled and environmentally modulated. eLife 2017; 6. [PMID: 28438259 PMCID: PMC5404925 DOI: 10.7554/elife.21476] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 03/21/2017] [Indexed: 12/28/2022] Open
Abstract
The mouse olfactory sensory neuron (OSN) repertoire is composed of 10 million cells and each expresses one olfactory receptor (OR) gene from a pool of over 1000. Thus, the nose is sub-stratified into more than a thousand OSN subtypes. Here, we employ and validate an RNA-sequencing-based method to quantify the abundance of all OSN subtypes in parallel, and investigate the genetic and environmental factors that contribute to neuronal diversity. We find that the OSN subtype distribution is stereotyped in genetically identical mice, but varies extensively between different strains. Further, we identify cis-acting genetic variation as the greatest component influencing OSN composition and demonstrate independence from OR function. However, we show that olfactory stimulation with particular odorants results in modulation of dozens of OSN subtypes in a subtle but reproducible, specific and time-dependent manner. Together, these mechanisms generate a highly individualized olfactory sensory system by promoting neuronal diversity. DOI:http://dx.doi.org/10.7554/eLife.21476.001 Smells are simply chemicals in the air that are recognized by nerves in our nose. Each nerve has a receptor that can identify a limited number of chemicals, and the nerve then relays this information to the brain. Animals have hundreds to thousands of different types of these nerves meaning that they can detect a wide array of smells. Smell receptors are proteins, and the genes that encode these proteins can be very different in two unrelated people. This could partly explain, for example, why some people find certain odors intense and unpleasant while others do not. However, having different genes for smell receptors does not by itself completely explain why some people are more sensitive than others to particular smells. The amounts of each nerve type in the nose might also differ between people and have an effect, but to date it has not been possible to accurately count them all. Ibarra-Soria et al. have now devised a new method to essentially count the number of each nerve type in the noses of mice from different breeds. The method makes use of a technique called RNA-sequencing, which can reveal which genes are active at any one time, and thus show how many nerves are producing each type of smell receptor. Ibarra-Soria et al. learned that different breeds of mice had remarkably different compositions of nerves in their noses. Further analysis revealed that this was due to changes to the DNA code near to the genes that encode the smell receptor. Next, Ibarra-Soria et al. sought to find out how the amount of each nerve type is controlled by giving mice water with different smells for weeks and looking how this affected their noses. These experiments revealed that a small number of the nerve types became more or less common after exposure to a smell. The altered nerves were directly involved in recognizing the smells, proving that the very act of smelling can change the make-up of nerves in a mouse’s nose. These results confirm that the diversity in the nose of each individual is not only dictated by the types of receptors found in there, but also by the number of each nerve type. The next challenge is to understand better how these differences change the way people perceive smells. DOI:http://dx.doi.org/10.7554/eLife.21476.002
Collapse
Affiliation(s)
| | - Thiago S Nakahara
- Department of Genetics and Evolution, Institute of Biology, University of Campinas, Campinas, Brazil
| | - Jingtao Lilue
- Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Yue Jiang
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, United States
| | - Casey Trimmer
- Monell Chemical Senses Center, Philadelphia, United States
| | - Mateus Aa Souza
- Department of Genetics and Evolution, Institute of Biology, University of Campinas, Campinas, Brazil
| | - Paulo Hm Netto
- Department of Genetics and Evolution, Institute of Biology, University of Campinas, Campinas, Brazil
| | - Kentaro Ikegami
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, United States
| | | | - Mairi Kusma
- Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Andrea Kirton
- Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Luis R Saraiva
- Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Thomas M Keane
- Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Hiroaki Matsunami
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, United States.,Department of Neurobiology, Duke Institute for Brain Sciences, Duke University Medical Center, Durham, United States
| | - Joel Mainland
- Monell Chemical Senses Center, Philadelphia, United States.,Department of Neuroscience, University of Pennsylvania, Philadelphia, United States
| | - Fabio Papes
- Department of Genetics and Evolution, Institute of Biology, University of Campinas, Campinas, Brazil
| | - Darren W Logan
- Wellcome Trust Sanger Institute, Cambridge, United Kingdom.,Monell Chemical Senses Center, Philadelphia, United States
| |
Collapse
|
38
|
Schughart K, Williams RW. The Collaborative Cross Resource for Systems Genetics Research of Infectious Diseases. Methods Mol Biol 2017; 1488:579-596. [PMID: 27933545 PMCID: PMC7120135 DOI: 10.1007/978-1-4939-6427-7_28] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
An increasing body of evidence highlights the role of host genetic variation in driving susceptibility to severe disease following pathogen infection. In order to fully appreciate the importance of host genetics on infection susceptibility and resulting disease, genetically variable experimental model systems should be employed. These systems allow for the identification, characterization, and mechanistic dissection of genetic variants that cause differential disease responses. Herein we discuss application of the Collaborative Cross (CC) panel of recombinant inbred strains to study viral pathogenesis, focusing on practical considerations for experimental design, assessment and analysis of disease responses within the CC, as well as some of the resources developed for the CC. Although the focus of this chapter is on viral pathogenesis, many of the methods presented within are applicable to studies of other pathogens, as well as to case-control designs in genetically diverse populations.
Collapse
Affiliation(s)
- Klaus Schughart
- Department of Infection Genetics, Helmholtz Centre for Infection Research & University of Veterinary Medicine Hannover, Braunschweig, Niedersachsen Germany
| | - Robert W. Williams
- Department of Microbiology, Immunology and Biochemistry, University of Tennessee Health Science Center, Memphis, Tennessee USA
| |
Collapse
|
39
|
Baud A, Mulligan MK, Casale FP, Ingels JF, Bohl CJ, Callebert J, Launay JM, Krohn J, Legarra A, Williams RW, Stegle O. Genetic Variation in the Social Environment Contributes to Health and Disease. PLoS Genet 2017; 13:e1006498. [PMID: 28121987 PMCID: PMC5266220 DOI: 10.1371/journal.pgen.1006498] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 11/21/2016] [Indexed: 11/29/2022] Open
Abstract
Assessing the impact of the social environment on health and disease is challenging. As social effects are in part determined by the genetic makeup of social partners, they can be studied from associations between genotypes of one individual and phenotype of another (social genetic effects, SGE, also called indirect genetic effects). For the first time we quantified the contribution of SGE to more than 100 organismal phenotypes and genome-wide gene expression measured in laboratory mice. We find that genetic variation in cage mates (i.e. SGE) contributes to variation in organismal and molecular measures related to anxiety, wound healing, immune function, and body weight. Social genetic effects explained up to 29% of phenotypic variance, and for several traits their contribution exceeded that of direct genetic effects (effects of an individual's genotypes on its own phenotype). Importantly, we show that ignoring SGE can severely bias estimates of direct genetic effects (heritability). Thus SGE may be an important source of "missing heritability" in studies of complex traits in human populations. In summary, our study uncovers an important contribution of the social environment to phenotypic variation, sets the basis for using SGE to dissect social effects, and identifies an opportunity to improve studies of direct genetic effects.
Collapse
Affiliation(s)
- Amelie Baud
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Megan K. Mulligan
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Francesco Paolo Casale
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Jesse F. Ingels
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Casey J. Bohl
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Jacques Callebert
- AP-HP, Hôpital Lariboisière, Department of Biochemistry, INSERM U942, Paris, France
| | - Jean-Marie Launay
- AP-HP, Hôpital Lariboisière, Department of Biochemistry, INSERM U942, Paris, France
| | - Jon Krohn
- Wellcome Trust Centre for Human Genetics, Oxford, United Kingdom
| | | | - Robert W. Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| |
Collapse
|
40
|
Winter JM, Gildea DE, Andreas JP, Gatti DM, Williams KA, Lee M, Hu Y, Zhang S, Mullikin JC, Wolfsberg TG, McDonnell SK, Fogarty ZC, Larson MC, French AJ, Schaid DJ, Thibodeau SN, Churchill GA, Crawford NPS. Mapping Complex Traits in a Diversity Outbred F1 Mouse Population Identifies Germline Modifiers of Metastasis in Human Prostate Cancer. Cell Syst 2016; 4:31-45.e6. [PMID: 27916600 DOI: 10.1016/j.cels.2016.10.018] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Revised: 09/08/2016] [Accepted: 10/20/2016] [Indexed: 01/02/2023]
Abstract
It is unclear how standing genetic variation affects the prognosis of prostate cancer patients. To provide one controlled answer to this problem, we crossed a dominant, penetrant mouse model of prostate cancer to Diversity Outbred mice, a collection of animals that carries over 40 million SNPs. Integration of disease phenotype and SNP variation data in 493 F1 males identified a metastasis modifier locus on Chromosome 8 (LOD = 8.42); further analysis identified the genes Rwdd4, Cenpu, and Casp3 as functional effectors of this locus. Accordingly, analysis of over 5,300 prostate cancer patient samples revealed correlations between the presence of genetic variants at these loci, their expression levels, cancer aggressiveness, and patient survival. We also observed that ectopic overexpression of RWDD4 and CENPU increased the aggressiveness of two human prostate cancer cell lines. In aggregate, our approach demonstrates how well-characterized genetic variation in mice can be harnessed in conjunction with systems genetics approaches to identify and characterize germline modifiers of human disease processes.
Collapse
Affiliation(s)
- Jean M Winter
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Derek E Gildea
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Jonathan P Andreas
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | | | - Kendra A Williams
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Minnkyong Lee
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Ying Hu
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, NIH, Rockville, MD 20892, USA
| | - Suiyuan Zhang
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | -
- NIH Intramural Sequencing Center, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - James C Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Tyra G Wolfsberg
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Shannon K McDonnell
- Department of Health Sciences Research, Mayo Clinic College of Medicine, 200 First Street SW, Rochester, MN 55905, USA
| | - Zachary C Fogarty
- Department of Health Sciences Research, Mayo Clinic College of Medicine, 200 First Street SW, Rochester, MN 55905, USA
| | - Melissa C Larson
- Department of Health Sciences Research, Mayo Clinic College of Medicine, 200 First Street SW, Rochester, MN 55905, USA
| | - Amy J French
- Department of Laboratory Medicine and Pathology, Mayo Clinic College of Medicine, 200 First Street SW, Rochester, MN 55905, USA
| | - Daniel J Schaid
- Department of Health Sciences Research, Mayo Clinic College of Medicine, 200 First Street SW, Rochester, MN 55905, USA
| | - Stephen N Thibodeau
- Department of Laboratory Medicine and Pathology, Mayo Clinic College of Medicine, 200 First Street SW, Rochester, MN 55905, USA
| | | | - Nigel P S Crawford
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA.
| |
Collapse
|
41
|
Dowell R, Odell A, Richmond P, Malmer D, Halper-Stromberg E, Bennett B, Larson C, Leach S, Radcliffe RA. Genome characterization of the selected long- and short-sleep mouse lines. Mamm Genome 2016; 27:574-586. [PMID: 27651241 PMCID: PMC5110614 DOI: 10.1007/s00335-016-9663-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2016] [Accepted: 08/22/2016] [Indexed: 01/29/2023]
Abstract
The Inbred Long- and Short-Sleep (ILS, ISS) mouse lines were selected for differences in acute ethanol sensitivity using the loss of righting response (LORR) as the selection trait. The lines show an over tenfold difference in LORR and, along with a recombinant inbred panel derived from them (the LXS), have been widely used to dissect the genetic underpinnings of acute ethanol sensitivity. Here we have sequenced the genomes of the ILS and ISS to investigate the DNA variants that contribute to their sensitivity difference. We identified ~2.7 million high-confidence SNPs and small indels and ~7000 structural variants between the lines; variants were found to occur in 6382 annotated genes. Using a hidden Markov model, we were able to reconstruct the genome-wide ancestry patterns of the eight inbred progenitor strains from which the ILS and ISS were derived, and found that quantitative trait loci that have been mapped for LORR were slightly enriched for DNA variants. Finally, by mapping and quantifying RNA-seq reads from the ILS and ISS to their strain-specific genomes rather than to the reference genome, we found a substantial improvement in a differential expression analysis between the lines. This work will help in identifying and characterizing the DNA sequence variants that contribute to the difference in ethanol sensitivity between the ILS and ISS and will also aid in accurate quantification of RNA-seq data generated from the LXS RIs.
Collapse
Affiliation(s)
- Robin Dowell
- BioFrontiers Institute, University of Colorado Boulder, Boulder, CO, 80309, USA. .,Department of Molecular, Cellular, and Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309, USA. .,Department of Computer Science, University of Colorado Boulder, Boulder, CO, 80309, USA.
| | - Aaron Odell
- BioFrontiers Institute, University of Colorado Boulder, Boulder, CO, 80309, USA
| | - Phillip Richmond
- BioFrontiers Institute, University of Colorado Boulder, Boulder, CO, 80309, USA.,Department of Molecular, Cellular, and Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309, USA
| | - Daniel Malmer
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, 80309, USA
| | - Eitan Halper-Stromberg
- Center for Genes, Environment and Health, National Jewish Health, Denver, CO, 80206, USA
| | - Beth Bennett
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado, Aurora, CO, 80045, USA
| | - Colin Larson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado, Aurora, CO, 80045, USA
| | - Sonia Leach
- Center for Genes, Environment and Health, National Jewish Health, Denver, CO, 80206, USA
| | - Richard A Radcliffe
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado, Aurora, CO, 80045, USA.
| |
Collapse
|
42
|
Hodgkinson A, Grenier JC, Gbeha E, Awadalla P. A haplotype-based normalization technique for the analysis and detection of allele specific expression. BMC Bioinformatics 2016; 17:364. [PMID: 27618913 PMCID: PMC5020486 DOI: 10.1186/s12859-016-1238-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Accepted: 09/02/2016] [Indexed: 12/17/2022] Open
Abstract
Background Allele specific expression (ASE) has become an important phenotype, being utilized for the detection of cis-regulatory variation, nonsense mediated decay and imprinting in the personal genome, and has been used to both identify disease loci and consider the penetrance of damaging alleles. The detection of ASE using high throughput technologies relies on aligning short-read sequencing data, a process that has inherent biases, and there is still a need to develop fast and accurate methods to detect ASE given the unprecedented growth of sequencing information in big data projects. Results Here, we present a new approach to normalize RNA sequencing data in order to call ASE events with high precision in a short time-frame. Using simulated datasets we find that our approach dramatically improves reference allele quantification at heterozygous sites versus default mapping methods and also performs well compared to existing techniques for ASE detection, such as filtering methods and mapping to parental genomes, without the need for complex and time consuming manipulation. Finally, by sequencing the exomes and transcriptomes of 96 well-phenotyped individuals of the CARTaGENE cohort, we characterise the levels of ASE across individuals and find a significant association between the proportion of sites undergoing ASE within the genome and smoking. Conclusions The correct treatment and analysis of RNA sequencing data is vital to control for mapping biases and detect genuine ASE signals. By normalising RNA sequencing information after mapping, we show that this approach can be used to identify biologically relevant signals in personal genomes. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1238-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alan Hodgkinson
- CHU Sainte Justine Research Centre, Department of Pediatrics, Faculty of Medicine, Universite de Montreal, 3175 Chemin de la Cote Sainte Catherine, Montreal, QC, Canada. .,Department of Medical and Molecular Genetics, Guy's Hospital, King's College London, London, SE1 9RT, UK.
| | - Jean-Christophe Grenier
- CHU Sainte Justine Research Centre, Department of Pediatrics, Faculty of Medicine, Universite de Montreal, 3175 Chemin de la Cote Sainte Catherine, Montreal, QC, Canada
| | - Elias Gbeha
- CHU Sainte Justine Research Centre, Department of Pediatrics, Faculty of Medicine, Universite de Montreal, 3175 Chemin de la Cote Sainte Catherine, Montreal, QC, Canada.,Ontario Institute of Cancer Research, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Philip Awadalla
- CHU Sainte Justine Research Centre, Department of Pediatrics, Faculty of Medicine, Universite de Montreal, 3175 Chemin de la Cote Sainte Catherine, Montreal, QC, Canada.,Ontario Institute of Cancer Research, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
43
|
Extensive sequence divergence between the reference genomes of two elite indica rice varieties Zhenshan 97 and Minghui 63. Proc Natl Acad Sci U S A 2016; 113:E5163-71. [PMID: 27535938 DOI: 10.1073/pnas.1611012113] [Citation(s) in RCA: 155] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Asian cultivated rice consists of two subspecies: Oryza sativa subsp. indica and O. sativa subsp. japonica Despite the fact that indica rice accounts for over 70% of total rice production worldwide and is genetically much more diverse, a high-quality reference genome for indica rice has yet to be published. We conducted map-based sequencing of two indica rice lines, Zhenshan 97 (ZS97) and Minghui 63 (MH63), which represent the two major varietal groups of the indica subspecies and are the parents of an elite Chinese hybrid. The genome sequences were assembled into 237 (ZS97) and 181 (MH63) contigs, with an accuracy >99.99%, and covered 90.6% and 93.2% of their estimated genome sizes. Comparative analyses of these two indica genomes uncovered surprising structural differences, especially with respect to inversions, translocations, presence/absence variations, and segmental duplications. Approximately 42% of nontransposable element related genes were identical between the two genomes. Transcriptome analysis of three tissues showed that 1,059-2,217 more genes were expressed in the hybrid than in the parents and that the expressed genes in the hybrid were much more diverse due to their divergence between the parental genomes. The public availability of two high-quality reference genomes for the indica subspecies of rice will have large-ranging implications for plant biology and crop genetic improvement.
Collapse
|
44
|
Morton NM, Beltram J, Carter RN, Michailidou Z, Gorjanc G, Fadden CM, Barrios-Llerena ME, Rodriguez-Cuenca S, Gibbins MTG, Aird RE, Moreno-Navarrete JM, Munger SC, Svenson KL, Gastaldello A, Ramage L, Naredo G, Zeyda M, Wang ZV, Howie AF, Saari A, Sipilä P, Stulnig TM, Gudnason V, Kenyon CJ, Seckl JR, Walker BR, Webster SP, Dunbar DR, Churchill GA, Vidal-Puig A, Fernandez-Real JM, Emilsson V, Horvat S. Genetic identification of thiosulfate sulfurtransferase as an adipocyte-expressed antidiabetic target in mice selected for leanness. Nat Med 2016; 22:771-9. [PMID: 27270587 PMCID: PMC5524189 DOI: 10.1038/nm.4115] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2015] [Accepted: 04/29/2016] [Indexed: 12/13/2022]
Abstract
The discovery of genetic mechanisms for resistance to obesity and diabetes may illuminate new therapeutic strategies for the treatment of this global health challenge. We used the polygenic 'lean' mouse model, which has been selected for low adiposity over 60 generations, to identify mitochondrial thiosulfate sulfurtransferase (Tst; also known as rhodanese) as a candidate obesity-resistance gene with selectively increased expression in adipocytes. Elevated adipose Tst expression correlated with indices of metabolic health across diverse mouse strains. Transgenic overexpression of Tst in adipocytes protected mice from diet-induced obesity and insulin-resistant diabetes. Tst-deficient mice showed markedly exacerbated diabetes, whereas pharmacological activation of TST ameliorated diabetes in mice. Mechanistically, TST selectively augmented mitochondrial function combined with degradation of reactive oxygen species and sulfide. In humans, TST mRNA expression in adipose tissue correlated positively with insulin sensitivity in adipose tissue and negatively with fat mass. Thus, the genetic identification of Tst as a beneficial regulator of adipocyte mitochondrial function may have therapeutic significance for individuals with type 2 diabetes.
Collapse
Affiliation(s)
- Nicholas M. Morton
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Jasmina Beltram
- Biotechnical Faculty, Animal Science Department, University of Ljubljana, Ljubljana, Slovenia
| | - Roderick N. Carter
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Zoi Michailidou
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Gregor Gorjanc
- Biotechnical Faculty, Animal Science Department, University of Ljubljana, Ljubljana, Slovenia
| | - Clare Mc Fadden
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Martin E. Barrios-Llerena
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Sergio Rodriguez-Cuenca
- Metabolic Research Laboratories, Level 4, Wellcome Trust-MRC Institute of Metabolic Science, Addenbrookes Hospital, Cambridge, UK
| | - Matthew T. G. Gibbins
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Rhona E. Aird
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - José Maria Moreno-Navarrete
- Department of Diabetes, Endocrinology and Nutrition, Institut d'Investigació Biomédica de Girona; Department of Medicine, University of Girona
- Centro de Investigación Biomédica en Red de Fisiopatología de la Obesidad y Nutrición, Instituto de Salud Carlos III, Girona, Spain
| | | | | | - Annalisa Gastaldello
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Lynne Ramage
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Gregorio Naredo
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Maximilian Zeyda
- Clinical Division of Endocrinology and Metabolism, Department of Medicine III, Medical University of Vienna, Vienna, Austria
| | - Zhao V. Wang
- Department of Internal Medicine, Touchstone Diabetes Center University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Alexander F. Howie
- The MRC Centre for Reproductive Health, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Aila Saari
- Department of Physiology, Institute of Biomedicine, University of Turku, Turku, Finland
| | - Petra Sipilä
- Central Animal Laboratory, University of Turku, Turku, Finland
| | - Thomas M. Stulnig
- Clinical Division of Endocrinology and Metabolism, Department of Medicine III, Medical University of Vienna, Vienna, Austria
| | | | - Christopher J. Kenyon
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Jonathan R. Seckl
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Brian R. Walker
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Scott P. Webster
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | - Donald R. Dunbar
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Queen’s Medical Research Institute, Edinburgh, UK
| | | | - Antonio Vidal-Puig
- Metabolic Research Laboratories, Level 4, Wellcome Trust-MRC Institute of Metabolic Science, Addenbrookes Hospital, Cambridge, UK
| | - José Manuel Fernandez-Real
- Department of Diabetes, Endocrinology and Nutrition, Institut d'Investigació Biomédica de Girona; Department of Medicine, University of Girona
- Centro de Investigación Biomédica en Red de Fisiopatología de la Obesidad y Nutrición, Instituto de Salud Carlos III, Girona, Spain
| | - Valur Emilsson
- Icelandic Heart Association, Kopavogur, Iceland
- Faculty of Pharmaceutical Sciences, University of Iceland, Reykjavik, Iceland
| | - Simon Horvat
- Biotechnical Faculty, Animal Science Department, University of Ljubljana, Ljubljana, Slovenia
- National Institute of Chemistry, Ljubljana, Slovenia
| |
Collapse
|
45
|
Chick JM, Munger SC, Simecek P, Huttlin EL, Choi K, Gatti DM, Raghupathy N, Svenson KL, Churchill GA, Gygi SP. Defining the consequences of genetic variation on a proteome-wide scale. Nature 2016; 534:500-5. [PMID: 27309819 PMCID: PMC5292866 DOI: 10.1038/nature18270] [Citation(s) in RCA: 249] [Impact Index Per Article: 31.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 04/13/2016] [Indexed: 12/11/2022]
Abstract
Genetic variation modulates protein expression through both transcriptional and post-transcriptional mechanisms. To characterize the consequences of natural genetic diversity on the proteome, here we combine a multiplexed, mass spectrometry-based method for protein quantification with an emerging outbred mouse model containing extensive genetic variation from eight inbred founder strains. By measuring genome-wide transcript and protein expression in livers from 192 Diversity outbred mice, we identify 2,866 protein quantitative trait loci (pQTL) with twice as many local as distant genetic variants. These data support distinct transcriptional and post-transcriptional models underlying the observed pQTL effects. Using a sensitive approach to mediation analysis, we often identified a second protein or transcript as the causal mediator of distant pQTL. Our analysis reveals an extensive network of direct protein-protein interactions. Finally, we show that local genotype can provide accurate predictions of protein abundance in an independent cohort of collaborative cross mice.
Collapse
Affiliation(s)
- Joel M Chick
- Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | - Petr Simecek
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | | | - Kwangbom Choi
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | | | | | | | | | - Steven P Gygi
- Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
46
|
Chen Z, Hagen DE, Wang J, Elsik CG, Ji T, Siqueira LG, Hansen PJ, Rivera RM. Global assessment of imprinted gene expression in the bovine conceptus by next generation sequencing. Epigenetics 2016; 11:501-16. [PMID: 27245094 PMCID: PMC4939914 DOI: 10.1080/15592294.2016.1184805] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Genomic imprinting is an epigenetic mechanism that leads to parental-allele-specific gene expression. Approximately 150 imprinted genes have been identified in humans and mice but less than 30 have been described as imprinted in cattle. For the purpose of de novo identification of imprinted genes in bovine, we determined global monoallelic gene expression in brain, skeletal muscle, liver, kidney and placenta of day ∼105 Bos taurus indicus × Bos taurus taurus F1 conceptuses using RNA sequencing. To accomplish this, we developed a bioinformatics pipeline to identify parent-specific single nucleotide polymorphism alleles after filtering adenosine to inosine (A-to-I) RNA editing sites. We identified 53 genes subject to monoallelic expression. Twenty three are genes known to be imprinted in the cow and an additional 7 have previously been characterized as imprinted in human and/or mouse that have not been reported as imprinted in cattle. Of the remaining 23 genes, we found that 10 are uncharacterized or unannotated transcripts located in known imprinted clusters, whereas the other 13 genes are distributed throughout the bovine genome and are not close to any known imprinted clusters. To exclude potential cis-eQTL effects on allele expression, we corroborated the parental specificity of monoallelic expression in day 86 Bos taurus taurus × Bos taurus taurus conceptuses and identified 8 novel bovine imprinted genes. Further, we identified 671 candidate A-to-I RNA editing sites and describe random X-inactivation in day 15 bovine extraembryonic membranes. Our results expand the imprinted gene list in bovine and demonstrate that monoallelic gene expression can be the result of cis-eQTL effects.
Collapse
Affiliation(s)
- Zhiyuan Chen
- a Division of Animal Sciences , University of Missouri , Columbia , MO , USA
| | - Darren E Hagen
- a Division of Animal Sciences , University of Missouri , Columbia , MO , USA
| | - Juanbin Wang
- b Department of Statistics , University of Missouri , Columbia , MO , USA
| | - Christine G Elsik
- a Division of Animal Sciences , University of Missouri , Columbia , MO , USA
| | - Tieming Ji
- b Department of Statistics , University of Missouri , Columbia , MO , USA
| | - Luiz G Siqueira
- c Department of Animal Sciences , University of Florida , Gainesville , FL , USA
| | - Peter J Hansen
- c Department of Animal Sciences , University of Florida , Gainesville , FL , USA
| | - Rocío M Rivera
- a Division of Animal Sciences , University of Missouri , Columbia , MO , USA
| |
Collapse
|
47
|
Buffering of Genetic Regulatory Networks in Drosophila melanogaster. Genetics 2016; 203:1177-90. [PMID: 27194752 DOI: 10.1534/genetics.116.188797] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 05/17/2016] [Indexed: 01/01/2023] Open
Abstract
Regulatory variation in gene expression can be described by cis- and trans-genetic components. Here we used RNA-seq data from a population panel of Drosophila melanogaster test crosses to compare allelic imbalance (AI) in female head tissue between mated and virgin flies, an environmental change known to affect transcription. Indeed, 3048 exons (1610 genes) are differentially expressed in this study. A Bayesian model for AI, with an intersection test, controls type I error. There are ∼200 genes with AI exclusively in mated or virgin flies, indicating an environmental component of expression regulation. On average 34% of genes within a cross and 54% of all genes show evidence for genetic regulation of transcription. Nearly all differentially regulated genes are affected in cis, with an average of 63% of expression variation explained by the cis-effects. Trans-effects explain 8% of the variance in AI on average and the interaction between cis and trans explains an average of 11% of the total variance in AI. In both environments cis- and trans-effects are compensatory in their overall effect, with a negative association between cis- and trans-effects in 85% of the exons examined. We hypothesize that the gene expression level perturbed by cis-regulatory mutations is compensated through trans-regulatory mechanisms, e.g., trans and cis by trans-factors buffering cis-mutations. In addition, when AI is detected in both environments, cis-mated, cis-virgin, and trans-mated-trans-virgin estimates are highly concordant with 99% of all exons positively correlated with a median correlation of 0.83 for cis and 0.95 for trans We conclude that the gene regulatory networks (GRNs) are robust and that trans-buffering explains robustness.
Collapse
|
48
|
Genetic Architectures of Quantitative Variation in RNA Editing Pathways. Genetics 2015; 202:787-98. [PMID: 26614740 DOI: 10.1534/genetics.115.179481] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 11/17/2015] [Indexed: 11/18/2022] Open
Abstract
RNA editing refers to post-transcriptional processes that alter the base sequence of RNA. Recently, hundreds of new RNA editing targets have been reported. However, the mechanisms that determine the specificity and degree of editing are not well understood. We examined quantitative variation of site-specific editing in a genetically diverse multiparent population, Diversity Outbred mice, and mapped polymorphic loci that alter editing ratios globally for C-to-U editing and at specific sites for A-to-I editing. An allelic series in the C-to-U editing enzyme Apobec1 influences the editing efficiency of Apob and 58 additional C-to-U editing targets. We identified 49 A-to-I editing sites with polymorphisms in the edited transcript that alter editing efficiency. In contrast to the shared genetic control of C-to-U editing, most of the variable A-to-I editing sites were determined by local nucleotide polymorphisms in proximity to the editing site in the RNA secondary structure. Our results indicate that RNA editing is a quantitative trait subject to genetic variation and that evolutionary constraints have given rise to distinct genetic architectures in the two canonical types of RNA editing.
Collapse
|
49
|
Stein S, Lu ZX, Bahrami-Samani E, Park JW, Xing Y. Discover hidden splicing variations by mapping personal transcriptomes to personal genomes. Nucleic Acids Res 2015; 43:10612-22. [PMID: 26578562 PMCID: PMC4678817 DOI: 10.1093/nar/gkv1099] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Accepted: 10/09/2015] [Indexed: 01/27/2023] Open
Abstract
RNA-seq has become a popular technology for studying genetic variation of pre-mRNA alternative splicing. Commonly used RNA-seq aligners rely on the consensus splice site dinucleotide motifs to map reads across splice junctions. Consequently, genomic variants that create novel splice site dinucleotides may produce splice junction RNA-seq reads that cannot be mapped to the reference genome. We developed and evaluated an approach to identify ‘hidden’ splicing variations in personal transcriptomes, by mapping personal RNA-seq data to personal genomes. Computational analysis and experimental validation indicate that this approach identifies personal specific splice junctions at a low false positive rate. Applying this approach to an RNA-seq data set of 75 individuals, we identified 506 personal specific splice junctions, among which 437 were novel splice junctions not documented in current human transcript annotations. 94 splice junctions had splice site SNPs associated with GWAS signals of human traits and diseases. These involve genes whose splicing variations have been implicated in diseases (such as OAS1), as well as novel associations between alternative splicing and diseases (such as ICA1). Collectively, our work demonstrates that the personal genome approach to RNA-seq read alignment enables the discovery of a large but previously unknown catalog of splicing variations in human populations.
Collapse
Affiliation(s)
- Shayna Stein
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Zhi-Xiang Lu
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Emad Bahrami-Samani
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Juw Won Park
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yi Xing
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
50
|
Yang C, Wu PY, Tong L, Phan JH, Wang MD. The impact of RNA-seq aligners on gene expression estimation. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2015; 2015:462-471. [PMID: 27583310 DOI: 10.1145/2808719.2808767] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
While numerous RNA-seq data analysis pipelines are available, research has shown that the choice of pipeline influences the results of differentially expressed gene detection and gene expression estimation. Gene expression estimation is a key step in RNA-seq data analysis, since the accuracy of gene expression estimates profoundly affects the subsequent analysis. Generally, gene expression estimation involves sequence alignment and quantification, and accurate gene expression estimation requires accurate alignment. However, the impact of aligners on gene expression estimation remains unclear. We address this need by constructing nine pipelines consisting of nine spliced aligners and one quantifier. We then use simulated data to investigate the impact of aligners on gene expression estimation. To evaluate alignment, we introduce three alignment performance metrics, (1) the percentage of reads aligned, (2) the percentage of reads aligned with zero mismatch (ZeroMismatchPercentage), and (3) the percentage of reads aligned with at most one mismatch (ZeroOneMismatchPercentage). We then evaluate the impact of alignment performance on gene expression estimation using three metrics, (1) gene detection accuracy, (2) the number of genes falsely quantified (FalseExpNum), and (3) the number of genes with falsely estimated fold changes (FalseFcNum). We found that among various pipelines, FalseExpNum and FalseFcNum are correlated. Moreover, FalseExpNum is linearly correlated with the percentage of reads aligned and ZeroMismatchPercentage, and FalseFcNum is linearly correlated with ZeroMismatchPercentage. Because of this correlation, the percentage of reads aligned and ZeroMismatchPercentage may be used to assess the performance of gene expression estimation for all RNA-seq datasets.
Collapse
Affiliation(s)
- Cheng Yang
- Department of Biomedical Engineering, Georgia Institute of Technology, Emory University, and Peking University, Atlanta, GA 30332, USA
| | - Po-Yen Wu
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Li Tong
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - John H Phan
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - May D Wang
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| |
Collapse
|