1
|
Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. Am J Hum Genet 2023; 110:2077-2091. [PMID: 38065072 PMCID: PMC10716520 DOI: 10.1016/j.ajhg.2023.10.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 12/18/2023] Open
Abstract
Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Joshua G Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Caoqi Fan
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Bryan Dinh
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Charleston W K Chiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Michael D Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
2
|
Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CW, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.07.536093. [PMID: 37066144 PMCID: PMC10104234 DOI: 10.1101/2023.04.07.536093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide Association Studies (GWAS) are a powerful way to find genetic loci associated with phenotypes. GWAS are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix given the ARG (local eGRM). Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to identify a large-effect BMI locus, the CREBRF gene, in a sample of Native Hawaiians in which it was not previously detectable by GWAS because of a lack of population-specific imputation resources. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California
| | - Joshua G. Schraiber
- Department of Quantitative and Computational Biology, University of Southern California
| | - Caoqi Fan
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Bryan Dinh
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Charleston W.K. Chiang
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California
| |
Collapse
|
3
|
McIntyre LM. Celebrating discovery across the tree of life. G3 (BETHESDA, MD.) 2023; 13:6986389. [PMID: 36634225 PMCID: PMC9836344 DOI: 10.1093/g3journal/jkac318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
4
|
Crouse WL, Kelada SNP, Valdar W. Inferring the Allelic Series at QTL in Multiparental Populations. Genetics 2020; 216:957-983. [PMID: 33082282 PMCID: PMC7768242 DOI: 10.1534/genetics.120.303393] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 10/12/2020] [Indexed: 12/25/2022] Open
Abstract
Multiparental populations (MPPs) are experimental populations in which the genome of every individual is a mosaic of known founder haplotypes. These populations are useful for detecting quantitative trait loci (QTL) because tests of association can leverage inferred founder haplotype descent. It is difficult, however, to determine how haplotypes at a locus group into distinct functional alleles, termed the allelic series. The allelic series is important because it provides information about the number of causal variants at a QTL and their combined effects. In this study, we introduce a fully Bayesian model selection framework for inferring the allelic series. This framework accounts for sources of uncertainty found in typical MPPs, including the number and composition of functional alleles. Our prior distribution for the allelic series is based on the Chinese restaurant process, a relative of the Dirichlet process, and we leverage its connection to the coalescent to introduce additional prior information about haplotype relatedness via a phylogenetic tree. We evaluate our approach via simulation and apply it to QTL from two MPPs: the Collaborative Cross (CC) and the Drosophila Synthetic Population Resource (DSPR). We find that, although posterior inference of the exact allelic series is often uncertain, we are able to distinguish biallelic QTL from more complex multiallelic cases. Additionally, our allele-based approach improves haplotype effect estimation when the true number of functional alleles is small. Our method, Tree-Based Inference of Multiallelism via Bayesian Regression (TIMBR), provides new insight into the genetic architecture of QTL in MPPs.
Collapse
Affiliation(s)
- Wesley L Crouse
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, North Carolina 27599
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Samir N P Kelada
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599
- Marsico Lung Institute, University of North Carolina, Chapel Hill, North Carolina 27599
| | - William Valdar
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina 27599
| |
Collapse
|
5
|
A Loss-of-Function Mutation in the Integrin Alpha L ( Itgal) Gene Contributes to Susceptibility to Salmonella enterica Serovar Typhimurium Infection in Collaborative Cross Strain CC042. Infect Immun 2019; 88:IAI.00656-19. [PMID: 31636138 DOI: 10.1128/iai.00656-19] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 10/05/2019] [Indexed: 12/18/2022] Open
Abstract
Salmonella is an intracellular bacterium found in the gastrointestinal tract of mammalian, avian, and reptilian hosts. Mouse models have been extensively used to model in vivo distinct aspects of human Salmonella infections and have led to the identification of several host susceptibility genes. We have investigated the susceptibility of Collaborative Cross strains to intravenous infection with Salmonella enterica serovar Typhimurium as a model of human systemic invasive infection. In this model, strain CC042/GeniUnc (CC042) mice displayed extreme susceptibility with very high bacterial loads and mortality. CC042 mice showed lower spleen weights and decreased splenocyte numbers before and after infection, affecting mostly CD8+ T cells, B cells, and all myeloid cell populations, compared with control C57BL/6J mice. CC042 mice also had lower thymus weights with a reduced total number of thymocytes and double-negative and double-positive (CD4+, CD8+) thymocytes compared to C57BL/6J mice. Analysis of bone marrow-resident hematopoietic progenitors showed a strong bias against lymphoid-primed multipotent progenitors. An F2 cross between CC042 and C57BL/6N mice identified two loci on chromosome 7 (Stsl6 and Stsl7) associated with differences in bacterial loads. In the Stsl7 region, CC042 carried a loss-of-function variant, unique to this strain, in the integrin alpha L (Itgal) gene, the causative role of which was confirmed by a quantitative complementation test. Notably, Itgal loss of function increased the susceptibility to S. Typhimurium in a (C57BL/6J × CC042)F1 mouse background but not in a C57BL/6J mouse inbred background. These results further emphasize the utility of the Collaborative Cross to identify new host genetic variants controlling susceptibility to infections and improve our understanding of the function of the Itgal gene.
Collapse
|
6
|
Thompson KL, Linnen CR, Kubatko L. Tree-based quantitative trait mapping in the presence of external covariates. Stat Appl Genet Mol Biol 2016; 15:473-490. [PMID: 27875322 DOI: 10.1515/sagmb-2015-0107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
A central goal in biological and biomedical sciences is to identify the molecular basis of variation in morphological and behavioral traits. Over the last decade, improvements in sequencing technologies coupled with the active development of association mapping methods have made it possible to link single nucleotide polymorphisms (SNPs) and quantitative traits. However, a major limitation of existing methods is that they are often unable to consider complex, but biologically-realistic, scenarios. Previous work showed that association mapping method performance can be improved by using the evolutionary history within each SNP to estimate the covariance structure among randomly-sampled individuals. Here, we propose a method that can be used to analyze a variety of data types, such as data including external covariates, while considering the evolutionary history among SNPs, providing an advantage over existing methods. Existing methods either do so at a computational cost, or fail to model these relationships altogether. By considering the broad-scale relationships among SNPs, the proposed approach is both computationally-feasible and informed by the evolutionary history among SNPs. We show that incorporating an approximate covariance structure during analysis of complex data sets increases performance in quantitative trait mapping, and apply the proposed method to deer mice data.
Collapse
|
7
|
Thompson KL, Fardo DW. Comparing performance of non-tree-based and tree-based association mapping methods. BMC Proc 2016; 10:405-410. [PMID: 27980669 PMCID: PMC5133494 DOI: 10.1186/s12919-016-0063-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
A central goal in the biomedical and biological sciences is to link variation in quantitative traits to locations along the genome (single nucleotide polymorphisms). Sequencing technology has rapidly advanced in recent decades, along with the statistical methodology to analyze genetic data. Two classes of association mapping methods exist: those that account for the evolutionary relatedness among individuals, and those that ignore the evolutionary relationships among individuals. While the former methods more fully use implicit information in the data, the latter methods are more flexible in the types of data they can handle. This study presents a comparison of the 2 types of association mapping methods when they are applied to simulated data.
Collapse
Affiliation(s)
| | - David W. Fardo
- Department of Biostatistics, University of Kentucky College of Public Health, Lexington, KY 40536-0003 USA
| |
Collapse
|
8
|
Abstract
A general Bayesian model, Diploffect, is described for estimating the effects of founder haplotypes at quantitative trait loci (QTL) detected in multiparental genetic populations; such populations include the Collaborative Cross (CC), Heterogeneous Socks (HS), and many others for which local genetic variation is well described by an underlying, usually probabilistically inferred, haplotype mosaic. Our aim is to provide a framework for coherent estimation of haplotype and diplotype (haplotype pair) effects that takes into account the following: uncertainty in haplotype composition for each individual; uncertainty arising from small sample sizes and infrequently observed haplotype combinations; possible effects of dominance (for noninbred subjects); genetic background; and that provides a means to incorporate data that may be incomplete or has a hierarchical structure. Using the results of a probabilistic haplotype reconstruction as prior information, we obtain posterior distributions at the QTL for both haplotype effects and haplotype composition. Two alternative computational approaches are supplied: a Markov chain Monte Carlo sampler and a procedure based on importance sampling of integrated nested Laplace approximations. Using simulations of QTL in the incipient CC (pre-CC) and Northport HS populations, we compare the accuracy of Diploffect, approximations to it, and more commonly used approaches based on Haley–Knott regression, describing trade-offs between these methods. We also estimate effects for three QTL previously identified in those populations, obtaining posterior intervals that describe how the phenotype might be affected by diplotype substitutions at the modeled locus.
Collapse
|
9
|
Sun L, Ye M, Hao H, Wang N, Wang Y, Cheng T, Zhang Q, Wu R. A model framework for identifying genes that guide the evolution of heterochrony. Mol Biol Evol 2014; 31:2238-47. [PMID: 24817546 DOI: 10.1093/molbev/msu156] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Heterochrony, the phylogenic change in the time of developmental events or rate of development, has been thought to play an important role in producing phenotypic novelty during evolution. Increasing evidence suggests that specific genes are implicated in heterochrony, guiding the process of developmental divergence, but no quantitative models have been instrumented to map such heterochrony genes. Here, we present a computational framework for genetic mapping by which to characterize and locate quantitative trait loci (QTLs) that govern heterochrony described by four parameters, the timing of the inflection point, the timing of maximum acceleration of growth, the timing of maximum deceleration of growth, and the length of linear growth. The framework was developed from functional mapping, a dynamic model derived to map QTLs for the overall process and pattern of development. By integrating an optimality algorithm, the framework allows the so-called heterochrony QTLs (hQTLs) to be tested and quantified. Specific pipelines are given for testing how hQTLs control the onset and offset of developmental events, the rate of development, and duration of a particular developmental stage. Computer simulation was performed to examine the statistical properties of the model and demonstrate its utility to characterize the effect of hQTLs on population diversification due to heterochrony. By analyzing a genetic mapping data in rice, the framework identified an hQTL that controls the timing of maximum growth rate and duration of linear growth stage in plant height growth. The framework provides a tool to study how genetic variation translates into phenotypic innovation, leading a lineage to evolve, through heterochrony.
Collapse
Affiliation(s)
- Lidan Sun
- Beijing Key Laboratory of Ornamental Germplasm Innovation and Molecular Breeding, National Engineering Research Center for Floriculture, College of Landscape Architecture, Beijing Forestry University, Beijing, China
| | - Meixia Ye
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Han Hao
- Center for Statistical Genetics, The Pennsylvania State University
| | - Ningtao Wang
- Center for Statistical Genetics, The Pennsylvania State University
| | - Yaqun Wang
- Center for Statistical Genetics, The Pennsylvania State University
| | - Tangren Cheng
- Beijing Key Laboratory of Ornamental Germplasm Innovation and Molecular Breeding, National Engineering Research Center for Floriculture, College of Landscape Architecture, Beijing Forestry University, Beijing, China
| | - Qixiang Zhang
- Beijing Key Laboratory of Ornamental Germplasm Innovation and Molecular Breeding, National Engineering Research Center for Floriculture, College of Landscape Architecture, Beijing Forestry University, Beijing, China
| | - Rongling Wu
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, ChinaCenter for Statistical Genetics, The Pennsylvania State University
| |
Collapse
|
10
|
Thompson KL, Kubatko LS. Using ancestral information to detect and localize quantitative trait loci in genome-wide association studies. BMC Bioinformatics 2013; 14:200. [PMID: 23786262 PMCID: PMC3706278 DOI: 10.1186/1471-2105-14-200] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 06/06/2013] [Indexed: 11/15/2022] Open
Abstract
Background In mammalian genetics, many quantitative traits, such as blood pressure, are thought to be influenced by specific genes, but are also affected by environmental factors, making the associated genes difficult to identify and locate from genetic data alone. In particular, the application of classical statistical methods to single nucleotide polymorphism (SNP) data collected in genome-wide association studies has been especially challenging. We propose a coalescent approach to search for SNPs associated with quantitative traits in genome-wide association study (GWAS) data by taking into account the evolutionary history among SNPs. Results We evaluate the performance of the new method using simulated data, and find that it performs at least as well as existing methods with an increase in performance in the case of population structure. Application of the methodology to a real data set consisting of high-density lipoprotein cholesterol measurements in mice shows the method performs well for empirical data, as well. Conclusions By combining methods from stochastic processes and phylogenetics, this work provides an innovative avenue for the development of new statistical methodology in the analysis of GWAS data.
Collapse
|
11
|
Abstract
The Collaborative Cross Consortium reports here on the development of a unique genetic resource population. The Collaborative Cross (CC) is a multiparental recombinant inbred panel derived from eight laboratory mouse inbred strains. Breeding of the CC lines was initiated at multiple international sites using mice from The Jackson Laboratory. Currently, this innovative project is breeding independent CC lines at the University of North Carolina (UNC), at Tel Aviv University (TAU), and at Geniad in Western Australia (GND). These institutions aim to make publicly available the completed CC lines and their genotypes and sequence information. We genotyped, and report here, results from 458 extant lines from UNC, TAU, and GND using a custom genotyping array with 7500 SNPs designed to be maximally informative in the CC and used a novel algorithm to infer inherited haplotypes directly from hybridization intensity patterns. We identified lines with breeding errors and cousin lines generated by splitting incipient lines into two or more cousin lines at early generations of inbreeding. We then characterized the genome architecture of 350 genetically independent CC lines. Results showed that founder haplotypes are inherited at the expected frequency, although we also consistently observed highly significant transmission ratio distortion at specific loci across all three populations. On chromosome 2, there is significant overrepresentation of WSB/EiJ alleles, and on chromosome X, there is a large deficit of CC lines with CAST/EiJ alleles. Linkage disequilibrium decays as expected and we saw no evidence of gametic disequilibrium in the CC population as a whole or in random subsets of the population. Gametic equilibrium in the CC population is in marked contrast to the gametic disequilibrium present in a large panel of classical inbred strains. Finally, we discuss access to the CC population and to the associated raw data describing the genetic structure of individual lines. Integration of rich phenotypic and genomic data over time and across a wide variety of fields will be vital to delivering on one of the key attributes of the CC, a common genetic reference platform for identifying causative variants and genetic networks determining traits in mammals.
Collapse
|
12
|
|
13
|
Abstract
The February 2012 issues of GENETICS and G3: Genes, Genomes, Genetics present a collection of articles reporting recent advances from the international Collaborative Cross (CC) project. The goal of the CC project is to develop a new resource that will enhance quantitative trait locus (QTL) and systems genetic analyses in mice. The CC consists of hundreds of independently bred, octo-parental recombinant inbred lines (Figure 1). The work reported in these issues represents progress toward completion of the CC, proof-of-principle experiments using incipient inbred CC mice, and new research areas and complementary resources facilitated by the CC project.
Collapse
|