1
|
Chi J, Ye J, Zhou Y. Mapping QTL controlling count traits with excess zeros and ones using a zero-and-one-inflated generalized Poisson regression model. Biom J 2024; 66:e2200342. [PMID: 38616336 DOI: 10.1002/bimj.202200342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 11/26/2023] [Accepted: 12/08/2023] [Indexed: 04/16/2024]
Abstract
The research on the quantitative trait locus (QTL) mapping of count data has aroused the wide attention of researchers. There are frequent problems in applied research that limit the application of the conventional Poisson model in the analysis of count phenotypes, which include the overdispersion and excess zeros and ones. In this article, a novel model, that is, the zero-and-one-inflated generalized Poisson (ZOIGP) model, is proposed to deal with these problems. Based on the proposed model, a score test is performed for the inflation parameter, in which the ZOIGP model with a constant proportion of excess zeros and ones is compared with a standard generalized Poisson model. To illustrate the practicability of the ZOIGP model, we extend it to the QTL interval mapping application that underpins count phenotype with excess zeros and excess ones. The genetic effects are estimated utilizing the expectation-maximization algorithm embedded with the Newton-Raphson algorithm, and the genome-wide scan and likelihood ratio test is performed to map and test the potential QTLs. The statistical properties exhibited by the proposed method are investigated through simulation. Finally, a real data analysis example is used to illustrate the utility of the proposed method for QTL mapping.
Collapse
Affiliation(s)
- Jinling Chi
- School of Mathematics and Statistics, Xidian University, Xi'an, China
| | - Jimin Ye
- School of Mathematics and Statistics, Xidian University, Xi'an, China
| | - Ying Zhou
- School of Mathematical Sciences, Heilongjiang University, Harbin, China
| |
Collapse
|
2
|
Ahmadi N. Genetic Bases of Complex Traits: From Quantitative Trait Loci to Prediction. Methods Mol Biol 2022; 2467:1-44. [PMID: 35451771 DOI: 10.1007/978-1-0716-2205-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Conceived as a general introduction to the book, this chapter is a reminder of the core concepts of genetic mapping and molecular marker-based prediction. It provides an overview of the principles and the evolution of methods for mapping the variation of complex traits, and methods for QTL-based prediction of human disease risk and animal and plant breeding value. The principles of linkage-based and linkage disequilibrium-based QTL mapping methods are described in the context of the simplest, single-marker, methods. Methodological evolutions are analysed in relation with their ability to account for the complexity of the genotype-phenotype relations. Main characteristics of the genetic architecture of complex traits, drawn from QTL mapping works using large populations of unrelated individuals, are presented. Methods combining marker-QTL association data into polygenic risk score that captures part of an individual's susceptibility to complex diseases are reviewed. Principles of best linear mixed model-based prediction of breeding value in animal- and plant-breeding programs using phenotypic and pedigree data, are summarized and methods for moving from BLUP to marker-QTL BLUP are presented. Factors influencing the additional genetic progress achieved by using molecular data and rules for their optimization are discussed.
Collapse
Affiliation(s)
- Nourollah Ahmadi
- CIRAD, UMR AGAP Institut, Montpellier, France.
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France.
| |
Collapse
|
3
|
Chi J, Zhou Y, Chen L, Zhou Y. Bayesian interval mapping of count trait loci based on zero-inflated generalized Poisson regression model. Biom J 2020; 62:1428-1442. [PMID: 32399977 DOI: 10.1002/bimj.201900274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 02/21/2020] [Accepted: 02/25/2020] [Indexed: 11/12/2022]
Abstract
Count phenotypes with excessive zeros are often observed in the biological world. Researchers have studied many statistical methods for mapping the quantitative trait loci (QTLs) of zero-inflated count phenotypes. However, most of the existing methods consist of finding the approximate positions of the QTLs on the chromosome by genome-wide scanning. Additionally, most of the existing methods use the EM algorithm for parameter estimation. In this paper, we propose a Bayesian interval mapping scheme of QTLs for zero-inflated count data. The method takes advantage of a zero-inflated generalized Poisson (ZIGP) regression model to study the influence of QTLs on the zero-inflated count phenotype. The MCMC algorithm is used to estimate the effects and position parameters of QTLs. We use the Haldane map function to realize the conversion between recombination rate and map distance. Monte Carlo simulations are conducted to test the applicability and advantage of the proposed method. The effects of QTLs on the formation of mouse cholesterol gallstones were demonstrated by analyzing an F 2 mouse data set.
Collapse
Affiliation(s)
- Jinling Chi
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin, P. R. China.,Heilongjiang Provincial Key Laboratory of the Theory and Computation of Complex Systems, Heilongjiang University, Harbin, P. R. China
| | - Ying Zhou
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin, P. R. China.,Heilongjiang Provincial Key Laboratory of the Theory and Computation of Complex Systems, Heilongjiang University, Harbin, P. R. China
| | - Lili Chen
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin, P. R. China.,Heilongjiang Provincial Key Laboratory of the Theory and Computation of Complex Systems, Heilongjiang University, Harbin, P. R. China
| | - Yajing Zhou
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin, P. R. China.,Heilongjiang Provincial Key Laboratory of the Theory and Computation of Complex Systems, Heilongjiang University, Harbin, P. R. China
| |
Collapse
|
4
|
|
5
|
Tong L, Sun X, Zhou Y. Simultaneous estimation of QTL parameters for mapping multiple traits. J Genet 2018; 97:267-274. [PMID: 29666345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The analysis of quantitative trait loci (QTLs) aims at mapping and estimating the positions and effects of the genes that may affect the quantitative trait, and evaluating the relationship between the gene variation and the phenotype. In existing studies, most methods mainly focus on the association/linkage between multiple gene loci and one trait, in which some useful joint information of multiple traits may be ignored. In this paper, we proposed a method of simultaneously estimating all QTL parameters in the framework of multiple-trait multiple-interval mapping. Simulation results show that in accuracy aspect, the proposed method outperforms an existing method for mapping multiple traits. A real example is also provided to validate the performance of the new method.
Collapse
Affiliation(s)
- Liang Tong
- School of Mathematical Sciences, Heilongjiang University, Harbin 150080, People's Republic of China.
| | | | | |
Collapse
|
6
|
Gordon D, Londono D, Patel P, Kim W, Finch SJ, Heiman GA. An Analytic Solution to the Computation of Power and Sample Size for Genetic Association Studies under a Pleiotropic Mode of Inheritance. Hum Hered 2017; 81:194-209. [PMID: 28315880 DOI: 10.1159/000457135] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 01/20/2017] [Indexed: 01/14/2023] Open
Abstract
Our motivation here is to calculate the power of 3 statistical tests used when there are genetic traits that operate under a pleiotropic mode of inheritance and when qualitative phenotypes are defined by use of thresholds for the multiple quantitative phenotypes. Specifically, we formulate a multivariate function that provides the probability that an individual has a vector of specific quantitative trait values conditional on having a risk locus genotype, and we apply thresholds to define qualitative phenotypes (affected, unaffected) and compute penetrances and conditional genotype frequencies based on the multivariate function. We extend the analytic power and minimum-sample-size-necessary (MSSN) formulas for 2 categorical data-based tests (genotype, linear trend test [LTT]) of genetic association to the pleiotropic model. We further compare the MSSN of the genotype test and the LTT with that of a multivariate ANOVA (Pillai). We approximate the MSSN for statistics by linear models using a factorial design and ANOVA. With ANOVA decomposition, we determine which factors most significantly change the power/MSSN for all statistics. Finally, we determine which test statistics have the smallest MSSN. In this work, MSSN calculations are for 2 traits (bivariate distributions) only (for illustrative purposes). We note that the calculations may be extended to address any number of traits. Our key findings are that the genotype test usually has lower MSSN requirements than the LTT. More inclusive thresholds (top/bottom 25% vs. top/bottom 10%) have higher sample size requirements. The Pillai test has a much larger MSSN than both the genotype test and the LTT, as a result of sample selection. With these formulas, researchers can specify how many subjects they must collect to localize genes for pleiotropic phenotypes.
Collapse
Affiliation(s)
- Derek Gordon
- Department of Genetics, The State University of New Jersey, Piscataway, NJ, USA
| | | | | | | | | | | |
Collapse
|
7
|
Abstract
The aim of expression Quantitative Trait Locus (eQTL) mapping is the identification of DNA sequence variants that explain variation in gene expression. Given the recent yield of trait-associated genetic variants identified by large-scale genome-wide association analyses (GWAS), eQTL mapping has become a useful tool to understand the functional context where these variants operate and eventually narrow down functional gene targets for disease. Despite its extensive application to complex (polygenic) traits and disease, the majority of eQTL studies still rely on univariate data modeling strategies, i.e., testing for association of all transcript-marker pairs. However these "one at-a-time" strategies are (1) unable to control the number of false-positives when an intricate Linkage Disequilibrium structure is present and (2) are often underpowered to detect the full spectrum of trans-acting regulatory effects. Here we present our viewpoint on the most recent advances on eQTL mapping approaches, with a focus on Bayesian methodology. We review the advantages of the Bayesian approach over frequentist methods and provide an empirical example of polygenic eQTL mapping to illustrate the different properties of frequentist and Bayesian methods. Finally, we discuss how multivariate eQTL mapping approaches have distinctive features with respect to detection of polygenic effects, accuracy, and interpretability of the results.
Collapse
Affiliation(s)
- Martha Imprialou
- Centre for Complement and Inflammation Research, Imperial College London, Hammersmith Hospital, Du Cane Road, London, W12 0NN, UK
| | - Enrico Petretto
- Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore.
| | - Leonardo Bottolo
- Department of Medical Genetics, University of Cambridge, Box 238, Lv 6 Addenbrooke's Treatment Centre, Addenbrooke's Hospital, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK.
- Department of Mathematics, Imperial College London, 180 Queen's Gate, London, SW7 2AZ, UK.
| |
Collapse
|
8
|
Wu X, Lund MS, Sahana G, Guldbrandtsen B, Sun D, Zhang Q, Su G. Association analysis for udder health based on SNP-panel and sequence data in Danish Holsteins. Genet Sel Evol 2015; 47:50. [PMID: 26087655 PMCID: PMC4472403 DOI: 10.1186/s12711-015-0129-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Accepted: 05/21/2015] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The sensitivity of genome-wide association studies for the detection of quantitative trait loci (QTL) depends on the density of markers examined and the statistical models used. This study compares the performance of three marker densities to refine six previously detected QTL regions for mastitis traits: 54 k markers of a medium-density SNP (single nucleotide polymorphism) chip (MD), imputed 777 k markers of a high-density SNP chip (HD), and imputed whole-genome sequencing data (SEQ). Each dataset contained data for 4496 Danish Holstein cattle. Comparisons were performed using a linear mixed model (LM) and a Bayesian variable selection model (BVS). RESULTS After quality control, 587, 7825, and 78 856 SNPs in the six targeted regions remained for MD, HD, and SEQ data, respectively. In general, the association patterns between SNPs and traits were similar for the three marker densities when tested using the same statistical model. With the LM model, 120 (MD), 967 (HD), and 7209 (SEQ) SNPs were significantly associated with mastitis, whereas with the BVS model, 43 (MD), 131 (HD), and 1052 (SEQ) significant SNPs (Bayes factor > 3.2) were observed. A total of 26 (MD), 75 (HD), and 465 (SEQ) significant SNPs were identified by both models. In addition, one, 16, and 33 QTL peaks for MD, HD, and SEQ data were detected according to the QTL intensity profile of SNP bins by post-analysis of the BVS model. CONCLUSIONS The power to detect significant associations increased with increasing marker density. The BVS model resulted in clearer boundaries between linked QTL than the LM model. Using SEQ data, the six targeted regions were refined to 33 candidate QTL regions for udder health. The comparison between these candidate QTL regions and known genes suggested that NPFFR2, SLC4A4, DCK, LIFR, and EDN3 may be considered as candidate genes for mastitis susceptibility.
Collapse
Affiliation(s)
- Xiaoping Wu
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Tjele, Denmark. .,Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| | - Mogens S Lund
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Tjele, Denmark.
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Tjele, Denmark.
| | - Bernt Guldbrandtsen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Tjele, Denmark.
| | - Dongxiao Sun
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| | - Qin Zhang
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Tjele, Denmark.
| |
Collapse
|
9
|
Jiang J, Zhang Q, Ma L, Li J, Wang Z, Liu JF. Joint prediction of multiple quantitative traits using a Bayesian multivariate antedependence model. Heredity (Edinb) 2015; 115:29-36. [PMID: 25873147 PMCID: PMC4815501 DOI: 10.1038/hdy.2015.9] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Revised: 12/14/2014] [Accepted: 01/23/2015] [Indexed: 02/02/2023] Open
Abstract
Predicting organismal phenotypes from genotype data is important for preventive and personalized medicine as well as plant and animal breeding. Although genome-wide association studies (GWAS) for complex traits have discovered a large number of trait- and disease-associated variants, phenotype prediction based on associated variants is usually in low accuracy even for a high-heritability trait because these variants can typically account for a limited fraction of total genetic variance. In comparison with GWAS, the whole-genome prediction (WGP) methods can increase prediction accuracy by making use of a huge number of variants simultaneously. Among various statistical methods for WGP, multiple-trait model and antedependence model show their respective advantages. To take advantage of both strategies within a unified framework, we proposed a novel multivariate antedependence-based method for joint prediction of multiple quantitative traits using a Bayesian algorithm via modeling a linear relationship of effect vector between each pair of adjacent markers. Through both simulation and real-data analyses, our studies demonstrated that the proposed antedependence-based multiple-trait WGP method is more accurate and robust than corresponding traditional counterparts (Bayes A and multi-trait Bayes A) under various scenarios. Our method can be readily extended to deal with missing phenotypes and resequence data with rare variants, offering a feasible way to jointly predict phenotypes for multiple complex traits in human genetic epidemiology as well as plant and livestock breeding.
Collapse
Affiliation(s)
- J Jiang
- Department of Animal Genetics, Breeding and Reproduction, China Agricultural University, Beijing, China
| | - Q Zhang
- Department of Animal Genetics, Breeding and Reproduction, China Agricultural University, Beijing, China
| | - L Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD, USA
| | - J Li
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, China
| | - Z Wang
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta, Canada
| | - J-F Liu
- Department of Animal Genetics, Breeding and Reproduction, China Agricultural University, Beijing, China
| |
Collapse
|
10
|
Xu HM, Sun XW, Qi T, Lin WY, Liu N, Lou XY. Multivariate dimensionality reduction approaches to identify gene-gene and gene-environment interactions underlying multiple complex traits. PLoS One 2014; 9:e108103. [PMID: 25259584 PMCID: PMC4178067 DOI: 10.1371/journal.pone.0108103] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2014] [Accepted: 08/18/2014] [Indexed: 11/30/2022] Open
Abstract
The elusive but ubiquitous multifactor interactions represent a stumbling block that urgently needs to be removed in searching for determinants involved in human complex diseases. The dimensionality reduction approaches are a promising tool for this task. Many complex diseases exhibit composite syndromes required to be measured in a cluster of clinical traits with varying correlations and/or are inherently longitudinal in nature (changing over time and measured dynamically at multiple time points). A multivariate approach for detecting interactions is thus greatly needed on the purposes of handling a multifaceted phenotype and longitudinal data, as well as improving statistical power for multiple significance testing via a two-stage testing procedure that involves a multivariate analysis for grouped phenotypes followed by univariate analysis for the phenotypes in the significant group(s). In this article, we propose a multivariate extension of generalized multifactor dimensionality reduction (GMDR) based on multivariate generalized linear, multivariate quasi-likelihood and generalized estimating equations models. Simulations and real data analysis for the cohort from the Study of Addiction: Genetics and Environment are performed to investigate the properties and performance of the proposed method, as compared with the univariate method. The results suggest that the proposed multivariate GMDR substantially boosts statistical power.
Collapse
Affiliation(s)
- Hai-Ming Xu
- Institute of Bioinformatics, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, P.R. China
- Research Center for Air Pollution and Health, Zhejiang University, Hangzhou, P.R. China
| | - Xi-Wei Sun
- Institute of Bioinformatics, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, P.R. China
| | - Ting Qi
- Institute of Bioinformatics, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, P.R. China
| | - Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Nianjun Liu
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Xiang-Yang Lou
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- * E-mail:
| |
Collapse
|
11
|
Multiple-trait genome-wide association study based on principal component analysis for residual covariance matrix. Heredity (Edinb) 2014; 113:526-32. [PMID: 24984606 DOI: 10.1038/hdy.2014.57] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2013] [Revised: 04/15/2014] [Accepted: 04/22/2014] [Indexed: 02/02/2023] Open
Abstract
Given the drawbacks of implementing multivariate analysis for mapping multiple traits in genome-wide association study (GWAS), principal component analysis (PCA) has been widely used to generate independent 'super traits' from the original multivariate phenotypic traits for the univariate analysis. However, parameter estimates in this framework may not be the same as those from the joint analysis of all traits, leading to spurious linkage results. In this paper, we propose to perform the PCA for residual covariance matrix instead of the phenotypical covariance matrix, based on which multiple traits are transformed to a group of pseudo principal components. The PCA for residual covariance matrix allows analyzing each pseudo principal component separately. In addition, all parameter estimates are equivalent to those obtained from the joint multivariate analysis under a linear transformation. However, a fast least absolute shrinkage and selection operator (LASSO) for estimating the sparse oversaturated genetic model greatly reduces the computational costs of this procedure. Extensive simulations show statistical and computational efficiencies of the proposed method. We illustrate this method in a GWAS for 20 slaughtering traits and meat quality traits in beef cattle.
Collapse
|
12
|
Malina M, Ickstadt K, Schwender H, Posch M, Bogdan M. Detection of epistatic effects with logic regression and a classical linear regression model. Stat Appl Genet Mol Biol 2014; 13:83-104. [PMID: 24413217 DOI: 10.1515/sagmb-2013-0028] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.
Collapse
|
13
|
Yang R, Li H, Fu L, Liu Y. An efficient approach to large-scale genotype-phenotype association analyses. Brief Bioinform 2013; 15:814-22. [PMID: 23990269 DOI: 10.1093/bib/bbt061] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Modern molecular biotechnology generates a great deal of intermediate information, such as transcriptional and metabolic products in bridging DNA and complex traits. In genome-wide linkage analysis and genome-wide association study, regression analysis for large-scale correlated phenotypes is applied to map genes for those by-products that are regarded as quantitative traits. For a single trait, least absolute shrinkage and selection operator with coordinate descent step can be employed to efficiently shrink sparse non-zero genetic effects of quantitative trait loci (QTLs). However, regression analyses in a trait-by-trait basis do not take account of the correlations among the analyzed traits. In this study, conditional phenotype of each trait is defined, given other traits. Large-scale genotype-phenotype association analyses are therefore transformed to separate genotype-conditional phenotype ones. Meanwhile, the correlation architecture between each trait and other traits can also be provided by shrinkage estimation for each conditional phenotype. Simulation demonstrates that the proposed conditional mapping method is generally identical to joint mapping method based on multivariate analysis in terms of statistical detection power and parameter estimation. Application of the method is provided to locate eQTL in yeast.
Collapse
|
14
|
Da Costa E Silva L, Wang S, Zeng ZB. Multiple trait multiple interval mapping of quantitative trait loci from inbred line crosses. BMC Genet 2012; 13:67. [PMID: 22852865 PMCID: PMC3778868 DOI: 10.1186/1471-2156-13-67] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Accepted: 06/28/2012] [Indexed: 12/02/2022] Open
Abstract
Background Although many experiments have measurements on multiple traits, most studies performed the analysis of mapping of quantitative trait loci (QTL) for each trait separately using single trait analysis. Single trait analysis does not take advantage of possible genetic and environmental correlations between traits. In this paper, we propose a novel statistical method for multiple trait multiple interval mapping (MTMIM) of QTL for inbred line crosses. We also develop a novel score-based method for estimating genome-wide significance level of putative QTL effects suitable for the MTMIM model. The MTMIM method is implemented in the freely available and widely used Windows QTL Cartographer software. Results Throughout the paper, we provide compelling empirical evidences that: (1) the score-based threshold maintains proper type I error rate and tends to keep false discovery rate within an acceptable level; (2) the MTMIM method can deliver better parameter estimates and power than single trait multiple interval mapping method; (3) an analysis of Drosophila dataset illustrates how the MTMIM method can better extract information from datasets with measurements in multiple traits. Conclusions The MTMIM method represents a convenient statistical framework to test hypotheses of pleiotropic QTL versus closely linked nonpleiotropic QTL, QTL by environment interaction, and to estimate the total genotypic variance-covariance matrix between traits and to decompose it in terms of QTL-specific variance-covariance matrices, therefore, providing more details on the genetic architecture of complex traits.
Collapse
Affiliation(s)
- Luciano Da Costa E Silva
- Department of Statistics & Bioinformatics Research Center, North Carolina State University, Raleigh 27695-7566, USA
| | | | | |
Collapse
|
15
|
Balestre M, Von Pinho RG, de Souza CL, Bueno Filho JSDS. Bayesian mapping of multiple traits in maize: the importance of pleiotropic effects in studying the inheritance of quantitative traits. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2012; 125:479-493. [PMID: 22437491 DOI: 10.1007/s00122-012-1847-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2011] [Accepted: 03/05/2012] [Indexed: 05/31/2023]
Abstract
Pleiotropy has played an important role in understanding quantitative traits. However, the extensiveness of this effect in the genome and its consequences for plant improvement have not been fully elucidated. The aim of this study was to identify pleiotropic quantitative trait loci (QTLs) in maize using Bayesian multiple interval mapping. Additionally, we sought to obtain a better understanding of the inheritance, extent and distribution of pleiotropic effects of several components in maize production. The design III procedure was used from a population derived from the cross of the inbred lines L-14-04B and L-08-05F. Two hundred and fifty plants were genotyped with 177 microsatellite markers and backcrossed to both parents giving rise to 500 backcrossed progenies, which were evaluated in six environments for grain yield and its components. The results of this study suggest that mapping isolated traits limits our understanding of the genetic architecture of quantitative traits. This architecture can be better understood by using pleiotropic networks that facilitate the visualization of the complexity of quantitative inheritance, and this characterization will help to develop new selection strategies. It was also possible to confront the idea that it is feasible to identify QTLs for complex traits such as grain yield, as pleiotropy acts prominently on its subtraits and as this "trait" can be broken down and predicted almost completely by the QTLs of its components. Additionally, pleiotropic QTLs do not necessarily signify pleiotropy of allelic interactions, and this indicates that the pervasive pleiotropy does not limit the genetic adaptability of plants.
Collapse
Affiliation(s)
- Marcio Balestre
- Departamento de Ciências Exatas, Universidade Federal de Lavras, CP 3037, Lavras, MG, 37200-000, Brazil.
| | | | | | | |
Collapse
|
16
|
Mutshinda CM, Noykova N, Sillanpää MJ. A hierarchical bayesian approach to multi-trait clinical quantitative trait locus modeling. Front Genet 2012; 3:97. [PMID: 22685451 PMCID: PMC3368303 DOI: 10.3389/fgene.2012.00097] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2011] [Accepted: 05/12/2012] [Indexed: 02/04/2023] Open
Abstract
Recent advances in high-throughput genotyping and transcript profiling technologies have enabled the inexpensive production of genome-wide dense marker maps in tandem with huge amounts of expression profiles. These large-scale data encompass valuable information about the genetic architecture of important phenotypic traits. Comprehensive models that combine molecular markers and gene transcript levels are increasingly advocated as an effective approach to dissecting the genetic architecture of complex phenotypic traits. The simultaneous utilization of marker and gene expression data to explain the variation in clinical quantitative trait, known as clinical quantitative trait locus (cQTL) mapping, poses challenges that are both conceptual and computational. Nonetheless, the hierarchical Bayesian (HB) modeling approach, in combination with modern computational tools such as Markov chain Monte Carlo (MCMC) simulation techniques, provides much versatility for cQTL analysis. Sillanpää and Noykova (2008) developed a HB model for single-trait cQTL analysis in inbred line cross-data using molecular markers, gene expressions, and marker-gene expression pairs. However, clinical traits generally relate to one another through environmental correlations and/or pleiotropy. A multi-trait approach can improve on the power to detect genetic effects and on their estimation precision. A multi-trait model also provides a framework for examining a number of biologically interesting hypotheses. In this paper we extend the HB cQTL model for inbred line crosses proposed by Sillanpää and Noykova to a multi-trait setting. We illustrate the implementation of our new model with simulated data, and evaluate the multi-trait model performance with regard to its single-trait counterpart. The data simulation process was based on the multi-trait cQTL model, assuming three traits with uncorrelated and correlated cQTL residuals, with the simulated data under uncorrelated cQTL residuals serving as our test set for comparing the performances of the multi-trait and single-trait models. The simulated data under correlated cQTL residuals were essentially used to assess how well our new model can estimate the cQTL residual covariance structure. The model fitting to the data was carried out by MCMC simulation through OpenBUGS. The multi-trait model outperformed its single-trait counterpart in identifying cQTLs, with a consistently lower false discovery rate. Moreover, the covariance matrix of cQTL residuals was typically estimated to an appreciable degree of precision under the multi-trait cQTL model, making our new model a promising approach to addressing a wide range of issues facing the analysis of correlated clinical traits.
Collapse
Affiliation(s)
- Crispin M Mutshinda
- Department of Mathematics and Statistics, University of Helsinki Helsinki, Finland
| | | | | |
Collapse
|
17
|
Melton PE, Pankratz N. Joint analyses of disease and correlated quantitative phenotypes using next-generation sequencing data. Genet Epidemiol 2012; 35 Suppl 1:S67-73. [PMID: 22128062 DOI: 10.1002/gepi.20653] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The joint analysis of multiple disease phenotypes aims to increase statistical power and potentially identify pleiotropic genes involved in the biological development of common chronic diseases. As next-generation sequencing data become more common, it will be important to consider ways to maximize the ability to detect rare variants within the human genome. The two exome sequence data sets provided for analysis at Genetic Analysis Workshop 17 (GAW17) offered three quantitative phenotypes related to disease status in 200 simulated replicates for both families and unrelated individuals. Participants in Group 10 addressed the challenges and potential uses of next-generation sequencing data to identify causal variants through a broad range of statistical methods. These methods included investigating multiple phenotypes either through data reduction or joint methods, using family or unrelated individuals, and reducing the dimensionality inherent in these data. Most of the research teams regarded the use of multiple phenotypes as a means of increasing analytical power and as a way to clarify the biology of complex disease. Three major observations were gleaned from these Group 10 contributions. First, family and unrelated case-control samples are suited to finding different types of variants. In addition, collapsing either phenotypes or genotypes can reduce the dimensionality of the data and alleviate some of the problems of multiple testing. Finally, we were able to demonstrate in certain cases that performing a joint analysis of disease status and a quantitative trait can improve statistical power.
Collapse
Affiliation(s)
- Phillip E Melton
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, USA
| | | |
Collapse
|
18
|
Shriner D. Moving toward System Genetics through Multiple Trait Analysis in Genome-Wide Association Studies. Front Genet 2012; 3:1. [PMID: 22303408 PMCID: PMC3266611 DOI: 10.3389/fgene.2012.00001] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2011] [Accepted: 01/01/2012] [Indexed: 02/05/2023] Open
Abstract
Association studies are a staple of genotype–phenotype mapping studies, whether they are based on single markers, haplotypes, candidate genes, genome-wide genotypes, or whole genome sequences. Although genetic epidemiological studies typically contain data collected on multiple traits which themselves are often correlated, most analyses have been performed on single traits. Here, I review several methods that have been developed to perform multiple trait analysis. These methods range from traditional multivariate models for systems of equations to recently developed graphical approaches based on network theory. The application of network theory to genetics is termed systems genetics and has the potential to address long-standing questions in genetics about complex processes such as coordinate regulation, homeostasis, and pleiotropy.
Collapse
Affiliation(s)
- Daniel Shriner
- Center for Research on Genomics and Global Health, National Human Genome Research Institute Bethesda, MD, USA
| |
Collapse
|
19
|
Costa E Silva L, Wang S, Zeng ZB. Multiple trait multiple interval mapping of quantitative trait loci from inbred line crosses. BMC Genet 2012. [DOI: 10.1186/1471-2156] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
20
|
Mapping quantitative trait loci for T lymphocyte subpopulations in peripheral blood in swine. BMC Genet 2011; 12:79. [PMID: 21923905 PMCID: PMC3182951 DOI: 10.1186/1471-2156-12-79] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2011] [Accepted: 09/16/2011] [Indexed: 11/25/2022] Open
Abstract
Background Increased disease resistance through improved general immune capacity would be beneficial for the welfare and productivity of farm animals. T lymphocyte subpopulations in peripheral blood play an important role in immune capacity and disease resistance in animals. However, very little research to date has focused on quantitative trait loci (QTL) for T lymphocyte subpopulations in peripheral blood in swine. Results In the study, experimental animals consist of 446 piglets from three different breed populations. To identify QTL for T lymphocyte subpopulations in peripheral blood in swine, the proportions of CD4+, CD8+, CD4+CD8+, CD4+CD8-, CD4-CD8+, and CD4-CD8- T cells and the ratio of CD4+:CD8+ T cells were measured for all individuals before and after challenge with modified live CSF (classical swine fever) vaccine. Based on the combined data of individuals from three breed populations, genome-wide scanning of QTL for these traits was performed based on a variance component model, and the genome wide significance level for declaring QTL was determined via permutation tests as well as FDR (false discovery rate) correction. A total of 27 QTL (two for CD4+CD8+, one for CD4+CD8-, three for CD4-CD8+, two for CD4-CD8-, nine for CD4+, two for CD8+, and eight for CD4+:CD8+ ratio) were identified with significance level of FDR < 0.10, of which 11 were significant at the level of FDR < 0.05, including the five significant at FDR < 0.01. Conclusions Within these QTL regions, a number of known genes having potential relationships with the studied traits may serve as candidate genes for these traits. Our findings herein are helpful for identification of the causal genes underlying these immune-related trait and selection for immune capacity of individuals in swine breeding in the future.
Collapse
|
21
|
Fang M, Liu J, Sun D, Zhang Y, Zhang Q, Zhang Y, Zhang S. QTL mapping in outbred half-sib families using Bayesian model selection. Heredity (Edinb) 2011; 107:265-76. [PMID: 21487433 DOI: 10.1038/hdy.2011.15] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
In this article, we propose a model selection method, the Bayesian composite model space approach, to map quantitative trait loci (QTL) in a half-sib population for continuous and binary traits. In our method, the identity-by-descent-based variance component model is used. To demonstrate the performance of this model, the method was applied to map QTL underlying production traits on BTA6 in a Chinese half-sib dairy cattle population. A total of four QTLs were detected, whereas only one QTL was identified using the traditional least square (LS) method. We also conducted two simulation experiments to validate the efficiency of our method. The results suggest that the proposed method based on a multiple-QTL model is efficient in mapping multiple QTL for an outbred half-sib population and is more powerful than the LS method based on a single-QTL model.
Collapse
Affiliation(s)
- M Fang
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | | | | | | | | | | | | |
Collapse
|
22
|
Xu HM, Wei CS, Tang YT, Zhu ZH, Sima YF, Lou XY. A new mapping method for quantitative trait loci of silkworm. BMC Genet 2011; 12:19. [PMID: 21276233 PMCID: PMC3042969 DOI: 10.1186/1471-2156-12-19] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2010] [Accepted: 01/28/2011] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Silkworm is the basis of sericultural industry and the model organism in insect genetics study. Mapping quantitative trait loci (QTLs) underlying economically important traits of silkworm is of high significance for promoting the silkworm molecular breeding and advancing our knowledge on genetic architecture of the Lepidoptera. Yet, the currently used mapping methods are not well suitable for silkworm, because of ignoring the recombination difference in meiosis between two sexes. RESULTS A mixed linear model including QTL main effects, epistatic effects, and QTL × sex interaction effects was proposed for mapping QTLs in an F2 population of silkworm. The number and positions of QTLs were determined by F-test and model selection. The Markov chain Monte Carlo (MCMC) algorithm was employed to estimate and test genetic effects of QTLs and QTL × sex interaction effects. The effectiveness of the model and statistical method was validated by a series of simulations. The results indicate that when markers are distributed sparsely on chromosomes, our method will substantially improve estimation accuracy as compared to the normal chiasmate F2 model. We also found that a sample size of hundreds was sufficiently large to unbiasedly estimate all the four types of epistases (i.e., additive-additive, additive-dominance, dominance-additive, and dominance-dominance) when the paired QTLs reside on different chromosomes in silkworm. CONCLUSION The proposed method could accurately estimate not only the additive, dominance and digenic epistatic effects but also their interaction effects with sex, correcting the potential bias and precision loss in the current QTL mapping practice of silkworm and thus representing an important addition to the arsenal of QTL mapping tools.
Collapse
Affiliation(s)
- Hai-Ming Xu
- Institute of Bioinformatics, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310029, China
| | | | | | | | | | | |
Collapse
|
23
|
Abstract
Quantitative trait loci (QTLs) mapping often results in data on a number of traits that have well-established causal relationships. Many multi-trait QTL mapping methods that account for correlation among the multiple traits have been developed to improve the statistical power and the precision of QTL parameter estimation. However, none of these methods are capable of incorporating the causal structure among the traits. Consequently, genetic functions of the QTL may not be fully understood. In this paper, we developed a Bayesian multiple QTL mapping method for causally related traits using a mixture structural equation model (SEM), which allows researchers to decompose QTL effects into direct, indirect and total effects. Parameters are estimated based on their marginal posterior distribution. The posterior distributions of parameters are estimated using Markov Chain Monte Carlo methods such as the Gibbs sampler and the Metropolis-Hasting algorithm. The number of QTLs affecting traits is determined by the Bayes factor. The performance of the proposed method is evaluated by simulation study and applied to data from a wheat experiment. Compared with single trait Bayesian analysis, our proposed method not only improved the statistical power of QTL detection, accuracy and precision of parameter estimates but also provided important insight into how genes regulate traits directly and indirectly by fitting a more biologically sensible model.
Collapse
|
24
|
Jiang L, Liu J, Sun D, Ma P, Ding X, Yu Y, Zhang Q. Genome wide association studies for milk production traits in Chinese Holstein population. PLoS One 2010; 5:e13661. [PMID: 21048968 PMCID: PMC2965099 DOI: 10.1371/journal.pone.0013661] [Citation(s) in RCA: 176] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2010] [Accepted: 10/04/2010] [Indexed: 11/21/2022] Open
Abstract
Genome-wide association studies (GWAS) based on high throughput SNP genotyping technologies open a broad avenue for exploring genes associated with milk production traits in dairy cattle. Motivated by pinpointing novel quantitative trait nucleotide (QTN) across Bos Taurus genome, the present study is to perform GWAS to identify genes affecting milk production traits using current state-of-the-art SNP genotyping technology, i.e., the Illumina BovineSNP50 BeadChip. In the analyses, the five most commonly evaluated milk production traits are involved, including milk yield (MY), milk fat yield (FY), milk protein yield (PY), milk fat percentage (FP) and milk protein percentage (PP). Estimated breeding values (EBVs) of 2,093 daughters from 14 paternal half-sib families are considered as phenotypes within the framework of a daughter design. Association tests between each trait and the 54K SNPs are achieved via two different analysis approaches, a paternal transmission disequilibrium test (TDT)-based approach (L1-TDT) and a mixed model based regression analysis (MMRA). In total, 105 SNPs were detected to be significantly associated genome-wise with one or multiple milk production traits. Of the 105 SNPs, 38 were commonly detected by both methods, while four and 63 were solely detected by L1-TDT and MMRA, respectively. The majority (86 out of 105) of the significant SNPs is located within the reported QTL regions and some are within or close to the reported candidate genes. In particular, two SNPs, ARS-BFGL-NGS-4939 and BFGL-NGS-118998, are located close to the DGAT1 gene (160bp apart) and within the GHR gene, respectively. Our findings herein not only provide confirmatory evidences for previously findings, but also explore a suite of novel SNPs associated with milk production traits, and thus form a solid basis for eventually unraveling the causal mutations for milk production traits in dairy cattle.
Collapse
Affiliation(s)
- Li Jiang
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China
| | - Jianfeng Liu
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China
| | - Dongxiao Sun
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China
| | - Peipei Ma
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China
| | - Xiangdong Ding
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China
| | - Ying Yu
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China
| | - Qin Zhang
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China
- * E-mail:
| |
Collapse
|
25
|
Nonyane BAS, Whittaker JC. A variance components factor model for genetic association studies: a Bayesian analysis. Genet Epidemiol 2010; 34:529-36. [PMID: 20718044 DOI: 10.1002/gepi.20503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Studies of gene-trait associations for complex diseases often involve multiple traits that may vary by genotype groups or patterns. Such traits are usually manifestations of lower-dimensional latent factors or disease syndromes. We illustrate the use of a variance components factor (VCF) model to model the association between multiple traits and genotype groups as well as any other existing patient-level covariates. This model characterizes the correlations between traits as underlying latent factors that can be used in clinical decision-making. We apply it within the Bayesian framework and provide a straightforward implementation using the WinBUGS software. The VCF model is illustrated with simulated data and an example that comprises changes in plasma lipid measurements of patients who were treated with statins to lower low-density lipoprotein cholesterol, and polymorphisms from the apolipoprotein-E gene. The simulation shows that this model clearly characterizes existing multiple trait manifestations across genotype groups where individuals' group assignments are fully observed or can be deduced from the observed data. It also allows one to investigate covariate by genotype group interactions that may explain the variability in the traits. The flexibility to characterize such multiple trait manifestations makes the VCF model more desirable than the univariate variance components model, which is applied to each trait separately. The Bayesian framework offers a flexible approach that allows one to incorporate prior information.
Collapse
Affiliation(s)
- B A S Nonyane
- Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK.
| | | |
Collapse
|
26
|
Fang M. Bayesian shrinkage mapping of quantitative trait loci in variance component models. BMC Genet 2010; 11:30. [PMID: 20429900 PMCID: PMC2874758 DOI: 10.1186/1471-2156-11-30] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 04/29/2010] [Indexed: 12/02/2022] Open
Abstract
Background In this article, I propose a model-selection-free method to map multiple quantitative trait loci (QTL) in variance component model, which is useful in outbred populations. The new method can estimate the variance of zero-effect QTL infinitely to zero, but nearly unbiased for non-zero-effect QTL. It is analogous to Xu's Bayesian shrinkage estimation method, but his method is based on allelic substitution model, while the new method is based on the variance component models. Results Extensive simulation experiments were conducted to investigate the performance of the proposed method. The results showed that the proposed method was efficient in mapping multiple QTL simultaneously, and moreover it was more competitive than the reversible jump MCMC (RJMCMC) method and may even out-perform it. Conclusions The newly developed Bayesian shrinkage method is very efficient and powerful for mapping multiple QTL in outbred populations.
Collapse
Affiliation(s)
- Ming Fang
- Life Science College, Heilongjiang August First Land Reclamation University, Daqing, China.
| |
Collapse
|
27
|
Petretto E, Bottolo L, Langley SR, Heinig M, McDermott-Roe C, Sarwar R, Pravenec M, Hübner N, Aitman TJ, Cook SA, Richardson S. New insights into the genetic control of gene expression using a Bayesian multi-tissue approach. PLoS Comput Biol 2010; 6:e1000737. [PMID: 20386736 PMCID: PMC2851562 DOI: 10.1371/journal.pcbi.1000737] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Accepted: 03/03/2010] [Indexed: 01/29/2023] Open
Abstract
The majority of expression quantitative trait locus (eQTL) studies have been carried out in single tissues or cell types, using methods that ignore information shared across tissues. Although global analysis of RNA expression in multiple tissues is now feasible, few integrated statistical frameworks for joint analysis of gene expression across tissues combined with simultaneous analysis of multiple genetic variants have been developed to date. Here, we propose Sparse Bayesian Regression models for mapping eQTLs within individual tissues and simultaneously across tissues. Testing these on a set of 2,000 genes in four tissues, we demonstrate that our methods are more powerful than traditional approaches in revealing the true complexity of the eQTL landscape at the systems-level. Highlighting the power of our method, we identified a two-eQTL model (cis/trans) for the Hopx gene that was experimentally validated and was not detected by conventional approaches. We showed common genetic regulation of gene expression across four tissues for ∼27% of transcripts, providing >5 fold increase in eQTLs detection when compared with single tissue analyses at 5% FDR level. These findings provide a new opportunity to uncover complex genetic regulatory mechanisms controlling global gene expression while the generality of our modelling approach makes it adaptable to other model systems and humans, with broad application to analysis of multiple intermediate and whole-body phenotypes. Integrated analysis of genome-wide genetic polymorphisms and gene expression profiles from different tissues or cell types has been highly successful in identifying genes modulating complex phenotypes in animal models and humans. However, an important limitation of the current approaches consists in their sole application to individual tissues, thus ignoring information shared across different tissues. To uncover complex genetic regulatory mechanisms controlling gene expression at the whole organism's level, it is essential to develop appropriate analytical methods for the analysis of genome-wide genetic polymorphisms and gene expression profiles simultaneously in multiple tissues. This paper presents a novel, fully integrated Bayesian approach for mapping the genetic components of gene expression within and across multiple tissues. In addition to increased power and enhanced mapping resolution when compared with traditional approaches, our model directly provides information on potential systemic effects on transcriptional profiles and co-existing local (cis) and distant (trans) genetic control of gene expression. We also discuss the possibility to extend our approach for the analysis of different phenotypes, and other study designs, thus providing an integrated computational tool to explore the genetic control underlying transcriptional regulation at the systems-level, beyond the single tissue resolution.
Collapse
Affiliation(s)
- Enrico Petretto
- Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, London, United Kingdom
- Department of Epidemiology and Biostatistics, Faculty of Medicine, Imperial College, London, United Kingdom
| | - Leonardo Bottolo
- Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, London, United Kingdom
- Department of Epidemiology and Biostatistics, Faculty of Medicine, Imperial College, London, United Kingdom
| | - Sarah R. Langley
- Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, London, United Kingdom
| | | | - Chris McDermott-Roe
- Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Rizwan Sarwar
- Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Michal Pravenec
- Institute of Physiology, Czech Academy of Sciences and Centre for Applied Genomics, Prague, Czech Republic
- Charles University in Prague, Institute of Biology and Medical Genetics of the First Faculty of Medicine and General Teaching Hospital, Prague, Czech Republic
| | - Norbert Hübner
- Max-Delbrück Center for Molecular Medicine, Berlin, Germany
| | - Timothy J. Aitman
- Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, London, United Kingdom
- Section of Molecular Genetics and Rheumatology, Division and Faculty of Medicine, Imperial College, London, United Kingdom
| | - Stuart A. Cook
- Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, London, United Kingdom
- National Heart and Lung Institute, Imperial College, London, United Kingdom
| | - Sylvia Richardson
- Department of Epidemiology and Biostatistics, Faculty of Medicine, Imperial College, London, United Kingdom
- * E-mail:
| |
Collapse
|
28
|
Bachlava E, Tang S, Pizarro G, Schuppert GF, Brunick RK, Draeger D, Leon A, Hahn V, Knapp SJ. Pleiotropy of the branching locus (B) masks linked and unlinked quantitative trait loci affecting seed traits in sunflower. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2010; 120:829-42. [PMID: 19921140 DOI: 10.1007/s00122-009-1212-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2009] [Accepted: 10/27/2009] [Indexed: 05/20/2023]
Abstract
The discovery of unbranched, monocephalic natural variants was pivotal for the domestication of sunflower (Helianthus annuus L.). The branching locus (B), one of several loci apparently targeted by aboriginal selection for monocephaly, pleiotropically affects plant, seed and capitula morphology and, when segregating, confounds the discovery of favorable alleles for seed yield and other traits. The present study was undertaken to gain deeper insights into the genetics of branching and seed traits affected by branching. We produced an unbranched hybrid testcross recombinant inbred line (TC-RIL) population by crossing branched (bb) and unbranched (BB) RILs to an unbranched (BB) tester. The elimination of branching concomitantly eliminated a cluster of B-linked seed trait quantitative trait loci (QTL) identified by RIL per se testing. We identified a seed oil content QTL linked in repulsion and a 100-seed weight QTL linked in coupling to the B locus and additional unlinked QTL, previously masked by B-locus pleiotropy. Genomic segments flanking the B locus harbor multiple loci for domestication and post-domestication traits, the effects of which are masked by B-locus pleiotropy in populations segregating for branching and can only be disentangled by genetic analyses in unbranched populations. QTL analyses of NILs carrying wild B alleles substantiated the pleiotropic effects of the B locus. The effect of the B locus on branching was masked by the effects of wild alleles at independent branching loci in hybrids between monocephalic domesticated lines and polycephalic wild ecotypes; hence, the B locus appears to be necessary, but not sufficient, for monocephaly in domesticated sunflower.
Collapse
Affiliation(s)
- Eleni Bachlava
- Institute of Plant Breeding, Genetics, and Genomics, The University of Georgia, 111 Riverbend Road, Athens, GA 30602, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Logsdon BA, Hoffman GE, Mezey JG. A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis. BMC Bioinformatics 2010; 11:58. [PMID: 20105321 PMCID: PMC2824680 DOI: 10.1186/1471-2105-11-58] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2009] [Accepted: 01/27/2010] [Indexed: 12/17/2022] Open
Abstract
Background The success achieved by genome-wide association (GWA) studies in the identification of candidate loci for complex diseases has been accompanied by an inability to explain the bulk of heritability. Here, we describe the algorithm V-Bay, a variational Bayes algorithm for multiple locus GWA analysis, which is designed to identify weaker associations that may contribute to this missing heritability. Results V-Bay provides a novel solution to the computational scaling constraints of most multiple locus methods and can complete a simultaneous analysis of a million genetic markers in a few hours, when using a desktop. Using a range of simulated genetic and GWA experimental scenarios, we demonstrate that V-Bay is highly accurate, and reliably identifies associations that are too weak to be discovered by single-marker testing approaches. V-Bay can also outperform a multiple locus analysis method based on the lasso, which has similar scaling properties for large numbers of genetic markers. For demonstration purposes, we also use V-Bay to confirm associations with gene expression in cell lines derived from the Phase II individuals of HapMap. Conclusions V-Bay is a versatile, fast, and accurate multiple locus GWA analysis tool for the practitioner interested in identifying weaker associations without high false positive rates.
Collapse
Affiliation(s)
- Benjamin A Logsdon
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA
| | | | | |
Collapse
|
30
|
Abstract
Recently, an effective Bayesian shrinkage estimation method has been proposed for mapping QTL in inbred line crosses. However, with regard to outbred populations, such as half-sib populations with maternal information unavailable, it is not straightforward to utilize such a shrinkage estimation for QTL mapping. The reasons are: (1) the linkage phase of markers in the outbred population is usually unknown; and (2) only paternal genotypes can be used for inferring QTL genotypes of offspring. In this article, a novel Bayesian shrinkage method was proposed for mapping QTL under the half-sib design using a mixed model. A simulation study clearly demonstrated that the proposed method was powerful for detecting multiple QTL. In addition, we applied the proposed method to map QTL for economic traits in the Chinese dairy cattle population. Two or more novel QTL harbored in the chromosomal region were detected for each trait of interest, whereas only one QTL was found using traditional maximum likelihood analyses in our earlier studies. This further validated that our shrinkage estimation method could perform well in empirical data analyses and had practical significance in the field of linkage studies for outbred populations.
Collapse
|
31
|
Liu J, Pei Y, Papasian CJ, Deng HW. Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet Epidemiol 2009; 33:217-27. [PMID: 18924135 DOI: 10.1002/gepi.20372] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Genome-wide association (GWA) study is becoming a powerful tool in deciphering genetic basis of complex human diseases/traits. Currently, the univariate analysis is the most commonly used method to identify genes associated with a certain disease/phenotype under study. A major limitation with the univariate analysis is that it may not make use of the information of multiple correlated phenotypes, which are usually measured and collected in practical studies. The multivariate analysis has proven to be a powerful approach in linkage studies of complex diseases/traits, but it has received little attention in GWA. In this study, we aim to develop a bivariate analytical method for GWA study, which can be used for a complex situation in which continuous trait and a binary trait are measured under study. Based on the modified extended generalized estimating equation (EGEE) method we proposed herein, we assessed the performance of our bivariate analyses through extensive simulations as well as real data analyses. In the study, to develop an EGEE approach for bivariate genetic analyses, we combined two different generalized linear models corresponding to phenotypic variables using a seemingly unrelated regression model. The simulation results demonstrated that our EGEE-based bivariate analytical method outperforms univariate analyses in increasing statistical power under a variety of simulation scenarios. Notably, EGEE-based bivariate analyses have consistent advantages over univariate analyses whether or not there exists a phenotypic correlation between the two traits. Our study has practical importance, as one can always use multivariate analyses as a screening tool when multiple phenotypes are available, without extra costs of statistical power and false-positive rate. Analyses on empirical GWA data further affirm the advantages of our bivariate analytical method.
Collapse
Affiliation(s)
- Jianfeng Liu
- Department of Orthopedic Surgery, School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, USA
| | | | | | | |
Collapse
|
32
|
Kim S, Sohn KA, Xing EP. A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics 2009; 25:i204-12. [PMID: 19477989 PMCID: PMC2687972 DOI: 10.1093/bioinformatics/btp218] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Many complex disease syndromes such as asthma consist of a large number of highly related, rather than independent, clinical phenotypes, raising a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. Although a causal genetic variation may influence a group of highly correlated traits jointly, most of the previous association analyses considered each phenotype separately, or combined results from a set of single-phenotype analyses. RESULTS We propose a new statistical framework called graph-guided fused lasso to address this issue in a principled way. Our approach represents the dependency structure among the quantitative traits explicitly as a network, and leverages this trait network to encode structured regularizations in a multivariate regression model over the genotypes and traits, so that the genetic markers that jointly influence subgroups of highly correlated traits can be detected with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently, our approach analyzes all of the traits jointly in a single statistical method to discover the genetic markers that perturb a subset of correlated traits jointly rather than a single trait. Using simulated datasets based on the HapMap consortium data and an asthma dataset, we compare the performance of our method with the single-marker analysis, and other sparse regression methods that do not use any structural information in the traits. Our results show that there is a significant advantage in detecting the true causal single nucleotide polymorphisms when we incorporate the correlation pattern in traits using our proposed methods. AVAILABILITY Software for GFlasso is available at http://www.sailing.cs.cmu.edu/gflasso.html.
Collapse
Affiliation(s)
- Seyoung Kim
- School of Computer Science, Carnegie Mellon University, Pittsburgh, USA.
| | | | | |
Collapse
|
33
|
Fang M, Liu S, Jiang D. Bayesian composite model space approach for mapping quantitative trait Loci in variance component model. Behav Genet 2009; 39:337-46. [PMID: 19263210 DOI: 10.1007/s10519-009-9259-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2008] [Accepted: 02/13/2009] [Indexed: 12/17/2022]
Abstract
In this article, we successfully apply the novel model selection method, Bayesian composite model space approach which has been used to map quantitative trait loci (QTL) for allelic substitution model, to map QTL for variance component model. The novel model selection approach has two advantages compared to the reversible jump Markov chain Monte Carlo method. Firstly, it mixes well due to the fixedness of the model dimension; secondly, it can map multiple QTL with higher power especially in genome-wide QTL mapping; finally, in the new method, it is also easy to incorporate our prior information about the variance components, which may bring precise estimate for variance components. A series of simulation experiments were conducted to demonstrate the general characters of the proposed method. The computer program is written in FORTRAN language, which is also built into a software "BayesMapQTL", and they also can be used for real data analysis and are available for request.
Collapse
Affiliation(s)
- Ming Fang
- Life Science College, Heilongjiang August First Land Reclamation University, 163319 Daqing, People's Republic of China.
| | | | | |
Collapse
|
34
|
Guo Z, Nelson JC. Multiple-trait quantitative trait locus mapping with incomplete phenotypic data. BMC Genet 2008; 9:82. [PMID: 19061502 PMCID: PMC2639387 DOI: 10.1186/1471-2156-9-82] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2008] [Accepted: 12/05/2008] [Indexed: 11/10/2022] Open
Abstract
Background Conventional multiple-trait quantitative trait locus (QTL) mapping methods must discard cases (individuals) with incomplete phenotypic data, thereby sacrificing other phenotypic and genotypic information contained in the discarded cases. Under standard assumptions about the missing-data mechanism, it is possible to exploit these cases. Results We present an expectation-maximization (EM) algorithm, derived for recombinant inbred and F2 genetic models but extensible to any mating design, that supports conventional hypothesis tests for QTL main effect, pleiotropy, and QTL-by-environment interaction in multiple-trait analyses with missing phenotypic data. We evaluate its performance by simulations and illustrate with a real-data example. Conclusion The EM method affords improved QTL detection power and precision of QTL location and effect estimation in comparison with case deletion or imputation methods. It may be incorporated into any least-squares or likelihood-maximization QTL-mapping approach.
Collapse
Affiliation(s)
- Zhigang Guo
- Department of Plant Pathology, Kansas State University, Manhattan, Kansas 66506, USA.
| | | |
Collapse
|
35
|
Keith JM, McRae A, Duffy D, Mengersen K, Visscher PM. Calculation of IBD probabilities with dense SNP or sequence data. Genet Epidemiol 2008; 32:513-9. [PMID: 18357613 DOI: 10.1002/gepi.20324] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The probabilities that two individuals share 0, 1, or 2 alleles identical by descent (IBD) at a given genotyped marker locus are quantities of fundamental importance for disease gene and quantitative trait mapping and in family-based tests of association. Until recently, genotyped markers were sufficiently sparse that founder haplotypes could be modelled as having been drawn from a population in linkage equilibrium for the purpose of estimating IBD probabilities. However, with the advent of high-throughput single nucleotide polymorphism genotyping assays, this is no longer a reasonable assumption. Indeed, the imminent arrival of individual sequencing will enable high-density single nucleotide polymorphism genotyping on a scale for which current algorithms are not equipped. In this paper, we present a simple new model in which founder haplotypes are modelled as a Markov chain. Another important innovation is that genotyping errors are explicitly incorporated into the model. We compare results obtained using the new model to those obtained using the popular genetic linkage analysis package Merlin, with and without using the cluster model of linkage disequilibrium that is incorporated into that program. We find that the new model results in accuracy approaching that of Merlin with haplotype blocks, but achieves this with orders of magnitude faster run times. Moreover, the new algorithm scales linearly with number of markers, irrespective of density, whereas Merlin scales supralinearly. We also confirm a previous finding that ignoring linkage disequilibrium in founder haplotypes can cause errors in the calculation of IBD probabilities.
Collapse
Affiliation(s)
- Jonathan M Keith
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Qld. 4001, Australia. j.keith.@qut.edu.au
| | | | | | | | | |
Collapse
|
36
|
Banerjee S, Yandell BS, Yi N. Bayesian quantitative trait loci mapping for multiple traits. Genetics 2008; 179:2275-89. [PMID: 18689903 PMCID: PMC2516097 DOI: 10.1534/genetics.108.088427] [Citation(s) in RCA: 100] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2008] [Accepted: 06/15/2008] [Indexed: 11/18/2022] Open
Abstract
Most quantitative trait loci (QTL) mapping experiments typically collect phenotypic data on multiple correlated complex traits. However, there is a lack of a comprehensive genomewide mapping strategy for correlated traits in the literature. We develop Bayesian multiple-QTL mapping methods for correlated continuous traits using two multivariate models: one that assumes the same genetic model for all traits, the traditional multivariate model, and the other known as the seemingly unrelated regression (SUR) model that allows different genetic models for different traits. We develop computationally efficient Markov chain Monte Carlo (MCMC) algorithms for performing joint analysis. We conduct extensive simulation studies to assess the performance of the proposed methods and to compare with the conventional single-trait model. Our methods have been implemented in the freely available package R/qtlbim (http://www.qtlbim.org), which greatly facilitates the general usage of the Bayesian methodology for unraveling the genetic architecture of complex traits.
Collapse
Affiliation(s)
- Samprit Banerjee
- Departments of Biostatistics, Section on Statistical Genetics, University of Alabama, Birmingham, AL 35294, USA
| | | | | |
Collapse
|
37
|
Fang M, Jiang D, Gao H, Sun D, Yang R, Zhang Q. A new Bayesian automatic model selection approach for mapping quantitative trait loci under variance component model. Genetica 2008; 135:429-37. [DOI: 10.1007/s10709-008-9291-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2008] [Accepted: 06/23/2008] [Indexed: 11/28/2022]
|
38
|
Multitrait analysis of quantitative trait loci using Bayesian composite space approach. BMC Genet 2008; 9:48. [PMID: 18637203 PMCID: PMC2515852 DOI: 10.1186/1471-2156-9-48] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Accepted: 07/18/2008] [Indexed: 11/17/2022] Open
Abstract
Background Multitrait analysis of quantitative trait loci can capture the maximum information of experiment. The maximum-likelihood approach and the least-square approach have been developed to jointly analyze multiple traits, but it is difficult for them to include multiple QTL simultaneously into one model. Results In this article, we have successfully extended Bayesian composite space approach, which is an efficient model selection method that can easily handle multiple QTL, to multitrait mapping of QTL. There are many statistical innovations of the proposed method compared with Bayesian single trait analysis. The first is that the parameters for all traits are updated jointly by vector or matrix; secondly, for QTL in the same interval that control different traits, the correlation between QTL genotypes is taken into account; thirdly, the information about the relationship of residual error between the traits is also made good use of. The superiority of the new method over separate analysis was demonstrated by both simulated and real data. The computing program was written in FORTRAN and it can be available for request. Conclusion The results suggest that the developed new method is more powerful than separate analysis.
Collapse
|
39
|
Wu XL, Gianola D, Weigel K. Bayesian joint mapping of quantitative trait loci for Gaussian and categorical characters in line crosses. Genetica 2008; 135:367-77. [DOI: 10.1007/s10709-008-9283-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2008] [Accepted: 06/03/2008] [Indexed: 11/29/2022]
|