1
|
Truong N, Tesfamariam K, Visintin L, Goessens T, De Saeger S, Lachat C, De Boevre M. Associating multiple mycotoxin exposure and health outcomes: current statistical approaches and challenges. WORLD MYCOTOXIN J 2022. [DOI: 10.3920/wmj2022.2784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Mycotoxin contamination is a global challenge to food safety and population health. A diversity of adverse effects in human health such as organ damage, immunity disorders and carcinogenesis are attributed to acute and chronic exposure to mycotoxins. While there is a high likelihood of mycotoxin co-occurrence in the daily diet, multiple mycotoxin exposures represent a considerable challenge in understanding the accumulative effects of groups of exposures on health outcomes. Nevertheless, previous studies on mycotoxin exposure-health outcome associations have focused on a single or a limited number of exposures. To guide multi-exposure assessment, careful considerations of statistical approaches available are required. In addition, the issue of multicollinearity in high-dimensional settings of multiple exposure analysis underlies the controversy surrounding the reliability and consistency of statistical conclusions about the exposure-health outcome associations. Conventional approaches such as generalised linear regressions (GLR) in conjunction with regularisation methods, including ridge regression, lasso and elastic net, offer some clear advantages in terms of results’ interpretation and model selection. However, when highly-correlated variables are observed, these methods have shown a low specificity in variable selection. Principal component analysis (PCA) that has been widely used as a dimensionality reduction technique also has the limitation to identify important predictor variables as this approach may overlook the associations between certain components and health outcomes. Recently, some alternative approaches have been introduced to address the issues of high dimensionality and highly-correlated data in the context of epidemiological and environmental research. Two of the noticeable approaches are weighted quantile sum regression (WQSR) and Bayesian kernel machine regression (BKMR). Combining different methods of inference allows us to interpret the role of certain exposures, their interactions and the combined effects on human health under diverse statistical perspectives, which ultimately facilitate the construction of the toxicological profile of multiple mycotoxins’ exposure.
Collapse
Affiliation(s)
- N.N. Truong
- Center of Excellence in Mycotoxicology and Public Health, Faculty of Pharmaceutical Sciences, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium
| | - K. Tesfamariam
- Center of Excellence in Mycotoxicology and Public Health, Faculty of Pharmaceutical Sciences, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium
- Department of Food Technology, Safety and Health, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
- Department of Public Health, College of Medicine and Health Sciences, Ambo University, Ambo, Ethiopia
- Department of Population and Family Health, Institute of Health, Jimma University, Jimma, Ethiopia
| | - L. Visintin
- Center of Excellence in Mycotoxicology and Public Health, Faculty of Pharmaceutical Sciences, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium
| | - T. Goessens
- Center of Excellence in Mycotoxicology and Public Health, Faculty of Pharmaceutical Sciences, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium
| | - S. De Saeger
- Center of Excellence in Mycotoxicology and Public Health, Faculty of Pharmaceutical Sciences, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium
- Department of Biotechnology and Food Technology, Faculty of Science, University of Johannesburg, P.O. Box 17011, Doornfontein Campus 2028, Gauteng, South Africa
| | - C. Lachat
- Department of Food Technology, Safety and Health, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - M. De Boevre
- Center of Excellence in Mycotoxicology and Public Health, Faculty of Pharmaceutical Sciences, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium
| |
Collapse
|
2
|
Vanhatalo J, Li Z, Sillanpää MJ. A Gaussian process model and Bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data. Bioinformatics 2020; 35:3684-3692. [PMID: 30850830 PMCID: PMC6761969 DOI: 10.1093/bioinformatics/btz164] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 12/05/2018] [Accepted: 03/06/2019] [Indexed: 12/22/2022] Open
Abstract
Motivation Recent advances in high dimensional phenotyping bring time as an extra dimension into the phenotypes. This promotes the quantitative trait locus (QTL) studies of function-valued traits such as those related to growth and development. Existing approaches for analyzing functional traits utilize either parametric methods or semi-parametric approaches based on splines and wavelets. However, very limited choices of software tools are currently available for practical implementation of functional QTL mapping and variable selection. Results We propose a Bayesian Gaussian process (GP) approach for functional QTL mapping. We use GPs to model the continuously varying coefficients which describe how the effects of molecular markers on the quantitative trait are changing over time. We use an efficient gradient based algorithm to estimate the tuning parameters of GPs. Notably, the GP approach is directly applicable to the incomplete datasets having even larger than 50% missing data rate (among phenotypes). We further develop a stepwise algorithm to search through the model space in terms of genetic variants, and use a minimal increase of Bayesian posterior probability as a stopping rule to focus on only a small set of putative QTL. We also discuss the connection between GP and penalized B-splines and wavelets. On two simulated and three real datasets, our GP approach demonstrates great flexibility for modeling different types of phenotypic trajectories with low computational cost. The proposed model selection approach finds the most likely QTL reliably in tested datasets. Availability and implementation Software and simulated data are available as a MATLAB package ‘GPQTLmapping’, and they can be downloaded from GitHub (https://github.com/jpvanhat/GPQTLmapping). Real datasets used in case studies are publicly available at QTL Archive. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jarno Vanhatalo
- Department of Mathematics and Statistics and Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki, Finland
| | - Zitong Li
- CSIRO Agriculture & Food, GPO Box 1600, Canberra, ACT 2601, Australia
| | - Mikko J Sillanpää
- Department of Mathematical Sciences, Biocenter Oulu and Infotech Oulu University of Oulu, Oulu FI-90014, Finland
| |
Collapse
|
3
|
Kontio JAJ, Sillanpää MJ. Scalable Nonparametric Prescreening Method for Searching Higher-Order Genetic Interactions Underlying Quantitative Traits. Genetics 2019; 213:1209-1224. [PMID: 31585953 PMCID: PMC6893368 DOI: 10.1534/genetics.119.302658] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 09/27/2019] [Indexed: 02/07/2023] Open
Abstract
Gaussian process (GP)-based automatic relevance determination (ARD) is known to be an efficient technique for identifying determinants of gene-by-gene interactions important to trait variation. However, the estimation of GP models is feasible only for low-dimensional datasets (∼200 variables), which severely limits application of the GP-based ARD method for high-throughput sequencing data. In this paper, we provide a nonparametric prescreening method that preserves virtually all the major benefits of the GP-based ARD method and extends its scalability to the typical high-dimensional datasets used in practice. In several simulated test scenarios, the proposed method compared favorably with existing nonparametric dimension reduction/prescreening methods suitable for higher-order interaction searches. As a real-data example, the proposed method was applied to a high-throughput dataset downloaded from the cancer genome atlas (TCGA) with measured expression levels of 16,976 genes (after preprocessing) from patients diagnosed with acute myeloid leukemia.
Collapse
Affiliation(s)
- Juho A J Kontio
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland and
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland and
- Infotech Oulu, University of Oulu, 90014, Finland
| |
Collapse
|
4
|
Ansarifar J, Wang L. New algorithms for detecting multi-effect and multi-way epistatic interactions. Bioinformatics 2019; 35:5078-5085. [DOI: 10.1093/bioinformatics/btz463] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Revised: 04/14/2019] [Accepted: 05/31/2019] [Indexed: 11/14/2022] Open
Abstract
AbstractMotivationEpistasis, which is the phenomenon of genetic interactions, plays a central role in many scientific discoveries. However, due to the combinatorial nature of the problem, it is extremely challenging to decipher the exact combinations of genes that trigger the epistatic effects. Many existing methods only focus on two-way interactions. Some of the most effective methods used machine learning techniques, but many were designed for special case-and-control studies or suffer from overfitting. We propose three new algorithms for multi-effect and multi-way epistases detection, with one guaranteeing global optimality and the other two being local optimization oriented heuristics.ResultsThe computational performance of the proposed heuristic algorithm was compared with several state-of-the-art methods using a yeast dataset. Results suggested that searching for the global optimal solution could be extremely time consuming, but the proposed heuristic algorithm was much more effective and efficient than others at finding a close-to-optimal solution. Moreover, it was able to provide biological insight on the exact configurations of epistases, besides achieving a higher prediction accuracy than the state-of-the-art methods.Availability and implementationData source was publicly available and details are provided in the text.
Collapse
|
5
|
Ko YA, Mukherjee B, Smith JA, Kardia SL, Allison M, Diez Roux AV. Classification and Clustering Methods for Multiple Environmental Factors in Gene-Environment Interaction: Application to the Multi-Ethnic Study of Atherosclerosis. Epidemiology 2016; 27:870-8. [PMID: 27479650 PMCID: PMC5039086 DOI: 10.1097/ede.0000000000000548] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
There has been an increased interest in identifying gene-environment interaction (G × E) in the context of multiple environmental exposures. Most G × E studies analyze one exposure at a time, but we are exposed to multiple exposures in reality. Efficient analysis strategies for complex G × E with multiple environmental factors in a single model are still lacking. Using the data from the Multiethnic Study of Atherosclerosis, we illustrate a two-step approach for modeling G × E with multiple environmental factors. First, we utilize common clustering and classification strategies (e.g., k-means, latent class analysis, classification and regression trees, Bayesian clustering using Dirichlet Process) to define subgroups corresponding to distinct environmental exposure profiles. Second, we illustrate the use of an additive main effects and multiplicative interaction model, instead of the conventional saturated interaction model using product terms of factors, to study G × E with the data-driven exposure subgroups defined in the first step. We demonstrate useful analytical approaches to translate multiple environmental exposures into one summary class. These tools not only allow researchers to consider several environmental exposures in G × E analysis but also provide some insight into how genes modify the effect of a comprehensive exposure profile instead of examining effect modification for each exposure in isolation.
Collapse
Affiliation(s)
- Yi-An Ko
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jennifer A. Smith
- Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Sharon L.R. Kardia
- Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Matthew Allison
- Department of Family Medicine and Public Health, University of California San Diego, La Jolla, CA 92093, USA
| | - Ana V. Diez Roux
- Department of Epidemiology and Biostatistics, Dornsife School of Public Health at Drexel University, Philadelphia, PA 19104, USA
| |
Collapse
|
6
|
Fang Z, Kim I, Schaumont P. Flexible variable selection for recovering sparsity in nonadditive nonparametric models. Biometrics 2016; 72:1155-1163. [PMID: 27077330 DOI: 10.1111/biom.12518] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Revised: 02/01/2016] [Accepted: 02/01/2016] [Indexed: 11/28/2022]
Abstract
Variable selection for recovering sparsity in nonadditive and nonparametric models with high-dimensional variables has been challenging. This problem becomes even more difficult due to complications in modeling unknown interaction terms among high-dimensional variables. There is currently no variable selection method to overcome these limitations. Hence, in this article we propose a variable selection approach that is developed by connecting a kernel machine with the nonparametric regression model. The advantages of our approach are that it can: (i) recover the sparsity; (ii) automatically model unknown and complicated interactions; (iii) connect with several existing approaches including linear nonnegative garrote and multiple kernel learning; and (iv) provide flexibility for both additive and nonadditive nonparametric models. Our approach can be viewed as a nonlinear version of a nonnegative garrote method. We model the smoothing function by a Least Squares Kernel Machine (LSKM) and construct the nonnegative garrote objective function as the function of the sparse scale parameters of kernel machine to recover sparsity of input variables whose relevances to the response are measured by the scale parameters. We also provide the asymptotic properties of our approach. We show that sparsistency is satisfied with consistent initial kernel function coefficients under certain conditions. An efficient coordinate descent/backfitting algorithm is developed. A resampling procedure for our variable selection methodology is also proposed to improve the power.
Collapse
Affiliation(s)
- Zaili Fang
- Department of Statistics, Virginia Tech., Blacksburg, Virginia, U.S.A
| | - Inyoung Kim
- Department of Statistics, Virginia Tech., Blacksburg, Virginia, U.S.A
| | - Patrick Schaumont
- Department of Electrical and Computer Engineering, Virginia Tech., Blacksburg, Virginia, U.S.A
| |
Collapse
|
7
|
Kessler DC, Hoff PD, Dunson DB. Marginally specified priors for non-parametric Bayesian estimation. J R Stat Soc Series B Stat Methodol 2015; 77:35-58. [PMID: 25663813 DOI: 10.1111/rssb.12059] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a parameter but will have real information about functionals of the parameter, such as the population mean or variance. The paper proposes a new framework for non-parametric Bayes inference in which the prior distribution for a possibly infinite dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a non-parametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard non-parametric prior distributions in common use and inherit the large support of the standard priors on which they are based. Additionally, posterior approximations under these informative priors can generally be made via minor adjustments to existing Markov chain approximation algorithms for standard non-parametric prior distributions. We illustrate the use of such priors in the context of multivariate density estimation using Dirichlet process mixture models, and in the modelling of high dimensional sparse contingency tables.
Collapse
|
8
|
Bobb JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, Godleski JJ, Coull BA. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics 2014; 16:493-508. [PMID: 25532525 DOI: 10.1093/biostatistics/kxu058] [Citation(s) in RCA: 935] [Impact Index Per Article: 93.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Accepted: 11/07/2014] [Indexed: 12/25/2022] Open
Abstract
Because humans are invariably exposed to complex chemical mixtures, estimating the health effects of multi-pollutant exposures is of critical concern in environmental epidemiology, and to regulatory agencies such as the U.S. Environmental Protection Agency. However, most health effects studies focus on single agents or consider simple two-way interaction models, in part because we lack the statistical methodology to more realistically capture the complexity of mixed exposures. We introduce Bayesian kernel machine regression (BKMR) as a new approach to study mixtures, in which the health outcome is regressed on a flexible function of the mixture (e.g. air pollution or toxic waste) components that is specified using a kernel function. In high-dimensional settings, a novel hierarchical variable selection approach is incorporated to identify important mixture components and account for the correlated structure of the mixture. Simulation studies demonstrate the success of BKMR in estimating the exposure-response function and in identifying the individual components of the mixture responsible for health effects. We demonstrate the features of the method through epidemiology and toxicology applications.
Collapse
Affiliation(s)
- Jennifer F Bobb
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| | - Linda Valeri
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| | - Birgit Claus Henn
- Department of Environmental Health, Harvard School of Public Health, Landmark Center, 401 Park Drive, Boston, MA 02215, USA
| | - David C Christiani
- Department of Environmental Health, Harvard School of Public Health, 665 Huntington Avenue, Boston, MA 02115, USA
| | - Robert O Wright
- Mount Sinai Hospital, 17 East 102 Street Floor 3, West Room D3-110, New York, NY 10029, USA
| | - Maitreyi Mazumdar
- Department of Environmental Health, Harvard School of Public Health, 665 Huntington Avenue, Boston, MA 02115, USA
| | - John J Godleski
- Department of Environmental Health, Harvard School of Public Health, Landmark Center, 401 Park Drive, Boston, MA 02215, USA
| | - Brent A Coull
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
9
|
Bhattacharya A, Pati D, Dunson D. ANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES. Ann Stat 2014; 42:352-381. [PMID: 25288827 DOI: 10.1214/13-aos1192] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
In nonparametric regression problems involving multiple predictors, there is typically interest in estimating an anisotropic multivariate regression surface in the important predictors while discarding the unimportant ones. Our focus is on defining a Bayesian procedure that leads to the minimax optimal rate of posterior contraction (up to a log factor) adapting to the unknown dimension and anisotropic smoothness of the true surface. We propose such an approach based on a Gaussian process prior with dimension-specific scalings, which are assigned carefully-chosen hyperpriors. We additionally show that using a homogenous Gaussian process with a single bandwidth leads to a sub-optimal rate in anisotropic cases.
Collapse
|
10
|
de Los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 2013; 193:327-45. [PMID: 22745228 PMCID: PMC3567727 DOI: 10.1534/genetics.112.143313] [Citation(s) in RCA: 489] [Impact Index Per Article: 44.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 06/11/2012] [Indexed: 11/18/2022] Open
Abstract
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.
Collapse
Affiliation(s)
- Gustavo de Los Campos
- Department of Biostatistics, School of Public Health, University of Alabama, Birmingham, AL 35294, USA.
| | | | | | | | | |
Collapse
|
11
|
Abstract
In this article, we present a selective overview of some recent developments in Bayesian model and variable selection methods for high dimensional linear models. While most of the reviews in literature are based on conventional methods, we focus on recently developed methods, which have proven to be successful in dealing with high dimensional variable selection. First, we give a brief overview of the traditional model selection methods (viz. Mallow's Cp, AIC, BIC, DIC), followed by a discussion on some recently developed methods (viz. EBIC, regularization), which have occupied the minds of many statisticians. Then, we review high dimensional Bayesian methods with a particular emphasis on Bayesian regularization methods, which have been used extensively in recent years. We conclude by briefly addressing the asymptotic behaviors of Bayesian variable selection methods for high dimensional linear models under different regularity conditions.
Collapse
Affiliation(s)
- Himel Mallick
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Nengjun Yi
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
12
|
Abstract
We propose a nested Gaussian process (nGP) as a locally adaptive prior for Bayesian nonparametric regression. Specified through a set of stochastic differential equations (SDEs), the nGP imposes a Gaussian process prior for the function's mth-order derivative. The nesting comes in through including a local instantaneous mean function, which is drawn from another Gaussian process inducing adaptivity to locally-varying smoothness. We discuss the support of the nGP prior in terms of the closure of a reproducing kernel Hilbert space, and consider theoretical properties of the posterior. The posterior mean under the nGP prior is shown to be equivalent to the minimizer of a nested penalized sum-of-squares involving penalties for both the global and local roughness of the function. Using highly-efficient Markov chain Monte Carlo for posterior inference, the proposed method performs well in simulation studies compared to several alternatives, and is scalable to massive data, illustrated through a proteomics application.
Collapse
Affiliation(s)
- Bin Zhu
- Tenure-Track Principal Investigator, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20852
| | - David B Dunson
- Arts & Sciences Distinguished Professor, Department of Statistical Science, Duke University, Durham, NC 27708
| |
Collapse
|
13
|
Abstract
Many common human diseases and complex traits are highly heritable and influenced by multiple genetic and environmental factors. Although genome-wide association studies (GWAS) have successfully identified many disease-associated variants, these genetic variants explain only a small proportion of the heritability of most complex diseases. Genetic interactions (gene-gene and gene-environment) substantially contribute to complex traits and diseases and could be one of the main sources of the missing heritability. This paper provides an overview of the available statistical methods and related computer software for identifying genetic interactions in animal and plant experimental crosses and human genetic association studies. The main discussion falls under the three broad issues in statistical analysis of genetic interactions: the definition, detection and interpretation of genetic interactions. Recently developed methods based on modern techniques for high-dimensional data are reviewed, including penalized likelihood approaches and hierarchical models; the relationships between these methods are also discussed. I conclude this review by highlighting some areas of future research.
Collapse
|
14
|
Abstract
Genomic data provide a valuable source of information for modeling covariance structures, allowing a more accurate prediction of total genetic values (GVs). We apply the kriging concept, originally developed in the geostatistical context for predictions in the low-dimensional space, to the high-dimensional space spanned by genomic single nucleotide polymorphism (SNP) vectors and study its properties in different gene-action scenarios. Two different kriging methods [“universal kriging” (UK) and “simple kriging” (SK)] are presented. As a novelty, we suggest use of the family of Matérn covariance functions to model the covariance structure of SNP vectors. A genomic best linear unbiased prediction (GBLUP) is applied as a reference method. The three approaches are compared in a whole-genome simulation study considering additive, additive-dominance, and epistatic gene-action models. Predictive performance is measured in terms of correlation between true and predicted GVs and average true GVs of the individuals ranked best by prediction. We show that UK outperforms GBLUP in the presence of dominance and epistatic effects. In a limiting case, it is shown that the genomic covariance structure proposed by VanRaden (2008) can be considered as a covariance function with corresponding quadratic variogram. We also prove theoretically that if a specific linear relationship exists between covariance matrices for two linear mixed models, the GVs resulting from BLUP are linked by a scaling factor. Finally, the relation of kriging to other models is discussed and further options for modeling the covariance structure, which might be more appropriate in the genomic context, are suggested.
Collapse
|
15
|
Xu HM, Wei CS, Tang YT, Zhu ZH, Sima YF, Lou XY. A new mapping method for quantitative trait loci of silkworm. BMC Genet 2011; 12:19. [PMID: 21276233 PMCID: PMC3042969 DOI: 10.1186/1471-2156-12-19] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2010] [Accepted: 01/28/2011] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Silkworm is the basis of sericultural industry and the model organism in insect genetics study. Mapping quantitative trait loci (QTLs) underlying economically important traits of silkworm is of high significance for promoting the silkworm molecular breeding and advancing our knowledge on genetic architecture of the Lepidoptera. Yet, the currently used mapping methods are not well suitable for silkworm, because of ignoring the recombination difference in meiosis between two sexes. RESULTS A mixed linear model including QTL main effects, epistatic effects, and QTL × sex interaction effects was proposed for mapping QTLs in an F2 population of silkworm. The number and positions of QTLs were determined by F-test and model selection. The Markov chain Monte Carlo (MCMC) algorithm was employed to estimate and test genetic effects of QTLs and QTL × sex interaction effects. The effectiveness of the model and statistical method was validated by a series of simulations. The results indicate that when markers are distributed sparsely on chromosomes, our method will substantially improve estimation accuracy as compared to the normal chiasmate F2 model. We also found that a sample size of hundreds was sufficiently large to unbiasedly estimate all the four types of epistases (i.e., additive-additive, additive-dominance, dominance-additive, and dominance-dominance) when the paired QTLs reside on different chromosomes in silkworm. CONCLUSION The proposed method could accurately estimate not only the additive, dominance and digenic epistatic effects but also their interaction effects with sex, correcting the potential bias and precision loss in the current QTL mapping practice of silkworm and thus representing an important addition to the arsenal of QTL mapping tools.
Collapse
Affiliation(s)
- Hai-Ming Xu
- Institute of Bioinformatics, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310029, China
| | | | | | | | | | | |
Collapse
|