1
|
Meher PK, Rustgi S, Kumar A. Performance of Bayesian and BLUP alphabets for genomic prediction: analysis, comparison and results. Heredity (Edinb) 2022; 128:519-530. [PMID: 35508540 DOI: 10.1038/s41437-022-00539-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 04/19/2022] [Accepted: 04/19/2022] [Indexed: 11/09/2022] Open
Abstract
We evaluated the performances of three BLUP and five Bayesian methods for genomic prediction by using nine actual and 54 simulated datasets. The genomic prediction accuracy was measured using Pearson's correlation coefficient between the genomic estimated breeding value (GEBV) and the observed phenotypic data using a fivefold cross-validation approach with 100 replications. The Bayesian alphabets performed better for the traits governed by a few genes/QTLs with relatively larger effects. On the contrary, the BLUP alphabets (GBLUP and CBLUP) exhibited higher genomic prediction accuracy for the traits controlled by several small-effect QTLs. Additionally, Bayesian methods performed better for the highly heritable traits and, for other traits, performed at par with the BLUP methods. Further, genomic BLUP (GBLUP) was identified as the least biased method for the GEBV estimation. Among the Bayesian methods, the Bayesian ridge regression and Bayesian LASSO were less biased than other Bayesian alphabets. Nonetheless, genomic prediction accuracy increased with an increase in trait heritability, irrespective of the sample size, marker density, and the QTL type (major/minor effect). In sum, this study provides valuable information regarding the choice of the selection method for genomic prediction in different breeding programs.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi-12, India.
| | - Sachin Rustgi
- Department of Plant and Environmental Sciences, Clemson University Pee Dee Research and Education Center, Darlington, SC, USA.
| | - Anuj Kumar
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi-12, India
| |
Collapse
|
2
|
Moura EG, Pamplona AKA, Balestre M. Functional models in genome-wide selection. PLoS One 2019; 14:e0222699. [PMID: 31644532 PMCID: PMC6808424 DOI: 10.1371/journal.pone.0222699] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 09/05/2019] [Indexed: 11/29/2022] Open
Abstract
The development of sequencing technologies has enabled the discovery of markers that are abundantly distributed over the whole genome. Knowledge about the marker locations in reference genomes provides further insights in the search for causal regions and the prediction of genomic values. The present study proposes a Bayesian functional approach for incorporating the marker locations into genomic analysis using stochastic methods to search causal regions and predict genotypic values. For this, three scenarios were analyzed: F2 population with 300 individuals and three different heritability levels (0.2, 0.5, and 0.8), along with 12,150 SNP markers that were distributed through ten linkage groups; F∞ populations with 320 individuals and three different heritability levels (0.2, 0.5, and 0.8), along with 10,020 SNP markers that were distributed through ten linkage groups; and data related to Eucalyptus spp. to measure the model performance in a real LD setting, with 611 individuals whose phenotypes were simulated from QTLs distributed through a panel of 36,812 SNPs with known positions. The performance of the proposed method was compared with those of other genome selection models, namely, RR-BLUP, Bayes B and Bayesian Lasso. The Bayesian functional model presented higher or similar predictive ability when compared with those classical regressions methods in simulated and real scenarios on different LD structures. In general, the Bayesian functional model also achieved higher computational efficiency, using 12 SNPs per MCMC round. The model was efficient in the identification of causal regions and showed high flexibility of analysis, as it is easily adaptable to any genomic selection model.
Collapse
Affiliation(s)
- Ernandes Guedes Moura
- Federal Institute of Maranhão - Campus São João dos Patos, São João dos Patos, Maranhão, Brasil
| | | | - Marcio Balestre
- Department of Statistics - Federal University of Lavras, Lavras, Minas Gerais, Brazil
| |
Collapse
|
3
|
Wang J, Zhou Z, Zhang Z, Li H, Liu D, Zhang Q, Bradbury PJ, Buckler ES, Zhang Z. Expanding the BLUP alphabet for genomic prediction adaptable to the genetic architectures of complex traits. Heredity (Edinb) 2018; 121:648-662. [PMID: 29765161 PMCID: PMC6221880 DOI: 10.1038/s41437-018-0075-0] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 03/16/2018] [Accepted: 03/17/2018] [Indexed: 12/05/2022] Open
Abstract
Improvement of statistical methods is crucial for realizing the potential of increasingly dense genetic markers. Bayesian methods treat all markers as random effects, exhibit an advantage on dense markers, and offer the flexibility of using different priors. In contrast, genomic best linear unbiased prediction (gBLUP) is superior in computing speed, but only superior in prediction accuracy for extremely complex traits. Currently, the existing variety in the BLUP method is insufficient for adapting to new sequencing technologies and traits with different genetic architectures. In this study, we found two ways to change the kinship derivation in the BLUP method that improve prediction accuracy while maintaining the computational advantage. First, using the settlement under progressively exclusive relationship (SUPER) algorithm, we substituted all available markers with estimated quantitative trait nucleotides (QTNs) to derive kinship. Second, we compressed individuals into groups based on kinship, and then used the groups as random effects instead of individuals. The two methods were named as SUPER BLUP (sBLUP) and compressed BLUP (cBLUP). Analyses on both simulated and real data demonstrated that these two methods offer flexibility for evaluating a variety of traits, covering a broadened realm of genetic architectures. For traits controlled by small numbers of genes, sBLUP outperforms Bayesian LASSO (least absolute shrinkage and selection operator). For traits with low heritability, cBLUP outperforms both gBLUP and Bayesian LASSO methods. We implemented these new BLUP alphabet series methods in an R package, Genome Association and Prediction Integrated Tool (GAPIT), available at http://zzlab.net/GAPIT .
Collapse
Affiliation(s)
- Jiabo Wang
- Department of Animal Science and Technology, Northeast Agricultural University, Harbin, China
- Institute of Animal Husbandry, Heilongjiang Academy of Agricultural Science, Harbin, China
- Department of Crop and Soil Sciences, Washington State University, Pullman, Washington, USA
| | - Zhengkui Zhou
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Hui Li
- Department of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| | - Di Liu
- Institute of Animal Husbandry, Heilongjiang Academy of Agricultural Science, Harbin, China
| | - Qin Zhang
- Department of Animal Breeding and Genetics, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Peter J Bradbury
- United States Department of Agriculture - Agricultural Research Service, Ithaca, New York, USA
| | - Edward S Buckler
- United States Department of Agriculture - Agricultural Research Service, Ithaca, New York, USA
| | - Zhiwu Zhang
- Department of Animal Science and Technology, Northeast Agricultural University, Harbin, China.
- Department of Crop and Soil Sciences, Washington State University, Pullman, Washington, USA.
| |
Collapse
|
4
|
Wittenburg D, Liebscher V. An approximate Bayesian significance test for genomic evaluations. Biom J 2018; 60:1096-1109. [PMID: 30101421 PMCID: PMC6282823 DOI: 10.1002/bimj.201700219] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 03/06/2018] [Accepted: 04/10/2018] [Indexed: 11/12/2022]
Abstract
Genomic information can be used to study the genetic architecture of some trait. Not only the size of the genetic effect captured by molecular markers and their position on the genome but also the mode of inheritance, which might be additive or dominant, and the presence of interactions are interesting parameters. When searching for interacting loci, estimating the effect size and determining the significant marker pairs increases the computational burden in terms of speed and memory allocation dramatically. This study revisits a rapid Bayesian approach (fastbayes). As a novel contribution, a measure of evidence is derived to select markers with effect significantly different from zero. It is based on the credibility of the highest posterior density interval next to zero in a marginalized manner. This methodology is applied to simulated data resembling a dairy cattle population in order to verify the sensitivity of testing for a given range of type-I error levels. A real data application complements this study. Sensitivity and specificity of fastbayes were similar to a variational Bayesian method, and a further reduction of computing time could be achieved. More than 50% of the simulated causative variants were identified. The most complex model containing different kinds of genetic effects and their pairwise interactions yielded the best outcome over a range of type-I error levels. The validation study showed that fastbayes is a dual-purpose tool for genomic inferences - it is applicable to predict future outcome of not-yet phenotyped individuals with high precision as well as to estimate and test single-marker effects. Furthermore, it allows the estimation of billions of interaction effects.
Collapse
Affiliation(s)
- Dörte Wittenburg
- Institute of Genetics and Biometry, Leibniz Institute for Farm Animal Biology, Wilhelm-Stahl-Allee 2, D-18196, Dummerstorf, Germany
| | - Volkmar Liebscher
- Department of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Str. 47, D-17489, Greifswald, Germany
| |
Collapse
|
5
|
Chen C, Steibel JP, Tempelman RJ. Genome-Wide Association Analyses Based on Broadly Different Specifications for Prior Distributions, Genomic Windows, and Estimation Methods. Genetics 2017; 206:1791-1806. [PMID: 28637709 PMCID: PMC5560788 DOI: 10.1534/genetics.117.202259] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2017] [Accepted: 06/19/2017] [Indexed: 11/18/2022] Open
Abstract
A currently popular strategy (EMMAX) for genome-wide association (GWA) analysis infers association for the specific marker of interest by treating its effect as fixed while treating all other marker effects as classical Gaussian random effects. It may be more statistically coherent to specify all markers as sharing the same prior distribution, whether that distribution is Gaussian, heavy-tailed (BayesA), or has variable selection specifications based on a mixture of, say, two Gaussian distributions [stochastic search and variable selection (SSVS)]. Furthermore, all such GWA inference should be formally based on posterior probabilities or test statistics as we present here, rather than merely being based on point estimates. We compared these three broad categories of priors within a simulation study to investigate the effects of different degrees of skewness for quantitative trait loci (QTL) effects and numbers of QTL using 43,266 SNP marker genotypes from 922 Duroc-Pietrain F2-cross pigs. Genomic regions were based either on single SNP associations, on nonoverlapping windows of various fixed sizes (0.5-3 Mb), or on adaptively determined windows that cluster the genome into blocks based on linkage disequilibrium. We found that SSVS and BayesA lead to the best receiver operating curve properties in almost all cases. We also evaluated approximate maximum a posteriori (MAP) approaches to BayesA and SSVS as potential computationally feasible alternatives; however, MAP inferences were not promising, particularly due to their sensitivity to starting values. We determined that it is advantageous to use variable selection specifications based on adaptively constructed genomic window lengths for GWA studies.
Collapse
Affiliation(s)
- Chunyu Chen
- Department of Animal Science, Michigan State University, East Lansing, Michigan 48824
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, Michigan 48824
| | - Robert J Tempelman
- Department of Animal Science, Michigan State University, East Lansing, Michigan 48824
| |
Collapse
|