Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: van den Berg S, Calus MPL, Meuwissen THE, Wientjes YCJ. Across population genomic prediction scenarios in which Bayesian variable selection outperforms GBLUP. BMC Genet 2015;16:146. [PMID: 26698836 PMCID: PMC4690391 DOI: 10.1186/s12863-015-0305-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 12/10/2015] [Indexed: 12/21/2022] Open

For:	van den Berg S, Calus MPL, Meuwissen THE, Wientjes YCJ. Across population genomic prediction scenarios in which Bayesian variable selection outperforms GBLUP. BMC Genet 2015;16:146. [PMID: 26698836 PMCID: PMC4690391 DOI: 10.1186/s12863-015-0305-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 12/10/2015] [Indexed: 12/21/2022] Open

Number

Cited by Other Article(s)

Ajasa AA, Boison SA, Gjøen HM, Lillehammer M. Accuracy of genomic prediction using multiple Atlantic salmon populations. Genet Sel Evol 2024;56:38. [PMID: 38750427 PMCID: PMC11094890 DOI: 10.1186/s12711-024-00907-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 05/06/2024] [Indexed: 05/19/2024] Open

Abstract

BACKGROUND

The accuracy of genomic prediction is partly determined by the size of the reference population. In Atlantic salmon breeding programs, four parallel populations often exist, thus offering the opportunity to increase the size of the reference set by combining these populations. By allowing a reduction in the number of records per population, multi-population prediction can potentially reduce cost and welfare issues related to the recording of traits, particularly for diseases. In this study, we evaluated the accuracy of multi- and across-population prediction of breeding values for resistance to amoebic gill disease (AGD) using all single nucleotide polymorphisms (SNPs) on a 55K chip or a selected subset of SNPs based on the signs of allele substitution effect estimates across populations, using both linear and nonlinear genomic prediction (GP) models in Atlantic salmon populations. In addition, we investigated genetic distance, genetic correlation estimated based on genomic relationships, and persistency of linkage disequilibrium (LD) phase across these populations.

RESULTS

The genetic distance between populations ranged from 0.03 to 0.07, while the genetic correlation ranged from 0.19 to 0.99. Nonetheless, compared to within-population prediction, there was limited or no impact of combining populations for multi-population prediction across the various models used or when using the selected subset of SNPs. The estimates of across-population prediction accuracy were low and to some extent proportional to the genetic correlation estimates. The persistency of LD phase between adjacent markers across populations using all SNP data ranged from 0.51 to 0.65, indicating that LD is poorly conserved across the studied populations.

CONCLUSIONS

Our results show that a high genetic correlation and a high genetic relationship between populations do not guarantee a higher prediction accuracy from multi-population genomic prediction in Atlantic salmon.

Collapse

Yin C, Shi H, Zhou P, Wang Y, Tao X, Yin Z, Zhang X, Liu Y. Genomic Prediction of Growth Traits in Yorkshire Pigs of Different Reference Group Sizes Using Different Estimated Breeding Value Models. Animals (Basel) 2024;14:1098. [PMID: 38612337 PMCID: PMC11010886 DOI: 10.3390/ani14071098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 03/31/2024] [Accepted: 04/02/2024] [Indexed: 04/14/2024] Open

Abstract

The need for sufficient reference population data poses a significant challenge in breeding programs aimed at improving pig farming on a small to medium scale. To overcome this hurdle, investigating the advantages of combing reference populations of varying sizes is crucial for enhancing the accuracy of the genomic estimated breeding value (GEBV). Genomic selection (GS) in populations with limited reference data can be optimized by combining populations of the same breed or related breeds. This study focused on understanding the effect of combing different reference group sizes on the accuracy of GS for determining the growth effectiveness and percentage of lean meat in Yorkshire pigs. Specifically, our study investigated two important traits: the age at 100 kg live weight (AGE100) and the backfat thickness at 100 kg live weight (BF100). This research assessed the efficiency of genomic prediction (GP) using different GEBV models across three Yorkshire populations with varying genetic backgrounds. The GeneSeek 50K GGP porcine high-density array was used for genotyping. A total of 2295 Yorkshire pigs were included, representing three Yorkshire pig populations with different genetic backgrounds-295 from Danish (small) lines from Huaibei City, Anhui Province, 500 from Canadian (medium) lines from Lixin County, Anhui Province, and 1500 from American (large) lines from Shanghai. To evaluate the impact of different population combination scenarios on the GS accuracy, three approaches were explored: (1) combining all three populations for prediction, (2) combining two populations to predict the third, and (3) predicting each population independently. Five GEBV models, including three Bayesian models (BayesA, BayesB, and BayesC), the genomic best linear unbiased prediction (GBLUP) model, and single-step GBLUP (ssGBLUP) were implemented through 20 repetitions of five-fold cross-validation (CV). The results indicate that predicting one target population using the other two populations yielded the highest accuracy, providing a novel approach for improving the genomic selection accuracy in Yorkshire pigs. In this study, it was found that using different populations of the same breed to predict small- and medium-sized herds might be effective in improving the GEBV. This investigation highlights the significance of incorporating population combinations in genetic models for predicting the breeding value, particularly for pig farmers confronted with resource limitations.

Collapse

Cai W, Hu J, Fan W, Xu Y, Tang J, Xie M, Zhang Y, Guo Z, Zhou Z, Hou S. Strategies to improve genomic predictions for 35 duck carcass traits in an F₂ population. J Anim Sci Biotechnol 2023;14:74. [PMID: 37147656 PMCID: PMC10163724 DOI: 10.1186/s40104-023-00875-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 04/02/2023] [Indexed: 05/07/2023] Open

Abstract

BACKGROUND

Carcass traits are crucial for broiler ducks, but carcass traits can only be measured postmortem. Genomic selection (GS) is an effective approach in animal breeding to improve selection and reduce costs. However, the performance of genomic prediction in duck carcass traits remains largely unknown.

RESULTS

In this study, we estimated the genetic parameters, performed GS using different models and marker densities, and compared the estimation performance between GS and conventional BLUP on 35 carcass traits in an F₂ population of ducks. Most of the cut weight traits and intestine length traits were estimated to be high and moderate heritabilities, respectively, while the heritabilities of percentage slaughter traits were dynamic. The reliability of genome prediction using GBLUP increased by an average of 0.06 compared to the conventional BLUP method. The Permutation studies revealed that 50K markers had achieved ideal prediction reliability, while 3K markers still achieved 90.7% predictive capability would further reduce the cost for duck carcass traits. The genomic relationship matrix normalized by our true variance method instead of the widely used [Formula: see text] could achieve an increase in prediction reliability in most traits. We detected most of the bayesian models had a better performance, especially for BayesN. Compared to GBLUP, BayesN can further improve the predictive reliability with an average of 0.06 for duck carcass traits.

CONCLUSION

This study demonstrates genomic selection for duck carcass traits is promising. The genomic prediction can be further improved by modifying the genomic relationship matrix using our proposed true variance method and several Bayesian models. Permutation study provides a theoretical basis for the fact that low-density arrays can be used to reduce genotype costs in duck genome selection.

Collapse

Vu NT, Phuc TH, Nguyen NH, Van Sang N. Effects of common full-sib families on accuracy of genomic prediction for tagging weight in striped catfish Pangasianodon hypophthalmus. Front Genet 2023;13:1081246. [PMID: 36685869 PMCID: PMC9845282 DOI: 10.3389/fgene.2022.1081246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 12/06/2022] [Indexed: 01/06/2023] Open

Nazzicari N, Biscarini F. Stacked kinship CNN vs. GBLUP for genomic predictions of additive and complex continuous phenotypes. Sci Rep 2022;12:19889. [PMID: 36400808 PMCID: PMC9674857 DOI: 10.1038/s41598-022-24405-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022] Open

Vela-Avitúa S, Thorland I, Bakopoulos V, Papanna K, Dimitroglou A, Kottaras E, Leonidas P, Guinand B, Tsigenopoulos CS, Aslam ML. Genetic Basis for Resistance Against Viral Nervous Necrosis: GWAS and Potential of Genomic Prediction Explored in Farmed European Sea Bass (Dicentrarchus labrax). Front Genet 2022;13:804584. [PMID: 35401661 PMCID: PMC8992836 DOI: 10.3389/fgene.2022.804584] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 02/22/2022] [Indexed: 11/13/2022] Open

Marjanovic J, Calus MPL. Factors affecting accuracy of estimated effective number of chromosome segments for numerically small breeds. J Anim Breed Genet 2021;138:151-160. [PMID: 33040409 PMCID: PMC7891385 DOI: 10.1111/jbg.12512] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 08/25/2020] [Accepted: 09/12/2020] [Indexed: 11/28/2022]

Abstract

For numerically small breeds, obtaining a sufficiently large breed-specific reference population for genomic prediction is challenging or simply not possible, but may be overcome by adding individuals from another breed. To prioritize among available breeds, the effective number of chromosome segments (Me ) can be used as an indicator of relatedness between individuals from different breeds. The Me is also an important parameter in determining the accuracy of genomic prediction. The Me can be estimated both within a population and between two populations or breeds, as the reciprocal of the variance of genomic relationships. However, the threshold for number of individuals needed to accurately estimate within or between populations Me is currently unknown. It is also unknown if a discrepancy in number of genotyped individuals in two breeds affects the estimates of Me between populations. In this study, we conducted a simulation that mimics current domestic cattle populations in order to investigate how estimated Me is affected by number of genotyped individuals, single-nucleotide polymorphism (SNP) density and pedigree availability. Our results show that a small sample of 10 genotyped individuals may result in substantial over or underestimation of Me . While estimates of within population Me were hardly affected by SNP density, between population Me values were highly dependent on the number of available SNPs, with higher SNP densities being able to detect more independent chromosome segments. When subtracting pedigree from genomic relationships before computing Me , estimates of within population Me were three to four times higher than estimates with genotypes only; however, between Me estimates remained the same. For accurate estimation of within and between population Me , at least 50 individuals should be genotyped per population. Estimates of within Me were highly affected by whether pedigree was used or not. For within Me , even the smallest SNP density (~11k) resulted in accurate representation of family relationships in the population; however, for between Me , many more markers are needed to capture all independent segments.

Collapse

Estimation of Molecular Pairwise Relatedness in Autopolyploid Crops. G3-GENES GENOMES GENETICS 2020;10:4579-4589. [PMID: 33051262 PMCID: PMC7718764 DOI: 10.1534/g3.120.401669] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Ren D, An L, Li B, Qiao L, Liu W. Efficient weighting methods for genomic best linear-unbiased prediction (BLUP) adapted to the genetic architectures of quantitative traits. Heredity (Edinb) 2020;126:320-334. [PMID: 32980863 DOI: 10.1038/s41437-020-00372-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 09/12/2020] [Accepted: 09/13/2020] [Indexed: 11/09/2022] Open

Abstract

Genomic best linear-unbiased prediction (GBLUP) assumes equal variance for all marker effects, which is suitable for traits that conform to the infinitesimal model. For traits controlled by major genes, Bayesian methods with shrinkage priors or genome-wide association study (GWAS) methods can be used to identify causal variants effectively. The information from Bayesian/GWAS methods can be used to construct the weighted genomic relationship matrix (G). However, it remains unclear which methods perform best for traits varying in genetic architecture. Therefore, we developed several methods to optimize the performance of weighted GBLUP and compare them with other available methods using simulated and real data sets. First, two types of methods (marker effects with local shrinkage or normal prior) were used to obtain test statistics and estimates for each marker effect. Second, three weighted G matrices were constructed based on the marker information from the first step: (1) the genomic-feature-weighted G, (2) the estimated marker-variance-weighted G, and (3) the absolute value of the estimated marker-effect-weighted G. Following the above process, six different weighted GBLUP methods (local shrinkage/normal-prior GF/EV/AEWGBLUP) were proposed for genomic prediction. Analyses with both simulated and real data demonstrated that these options offer flexibility for optimizing the weighted GBLUP for traits with a broad spectrum of genetic architectures. The advantage of weighting methods over GBLUP in terms of accuracy was trait dependant, ranging from 14.8% to marginal for simulated traits and from 44% to marginal for real traits. Local-shrinkage prior EVWGBLUP is superior for traits mainly controlled by loci of a large effect. Normal-prior AEWGBLUP performs well for traits mainly controlled by loci of moderate effect. For traits controlled by some loci with large effects (explain 25-50% genetic variance) and a range of loci with small effects, GFWGBLUP has advantages. In conclusion, the optimal weighted GBLUP method for genomic selection should take both the genetic architecture and number of QTLs of traits into consideration carefully.

Collapse

Ye S, Song H, Ding X, Zhang Z, Li J. Pre-selecting markers based on fixation index scores improved the power of genomic evaluations in a combined Yorkshire pig population. Animal 2020;14:1555-1564. [PMID: 32209149 DOI: 10.1017/s1751731120000506] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open

Abstract

Combining different swine populations in genomic prediction can be an important tool, leading to an increased accuracy of genomic prediction using single nucleotide polymorphism (SNP) chip data compared with within-population genomic. However, the expected higher accuracy of multi-population genomic prediction has not been realized. This may be due to an inconsistent linkage disequilibrium (LD) between SNPs and quantitative trait loci (QTL) across populations, and the weak genetic relationships across populations. In this study, we determined the impact of different genomic relationship matrices, SNP density and pre-selected variants on prediction accuracy using a combined Yorkshire pig population. Our objective was to provide useful strategies for improving the accuracy of genomic prediction within a combined population. Results showed that the accuracy of genomic best linear unbiased prediction (GBLUP) using imputed whole-genome sequencing (WGS) data in the combined population was always higher than that within populations. Furthermore, the use of imputed WGS data always resulted in a higher accuracy of GBLUP than the use of 80K chip data for the combined population. Additionally, the accuracy of GBLUP with a non-linear genomic relationship matrix was markedly increased (0.87% to 15.17% for 80K chip data, and 0.43% to 4.01% for imputed WGS data) compared with that obtained with a linear genomic relationship matrix, except for the prediction of XD population in the combined population using imputed WGS data. More importantly, the application of pre-selected variants based on fixation index (Fst) scores improved the accuracy of multi-population genomic prediction, especially for 80K chip data. For BLUP|GA (BLUP approach given the genetic architecture), the use of a linear method with an appropriate weight to build a weight-relatedness matrix led to a higher prediction accuracy compared with the use of only pre-selected SNPs for genomic evaluations, especially for the total number of piglets born. However, for the non-linear method, BLUP|GA showed only a small increase or even a decrease in prediction accuracy compared with the use of only pre-selected SNPs. Overall, the best genomic evaluation strategy for reproduction-related traits for a combined population was found to be GBLUP performed with a non-linear genomic relationship matrix using variants pre-selected from the 80K chip data based on Fst scores.

Collapse

van den Berg I, Meuwissen THE, MacLeod IM, Goddard ME. Predicting the effect of reference population on the accuracy of within, across, and multibreed genomic prediction. J Dairy Sci 2019;102:3155-3174. [PMID: 30738664 DOI: 10.3168/jds.2018-15231] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 12/08/2018] [Indexed: 01/24/2023]

Abstract

Genomic prediction is widely used to select candidates for breeding. Size and composition of the reference population are important factors influencing prediction accuracy. In Holstein dairy cattle, large reference populations are used, but this is difficult to achieve in numerically small breeds and for traits that are not routinely recorded. The prediction accuracy is usually estimated using cross-validation, requiring the full data set. It would be useful to have a method to predict the benefit of multibreed reference populations that does not require the availability of the full data set. Our objective was to study the effect of the size and breed composition of the reference population on the accuracy of genomic prediction using genomic BLUP and Bayes R. We also examined the effect of trait heritability and validation breed on prediction accuracy. Using these empirical results, we investigated the use of a formula to predict the effect of the size and composition of the reference population on the accuracy of genomic prediction. Phenotypes were simulated in a data set containing real genotypes of imputed sequence variants for 22,752 dairy bulls and cows, including Holstein, Jersey, Red Holstein, and Australian Red cattle. Different reference populations were constructed, varying in size and composition, to study within-breed, multibreed, and across-breed prediction. Phenotypes were simulated varying in heritability, number of chromosomes, and number of quantitative trait loci. Genomic prediction was carried out using genomic BLUP and Bayes R. We used either the genomic relationship matrix (GRM) to estimate the number of independent chromosomal segments and subsequently to predict accuracy, or the accuracies obtained from single-breed reference populations to predict the accuracies of larger or multibreed reference populations. Using the GRM overestimated the accuracy; this overestimation was likely due to close relationships among some of the reference animals. Consequently, the GRM could not be used to predict the accuracy of genomic prediction reliably. However, a method using the prediction accuracies obtained by cross-validation using a small, single-breed reference population predicted the accuracy using a multibreed reference population well and slightly overestimated the accuracy for a larger reference population of the same breed, but gave a reasonably close estimate of the accuracy for a multibreed reference population. This method could be useful for making decisions regarding the size and composition of the reference population.

Collapse

Zhang C, Kemp RA, Stothard P, Wang Z, Boddicker N, Krivushin K, Dekkers J, Plastow G. Genomic evaluation of feed efficiency component traits in Duroc pigs using 80K, 650K and whole-genome sequence variants. Genet Sel Evol 2018;50:14. [PMID: 29625549 PMCID: PMC5889553 DOI: 10.1186/s12711-018-0387-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2017] [Accepted: 03/27/2018] [Indexed: 12/30/2022] Open

Abstract

BACKGROUND

Increasing marker density was proposed to have potential to improve the accuracy of genomic prediction for quantitative traits; whole-sequence data is expected to give the best accuracy of prediction, since all causal mutations that underlie a trait are expected to be included. However, in cattle and chicken, this assumption is not supported by empirical studies. Our objective was to compare the accuracy of genomic prediction of feed efficiency component traits in Duroc pigs using single nucleotide polymorphism (SNP) panels of 80K, imputed 650K, and whole-genome sequence variants using GBLUP, BayesB and BayesRC methods, with the ultimate purpose to determine the optimal method to increase genetic gain for feed efficiency in pigs.

RESULTS

Phenotypes of average daily feed intake (ADFI), average daily gain (ADG), ultrasound backfat depth (FAT), and loin muscle depth (LMD) were available for 1363 Duroc boars from a commercial breeding program. Genotype imputation accuracies reached 92.1% from 80K to 650K and 85.6% from 650K to whole-genome sequence variants. Average accuracies across methods and marker densities of genomic prediction of ADFI, FAT, LMD and ADG were 0.40, 0.65, 0.30 and 0.15, respectively. For ADFI and FAT, BayesB outperformed GBLUP, but increasing marker density had little advantage for genomic prediction. For ADG and LMD, GBLUP outperformed BayesB, while BayesRC based on whole-genome sequence data gave the best accuracies and reached up to 0.35 for LMD and 0.25 for ADG.

CONCLUSIONS

Use of genomic information was beneficial for prediction of ADFI and FAT but not for that of ADG and LMD compared to pedigree-based estimates. BayesB based on 80K SNPs gave the best genomic prediction accuracy for ADFI and FAT, while BayesRC based on whole-genome sequence data performed best for ADG and LMD. We suggest that these differences between traits in the effect of marker density and method on accuracy of genomic prediction are mainly due to the underlying genetic architecture of the traits.

Collapse

Calus MPL, Goddard ME, Wientjes YCJ, Bowman PJ, Hayes BJ. Multibreed genomic prediction using multitrait genomic residual maximum likelihood and multitask Bayesian variable selection. J Dairy Sci 2018;101:4279-4294. [PMID: 29550121 DOI: 10.3168/jds.2017-13366] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 01/04/2018] [Indexed: 11/19/2022]

Abstract

Genomic prediction is applicable to individuals of different breeds. Empirical results to date, however, show limited benefits in using information on multiple breeds in the context of genomic prediction. We investigated a multitask Bayesian model, presented previously by others, implemented in a Bayesian stochastic search variable selection (BSSVS) model. This model allowed for evidence of quantitative trait loci (QTL) to be accumulated across breeds or for both QTL that segregate across breeds and breed-specific QTL. In both cases, single nucleotide polymorphism effects were estimated with information from a single breed. Other models considered were a single-trait and multitrait genomic residual maximum likelihood (GREML) model, with breeds considered as different traits, and a single-trait BSSVS model. All single-trait models were applied to each of the 2 breeds separately and to the pooled data of both breeds. The data used included a training data set of 6,278 Holstein and 722 Jersey bulls, as well as 374 Jersey validation bulls. All animals had genotypes for 474,773 single nucleotide polymorphisms after editing and phenotypes for milk, fat, and protein yields. Using the same training data, BSSVS consistently outperformed GREML. The multitask BSSVS, however, did not outperform single-trait BSSVS, which used pooled Holstein and Jersey data for training. Thus, the rigorous assumption that the traits are the same in both breeds yielded a slightly better prediction than a model that had to estimate the correlation between the breeds from the data. Adding the Holstein data significantly increased the accuracy of the single-trait GREML and BSSVS in predicting the Jerseys for milk and protein, in line with estimated correlations between the breeds of 0.66 and 0.47 for milk and protein yields, whereas only the BSSVS model significantly improved the accuracy for fat yield with an estimated correlation between breeds of only 0.05. The relatively high genetic correlations for milk and protein yields, and the superiority of the pooling strategy, is likely the result of the observed admixture between both breeds in our data. The Bayesian model was able to detect several QTL in Holsteins, which likely enabled it to outperform GREML. The inability of the multitask Bayesian models to outperform a simple pooling strategy may be explained by the fact that the pooling strategy assumes equal effects in both breeds; furthermore, this assumption may be valid for moderate- to large-sized QTL, which are important for multibreed genomic prediction.

Collapse

Joint genome-wide prediction in several populations accounting for randomness of genotypes: A hierarchical Bayes approach. I: Multivariate Gaussian priors for marker effects and derivation of the joint probability mass function of genotypes. J Theor Biol 2017;417:8-19. [PMID: 28043819 DOI: 10.1016/j.jtbi.2016.12.020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 11/30/2016] [Accepted: 12/28/2016] [Indexed: 11/23/2022]

Abstract

It is important to consider heterogeneity of marker effects and allelic frequencies in across population genome-wide prediction studies. Moreover, all regression models used in genome-wide prediction overlook randomness of genotypes. In this study, a family of hierarchical Bayesian models to perform across population genome-wide prediction modeling genotypes as random variables and allowing population-specific effects for each marker was developed. Models shared a common structure and differed in the priors used and the assumption about residual variances (homogeneous or heterogeneous). Randomness of genotypes was accounted for by deriving the joint probability mass function of marker genotypes conditional on allelic frequencies and pedigree information. As a consequence, these models incorporated kinship and genotypic information that not only permitted to account for heterogeneity of allelic frequencies, but also to include individuals with missing genotypes at some or all loci without the need for previous imputation. This was possible because the non-observed fraction of the design matrix was treated as an unknown model parameter. For each model, a simpler version ignoring population structure, but still accounting for randomness of genotypes was proposed. Implementation of these models and computation of some criteria for model comparison were illustrated using two simulated datasets. Theoretical and computational issues along with possible applications, extensions and refinements were discussed. Some features of the models developed in this study make them promising for genome-wide prediction, the use of information contained in the probability distribution of genotypes is perhaps the most appealing. Further studies to assess the performance of the models proposed here and also to compare them with conventional models used in genome-wide prediction are needed.

Collapse

Joint genome-wide prediction in several populations accounting for randomness of genotypes: A hierarchical Bayes approach. II: Multivariate spike and slab priors for marker effects and derivation of approximate Bayes and fractional Bayes factors for the complete family of models. J Theor Biol 2017;417:131-141. [PMID: 28088357 DOI: 10.1016/j.jtbi.2016.12.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 11/30/2016] [Accepted: 12/28/2016] [Indexed: 11/22/2022]

Abstract

This study corresponds to the second part of a companion paper devoted to the development of Bayesian multiple regression models accounting for randomness of genotypes in across population genome-wide prediction. This family of models considers heterogeneous and correlated marker effects and allelic frequencies across populations, and has the ability of considering records from non-genotyped individuals and individuals with missing genotypes in any subset of loci without the need for previous imputation, taking into account uncertainty about imputed genotypes. This paper extends this family of models by considering multivariate spike and slab conditional priors for marker allele substitution effects and contains derivations of approximate Bayes factors and fractional Bayes factors to compare models from part I and those developed here with their null versions. These null versions correspond to simpler models ignoring heterogeneity of populations, but still accounting for randomness of genotypes. For each marker loci, the spike component of priors corresponded to point mass at 0 in R^S, where S is the number of populations, and the slab component was a S-variate Gaussian distribution, independent conditional priors were assumed. For the Gaussian components, covariance matrices were assumed to be either the same for all markers or different for each marker. For null models, the priors were simply univariate versions of these finite mixture distributions. Approximate algebraic expressions for Bayes factors and fractional Bayes factors were found using the Laplace approximation. Using the simulated datasets described in part I, these models were implemented and compared with models derived in part I using measures of predictive performance based on squared Pearson correlations, Deviance Information Criterion, Bayes factors, and fractional Bayes factors. The extensions presented here enlarge our family of genome-wide prediction models making it more flexible in the sense that it now offers more modeling options.

Collapse

Zhang X, Lourenco D, Aguilar I, Legarra A, Misztal I. Weighting Strategies for Single-Step Genomic BLUP: An Iterative Approach for Accurate Calculation of GEBV and GWAS. Front Genet 2016;7:151. [PMID: 27594861 PMCID: PMC4990542 DOI: 10.3389/fgene.2016.00151] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Accepted: 08/04/2016] [Indexed: 01/16/2023] Open

Abstract

Genomic Best Linear Unbiased Predictor (GBLUP) assumes equal variance for all single nucleotide polymorphisms (SNP). When traits are influenced by major SNP, Bayesian methods have the advantage of SNP selection. To overcome the limitation of GBLUP, unequal variance or weights for all SNP are applied in a method called weighted GBLUP (WGBLUP). If only a fraction of animals is genotyped, single-step WGBLUP (WssGBLUP) can be used. Default weights in WGBLUP or WssGBLUP are obtained iteratively based on single SNP effect squared (u²) and/or heterozygosity. When the weights are optimal, prediction accuracy, and ability to detect major SNP are maximized. The objective was to develop optimal weights for WGBLUP-based methods. We evaluated 5 new procedures that accounted for locus-specific or windows-specific variance to maximize accuracy of predicting genomic estimated breeding value (GEBV) and SNP effect. Simulated datasets consisted of phenotypes for 13,000 animals, including 1540 animals genotyped for 45,000 SNP. Scenarios with 5, 100, and 500 simulated quantitative trait loci (QTL) were considered. The 5 new procedures for SNP weighting were: (1) u² plus a constant equal to the weight of the top SNP; (2) from a heavy-tailed distribution (similar to BayesA); (3) for every 20 SNP in a window along the whole genome, the largest effect (u²) among them; (4) the mean effect of every 20 SNP; and (5) the summation of every 20 SNP. Those methods were compared to the default WssGBLUP, GBLUP, BayesB, and BayesC. WssGBLUP methods were evaluated over 10 iterations. The accuracy of predicting GEBV was the correlation between true and estimated genomic breeding values for 300 genotyped animals from the last generation. The ability to detect the simulated QTL was also investigated. For most of the QTL scenarios, the accuracies obtained with all WssGBLUP procedures were higher compared to those from BayesB and BayesC, partly due to automatic inclusion of parent average in the former. Manhattan plots had higher resolution with 5 and 100 QTL. Using a common weight for a window of 20 SNP that sums or averages the SNP variance enhances accuracy of predicting GEBV and provides accurate estimation of marker effects.

Collapse

Calus MPL, Bouwman AC, Schrooten C, Veerkamp RF. Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection. Genet Sel Evol 2016;48:49. [PMID: 27357580 PMCID: PMC4926307 DOI: 10.1186/s12711-016-0225-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 06/16/2016] [Indexed: 12/22/2022] Open

Abstract

BACKGROUND

Use of whole-genome sequence data is expected to increase persistency of genomic prediction across generations and breeds but affects model performance and requires increased computing time. In this study, we investigated whether the split-and-merge Bayesian stochastic search variable selection (BSSVS) model could overcome these issues. BSSVS is performed first on subsets of sequence-based variants and then on a merged dataset containing variants selected in the first step.

RESULTS

We used a dataset that included 4,154,064 variants after editing and de-regressed proofs for 3415 reference and 2138 validation bulls for somatic cell score, protein yield and interval first to last insemination. In the first step, BSSVS was performed on 106 subsets each containing ~39,189 variants. In the second step, 1060 up to 472,492 variants, selected from the first step, were included to estimate the accuracy of genomic prediction. Accuracies were at best equal to those achieved with the commonly used Bovine 50k-SNP chip, although the number of variants within a few well-known quantitative trait loci regions was considerably enriched. When variant selection and the final genomic prediction were performed on the same data, predictions were biased. Predictions computed as the average of the predictions computed for each subset achieved the highest accuracies, i.e. 0.5 to 1.1 % higher than the accuracies obtained with the 50k-SNP chip, and yielded the least biased predictions. Finally, the accuracy of genomic predictions obtained when all sequence-based variants were included was similar or up to 1.4 % lower compared to that based on the average predictions across the subsets. By applying parallelization, the split-and-merge procedure was completed in 5 days, while the standard analysis including all sequence-based variants took more than three months.

CONCLUSIONS

The split-and-merge approach splits one large computational task into many much smaller ones, which allows the use of parallel processing and thus efficient genomic prediction based on whole-genome sequence data. The split-and-merge approach did not improve prediction accuracy, probably because we used data on a single breed for which relationships between individuals were high. Nevertheless, the split-and-merge approach may have potential for applications on data from multiple breeds.

Collapse

An Equation to Predict the Accuracy of Genomic Values by Combining Data from Multiple Traits, Populations, or Environments. Genetics 2015;202:799-823. [PMID: 26637542 DOI: 10.1534/genetics.115.183269] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 11/27/2015] [Indexed: 11/18/2022] Open