1
|
Ajasa AA, Boison SA, Gjøen HM, Lillehammer M. Accuracy of genomic prediction using multiple Atlantic salmon populations. Genet Sel Evol 2024; 56:38. [PMID: 38750427 PMCID: PMC11094890 DOI: 10.1186/s12711-024-00907-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 05/06/2024] [Indexed: 05/19/2024] Open
Abstract
BACKGROUND The accuracy of genomic prediction is partly determined by the size of the reference population. In Atlantic salmon breeding programs, four parallel populations often exist, thus offering the opportunity to increase the size of the reference set by combining these populations. By allowing a reduction in the number of records per population, multi-population prediction can potentially reduce cost and welfare issues related to the recording of traits, particularly for diseases. In this study, we evaluated the accuracy of multi- and across-population prediction of breeding values for resistance to amoebic gill disease (AGD) using all single nucleotide polymorphisms (SNPs) on a 55K chip or a selected subset of SNPs based on the signs of allele substitution effect estimates across populations, using both linear and nonlinear genomic prediction (GP) models in Atlantic salmon populations. In addition, we investigated genetic distance, genetic correlation estimated based on genomic relationships, and persistency of linkage disequilibrium (LD) phase across these populations. RESULTS The genetic distance between populations ranged from 0.03 to 0.07, while the genetic correlation ranged from 0.19 to 0.99. Nonetheless, compared to within-population prediction, there was limited or no impact of combining populations for multi-population prediction across the various models used or when using the selected subset of SNPs. The estimates of across-population prediction accuracy were low and to some extent proportional to the genetic correlation estimates. The persistency of LD phase between adjacent markers across populations using all SNP data ranged from 0.51 to 0.65, indicating that LD is poorly conserved across the studied populations. CONCLUSIONS Our results show that a high genetic correlation and a high genetic relationship between populations do not guarantee a higher prediction accuracy from multi-population genomic prediction in Atlantic salmon.
Collapse
Affiliation(s)
- Afees A Ajasa
- Nofima (Norwegian Institute of Food, Fisheries and Aquaculture Research), PO Box 210, 1431, Ås, Norway.
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, 1430, Ås, Norway.
| | | | - Hans M Gjøen
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, 1430, Ås, Norway
| | - Marie Lillehammer
- Nofima (Norwegian Institute of Food, Fisheries and Aquaculture Research), PO Box 210, 1431, Ås, Norway
| |
Collapse
|
2
|
Yin C, Shi H, Zhou P, Wang Y, Tao X, Yin Z, Zhang X, Liu Y. Genomic Prediction of Growth Traits in Yorkshire Pigs of Different Reference Group Sizes Using Different Estimated Breeding Value Models. Animals (Basel) 2024; 14:1098. [PMID: 38612337 PMCID: PMC11010886 DOI: 10.3390/ani14071098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 03/31/2024] [Accepted: 04/02/2024] [Indexed: 04/14/2024] Open
Abstract
The need for sufficient reference population data poses a significant challenge in breeding programs aimed at improving pig farming on a small to medium scale. To overcome this hurdle, investigating the advantages of combing reference populations of varying sizes is crucial for enhancing the accuracy of the genomic estimated breeding value (GEBV). Genomic selection (GS) in populations with limited reference data can be optimized by combining populations of the same breed or related breeds. This study focused on understanding the effect of combing different reference group sizes on the accuracy of GS for determining the growth effectiveness and percentage of lean meat in Yorkshire pigs. Specifically, our study investigated two important traits: the age at 100 kg live weight (AGE100) and the backfat thickness at 100 kg live weight (BF100). This research assessed the efficiency of genomic prediction (GP) using different GEBV models across three Yorkshire populations with varying genetic backgrounds. The GeneSeek 50K GGP porcine high-density array was used for genotyping. A total of 2295 Yorkshire pigs were included, representing three Yorkshire pig populations with different genetic backgrounds-295 from Danish (small) lines from Huaibei City, Anhui Province, 500 from Canadian (medium) lines from Lixin County, Anhui Province, and 1500 from American (large) lines from Shanghai. To evaluate the impact of different population combination scenarios on the GS accuracy, three approaches were explored: (1) combining all three populations for prediction, (2) combining two populations to predict the third, and (3) predicting each population independently. Five GEBV models, including three Bayesian models (BayesA, BayesB, and BayesC), the genomic best linear unbiased prediction (GBLUP) model, and single-step GBLUP (ssGBLUP) were implemented through 20 repetitions of five-fold cross-validation (CV). The results indicate that predicting one target population using the other two populations yielded the highest accuracy, providing a novel approach for improving the genomic selection accuracy in Yorkshire pigs. In this study, it was found that using different populations of the same breed to predict small- and medium-sized herds might be effective in improving the GEBV. This investigation highlights the significance of incorporating population combinations in genetic models for predicting the breeding value, particularly for pig farmers confronted with resource limitations.
Collapse
Affiliation(s)
- Chang Yin
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China; (C.Y.); (H.S.); (P.Z.); (Y.W.); (X.T.)
| | - Haoran Shi
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China; (C.Y.); (H.S.); (P.Z.); (Y.W.); (X.T.)
| | - Peng Zhou
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China; (C.Y.); (H.S.); (P.Z.); (Y.W.); (X.T.)
| | - Yuwei Wang
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China; (C.Y.); (H.S.); (P.Z.); (Y.W.); (X.T.)
| | - Xuzhe Tao
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China; (C.Y.); (H.S.); (P.Z.); (Y.W.); (X.T.)
| | - Zongjun Yin
- College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, China; (Z.Y.); (X.Z.)
| | - Xiaodong Zhang
- College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, China; (Z.Y.); (X.Z.)
| | - Yang Liu
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China; (C.Y.); (H.S.); (P.Z.); (Y.W.); (X.T.)
| |
Collapse
|
3
|
Cai W, Hu J, Fan W, Xu Y, Tang J, Xie M, Zhang Y, Guo Z, Zhou Z, Hou S. Strategies to improve genomic predictions for 35 duck carcass traits in an F 2 population. J Anim Sci Biotechnol 2023; 14:74. [PMID: 37147656 PMCID: PMC10163724 DOI: 10.1186/s40104-023-00875-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 04/02/2023] [Indexed: 05/07/2023] Open
Abstract
BACKGROUND Carcass traits are crucial for broiler ducks, but carcass traits can only be measured postmortem. Genomic selection (GS) is an effective approach in animal breeding to improve selection and reduce costs. However, the performance of genomic prediction in duck carcass traits remains largely unknown. RESULTS In this study, we estimated the genetic parameters, performed GS using different models and marker densities, and compared the estimation performance between GS and conventional BLUP on 35 carcass traits in an F2 population of ducks. Most of the cut weight traits and intestine length traits were estimated to be high and moderate heritabilities, respectively, while the heritabilities of percentage slaughter traits were dynamic. The reliability of genome prediction using GBLUP increased by an average of 0.06 compared to the conventional BLUP method. The Permutation studies revealed that 50K markers had achieved ideal prediction reliability, while 3K markers still achieved 90.7% predictive capability would further reduce the cost for duck carcass traits. The genomic relationship matrix normalized by our true variance method instead of the widely used [Formula: see text] could achieve an increase in prediction reliability in most traits. We detected most of the bayesian models had a better performance, especially for BayesN. Compared to GBLUP, BayesN can further improve the predictive reliability with an average of 0.06 for duck carcass traits. CONCLUSION This study demonstrates genomic selection for duck carcass traits is promising. The genomic prediction can be further improved by modifying the genomic relationship matrix using our proposed true variance method and several Bayesian models. Permutation study provides a theoretical basis for the fact that low-density arrays can be used to reduce genotype costs in duck genome selection.
Collapse
Affiliation(s)
- Wentao Cai
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Jian Hu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
- Shandong New Hope Liuhe Group Co., Ltd., Qingdao, 266108, China
| | - Wenlei Fan
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
- College of Animal Science and Technology, Qingdao Agricultural University, Qingdao, 266109, China
| | - Yaxi Xu
- College of Animal Science and Technology, Beijing University of Agriculture, Beijing, 102206, China
| | - Jing Tang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Ming Xie
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Yunsheng Zhang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Zhanbao Guo
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Zhengkui Zhou
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Shuisheng Hou
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| |
Collapse
|
4
|
Vu NT, Phuc TH, Nguyen NH, Van Sang N. Effects of common full-sib families on accuracy of genomic prediction for tagging weight in striped catfish Pangasianodon hypophthalmus. Front Genet 2023; 13:1081246. [PMID: 36685869 PMCID: PMC9845282 DOI: 10.3389/fgene.2022.1081246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 12/06/2022] [Indexed: 01/06/2023] Open
Abstract
Common full-sib families (c 2 ) make up a substantial proportion of total phenotypic variation in traits of commercial importance in aquaculture species and omission or inclusion of the c 2 resulted in possible changes in genetic parameter estimates and re-ranking of estimated breeding values. However, the impacts of common full-sib families on accuracy of genomic prediction for commercial traits of economic importance are not well known in many species, including aquatic animals. This research explored the impacts of common full-sib families on accuracy of genomic prediction for tagging weight in a population of striped catfish comprising 11,918 fish traced back to the base population (four generations), in which 560 individuals had genotype records of 14,154 SNPs. Our single step genomic best linear unbiased prediction (ssGLBUP) showed that the accuracy of genomic prediction for tagging weight was reduced by 96.5%-130.3% when the common full-sib families were included in statistical models. The reduction in the prediction accuracy was to a smaller extent in multivariate analysis than in univariate models. Imputation of missing genotypes somewhat reduced the upward biases in the prediction accuracy for tagging weight. It is therefore suggested that genomic evaluation models for traits recorded during the early phase of growth development should account for the common full-sib families to minimise possible biases in the accuracy of genomic prediction and hence, selection response.
Collapse
Affiliation(s)
- Nguyen Thanh Vu
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia,Center for Bio-Innovation, University of the Sunshine Coast, Maroochydore, QLD, Australia,Research Institute for Aquaculture No. 2, Ho Chi Minh City, Vietnam
| | - Tran Huu Phuc
- Research Institute for Aquaculture No. 2, Ho Chi Minh City, Vietnam
| | - Nguyen Hong Nguyen
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia,Center for Bio-Innovation, University of the Sunshine Coast, Maroochydore, QLD, Australia,*Correspondence: Nguyen Hong Nguyen, ; Nguyen Van Sang,
| | - Nguyen Van Sang
- Research Institute for Aquaculture No. 2, Ho Chi Minh City, Vietnam,*Correspondence: Nguyen Hong Nguyen, ; Nguyen Van Sang,
| |
Collapse
|
5
|
Nazzicari N, Biscarini F. Stacked kinship CNN vs. GBLUP for genomic predictions of additive and complex continuous phenotypes. Sci Rep 2022; 12:19889. [PMID: 36400808 PMCID: PMC9674857 DOI: 10.1038/s41598-022-24405-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022] Open
Abstract
Deep learning is impacting many fields of data science with often spectacular results. However, its application to whole-genome predictions in plant and animal science or in human biology has been rather limited, with mostly underwhelming results. While most works focus on exploring alternative network architectures, in this study we propose an innovative representation of marker genotype data and tested it against the GBLUP (Genomic BLUP) benchmark with linear and nonlinear phenotypes. From publicly available cattle SNP genotype data, different types of genomic kinship matrices are stacked together in a 3D pile from where 2D grayscale slices are extracted and fed to a deep convolutional neural network (DNN). We simulated nine phenotype scenarios with combinations of additivity, dominance and epistasis, and compared the DNN to GBLUP-A (computed using only the additive kinship matrix) and GBLUP-optim (additive, dominance, and epistasis kinship matrices, as needed). Results varied depending on the accuracy metric employed, with DNN performing better in terms of root mean squared error (1-12% lower than GBLUP-A; 1-9% lower than GBLUP-optim) but worse in terms of Pearson's correlation (0.505 for DNN compared to 0.672 and 0.669 of GBLUP-A and GBLUP-optim for fully additive case; 0.274 for DNN, 0.279 for GBLUP-A, and 0.477 for GBLUP-optim for fully dominant case). The proposed approach offers a basis to explore further the application of DNN to tabular data in whole-genome predictions.
Collapse
Affiliation(s)
- Nelson Nazzicari
- CREA Council for Agricultural Research and Analysis of Agricultural Economics, Research Centre for Animal Production and Aquaculture, Viale Piacenza 29, 26900 Lodi, Italy
| | - Filippo Biscarini
- grid.510304.3CNR: National Research Council, Institute of Agricultural Biology and Biotechnology, Via Bassini 15, Milan, 20133 Italy
| |
Collapse
|
6
|
Vela-Avitúa S, Thorland I, Bakopoulos V, Papanna K, Dimitroglou A, Kottaras E, Leonidas P, Guinand B, Tsigenopoulos CS, Aslam ML. Genetic Basis for Resistance Against Viral Nervous Necrosis: GWAS and Potential of Genomic Prediction Explored in Farmed European Sea Bass ( Dicentrarchus labrax). Front Genet 2022; 13:804584. [PMID: 35401661 PMCID: PMC8992836 DOI: 10.3389/fgene.2022.804584] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 02/22/2022] [Indexed: 11/13/2022] Open
Abstract
Viral nervous necrosis (VNN) is an infectious disease caused by the red-spotted grouper nervous necrosis virus (RGNNV) in European sea bass and is considered a serious concern for the aquaculture industry with fry and juveniles being highly susceptible. To understand the genetic basis for resistance against VNN, a survival phenotype through the challenge test against the RGNNV was recorded in populations from multiple year classes (YC2016 and YC2017). A total of 4,851 individuals from 181 families were tested, and a subset (n∼1,535) belonging to 122 families was genotyped using a ∼57K Affymetrix Axiom array. The survival against the RGNNV showed low to moderate heritability with observed scale estimates of 0.18 and 0.25 obtained using pedigree vs. genomic information, respectively. The genome-wide association analysis showed a strong signal of quantitative trait loci (QTL) at LG12 which explained ∼33% of the genetic variance. The QTL region contained multiple genes (ITPK1, PLK4, HSPA4L, REEP1, CHMP2, MRPL35, and SCUBE) with HSPA4L and/or REEP1 genes being highly relevant with a likely effect on host response in managing disease-associated symptoms. The results on the accuracy of predicting breeding values presented 20–43% advantage in accuracy using genomic over pedigree-based information which varied across model types and applied validation schemes.
Collapse
Affiliation(s)
- Sergio Vela-Avitúa
- Benchmark Genetics Norway AS (formerly Akvaforsk Genetics Center AS), Sunndalsøra, Norway
| | - Ingunn Thorland
- Benchmark Genetics Norway AS (formerly Akvaforsk Genetics Center AS), Sunndalsøra, Norway
| | - Vasileios Bakopoulos
- Laboratory of Ichthyology, Aquaculture and Diseases of Aquatic Animals, Department of Marine Sciences, University of The Aegean, Mytilene, Greece
| | | | | | | | | | - Bruno Guinand
- CNRS, IRD, EPHE, ISEM, Université de Montpellier, Montpellier, France
| | - Costas S Tsigenopoulos
- Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Institute of Marine Biology, Heraklion, Greece
| | | |
Collapse
|
7
|
Marjanovic J, Calus MPL. Factors affecting accuracy of estimated effective number of chromosome segments for numerically small breeds. J Anim Breed Genet 2021; 138:151-160. [PMID: 33040409 PMCID: PMC7891385 DOI: 10.1111/jbg.12512] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 08/25/2020] [Accepted: 09/12/2020] [Indexed: 11/28/2022]
Abstract
For numerically small breeds, obtaining a sufficiently large breed-specific reference population for genomic prediction is challenging or simply not possible, but may be overcome by adding individuals from another breed. To prioritize among available breeds, the effective number of chromosome segments (Me ) can be used as an indicator of relatedness between individuals from different breeds. The Me is also an important parameter in determining the accuracy of genomic prediction. The Me can be estimated both within a population and between two populations or breeds, as the reciprocal of the variance of genomic relationships. However, the threshold for number of individuals needed to accurately estimate within or between populations Me is currently unknown. It is also unknown if a discrepancy in number of genotyped individuals in two breeds affects the estimates of Me between populations. In this study, we conducted a simulation that mimics current domestic cattle populations in order to investigate how estimated Me is affected by number of genotyped individuals, single-nucleotide polymorphism (SNP) density and pedigree availability. Our results show that a small sample of 10 genotyped individuals may result in substantial over or underestimation of Me . While estimates of within population Me were hardly affected by SNP density, between population Me values were highly dependent on the number of available SNPs, with higher SNP densities being able to detect more independent chromosome segments. When subtracting pedigree from genomic relationships before computing Me , estimates of within population Me were three to four times higher than estimates with genotypes only; however, between Me estimates remained the same. For accurate estimation of within and between population Me , at least 50 individuals should be genotyped per population. Estimates of within Me were highly affected by whether pedigree was used or not. For within Me , even the smallest SNP density (~11k) resulted in accurate representation of family relationships in the population; however, for between Me , many more markers are needed to capture all independent segments.
Collapse
Affiliation(s)
- Jovana Marjanovic
- Animal Breeding and GenomicsWageningen University & ResearchWageningenThe Netherlands
| | - Mario P. L. Calus
- Animal Breeding and GenomicsWageningen University & ResearchWageningenThe Netherlands
| |
Collapse
|
8
|
Abstract
A suitable pairwise relatedness estimation is key to genetic studies. Several methods are proposed to compute relatedness in autopolyploids based on molecular data. However, unlike diploids, autopolyploids still need further studies considering scenarios with many linked molecular markers with known dosage. In this study, we provide guidelines for plant geneticists and breeders to access trustworthy pairwise relatedness estimates. To this end, we simulated populations considering different ploidy levels, meiotic pairings patterns, number of loci and alleles, and inbreeding levels. Analysis were performed to access the accuracy of distinct methods and to demonstrate the usefulness of molecular marker in practical situations. Overall, our results suggest that at least 100 effective biallelic molecular markers are required to have good pairwise relatedness estimation if methods based on correlation is used. For this number of loci, current methods based on multiallelic markers show lower performance than biallelic ones. To estimate relatedness in cases of inbreeding or close relationships (as parent-offspring, full-sibs, or half-sibs) is more challenging. Methods to estimate pairwise relatedness based on molecular markers, for different ploidy levels or pedigrees were implemented in the AGHmatrix R package.
Collapse
|
9
|
Ren D, An L, Li B, Qiao L, Liu W. Efficient weighting methods for genomic best linear-unbiased prediction (BLUP) adapted to the genetic architectures of quantitative traits. Heredity (Edinb) 2020; 126:320-334. [PMID: 32980863 DOI: 10.1038/s41437-020-00372-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 09/12/2020] [Accepted: 09/13/2020] [Indexed: 11/09/2022] Open
Abstract
Genomic best linear-unbiased prediction (GBLUP) assumes equal variance for all marker effects, which is suitable for traits that conform to the infinitesimal model. For traits controlled by major genes, Bayesian methods with shrinkage priors or genome-wide association study (GWAS) methods can be used to identify causal variants effectively. The information from Bayesian/GWAS methods can be used to construct the weighted genomic relationship matrix (G). However, it remains unclear which methods perform best for traits varying in genetic architecture. Therefore, we developed several methods to optimize the performance of weighted GBLUP and compare them with other available methods using simulated and real data sets. First, two types of methods (marker effects with local shrinkage or normal prior) were used to obtain test statistics and estimates for each marker effect. Second, three weighted G matrices were constructed based on the marker information from the first step: (1) the genomic-feature-weighted G, (2) the estimated marker-variance-weighted G, and (3) the absolute value of the estimated marker-effect-weighted G. Following the above process, six different weighted GBLUP methods (local shrinkage/normal-prior GF/EV/AEWGBLUP) were proposed for genomic prediction. Analyses with both simulated and real data demonstrated that these options offer flexibility for optimizing the weighted GBLUP for traits with a broad spectrum of genetic architectures. The advantage of weighting methods over GBLUP in terms of accuracy was trait dependant, ranging from 14.8% to marginal for simulated traits and from 44% to marginal for real traits. Local-shrinkage prior EVWGBLUP is superior for traits mainly controlled by loci of a large effect. Normal-prior AEWGBLUP performs well for traits mainly controlled by loci of moderate effect. For traits controlled by some loci with large effects (explain 25-50% genetic variance) and a range of loci with small effects, GFWGBLUP has advantages. In conclusion, the optimal weighted GBLUP method for genomic selection should take both the genetic architecture and number of QTLs of traits into consideration carefully.
Collapse
Affiliation(s)
- Duanyang Ren
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Taigu, China
| | - Lixia An
- College of Information, Shanxi Agricultural University, Taigu, China
| | - Baojun Li
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Taigu, China
| | - Liying Qiao
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Taigu, China
| | - Wenzhong Liu
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Taigu, China.
| |
Collapse
|
10
|
Ye S, Song H, Ding X, Zhang Z, Li J. Pre-selecting markers based on fixation index scores improved the power of genomic evaluations in a combined Yorkshire pig population. Animal 2020; 14:1555-1564. [PMID: 32209149 DOI: 10.1017/s1751731120000506] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Combining different swine populations in genomic prediction can be an important tool, leading to an increased accuracy of genomic prediction using single nucleotide polymorphism (SNP) chip data compared with within-population genomic. However, the expected higher accuracy of multi-population genomic prediction has not been realized. This may be due to an inconsistent linkage disequilibrium (LD) between SNPs and quantitative trait loci (QTL) across populations, and the weak genetic relationships across populations. In this study, we determined the impact of different genomic relationship matrices, SNP density and pre-selected variants on prediction accuracy using a combined Yorkshire pig population. Our objective was to provide useful strategies for improving the accuracy of genomic prediction within a combined population. Results showed that the accuracy of genomic best linear unbiased prediction (GBLUP) using imputed whole-genome sequencing (WGS) data in the combined population was always higher than that within populations. Furthermore, the use of imputed WGS data always resulted in a higher accuracy of GBLUP than the use of 80K chip data for the combined population. Additionally, the accuracy of GBLUP with a non-linear genomic relationship matrix was markedly increased (0.87% to 15.17% for 80K chip data, and 0.43% to 4.01% for imputed WGS data) compared with that obtained with a linear genomic relationship matrix, except for the prediction of XD population in the combined population using imputed WGS data. More importantly, the application of pre-selected variants based on fixation index (Fst) scores improved the accuracy of multi-population genomic prediction, especially for 80K chip data. For BLUP|GA (BLUP approach given the genetic architecture), the use of a linear method with an appropriate weight to build a weight-relatedness matrix led to a higher prediction accuracy compared with the use of only pre-selected SNPs for genomic evaluations, especially for the total number of piglets born. However, for the non-linear method, BLUP|GA showed only a small increase or even a decrease in prediction accuracy compared with the use of only pre-selected SNPs. Overall, the best genomic evaluation strategy for reproduction-related traits for a combined population was found to be GBLUP performed with a non-linear genomic relationship matrix using variants pre-selected from the 80K chip data based on Fst scores.
Collapse
Affiliation(s)
- S Ye
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, No. 483, Wushan Road, Tianhe District, 510642Guangzhou, China
| | - H Song
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, No. 2, Yuanmingyuan West Road, Haidian District, 100193Beijing, China
| | - X Ding
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, No. 2, Yuanmingyuan West Road, Haidian District, 100193Beijing, China
| | - Z Zhang
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, No. 483, Wushan Road, Tianhe District, 510642Guangzhou, China
| | - J Li
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, No. 483, Wushan Road, Tianhe District, 510642Guangzhou, China
| |
Collapse
|
11
|
van den Berg I, Meuwissen THE, MacLeod IM, Goddard ME. Predicting the effect of reference population on the accuracy of within, across, and multibreed genomic prediction. J Dairy Sci 2019; 102:3155-3174. [PMID: 30738664 DOI: 10.3168/jds.2018-15231] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 12/08/2018] [Indexed: 01/24/2023]
Abstract
Genomic prediction is widely used to select candidates for breeding. Size and composition of the reference population are important factors influencing prediction accuracy. In Holstein dairy cattle, large reference populations are used, but this is difficult to achieve in numerically small breeds and for traits that are not routinely recorded. The prediction accuracy is usually estimated using cross-validation, requiring the full data set. It would be useful to have a method to predict the benefit of multibreed reference populations that does not require the availability of the full data set. Our objective was to study the effect of the size and breed composition of the reference population on the accuracy of genomic prediction using genomic BLUP and Bayes R. We also examined the effect of trait heritability and validation breed on prediction accuracy. Using these empirical results, we investigated the use of a formula to predict the effect of the size and composition of the reference population on the accuracy of genomic prediction. Phenotypes were simulated in a data set containing real genotypes of imputed sequence variants for 22,752 dairy bulls and cows, including Holstein, Jersey, Red Holstein, and Australian Red cattle. Different reference populations were constructed, varying in size and composition, to study within-breed, multibreed, and across-breed prediction. Phenotypes were simulated varying in heritability, number of chromosomes, and number of quantitative trait loci. Genomic prediction was carried out using genomic BLUP and Bayes R. We used either the genomic relationship matrix (GRM) to estimate the number of independent chromosomal segments and subsequently to predict accuracy, or the accuracies obtained from single-breed reference populations to predict the accuracies of larger or multibreed reference populations. Using the GRM overestimated the accuracy; this overestimation was likely due to close relationships among some of the reference animals. Consequently, the GRM could not be used to predict the accuracy of genomic prediction reliably. However, a method using the prediction accuracies obtained by cross-validation using a small, single-breed reference population predicted the accuracy using a multibreed reference population well and slightly overestimated the accuracy for a larger reference population of the same breed, but gave a reasonably close estimate of the accuracy for a multibreed reference population. This method could be useful for making decisions regarding the size and composition of the reference population.
Collapse
Affiliation(s)
- I van den Berg
- Faculty of Veterinary & Agricultural Science, University of Melbourne, 3010 Parkville, Victoria, Australia; Agriculture Victoria, AgriBio, Centre for AgriBioscience, 3083 Bundoora, Victoria, Australia.
| | - T H E Meuwissen
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, 1432 Ås, Norway
| | - I M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 3083 Bundoora, Victoria, Australia
| | - M E Goddard
- Faculty of Veterinary & Agricultural Science, University of Melbourne, 3010 Parkville, Victoria, Australia; Agriculture Victoria, AgriBio, Centre for AgriBioscience, 3083 Bundoora, Victoria, Australia
| |
Collapse
|
12
|
Zhang C, Kemp RA, Stothard P, Wang Z, Boddicker N, Krivushin K, Dekkers J, Plastow G. Genomic evaluation of feed efficiency component traits in Duroc pigs using 80K, 650K and whole-genome sequence variants. Genet Sel Evol 2018; 50:14. [PMID: 29625549 PMCID: PMC5889553 DOI: 10.1186/s12711-018-0387-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2017] [Accepted: 03/27/2018] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Increasing marker density was proposed to have potential to improve the accuracy of genomic prediction for quantitative traits; whole-sequence data is expected to give the best accuracy of prediction, since all causal mutations that underlie a trait are expected to be included. However, in cattle and chicken, this assumption is not supported by empirical studies. Our objective was to compare the accuracy of genomic prediction of feed efficiency component traits in Duroc pigs using single nucleotide polymorphism (SNP) panels of 80K, imputed 650K, and whole-genome sequence variants using GBLUP, BayesB and BayesRC methods, with the ultimate purpose to determine the optimal method to increase genetic gain for feed efficiency in pigs. RESULTS Phenotypes of average daily feed intake (ADFI), average daily gain (ADG), ultrasound backfat depth (FAT), and loin muscle depth (LMD) were available for 1363 Duroc boars from a commercial breeding program. Genotype imputation accuracies reached 92.1% from 80K to 650K and 85.6% from 650K to whole-genome sequence variants. Average accuracies across methods and marker densities of genomic prediction of ADFI, FAT, LMD and ADG were 0.40, 0.65, 0.30 and 0.15, respectively. For ADFI and FAT, BayesB outperformed GBLUP, but increasing marker density had little advantage for genomic prediction. For ADG and LMD, GBLUP outperformed BayesB, while BayesRC based on whole-genome sequence data gave the best accuracies and reached up to 0.35 for LMD and 0.25 for ADG. CONCLUSIONS Use of genomic information was beneficial for prediction of ADFI and FAT but not for that of ADG and LMD compared to pedigree-based estimates. BayesB based on 80K SNPs gave the best genomic prediction accuracy for ADFI and FAT, while BayesRC based on whole-genome sequence data performed best for ADG and LMD. We suggest that these differences between traits in the effect of marker density and method on accuracy of genomic prediction are mainly due to the underlying genetic architecture of the traits.
Collapse
Affiliation(s)
- Chunyan Zhang
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | | | - Paul Stothard
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Zhiquan Wang
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | | | - Kirill Krivushin
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Jack Dekkers
- Department of Animal Science, Iowa State University, Ames, IA, 50011, USA
| | - Graham Plastow
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada.
| |
Collapse
|
13
|
Calus MPL, Goddard ME, Wientjes YCJ, Bowman PJ, Hayes BJ. Multibreed genomic prediction using multitrait genomic residual maximum likelihood and multitask Bayesian variable selection. J Dairy Sci 2018; 101:4279-4294. [PMID: 29550121 DOI: 10.3168/jds.2017-13366] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 01/04/2018] [Indexed: 11/19/2022]
Abstract
Genomic prediction is applicable to individuals of different breeds. Empirical results to date, however, show limited benefits in using information on multiple breeds in the context of genomic prediction. We investigated a multitask Bayesian model, presented previously by others, implemented in a Bayesian stochastic search variable selection (BSSVS) model. This model allowed for evidence of quantitative trait loci (QTL) to be accumulated across breeds or for both QTL that segregate across breeds and breed-specific QTL. In both cases, single nucleotide polymorphism effects were estimated with information from a single breed. Other models considered were a single-trait and multitrait genomic residual maximum likelihood (GREML) model, with breeds considered as different traits, and a single-trait BSSVS model. All single-trait models were applied to each of the 2 breeds separately and to the pooled data of both breeds. The data used included a training data set of 6,278 Holstein and 722 Jersey bulls, as well as 374 Jersey validation bulls. All animals had genotypes for 474,773 single nucleotide polymorphisms after editing and phenotypes for milk, fat, and protein yields. Using the same training data, BSSVS consistently outperformed GREML. The multitask BSSVS, however, did not outperform single-trait BSSVS, which used pooled Holstein and Jersey data for training. Thus, the rigorous assumption that the traits are the same in both breeds yielded a slightly better prediction than a model that had to estimate the correlation between the breeds from the data. Adding the Holstein data significantly increased the accuracy of the single-trait GREML and BSSVS in predicting the Jerseys for milk and protein, in line with estimated correlations between the breeds of 0.66 and 0.47 for milk and protein yields, whereas only the BSSVS model significantly improved the accuracy for fat yield with an estimated correlation between breeds of only 0.05. The relatively high genetic correlations for milk and protein yields, and the superiority of the pooling strategy, is likely the result of the observed admixture between both breeds in our data. The Bayesian model was able to detect several QTL in Holsteins, which likely enabled it to outperform GREML. The inability of the multitask Bayesian models to outperform a simple pooling strategy may be explained by the fact that the pooling strategy assumes equal effects in both breeds; furthermore, this assumption may be valid for moderate- to large-sized QTL, which are important for multibreed genomic prediction.
Collapse
Affiliation(s)
- M P L Calus
- Wageningen University & Research, Animal Breeding and Genomics, PO Box 338, 6700 AH Wageningen, the Netherlands.
| | - M E Goddard
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Melbourne, Victoria 3010, Australia; Agriculture Research, Department of Economic Development, Jobs, Transport and Resources, Melbourne, Victoria 3083, Australia
| | - Y C J Wientjes
- Wageningen University & Research, Animal Breeding and Genomics, PO Box 338, 6700 AH Wageningen, the Netherlands
| | - P J Bowman
- Agriculture Research, Department of Economic Development, Jobs, Transport and Resources, Melbourne, Victoria 3083, Australia; School of Applied Systems Biology, La Trobe University, Bundoora, Victoria 3083, Australia
| | - B J Hayes
- School of Applied Systems Biology, La Trobe University, Bundoora, Victoria 3083, Australia; Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, The University of Queensland, St. Lucia, Queensland 4072, Australia
| |
Collapse
|
14
|
Joint genome-wide prediction in several populations accounting for randomness of genotypes: A hierarchical Bayes approach. I: Multivariate Gaussian priors for marker effects and derivation of the joint probability mass function of genotypes. J Theor Biol 2017; 417:8-19. [PMID: 28043819 DOI: 10.1016/j.jtbi.2016.12.020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 11/30/2016] [Accepted: 12/28/2016] [Indexed: 11/23/2022]
Abstract
It is important to consider heterogeneity of marker effects and allelic frequencies in across population genome-wide prediction studies. Moreover, all regression models used in genome-wide prediction overlook randomness of genotypes. In this study, a family of hierarchical Bayesian models to perform across population genome-wide prediction modeling genotypes as random variables and allowing population-specific effects for each marker was developed. Models shared a common structure and differed in the priors used and the assumption about residual variances (homogeneous or heterogeneous). Randomness of genotypes was accounted for by deriving the joint probability mass function of marker genotypes conditional on allelic frequencies and pedigree information. As a consequence, these models incorporated kinship and genotypic information that not only permitted to account for heterogeneity of allelic frequencies, but also to include individuals with missing genotypes at some or all loci without the need for previous imputation. This was possible because the non-observed fraction of the design matrix was treated as an unknown model parameter. For each model, a simpler version ignoring population structure, but still accounting for randomness of genotypes was proposed. Implementation of these models and computation of some criteria for model comparison were illustrated using two simulated datasets. Theoretical and computational issues along with possible applications, extensions and refinements were discussed. Some features of the models developed in this study make them promising for genome-wide prediction, the use of information contained in the probability distribution of genotypes is perhaps the most appealing. Further studies to assess the performance of the models proposed here and also to compare them with conventional models used in genome-wide prediction are needed.
Collapse
|
15
|
Joint genome-wide prediction in several populations accounting for randomness of genotypes: A hierarchical Bayes approach. II: Multivariate spike and slab priors for marker effects and derivation of approximate Bayes and fractional Bayes factors for the complete family of models. J Theor Biol 2017; 417:131-141. [PMID: 28088357 DOI: 10.1016/j.jtbi.2016.12.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 11/30/2016] [Accepted: 12/28/2016] [Indexed: 11/22/2022]
Abstract
This study corresponds to the second part of a companion paper devoted to the development of Bayesian multiple regression models accounting for randomness of genotypes in across population genome-wide prediction. This family of models considers heterogeneous and correlated marker effects and allelic frequencies across populations, and has the ability of considering records from non-genotyped individuals and individuals with missing genotypes in any subset of loci without the need for previous imputation, taking into account uncertainty about imputed genotypes. This paper extends this family of models by considering multivariate spike and slab conditional priors for marker allele substitution effects and contains derivations of approximate Bayes factors and fractional Bayes factors to compare models from part I and those developed here with their null versions. These null versions correspond to simpler models ignoring heterogeneity of populations, but still accounting for randomness of genotypes. For each marker loci, the spike component of priors corresponded to point mass at 0 in RS, where S is the number of populations, and the slab component was a S-variate Gaussian distribution, independent conditional priors were assumed. For the Gaussian components, covariance matrices were assumed to be either the same for all markers or different for each marker. For null models, the priors were simply univariate versions of these finite mixture distributions. Approximate algebraic expressions for Bayes factors and fractional Bayes factors were found using the Laplace approximation. Using the simulated datasets described in part I, these models were implemented and compared with models derived in part I using measures of predictive performance based on squared Pearson correlations, Deviance Information Criterion, Bayes factors, and fractional Bayes factors. The extensions presented here enlarge our family of genome-wide prediction models making it more flexible in the sense that it now offers more modeling options.
Collapse
|
16
|
Zhang X, Lourenco D, Aguilar I, Legarra A, Misztal I. Weighting Strategies for Single-Step Genomic BLUP: An Iterative Approach for Accurate Calculation of GEBV and GWAS. Front Genet 2016; 7:151. [PMID: 27594861 PMCID: PMC4990542 DOI: 10.3389/fgene.2016.00151] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Accepted: 08/04/2016] [Indexed: 01/16/2023] Open
Abstract
Genomic Best Linear Unbiased Predictor (GBLUP) assumes equal variance for all single nucleotide polymorphisms (SNP). When traits are influenced by major SNP, Bayesian methods have the advantage of SNP selection. To overcome the limitation of GBLUP, unequal variance or weights for all SNP are applied in a method called weighted GBLUP (WGBLUP). If only a fraction of animals is genotyped, single-step WGBLUP (WssGBLUP) can be used. Default weights in WGBLUP or WssGBLUP are obtained iteratively based on single SNP effect squared (u2) and/or heterozygosity. When the weights are optimal, prediction accuracy, and ability to detect major SNP are maximized. The objective was to develop optimal weights for WGBLUP-based methods. We evaluated 5 new procedures that accounted for locus-specific or windows-specific variance to maximize accuracy of predicting genomic estimated breeding value (GEBV) and SNP effect. Simulated datasets consisted of phenotypes for 13,000 animals, including 1540 animals genotyped for 45,000 SNP. Scenarios with 5, 100, and 500 simulated quantitative trait loci (QTL) were considered. The 5 new procedures for SNP weighting were: (1) u2 plus a constant equal to the weight of the top SNP; (2) from a heavy-tailed distribution (similar to BayesA); (3) for every 20 SNP in a window along the whole genome, the largest effect (u2) among them; (4) the mean effect of every 20 SNP; and (5) the summation of every 20 SNP. Those methods were compared to the default WssGBLUP, GBLUP, BayesB, and BayesC. WssGBLUP methods were evaluated over 10 iterations. The accuracy of predicting GEBV was the correlation between true and estimated genomic breeding values for 300 genotyped animals from the last generation. The ability to detect the simulated QTL was also investigated. For most of the QTL scenarios, the accuracies obtained with all WssGBLUP procedures were higher compared to those from BayesB and BayesC, partly due to automatic inclusion of parent average in the former. Manhattan plots had higher resolution with 5 and 100 QTL. Using a common weight for a window of 20 SNP that sums or averages the SNP variance enhances accuracy of predicting GEBV and provides accurate estimation of marker effects.
Collapse
Affiliation(s)
- Xinyue Zhang
- Animal and Dairy Science, Animal Breeding and Genetics, University of Georgia Athens, GA, USA
| | - Daniela Lourenco
- Animal and Dairy Science, Animal Breeding and Genetics, University of Georgia Athens, GA, USA
| | - Ignacio Aguilar
- National Agricultural Research Institute Las Brujas, Uruguay
| | - Andres Legarra
- Institut National de la Recherche Agronomique, UMR1388 GenPhySE Castanet-Tolosan, France
| | - Ignacy Misztal
- Animal and Dairy Science, Animal Breeding and Genetics, University of Georgia Athens, GA, USA
| |
Collapse
|
17
|
Calus MPL, Bouwman AC, Schrooten C, Veerkamp RF. Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection. Genet Sel Evol 2016; 48:49. [PMID: 27357580 PMCID: PMC4926307 DOI: 10.1186/s12711-016-0225-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 06/16/2016] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Use of whole-genome sequence data is expected to increase persistency of genomic prediction across generations and breeds but affects model performance and requires increased computing time. In this study, we investigated whether the split-and-merge Bayesian stochastic search variable selection (BSSVS) model could overcome these issues. BSSVS is performed first on subsets of sequence-based variants and then on a merged dataset containing variants selected in the first step. RESULTS We used a dataset that included 4,154,064 variants after editing and de-regressed proofs for 3415 reference and 2138 validation bulls for somatic cell score, protein yield and interval first to last insemination. In the first step, BSSVS was performed on 106 subsets each containing ~39,189 variants. In the second step, 1060 up to 472,492 variants, selected from the first step, were included to estimate the accuracy of genomic prediction. Accuracies were at best equal to those achieved with the commonly used Bovine 50k-SNP chip, although the number of variants within a few well-known quantitative trait loci regions was considerably enriched. When variant selection and the final genomic prediction were performed on the same data, predictions were biased. Predictions computed as the average of the predictions computed for each subset achieved the highest accuracies, i.e. 0.5 to 1.1 % higher than the accuracies obtained with the 50k-SNP chip, and yielded the least biased predictions. Finally, the accuracy of genomic predictions obtained when all sequence-based variants were included was similar or up to 1.4 % lower compared to that based on the average predictions across the subsets. By applying parallelization, the split-and-merge procedure was completed in 5 days, while the standard analysis including all sequence-based variants took more than three months. CONCLUSIONS The split-and-merge approach splits one large computational task into many much smaller ones, which allows the use of parallel processing and thus efficient genomic prediction based on whole-genome sequence data. The split-and-merge approach did not improve prediction accuracy, probably because we used data on a single breed for which relationships between individuals were high. Nevertheless, the split-and-merge approach may have potential for applications on data from multiple breeds.
Collapse
Affiliation(s)
- Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, PO Box 338, 6700 AH, Wageningen, The Netherlands.
| | - Aniek C Bouwman
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, PO Box 338, 6700 AH, Wageningen, The Netherlands
| | | | - Roel F Veerkamp
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, PO Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
18
|
An Equation to Predict the Accuracy of Genomic Values by Combining Data from Multiple Traits, Populations, or Environments. Genetics 2015; 202:799-823. [PMID: 26637542 DOI: 10.1534/genetics.115.183269] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 11/27/2015] [Indexed: 11/18/2022] Open
Abstract
Predicting the accuracy of estimated genomic values using genome-wide marker information is an important step in designing training populations. Currently, different deterministic equations are available to predict accuracy within populations, but not for multipopulation scenarios where data from multiple breeds, lines or environments are combined. Therefore, our objective was to develop and validate a deterministic equation to predict the accuracy of genomic values when different populations are combined in one training population. The input parameters of the derived prediction equation are the number of individuals and the heritability from each of the populations in the training population; the genetic correlations between the populations, i.e., the correlation between allele substitution effects of quantitative trait loci; the effective number of chromosome segments across predicted and training populations; and the proportion of the genetic variance in the predicted population captured by the markers in each of the training populations. Validation was performed based on real genotype information of 1033 Holstein-Friesian cows that were divided into three different populations by combining half-sib families in the same population. Phenotypes were simulated for multiple scenarios, differing in heritability within populations and in genetic correlations between the populations. Results showed that the derived equation can accurately predict the accuracy of estimating genomic values for different scenarios of multipopulation genomic prediction. Therefore, the derived equation can be used to investigate the potential accuracy of different multipopulation genomic prediction scenarios and to decide on the most optimal design of training populations.
Collapse
|