1
|
Jighly A. Boosting genome-wide association power and genomic prediction accuracy for date palm fruit traits with advanced statistics. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2024; 344:112110. [PMID: 38704095 DOI: 10.1016/j.plantsci.2024.112110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 03/05/2024] [Accepted: 04/30/2024] [Indexed: 05/06/2024]
Abstract
The date palm is economically vital in the Middle East and North Africa, providing essential fibres, vitamins, and carbohydrates. Understanding the genetic architecture of its traits remains complex due to the tree's perennial nature and long generation times. This study aims to address these complexities by employing advanced genome-wide association (GWAS) and genomic prediction models using previously published data involving fruit acid content, sugar content, dimension, and colour traits. The multivariate GWAS model identified seven QTL, including five novel associations, that shed light on the genetic control of these traits. Furthermore, the research evaluates different genomic prediction models that considered genotype by environment and genotype by trait interactions. While colour- traits demonstrate strong predictive power, other traits display moderate accuracies across different models and scenarios aligned with the expectations when using small reference populations. When designing the cross-validation to predict new individuals, the accuracy of the best multi-trait model was significantly higher than all single-trait models for dimension traits, but not for the remaining traits, which showed similar performances. However, the cross-validation strategy that masked random phenotypic records (i.e., mimicking the unbalanced phenotypic records) showed significantly higher accuracy for all traits except acid contents. The findings underscore the importance of understanding genetic architecture for informed breeding strategies. The research emphasises the need for larger population sizes and multivariate models to enhance gene tagging power and predictive accuracy to advance date palm breeding programs. These findings support more targeted breeding in date palm, improving productivity and resilience to various environments.
Collapse
|
2
|
Wang X, Zhang Z, Du H, Pfeiffer C, Mészáros G, Ding X. Predictive ability of multi-population genomic prediction methods of phenotypes for reproduction traits in Chinese and Austrian pigs. Genet Sel Evol 2024; 56:49. [PMID: 38926647 PMCID: PMC11201905 DOI: 10.1186/s12711-024-00915-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 05/30/2024] [Indexed: 06/28/2024] Open
Abstract
BACKGROUND Multi-population genomic prediction can rapidly expand the size of the reference population and improve genomic prediction ability. Machine learning (ML) algorithms have shown advantages in single-population genomic prediction of phenotypes. However, few studies have explored the effectiveness of ML methods for multi-population genomic prediction. RESULTS In this study, 3720 Yorkshire pigs from Austria and four breeding farms in China were used, and single-trait genomic best linear unbiased prediction (ST-GBLUP), multitrait GBLUP (MT-GBLUP), Bayesian Horseshoe (BayesHE), and three ML methods (support vector regression (SVR), kernel ridge regression (KRR) and AdaBoost.R2) were compared to explore the optimal method for joint genomic prediction of phenotypes of Chinese and Austrian pigs through 10 replicates of fivefold cross-validation. In this study, we tested the performance of different methods in two scenarios: (i) including only one Austrian population and one Chinese pig population that were genetically linked based on principal component analysis (PCA) (designated as the "two-population scenario") and (ii) adding reference populations that are unrelated based on PCA to the above two populations (designated as the "multi-population scenario"). Our results show that, the use of MT-GBLUP in the two-population scenario resulted in an improvement of 7.1% in predictive ability compared to ST-GBLUP, while the use of SVR and KKR yielded improvements in predictive ability of 4.5 and 5.3%, respectively, compared to MT-GBLUP. SVR and KRR also yielded lower mean square errors (MSE) in most population and trait combinations. In the multi-population scenario, improvements in predictive ability of 29.7, 24.4 and 11.1% were obtained compared to ST-GBLUP when using, respectively, SVR, KRR, and AdaBoost.R2. However, compared to MT-GBLUP, the potential of ML methods to improve predictive ability was not demonstrated. CONCLUSIONS Our study demonstrates that ML algorithms can achieve better prediction performance than multitrait GBLUP models in multi-population genomic prediction of phenotypes when the populations have similar genetic backgrounds; however, when reference populations that are unrelated based on PCA are added, the ML methods did not show a benefit. When the number of populations increased, only MT-GBLUP improved predictive ability in both validation populations, while the other methods showed improvement in only one population.
Collapse
Affiliation(s)
- Xue Wang
- State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Zipeng Zhang
- State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Hehe Du
- State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | | | - Gábor Mészáros
- University of Natural Resources and Life Sciences, Vienna, Austria
| | - Xiangdong Ding
- State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China.
| |
Collapse
|
3
|
Ajasa AA, Boison SA, Gjøen HM, Lillehammer M. Accuracy of genomic prediction using multiple Atlantic salmon populations. Genet Sel Evol 2024; 56:38. [PMID: 38750427 PMCID: PMC11094890 DOI: 10.1186/s12711-024-00907-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 05/06/2024] [Indexed: 05/19/2024] Open
Abstract
BACKGROUND The accuracy of genomic prediction is partly determined by the size of the reference population. In Atlantic salmon breeding programs, four parallel populations often exist, thus offering the opportunity to increase the size of the reference set by combining these populations. By allowing a reduction in the number of records per population, multi-population prediction can potentially reduce cost and welfare issues related to the recording of traits, particularly for diseases. In this study, we evaluated the accuracy of multi- and across-population prediction of breeding values for resistance to amoebic gill disease (AGD) using all single nucleotide polymorphisms (SNPs) on a 55K chip or a selected subset of SNPs based on the signs of allele substitution effect estimates across populations, using both linear and nonlinear genomic prediction (GP) models in Atlantic salmon populations. In addition, we investigated genetic distance, genetic correlation estimated based on genomic relationships, and persistency of linkage disequilibrium (LD) phase across these populations. RESULTS The genetic distance between populations ranged from 0.03 to 0.07, while the genetic correlation ranged from 0.19 to 0.99. Nonetheless, compared to within-population prediction, there was limited or no impact of combining populations for multi-population prediction across the various models used or when using the selected subset of SNPs. The estimates of across-population prediction accuracy were low and to some extent proportional to the genetic correlation estimates. The persistency of LD phase between adjacent markers across populations using all SNP data ranged from 0.51 to 0.65, indicating that LD is poorly conserved across the studied populations. CONCLUSIONS Our results show that a high genetic correlation and a high genetic relationship between populations do not guarantee a higher prediction accuracy from multi-population genomic prediction in Atlantic salmon.
Collapse
Affiliation(s)
- Afees A Ajasa
- Nofima (Norwegian Institute of Food, Fisheries and Aquaculture Research), PO Box 210, 1431, Ås, Norway.
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, 1430, Ås, Norway.
| | | | - Hans M Gjøen
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, 1430, Ås, Norway
| | - Marie Lillehammer
- Nofima (Norwegian Institute of Food, Fisheries and Aquaculture Research), PO Box 210, 1431, Ås, Norway
| |
Collapse
|
4
|
Duenk P, Wientjes YCJ, Bijma P, Iversen MW, Lopes MS, Calus MPL. Predicting the impact of genotype-by-genotype interaction on the purebred-crossbred genetic correlation from phenotype and genotype marker data of parental lines. Genet Sel Evol 2023; 55:2. [PMID: 36639760 PMCID: PMC9837999 DOI: 10.1186/s12711-022-00773-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 12/14/2022] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND The genetic correlation between purebred (PB) and crossbred (CB) performances ([Formula: see text]) partially determines the response in CB when selection is on PB performance in the parental lines. An earlier study has derived expressions for an upper and lower bound of [Formula: see text], using the variance components of the parental purebred lines, including e.g. the additive genetic variance in the sire line for the trait expressed in one of the dam lines. How to estimate these variance components is not obvious, because animals from one parental line do not have phenotypes for the trait expressed in the other line. Thus, the aim of this study was to propose and compare three methods for approximating the required variance components. The first two methods are based on (co)variances of genomic estimated breeding values (GEBV) in the line of interest, either accounting for shrinkage (VCGEBV-S) or not (VCGEBV). The third method uses restricted maximum likelihood (REML) estimates directly from univariate and bivariate analyses (VCREML) by ignoring that the variance components should refer to the line of interest, rather than to the line in which the trait is expressed. We validated these methods by comparing the resulting predicted bounds of [Formula: see text] with the [Formula: see text] estimated from PB and CB data for five traits in a three-way cross in pigs. RESULTS With both VCGEBV and VCREML, the estimated [Formula: see text] (plus or minus one standard error) was between the upper and lower bounds in 14 out of 15 cases. However, the range between the bounds was much smaller with VCREML (0.15-0.22) than with VCGEBV (0.44-0.57). With VCGEBV-S, the estimated [Formula: see text] was between the upper and lower bounds in only six out of 15 cases, with the bounds ranging from 0.21 to 0.44. CONCLUSIONS We conclude that using REML estimates of variance components within and between parental lines to predict the bounds of [Formula: see text] resulted in better predictions than methods based on GEBV. Thus, we recommend that the studies that estimate [Formula: see text] with genotype data also report estimated genetic variance components within and between the parental lines.
Collapse
Affiliation(s)
- Pascal Duenk
- grid.4818.50000 0001 0791 5666Wageningen University & Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| | - Yvonne C. J. Wientjes
- grid.4818.50000 0001 0791 5666Wageningen University & Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| | - Piter Bijma
- grid.4818.50000 0001 0791 5666Wageningen University & Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| | - Maja W. Iversen
- grid.457964.d0000 0004 7866 857XNorsvin SA, Storhamargata 44, 2317 Hamar, Norway
| | - Marcos S. Lopes
- grid.435361.6Topigs Norsvin Research Center, P.O. Box 43, 6640 AA Beuningen, The Netherlands ,Topigs Norsvin, Curitiba, 80420-210 Brazil
| | - Mario P. L. Calus
- grid.4818.50000 0001 0791 5666Wageningen University & Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| |
Collapse
|
5
|
Building a Calibration Set for Genomic Prediction, Characteristics to Be Considered, and Optimization Approaches. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:77-112. [PMID: 35451773 DOI: 10.1007/978-1-0716-2205-6_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The efficiency of genomic selection strongly depends on the prediction accuracy of the genetic merit of candidates. Numerous papers have shown that the composition of the calibration set is a key contributor to prediction accuracy. A poorly defined calibration set can result in low accuracies, whereas an optimized one can considerably increase accuracy compared to random sampling, for a same size. Alternatively, optimizing the calibration set can be a way of decreasing the costs of phenotyping by enabling similar levels of accuracy compared to random sampling but with fewer phenotypic units. We present here the different factors that have to be considered when designing a calibration set, and review the different criteria proposed in the literature. We classified these criteria into two groups: model-free criteria based on relatedness, and criteria derived from the linear mixed model. We introduce criteria targeting specific prediction objectives including the prediction of highly diverse panels, biparental families, or hybrids. We also review different ways of updating the calibration set, and different procedures for optimizing phenotyping experimental designs.
Collapse
|
6
|
Wientjes YCJ, Bijma P, Calus MPL, Zwaan BJ, Vitezica ZG, van den Heuvel J. The long-term effects of genomic selection: 1. Response to selection, additive genetic variance, and genetic architecture. Genet Sel Evol 2022; 54:19. [PMID: 35255802 PMCID: PMC8900405 DOI: 10.1186/s12711-022-00709-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 02/10/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Genomic selection has revolutionized genetic improvement in animals and plants, but little is known about its long-term effects. Here, we investigated the long-term effects of genomic selection on response to selection, genetic variance, and the genetic architecture of traits using stochastic simulations. We defined the genetic architecture as the set of causal loci underlying each trait, their allele frequencies, and their statistical additive effects. We simulated a livestock population under 50 generations of phenotypic, pedigree, or genomic selection for a single trait, controlled by either only additive, additive and dominance, or additive, dominance, and epistatic effects. The simulated epistasis was based on yeast data.
Results
Short-term response was always greatest with genomic selection, while response after 50 generations was greater with phenotypic selection than with genomic selection when epistasis was present, and was always greater than with pedigree selection. This was mainly because loss of genetic variance and of segregating loci was much greater with genomic and pedigree selection than with phenotypic selection. Compared to pedigree selection, selection response was always greater with genomic selection. Pedigree and genomic selection lost a similar amount of genetic variance after 50 generations of selection, but genomic selection maintained more segregating loci, which on average had lower minor allele frequencies than with pedigree selection. Based on this result, genomic selection is expected to better maintain genetic gain after 50 generations than pedigree selection. The amount of change in the genetic architecture of traits was considerable across generations and was similar for genomic and pedigree selection, but slightly less for phenotypic selection. Presence of epistasis resulted in smaller changes in allele frequencies and less fixation of causal loci, but resulted in substantial changes in statistical additive effects across generations.
Conclusions
Our results show that genomic selection outperforms pedigree selection in terms of long-term genetic gain, but results in a similar reduction of genetic variance. The genetic architecture of traits changed considerably across generations, especially under selection and when non-additive effects were present. In conclusion, non-additive effects had a substantial impact on the accuracy of selection and long-term response to selection, especially when selection was accurate.
Collapse
|
7
|
van den Berg I, Ho PN, Nguyen TV, Haile-Mariam M, MacLeod IM, Beatson PR, O'Connor E, Pryce JE. GWAS and genomic prediction of milk urea nitrogen in Australian and New Zealand dairy cattle. Genet Sel Evol 2022; 54:15. [PMID: 35183113 PMCID: PMC8858489 DOI: 10.1186/s12711-022-00707-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 01/31/2022] [Indexed: 11/24/2022] Open
Abstract
Background Urinary nitrogen leakage is an environmental concern in dairy cattle. Selection for reduced urinary nitrogen leakage may be done using indicator traits such as milk urea nitrogen (MUN). The result of a previous study indicated that the genetic correlation between MUN in Australia (AUS) and MUN in New Zealand (NZL) was only low to moderate (between 0.14 and 0.58). In this context, an alternative is to select sequence variants based on genome-wide association studies (GWAS) with a view to improve genomic prediction accuracies. A GWAS can also be used to detect quantitative trait loci (QTL) associated with MUN. Therefore, our objectives were to perform within-country GWAS and a meta-GWAS for MUN using records from up to 33,873 dairy cows and imputed whole-genome sequence data, to compare QTL detected in the GWAS for MUN in AUS and NZL, and to use sequence variants selected from the meta-GWAS to improve the prediction accuracy for MUN based on a joint AUS-NZL reference set. Results Using the meta-GWAS, we detected 14 QTL for MUN, located on chromosomes 1, 6, 11, 14, 19, 22, 26 and the X chromosome. The three most significant QTL encompassed the casein genes on chromosome 6, PAEP on chromosome 11 and DGAT1 on chromosome 14. We selected 50,000 sequence variants that had the same direction of effect for MUN in AUS and MUN in NZL and that were most significant in the meta-analysis for the GWAS. The selected sequence variants yielded a genetic correlation between MUN in AUS and MUN in NZL of 0.95 and substantially increased prediction accuracy in both countries. Conclusions Our results demonstrate how the sharing of data between two countries can increase the power of a GWAS and increase the accuracy of genomic prediction using a multi-country reference population and sequence variants selected based on a meta-GWAS. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-022-00707-9.
Collapse
Affiliation(s)
- Irene van den Berg
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia.
| | - Phuong N Ho
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia
| | - Tuan V Nguyen
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia
| | - Mekonnen Haile-Mariam
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia
| | - Iona M MacLeod
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia
| | | | | | - Jennie E Pryce
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| |
Collapse
|
8
|
Duenk P, Bijma P, Wientjes YCJ, Calus MPL. Predicting the purebred-crossbred genetic correlation from the genetic variance components in the parental lines. Genet Sel Evol 2021; 53:10. [PMID: 33541267 PMCID: PMC7860586 DOI: 10.1186/s12711-021-00601-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 01/08/2021] [Indexed: 01/24/2023] Open
Abstract
Background The genetic correlation between purebred and crossbred performance (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${r}_{pc}$$\end{document}rpc) is an important parameter in pig and poultry breeding, because response to selection in crossbred performance depends on the value of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${r}_{pc}$$\end{document}rpc when selection is based on purebred (PB) performance. The value of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${r}_{pc}$$\end{document}rpc can be substantially lower than 1, which is partly due to differences in allele frequencies between parental lines when non-additive genetic effects are present. This relationship between \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${r}_{pc}$$\end{document}rpc and parental allele frequencies suggests that \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${r}_{pc}$$\end{document}rpc can be expressed as a function of genetic parameters for the trait in the parental lines. In this study, we derived expressions for \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${r}_{pc}$$\end{document}rpc based on genetic variances within, and the genetic covariance between parental lines. It is important to note that the variance components used in our expressions are not the components that are typically estimated in empirical data. The expressions were derived for a genetic model with additive and dominance effects (D), and additive and epistatic additive-by-additive effects (EAA). We validated our expressions using simulations of purebred parental lines and their crosses, where the parental lines were either selected or not. Finally, using these simulations, we investigated the value of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${r}_{pc}$$\end{document}rpc for genetic models with both dominance and epistasis or with other types of epistasis, for which expressions could not be derived. Results Our simulations show that when non-additive effects are present, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${r}_{pc}$$\end{document}rpc decreases with increasing differences in allele frequencies between the parental lines. Genetic models that involve dominance result in lower values of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${r}_{pc}$$\end{document}rpc than genetic models that involve epistasis only. Using information of parental lines only, our expressions provide exact estimates of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${r}_{pc}$$\end{document}rpc for models D and EAA, and accurate upper and lower bounds of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${r}_{pc}$$\end{document}rpc for two other genetic models. Conclusion This work lays the foundation to enable estimation of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${r}_{pc}$$\end{document}rpc from information collected in PB parental lines only.
Collapse
Affiliation(s)
- Pascal Duenk
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands.
| | - Piter Bijma
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Yvonne C J Wientjes
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Mario P L Calus
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
9
|
Raymond B, Wientjes YCJ, Bouwman AC, Schrooten C, Veerkamp RF. A deterministic equation to predict the accuracy of multi-population genomic prediction with multiple genomic relationship matrices. Genet Sel Evol 2020; 52:21. [PMID: 32345213 PMCID: PMC7189707 DOI: 10.1186/s12711-020-00540-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 04/14/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A multi-population genomic prediction (GP) model in which important pre-selected single nucleotide polymorphisms (SNPs) are differentially weighted (MPMG) has been shown to result in better prediction accuracy than a multi-population, single genomic relationship matrix ([Formula: see text]) GP model (MPSG) in which all SNPs are weighted equally. Our objective was to underpin theoretically the advantages and limits of the MPMG model over the MPSG model, by deriving and validating a deterministic prediction equation for its accuracy. METHODS Using selection index theory, we derived an equation to predict the accuracy of estimated total genomic values of selection candidates from population [Formula: see text] ([Formula: see text]), when individuals from two populations, [Formula: see text] and [Formula: see text], are combined in the training population and two [Formula: see text], made respectively from pre-selected and remaining SNPs, are fitted simultaneously in MPMG. We used simulations to validate the prediction equation in scenarios that differed in the level of genetic correlation between populations, heritability, and proportion of genetic variance explained by the pre-selected SNPs. Empirical accuracy of the MPMG model in each scenario was calculated and compared to the predicted accuracy from the equation. RESULTS In general, the derived prediction equation resulted in accurate predictions of [Formula: see text] for the scenarios evaluated. Using the prediction equation, we showed that an important advantage of the MPMG model over the MPSG model is its ability to benefit from the small number of independent chromosome segments ([Formula: see text]) due to the pre-selected SNPs, both within and across populations, whereas for the MPSG model, there is only a single value for [Formula: see text], calculated based on all SNPs, which is very large. However, this advantage is dependent on the pre-selected SNPs that explain some proportion of the total genetic variance for the trait. CONCLUSIONS We developed an equation that gives insight into why, and under which conditions the MPMG outperforms the MPSG model for GP. The equation can be used as a deterministic tool to assess the potential benefit of combining information from different populations, e.g., different breeds or lines for GP in livestock or plants, or different groups of people based on their ethnic background for prediction of disease risk scores.
Collapse
Affiliation(s)
- Biaty Raymond
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands. .,Biometris, Wageningen University and Research, 6700AA, Wageningen, The Netherlands.
| | - Yvonne C J Wientjes
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Aniek C Bouwman
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | | | - Roel F Veerkamp
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
10
|
Gemenet DC, Kitavi MN, David M, Ndege D, Ssali RT, Swanckaert J, Makunde G, Yencho GC, Gruneberg W, Carey E, Mwanga RO, Andrade MI, Heck S, Campos H. Development of diagnostic SNP markers for quality assurance and control in sweetpotato [Ipomoea batatas (L.) Lam.] breeding programs. PLoS One 2020; 15:e0232173. [PMID: 32330201 PMCID: PMC7182229 DOI: 10.1371/journal.pone.0232173] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 04/08/2020] [Indexed: 11/19/2022] Open
Abstract
Quality assurance and control (QA/QC) is an essential element of a breeding program's optimization efforts towards increased genetic gains. Due to auto-hexaploid genome complexity, a low-cost marker platform for routine QA/QC in sweetpotato breeding programs is still unavailable. We used 662 parents of the International Potato Center (CIP)'s global breeding program spanning Peru, Uganda, Mozambique and Ghana, to develop a low-density highly informative single nucleotide polymorphism (SNP) marker set to be deployed for routine QA/QC. Segregation of the selected 30 SNPs (two SNPs per base chromosome) in a recombined breeding population was evaluated using 282 progeny from some of the parents above. The progeny were replicated from in-vitro, screenhouse and field, and the selected SNP-set was confirmed to identify relatively similar mislabeling error rates as a high density SNP-set of 10,159 markers. Six additional trait-specific markers were added to the selected SNP set from previous quantitative trait loci mapping studies. The 36-SNP set will be deployed for QA/QC in breeding pipelines and in fingerprinting of advanced clones or released varieties to monitor genetic gains in famers' fields. The study also enabled evaluation of CIP's global breeding population structure and the effect of some of the most devastating stresses like sweetpotato virus disease on genetic variation management. These results will inform future deployment of genomic selection in sweetpotato.
Collapse
Affiliation(s)
| | - Mercy N. Kitavi
- International Potato Center (CIP), ILRI Campus, Nairobi, Kenya
| | - Maria David
- International Potato Center (CIP), Apartado, Lima, Peru
| | - Dorcah Ndege
- International Potato Center (CIP), ILRI Campus, Nairobi, Kenya
| | | | | | | | - G. Craig Yencho
- North Carolina State University, Raleigh, North Carolina, United States of America
| | | | - Edward Carey
- International Potato Center (CIP), Kumasi, Ghana
| | | | | | - Simon Heck
- International Potato Center (CIP), ILRI Campus, Nairobi, Kenya
| | - Hugo Campos
- International Potato Center (CIP), Apartado, Lima, Peru
| |
Collapse
|
11
|
Haile-Mariam M, MacLeod IM, Bolormaa S, Schrooten C, O'Connor E, de Jong G, Daetwyler HD, Pryce JE. Value of sharing cow reference population between countries on reliability of genomic prediction for milk yield traits. J Dairy Sci 2019; 103:1711-1728. [PMID: 31864746 DOI: 10.3168/jds.2019-17170] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 10/24/2019] [Indexed: 01/08/2023]
Abstract
Increasing the reliability of genomic prediction (GP) of economic traits in the pasture-based dairy production systems of New Zealand (NZ) and Australia (AU) is important to both countries. This study assessed if sharing cow phenotype and genotype data of NZ and AU improves the reliability of GP for NZ bulls. Data from approximately 32,000 NZ genotyped cows and their contemporaries were included in the May 2018 routine genetic evaluation of the Australian Dairy cattle in an attempt to provide consistent phenotypes for both countries. After the genetic evaluation, deregressed proofs of cows were calculated for milk yield traits. The April 2018 multiple across-country evaluation of Interbull was also used to calculate deregressed proofs for bulls on the NZ scale. Approximately 1,178 Jersey (Jer) and 6,422 Holstein (Hol) bulls had genotype and phenotype data. In addition to NZ cows, phenotype data of close to 60,000 genotyped Australian (AU) cows from the same genetic evaluation run as NZ cows were used. All AU and NZ females were genotyped using low-density SNP chips (<10K SNP) and were imputed first to 50K and then to ∼600K (referred to as high density; HD). We used up to 98,000 animals in the reference populations, both by expanding the NZ reference set (cow, bull, single breed to multi-breed set) and by adding AU cows. Reliabilities of GP were calculated for 508 Jer and 1,251 Hol bulls whose sires are not included in the reference set (RS) to ensure that real differences are not masked by close relationships. The GP was tested using 50K or high-density SNP chip using genomic BLUP in bivariate (considering country as a trait) or single trait models. The RS that gave the highest reliability for each breed were also tested using a hybrid GP method that combines expectation maximization with Bayes R. The addition of the AU cows to an NZ RS that included either NZ cows only, or cows and bulls, improved the reliability of GP for both NZ Hol and Jer validation bulls for all traits. Using single breed reference populations also increased reliability when NZ crossbred cows were added to reference populations that included only purebred NZ bulls and cows and AU cows. The full multi-breed RS (all NZ cows and bulls and AU cows) provided similar reliabilities in NZ Hol bulls, when compared with the single breed reference with crossbred NZ cows. For Jer validation bulls, the RS that included Jer cows and bulls and crossbred cows from NZ and Jer cows from AU was marginally better than the all-breed, all-country RS. In terms of reliability, the advantage of the HD SNP chip was small but captured more of the genomic variance than the 50K, particularly for Hol. The expectation maximization Bayes R GP method was slightly (up to 3 percentage points) better than genomic BLUP. We conclude that GP of milk production traits in NZ bulls improves by up to 7 percentage points in reliability by expanding the NZ reference population to include AU cows.
Collapse
Affiliation(s)
- M Haile-Mariam
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia.
| | - I M MacLeod
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia
| | - S Bolormaa
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia
| | | | | | - G de Jong
- CRV, 6800 AL Arnhem, the Netherlands
| | - H D Daetwyler
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia; School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - J E Pryce
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia; School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| |
Collapse
|
12
|
Duenk P, Calus MPL, Wientjes YCJ, Breen VP, Henshall JM, Hawken R, Bijma P. Estimating the purebred-crossbred genetic correlation of body weight in broiler chickens with pedigree or genomic relationships. Genet Sel Evol 2019; 51:6. [PMID: 30782121 PMCID: PMC6381670 DOI: 10.1186/s12711-019-0447-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 01/24/2019] [Indexed: 11/27/2022] Open
Abstract
Background In pig and poultry breeding programs, the breeding goal is to improve crossbred (CB) performance, whereas selection in the purebred (PB) lines is often based on PB performance. Thus, response to selection may be suboptimal, because the genetic correlation between PB and CB performance (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$r_{pc}$$\end{document}rpc) is generally lower than 1. Accurate estimates of the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$r_{pc}$$\end{document}rpc are needed, so that breeders can decide if they should collect data from CB animals. \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$r_{pc}$$\end{document}rpc can be estimated either from pedigree or genomic relationships, which may produce different results. With genomic relationships, the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$r_{pc}$$\end{document}rpc estimate could be improved when relationships between purebred and crossbred animals are based only on the alleles that originate from the PB line of interest. This work presents the first comparison of estimated \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$r_{pc}$$\end{document}rpc and variance components of body weight in broilers, using pedigree-based or genotype-based models, where the breed-of-origin of alleles was either ignored or considered. We used genotypes and body weight measurements of PB and CB animals that have a common sire line. Results Our results showed that the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$r_{pc}$$\end{document}rpc estimates depended on the relationship matrix used. Estimates were 5 to 25% larger with genotype-based models than with pedigree-based models. Moreover, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$r_{pc}$$\end{document}rpc estimates were similar (max. 7% difference) regardless of whether the model considered breed-of-origin of alleles or not. Standard errors of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$r_{pc}$$\end{document}rpc estimates were smaller with genotype-based than with pedigree-based methods, and smaller with models that ignored breed-of-origin than with models that considered breed-of-origin. Conclusions We conclude that genotype-based models can be useful for estimating \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$r_{pc}$$\end{document}rpc, even when the PB and CB animals that have phenotypes are closely related. Considering breed-of-origin of alleles did not yield different estimates of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$r_{pc}$$\end{document}rpc, probably because the parental breeds of the CB animals were distantly related. Electronic supplementary material The online version of this article (10.1186/s12711-019-0447-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pascal Duenk
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands.
| | - Mario P L Calus
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Yvonne C J Wientjes
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | | | | | - Rachel Hawken
- Cobb-Vantress Inc., Siloam Springs, AR, 72761-1030, USA
| | - Piter Bijma
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|