1
|
Steyn Y, Lawlor TJ, Lourenco D, Misztal I. The importance of historically popular sires on the accuracy of genomic predictions of young animals in the US Holstein population. JDS COMMUNICATIONS 2023; 4:260-264. [PMID: 37521061 PMCID: PMC10382817 DOI: 10.3168/jdsc.2022-0299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 01/26/2023] [Indexed: 08/01/2023]
Abstract
The dairy industry is known for its extensive use of artificial insemination, which has resulted in a population where most animals can be traced back to only a few sires. Due to their relatedness to the population, old influential sires could still contribute to the accuracy of genomic predictions. The objective of the study was to identify the impact of historically influential sires on the recent population. This was tested by constructing a genomic relationship matrix using recursion with different sets of sires. Differences in prediction accuracies with different sets are indicative of how important each set is. Recursion coefficients linking young animals to those sets reveal the relative importance of specific sires to the prediction accuracy of recent animals. The data included ∼10 million scores for stature and fore udder attachment (FUA) measured from 1983. Genotypes of 569,404 animals were available. Sire sets included the 100 most popular sires born within different time periods. Computations were with single-step genomic BLUP. In general, the younger sires had higher prediction accuracies than the oldest sires, even though they generally have fewer progeny. The accuracy of evaluation for stature was increased from 0.54 with the most popular sires born before 1981 to 0.69 with sires born from 2001 to 2010, while the accuracy for FUA increased from 0.47 to 0.61. The accuracy achieved using the overall 100 most used sires was 0.66 for stature and 0.58 for FUA. All 100 sires from each period were combined in a subset to determine the importance of each sire relative to all 400 animals in the combined subset. The highest relative impact of a sire that was born within the different time sets was 1.97 for Valiant (before 1981), 1.94 for Blackstar (1981 to 1990), 4.38 for Shottle (1991 to 2000), and 3.09 for Planet (2001 to 2010). The 3 sires among the 400 with the greatest impact were Shottle, Goldwyn (3.73), and Planet. The relative impact of a sire was not strongly related to the number of progeny. For instance, the relative impact of Durham with 34K progeny was 2.29, whereas the impact of O Man with 15K progeny was 3.13. The impact of a sire is also influenced by whether it was used as a sire of sires. Results show that younger sires are more relevant to the accuracy of breeding value prediction in the recent population.
Collapse
Affiliation(s)
- Yvette Steyn
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | | | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| |
Collapse
|
2
|
Osawa T, Masuda Y, Saburi J, Hirumachi K. Application of single-step single nucleotide polymorphism best linear unbiased predictor model with unknown-parent groups for type traits in Japanese Holsteins. J Dairy Sci 2023:S0022-0302(23)00291-6. [PMID: 37268563 DOI: 10.3168/jds.2022-22541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Accepted: 01/30/2023] [Indexed: 06/04/2023]
Abstract
The objectives of this study were to investigate the computational performance and the predictive ability and bias of a single-step SNP BLUP model (ssSNPBLUP) in genotyped young animals with unknown-parent groups (UPG) for type traits, using national genetic evaluation data from the Japanese Holstein population. The phenotype, genotype, and pedigree data were the same as those used in a national genetic evaluation of linear type traits classified between April 1984 and December 2020. In the current study, 2 data sets were prepared: the full data set containing all entries up to December 2020 and a truncated data set ending with December 2016. Genotyped animals were classified into 3 types: sires with classified daughters (S), cows with records (C), and young animals (Y). The computing performance and prediction accuracy of ssSNPBLUP were compared for the following 3 groups of genotyped animals: sires with classified daughters and young animals (SY); cows with records and young animals (CY); and sires with classified daughters, cows with records, and young animals (SCY). In addition, we tested 3 parameters of residual polygenic variance in ssSNPBLUP (0.1, 0.2, or 0.3). Daughter yield deviations (DYD) for the validation bulls and phenotypes adjusted for all fixed effects and random effects other than animal and residual (Yadj) for the validation cows were obtained using the full data set from the pedigree-based BLUP model. The regression coefficients of DYD for bulls (or Yadj for cows) on the genomic estimated breeding value (GEBV) using the truncated data set were used to measure the inflation of the predictions of young animals. The coefficient of determination of DYD on GEBV was used to measure the predictive ability of the predictions for the validation bulls. The reliability of the predictions for the validation cows was calculated as the square of the correlation between Yadj and GEBV divided by heritability. The predictive ability was highest in the SCY group and lowest in the CY group. However, minimal difference was found in predictive abilities with or without UPG models using different parameters of residual polygenic variance. The regression coefficients approached 1.0 as the parameter of residual polygenic variance increased, but regression coefficients were mostly similar regardless of the use of UPG across the groups of genotyped animals. The ssSNPBLUP model, including UPG, was demonstrated as feasible for implementation in the national evaluation of type traits in Japanese Holsteins.
Collapse
Affiliation(s)
- Takefumi Osawa
- National Livestock Breeding Center, Nishigo-mura, Fukushima, 961-8511, Japan.
| | - Yutaka Masuda
- Rakuno Gakuen University, Ebetsu, Hokkaido, 069-8501, Japan
| | - Junichi Saburi
- National Livestock Breeding Center, Nishigo-mura, Fukushima, 961-8511, Japan
| | - Keita Hirumachi
- National Livestock Breeding Center, Nishigo-mura, Fukushima, 961-8511, Japan
| |
Collapse
|
3
|
Exploring the statistical nature of independent chromosome segments. Livest Sci 2023. [DOI: 10.1016/j.livsci.2023.105207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2023]
|
4
|
Pocrnic I, Lindgren F, Tolhurst D, Herring WO, Gorjanc G. Optimisation of the core subset for the APY approximation of genomic relationships. Genet Sel Evol 2022; 54:76. [PMID: 36418945 PMCID: PMC9682752 DOI: 10.1186/s12711-022-00767-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 10/31/2022] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND By entering the era of mega-scale genomics, we are facing many computational issues with standard genomic evaluation models due to their dense data structure and cubic computational complexity. Several scalable approaches have been proposed to address this challenge, such as the Algorithm for Proven and Young (APY). In APY, genotyped animals are partitioned into core and non-core subsets, which induces a sparser inverse of the genomic relationship matrix. This partitioning is often done at random. While APY is a good approximation of the full model, random partitioning can make results unstable, possibly affecting accuracy or even reranking animals. Here we present a stable optimisation of the core subset by choosing animals with the most informative genotype data. METHODS We derived a novel algorithm for optimising the core subset based on a conditional genomic relationship matrix or a conditional single nucleotide polymorphism (SNP) genotype matrix. We compared the accuracy of genomic predictions with different core subsets for simulated and real pig data sets. The core subsets were constructed (1) at random, (2) based on the diagonal of the genomic relationship matrix, (3) at random with weights from (2), or (4) based on the novel conditional algorithm. To understand the different core subset constructions, we visualise the population structure of the genotyped animals with linear Principal Component Analysis and non-linear Uniform Manifold Approximation and Projection. RESULTS All core subset constructions performed equally well when the number of core animals captured most of the variation in the genomic relationships, both in simulated and real data sets. When the number of core animals was not sufficiently large, there was substantial variability in the results with the random construction but no variability with the conditional construction. Visualisation of the population structure and chosen core animals showed that the conditional construction spreads core animals across the whole domain of genotyped animals in a repeatable manner. CONCLUSIONS Our results confirm that the size of the core subset in APY is critical. Furthermore, the results show that the core subset can be optimised with the conditional algorithm that achieves an optimal and repeatable spread of core animals across the domain of genotyped animals.
Collapse
Affiliation(s)
- Ivan Pocrnic
- grid.4305.20000 0004 1936 7988The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG UK
| | - Finn Lindgren
- grid.4305.20000 0004 1936 7988School of Mathematics, The University of Edinburgh, The King’s Buildings, Edinburgh, EH9 3FD UK
| | - Daniel Tolhurst
- grid.4305.20000 0004 1936 7988The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG UK
| | - William O. Herring
- Genus PIC, 100 Bluegrass Commons Blvd., Suite 2200, Hendersonville, TN 37075 USA
| | - Gregor Gorjanc
- grid.4305.20000 0004 1936 7988The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG UK
| |
Collapse
|
5
|
Leite NG, Chen CY, Herring WO, Holl J, Tsuruta S, Lourenco D. Leveraging low-density crossbred genotypes to offset crossbred phenotypes and their impact on purebred predictions. J Anim Sci 2022; 100:6780296. [PMID: 36309902 PMCID: PMC9733505 DOI: 10.1093/jas/skac359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 10/27/2022] [Indexed: 12/15/2022] Open
Abstract
The objectives of this study were to 1) investigate the predictability and bias of genomic breeding values (GEBV) of purebred (PB) sires for CB performance when CB genotypes imputed from a low-density panel are available, 2) assess if the availability of those CB genotypes can be used to partially offset CB phenotypic recording, and 3) investigate the impact of including imputed CB genotypes in genomic analyses when using the algorithm for proven and young (APY). Two pig populations with up to 207,375 PB and 32,893 CB phenotypic records per trait and 138,026 PB and 32,893 CB genotypes were evaluated. PB sires were genotyped for a 50K panel, whereas CB animals were genotyped for a low-density panel of 600 SNP and imputed to 50K. The predictability and bias of GEBV of PB sires for backfat thickness (BFX) and average daily gain recorded (ADGX) recorded on CB animals were assessed when CB genotypes were available or not in the analyses. In the first set of analyses, direct inverses of the genomic relationship matrix (G) were used with phenotypic datasets truncated at different time points. In the next step, we evaluated the APY algorithm with core compositions differing in the CB genotype contributions. After that, the performance of core compositions was compared with an analysis using a random PB core from a purely PB genomic set. The number of rounds to convergence was recorded for all APY analyses. With the direct inverse of G in the first set of analyses, adding CB genotypes imputed from a low-density panel (600 SNP) did not improve predictability or reduce the bias of PB sires' GEBV for CB performance, even for sires with fewer CB progeny phenotypes in the analysis. That indicates that the inclusion of CB genotypes primarily used for inferring pedigree in commercial farms is of no benefit to offset CB phenotyping. When CB genotypes were incorporated into APY, a random core composition or a core with no CB genotypes reduced bias and the number of rounds to convergence but did not affect predictability. Still, a PB random core composition from a genomic set with only PB genotypes resulted in the highest predictability and the smallest number of rounds to convergence, although bias increased. Genotyping CB individuals for low-density panels is a valuable identification tool for linking CB phenotypes to pedigree; however, the inclusion of those CB genotypes imputed from a low-density panel (600 SNP) might not benefit genomic predictions for PB individuals or offset CB phenotyping for the evaluated CB performance traits. Further studies will help understand the usefulness of those imputed CB genotypes for traits with lower PB-CB genetic correlations and traits not recorded in the PB environment, such as mortality and disease traits.
Collapse
Affiliation(s)
| | | | | | | | - Shogo Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
6
|
Garcia A, Aguilar I, Legarra A, Tsuruta S, Misztal I, Lourenco D. Theoretical accuracy for indirect predictions based on SNP effects from single-step GBLUP. Genet Sel Evol 2022; 54:66. [PMID: 36162979 PMCID: PMC9513904 DOI: 10.1186/s12711-022-00752-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 08/23/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Although single-step GBLUP (ssGBLUP) is an animal model, SNP effects can be backsolved from genomic estimated breeding values (GEBV). Predicted SNP effects allow to compute indirect prediction (IP) per individual as the sum of the SNP effects multiplied by its gene content, which is helpful when the number of genotyped animals is large, for genotyped animals not in the official evaluations, and when interim evaluations are needed. Typically, IP are obtained for new batches of genotyped individuals, all of them young and without phenotypes. Individual (theoretical) accuracies for IP are rarely reported, but they are nevertheless of interest. Our first objective was to present equations to compute individual accuracy of IP, based on prediction error covariance (PEC) of SNP effects, and in turn, are obtained from PEC of GEBV in ssGBLUP. The second objective was to test the algorithm for proven and young (APY) in PEC computations. With large datasets, it is impossible to handle the full PEC matrix, thus the third objective was to examine the minimum number of genotyped animals needed in PEC computations to achieve IP accuracies that are equivalent to GEBV accuracies. RESULTS Correlations between GEBV and IP for the validation animals using SNP effects from ssGBLUP evaluations were ≥ 0.99. When all available genotyped animals were used for PEC computations, correlations between GEBV and IP accuracy were ≥ 0.99. In addition, IP accuracies were compatible with GEBV accuracies either with direct inversion of the genomic relationship matrix (G) or using the algorithm for proven and young (APY) to obtain the inverse of G. As the number of genotyped animals included in the PEC computations decreased from around 55,000 to 15,000, correlations were still ≥ 0.96, but IP accuracies were biased downwards. CONCLUSIONS Theoretical accuracy of indirect prediction can be successfully obtained by computing SNP PEC out of GEBV PEC from ssGBLUP equations using direct or APY G inverse. It is possible to reduce the number of genotyped animals in PEC computations, but accuracies may be underestimated. Further research is needed to approximate SNP PEC from ssGBLUP to limit the computational requirements with many genotyped animals.
Collapse
Affiliation(s)
- Andre Garcia
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602 USA
| | - Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria (INIA), 11500 Montevideo, Uruguay
| | - Andres Legarra
- UMR GenPhySE, INRA Toulouse, BP52626, 31326 Castanet Tolosan, France
| | - Shogo Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602 USA
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602 USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602 USA
| |
Collapse
|
7
|
Hollifield MK, Bermann M, Lourenco D, Misztal I. Impact of blending the genomic relationship matrix with different levels of pedigree relationships or the identity matrix on genetic evaluations. JDS COMMUNICATIONS 2022; 3:343-347. [PMID: 36340904 PMCID: PMC9623765 DOI: 10.3168/jdsc.2022-0229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 06/29/2022] [Indexed: 06/16/2023]
Abstract
Evaluations using single-step genomic BLUP require blending the genomic relationship matrix (G) with a positive definite matrix to ensure nonsingularity for solving the mixed model equations. Many organizations blend G with a proportion of the numerator relationship matrix for genotyped animals (A 22) to improve stability and possibly add a residual polygenic effect. However, when nearly all the polygenic variance is explained by G, blending with A 22 may cause inflation and add excess computing time; thus, blending with an identity matrix (I) multiplied by a small value may be a better solution. The objective of this study was to evaluate changes in reliability and inflation of genomic estimated breeding values, convergence rate, elapsed wall-clock time for blending G with different levels of A 22 or I, and develop a more time-efficient blending method. A US Holstein cattle data set was used with 9.7 million animals in the pedigree, 569,404 animals with genotypes, and 10.1 million stature phenotypes. Blending G by adding a small value to the diagonal elements had comparable performance to A 22 with fewer rounds to convergence required to solve the system of equations. Reliability and inflation of genomic estimated breeding values ranged from 0.63 to 0.68 and 0.86 to 0.89 for all blending scenarios tested. The current blending default in the BLUPF90 software is to replace G with (1 - β)G + βA 22, where β equals 0.05. In this study, β values of 0.30, 0.20, 0.05, 0.01, 0.005, and 0.001 were evaluated with A 22 and I. Negligible differences in elapsed computing time between the blending types and levels were observed. Subsequently, the current blending algorithm used in the BLUPF90 family of programs was optimized, reducing the blending time from approximately 2 h to 5 min for A 22 and less than 1 s for I. The new time difference between blending with A 22 or I is negligible and not computationally critical. The results indicate that blending G with A 22 does not have clear advantages over blending with a small proportion of I.
Collapse
|
8
|
Bermann M, Lourenco D, Forneris NS, Legarra A, Misztal I. On the equivalence between marker effect models and breeding value models and direct genomic values with the Algorithm for Proven and Young. Genet Sel Evol 2022; 54:52. [PMID: 35842585 PMCID: PMC9288049 DOI: 10.1186/s12711-022-00741-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 06/29/2022] [Indexed: 12/04/2022] Open
Abstract
Background Single-step genomic predictions obtained from a breeding value model require calculating the inverse of the genomic relationship matrix \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$({\mathbf{G}}^{-1})$$\end{document}(G-1). The Algorithm for Proven and Young (APY) creates a sparse representation of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{G}}^{-1}$$\end{document}G-1 with a low computational cost. APY consists of selecting a group of core animals and expressing the breeding values of the remaining animals as a linear combination of those from the core animals plus an error term. The objectives of this study were to: (1) extend APY to marker effects models; (2) derive equations for marker effect estimates when APY is used for breeding value models, and (3) show the implication of selecting a specific group of core animals in terms of a marker effects model. Results We derived a family of marker effects models called APY-SNP-BLUP. It differs from the classic marker effects model in that the row space of the genotype matrix is reduced and an error term is fitted for non-core animals. We derived formulas for marker effect estimates that take this error term in account. The prediction error variance (PEV) of the marker effect estimates depends on the PEV for core animals but not directly on the PEV of the non-core animals. We extended the APY-SNP-BLUP to include a residual polygenic effect and accommodate non-genotyped animals. We show that selecting a specific group of core animals is equivalent to select a subspace of the row space of the genotype matrix. As the number of core animals increases, subspaces corresponding to different sets of core animals tend to overlap, showing that random selection of core animals is algebraically justified. Conclusions The APY-(ss)GBLUP models can be expressed in terms of marker effect models. When the number of core animals is equal to the rank of the genotype matrix, APY-SNP-BLUP is identical to the classic marker effects model. If the number of core animals is less than the rank of the genotype matrix, genotypes for non-core animals are imputed as a linear combination of the genotypes of the core animals. For estimating SNP effects, only relationships and estimated breeding values for core animals are needed. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-022-00741-7.
Collapse
Affiliation(s)
- Matias Bermann
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| | - Natalia S Forneris
- Facultad de Agronomía, Universidad de Buenos Aires, C1417DSQ, Buenos Aires, Argentina.,Instituto de Investigaciones en Producción Animal (INPA), CONICET - Universidad de Buenos Aires, C1427CWO, Buenos Aires, Argentina
| | | | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| |
Collapse
|
9
|
Abdollahi-Arpanahi R, Lourenco D, Misztal I. A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP. Genet Sel Evol 2022; 54:34. [PMID: 35596130 PMCID: PMC9123737 DOI: 10.1186/s12711-022-00726-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 05/02/2022] [Indexed: 11/16/2022] Open
Abstract
Background The algorithm for proven and young (APY) has been suggested as a solution for recursively computing a sparse representation for the inverse of a large genomic relationship matrix (G). In APY, a subset of genotyped individuals is used as the core and the remaining genotyped individuals are used as noncore. Size and definition of the core are relevant research subjects for the application of APY, especially given the ever-increasing number of genotyped individuals. Methods The aim of this study was to investigate several core definitions, including the most popular animals (MPA) (i.e., animals with high contributions to the genetic pool), the least popular males (LPM), the least popular females (LPF), a random set (Rnd), animals evenly distributed across genealogical paths (Ped), unrelated individuals (Unrel), or based on within-family selection (Fam), or on decomposition of the gene content matrix (QR). Each definition was evaluated for six core sizes based on prediction accuracy of single-step genomic best linear unbiased prediction (ssGBLUP) with APY. Prediction accuracy of ssGBLUP with the full inverse of G was used as the baseline. The dataset consisted of 357k pedigreed Duroc pigs with 111k pigs with genotypes and ~ 220k phenotypic records. Results When the core size was equal to the number of largest eigenvalues explaining 50% of the variation of G (n = 160), MPA and Ped core definitions delivered the highest average prediction accuracies (~ 0.41−0.53). As the core size increased to the number of eigenvalues explaining 99% of the variation in G (n = 7320), prediction accuracy was nearly identical for all core types and correlations with genomic estimated breeding values (GEBV) from ssGBLUP with the full inversion of G were greater than 0.99 for all core definitions. Cores that represent all generations, such as Rnd, Ped, Fam, and Unrel, were grouped together in the hierarchical clustering of GEBV. Conclusions For small core sizes, the definition of the core matters; however, as the size of the core reaches an optimal value equal to the number of largest eigenvalues explaining 99% of the variation of G, the definition of the core becomes arbitrary.
Collapse
Affiliation(s)
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| |
Collapse
|
10
|
Junqueira VS, Lourenco D, Masuda Y, Cardoso FF, Lopes PS, Silva FFE, Misztal I. Is single-step genomic REML with the algorithm for proven and young more computationally efficient when less generations of data are present? J Anim Sci 2022; 100:skac082. [PMID: 35289906 PMCID: PMC9118993 DOI: 10.1093/jas/skac082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 03/10/2022] [Indexed: 12/04/2022] Open
Abstract
Efficient computing techniques allow the estimation of variance components for virtually any traditional dataset. When genomic information is available, variance components can be estimated using genomic REML (GREML). If only a portion of the animals have genotypes, single-step GREML (ssGREML) is the method of choice. The genomic relationship matrix (G) used in both cases is dense, limiting computations depending on the number of genotyped animals. The algorithm for proven and young (APY) can be used to create a sparse inverse of G (GAPY~-1) with close to linear memory and computing requirements. In ssGREML, the inverse of the realized relationship matrix (H-1) also includes the inverse of the pedigree relationship matrix, which can be dense with a long pedigree, but sparser with short. The main purpose of this study was to investigate whether costs of ssGREML can be reduced using APY with truncated pedigree and phenotypes. We also investigated the impact of truncation on variance components estimation when different numbers of core animals are used in APY. Simulations included 150K animals from 10 generations, with selection. Phenotypes (h2 = 0.3) were available for all animals in generations 1-9. A total of 30K animals in generations 8 and 9, and 15K validation animals in generation 10 were genotyped for 52,890 SNP. Average information REML and ssGREML with G-1 and GAPY~-1 using 1K, 5K, 9K, and 14K core animals were compared. Variance components are impacted when the core group in APY represents the number of eigenvalues explaining a small fraction of the total variation in G. The most time-consuming operation was the inversion of G, with more than 50% of the total time. Next, numerical factorization consumed nearly 30% of the total computing time. On average, a 7% decrease in the computing time for ordering was observed by removing each generation of data. APY can be successfully applied to create the inverse of the genomic relationship matrix used in ssGREML for estimating variance components. To ensure reliable variance component estimation, it is important to use a core size that corresponds to the number of largest eigenvalues explaining around 98% of total variation in G. When APY is used, pedigrees can be truncated to increase the sparsity of H and slightly reduce computing time for ordering and symbolic factorization, with no impact on the estimates.
Collapse
Affiliation(s)
- Vinícius Silva Junqueira
- Breeding Research Department, Bayer Crop Science, Uberlândia, Minas Gerais, Brazil
- Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Daniela Lourenco
- Department of Dairy and Animal Science, University of Georgia, Athens, GA 30602, USA
| | - Yutaka Masuda
- Department of Dairy and Animal Science, University of Georgia, Athens, GA 30602, USA
| | - Fernando Flores Cardoso
- Empresa Brasileira de Pesquisa Agropecuária (EMBRAPA) Pecuária Sul, Bagé, Rio Grande do Sul, Brasil
| | - Paulo Sávio Lopes
- Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | | | - Ignacy Misztal
- Department of Dairy and Animal Science, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
11
|
Tsuruta S, Lourenco D, Masuda Y, Lawlor T, Misztal I. Reducing computational cost of large-scale genomic evaluation by using indirect genomic prediction. JDS COMMUNICATIONS 2021; 2:356-360. [PMID: 36337117 PMCID: PMC9623783 DOI: 10.3168/jdsc.2021-0097] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 06/27/2021] [Indexed: 06/10/2023]
Abstract
Over half a million Holsteins are being genotyped annually in the United States. The computational cost of including all genotypes in single-step genomic (ssG)BLUP is high, although it is feasible to conduct large-scale genomic prediction using an efficient algorithm such as APY (algorithm for proven and young). An effective method to further reduce the computing cost could be the use of indirect genomic predictions (IGP) for genotyped animals when they have neither progeny nor phenotypes. These young genotyped animals have no effect on the other genotyped animals and could have their genomic prediction done indirectly. The main objective of this study was to calculate IGP for various groups of genotyped animals and investigate the reduction in computing time as well as bias and accuracy of the IGP. We compared IGP with genomic (G)EBV for 18 linear type traits in US Holsteins, including 2.3 million (M) genotyped animals. The full data set consisted of 10.9M records for 18 linear type traits up to 2018 calving, 13.6M animals in the pedigree, and 2.3M animals genotyped for 79K SNP. For IGP, ssGBLUP included all genotyped animals except those with neither progeny nor phenotypes by year from 2014 to 2018 (i.e., the target animals). The SNP marker effects were computed based on GEBV for genotyped animals that had progeny, or phenotypes, or both. Further, IGP were calculated for target genotyped animals in each year group. For all genotyped animal groups from 2014 to 2018, the coefficients of determination (R2) of a linear regression of GEBV on IGP were 0.960 for males and 0.954 for females for 18 traits on average. To reduce computing costs, the SNP marker effects were calculated based on GEBV from randomly selected genotyped animals from 15K to 60K. By randomly selecting a small number of genotyped animals, the computing time was dramatically reduced. As more genotyped animals were randomly selected to calculate SNP effects, R2 was higher (more accurate) and the regression coefficient was lower (more inflated IGP). In a practical genomic evaluation in US Holsteins, to get sufficient contributions from GEBV, 25K to 35K is a rational number of genotyped animals that can be randomly selected to compute SNP effects and obtain accurate and unbiased IGP. Considering the computing time and both unbiasedness and accuracy of IGP, genomic evaluation can be conducted separately in GEBV for genotyped animals with phenotypes or progeny and in IGP for young genotyped animals. This can be a practical solution when conducting a large-scale genomic evaluation and would enable more frequent evaluation at lower cost, especially when many genotyped animals have neither phenotypes nor progeny.
Collapse
Affiliation(s)
- S. Tsuruta
- Animal and Dairy Science Department, University of Georgia, Athens 30602
| | - D.A.L. Lourenco
- Animal and Dairy Science Department, University of Georgia, Athens 30602
| | - Y. Masuda
- Animal and Dairy Science Department, University of Georgia, Athens 30602
| | - T.J. Lawlor
- Holstein Association USA Inc., Brattleboro, VT 05301
| | - I. Misztal
- Animal and Dairy Science Department, University of Georgia, Athens 30602
| |
Collapse
|
12
|
Abdollahi-Arpanahi R, Lourenco D, Misztal I. Detecting effective starting point of genomic selection by divergent trends from best linear unbiased prediction and single-step genomic best linear unbiased prediction in pigs, beef cattle, and broilers. J Anim Sci 2021; 99:6352407. [PMID: 34390341 PMCID: PMC8420679 DOI: 10.1093/jas/skab243] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 08/12/2021] [Indexed: 12/12/2022] Open
Abstract
Genomic selection has been adopted nationally and internationally in different livestock and plant species. However, understanding whether genomic selection has been effective or not is an essential question for both industry and academia. Once genomic evaluation started being used, estimation of breeding values with pedigree best linear unbiased prediction (BLUP) became biased because this method does not consider selection using genomic information. Hence, the effective starting point of genomic selection can be detected in two possible ways including the divergence of genetic trends and Realized Mendelian sampling (RMS) trends obtained with BLUP and single-step genomic BLUP (ssGBLUP). This study aimed to find the start date of genomic selection for a set of economically important traits in three livestock species by comparing trends obtained using BLUP and ssGBLUP. Three datasets were used for this purpose: 1) a pig dataset with 117k genotypes and 1.3M animals in pedigree, 2) an Angus cattle dataset consisted of ~842k genotypes and 11.5M animals in pedigree, and 3) a purebred broiler chicken dataset included ~154k genotypes and 1.3M birds in pedigree were used. The genetic trends for pigs diverged for the genotyped animals born in 2014 for average daily gain (ADG) and backfat (BF). In beef cattle, the trends started diverging in 2009 for weaning weight (WW) and in 2016 for postweaning gain (PWG), with little divergence for birth weight (BTW). In broiler chickens, the genetic trends estimated by ssGBLUP and BLUP diverged at breeding cycle 6 for two out of the three production traits. The RMS trends for the genotyped pigs diverged for animals born in 2014, more for ADG than for BF. In beef cattle, the RMS trends started diverging in 2009 for WW and in 2016 for PWG, with a trivial trend for BTW. In broiler chickens, the RMS trends from ssGBLUP and BLUP diverged strongly for two production traits at breeding cycle 6, with a slight divergence for another trait. Divergence of the genetic trends from ssGBLUP and BLUP indicates the onset of the genomic selection. The presence of trends for RMS indicates selective genotyping, with or without the genomic selection. The onset of genomic selection and genotyping strategies agrees with industry practices across the three species. In summary, the effective start of genomic selection can be detected by the divergence between genetic and RMS trends from BLUP and ssGBLUP.
Collapse
Affiliation(s)
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
13
|
Steyn Y, Gonzalez-Pena D, Bernal Rubio YL, Vukasinovic N, DeNise SK, Lourenco DAL, Misztal I. Indirect genomic predictions for milk yield in crossbred Holstein-Jersey dairy cattle. J Dairy Sci 2021; 104:5728-5737. [PMID: 33685678 DOI: 10.3168/jds.2020-19451] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 01/05/2021] [Indexed: 11/19/2022]
Abstract
The objective of this study was to predict genomic breeding values for milk yield of crossbred dairy cattle under different scenarios using single-step genomic BLUP (ssGBLUP). The data set included 13,880,217 milk yield measurements on 6,830,415 cows. Genotypes of 89,558 Holstein, 40,769 Jersey, and 22,373 Holstein-Jersey crossbred animals were used, of which all Holstein, 9,313 Jersey, and 1,667 crossbred animals had phenotypic records. Genotypes were imputed to 45K SNP markers. The SNP effects were estimated from single-breed evaluations for Jersey (JE), Holstein (HO) and crossbreds (CROSS), and multibreed evaluations including all Jersey and Holstein (JE_HO) or approximately equal proportions of Jersey, Holstein, and crossbred animals (MIX). Indirect predictions (IP) of the validation animals (358 crossbred animals with phenotypes excluded from evaluations) were calculated using the resulting SNP effects. Additionally, breed proportions (BP) of crossbred animals were applied as a weight when IP were estimated based on each pure breed. The predictive ability of IP was calculated as the Pearson correlation between IP and phenotypes of the validation animals adjusted for fixed effects in the model. Regression of adjusted phenotypes on IP was used to assess the inflation of IP. The predictive ability of IP for CROSS, JE, HO, JE_HO, and MIX scenario was 0.50, 0.50, 0.47, 0.50, and 0.46, respectively. Using BP was the least successful, with a predictive ability of 0.32. The inflation of the IP for crossbred animals using CROSS, JE, HO, JE_HO, MIX, and BP scenarios were 1.17, 0.65, 0.55, 0.78, 1.00, and 0.85, respectively. The IP of crossbred animals can be predicted using single-step GBLUP under a scenario that includes purebred genotypes.
Collapse
Affiliation(s)
- Y Steyn
- Department of Animal and Dairy Science, University of Georgia, 425 River Road, Athens 30602.
| | | | | | | | - S K DeNise
- Zoetis, 333 Portage Street, Kalamazoo, MI 49007
| | - D A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, 425 River Road, Athens 30602
| | - I Misztal
- Department of Animal and Dairy Science, University of Georgia, 425 River Road, Athens 30602
| |
Collapse
|
14
|
Cesarani A, Masuda Y, Tsuruta S, Nicolazzi EL, VanRaden PM, Lourenco D, Misztal I. Genomic predictions for yield traits in US Holsteins with unknown parent groups. J Dairy Sci 2021; 104:5843-5853. [PMID: 33663836 DOI: 10.3168/jds.2020-19789] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 12/18/2020] [Indexed: 11/19/2022]
Abstract
The objective of this study was to assess the reliability and bias of estimated breeding values (EBV) from traditional BLUP with unknown parent groups (UPG), genomic EBV (GEBV) from single-step genomic BLUP (ssGBLUP) with UPG for the pedigree relationship matrix (A) only (SS_UPG), and GEBV from ssGBLUP with UPG for both A and the relationship matrix among genotyped animals (A22; SS_UPG2) using 6 large phenotype-pedigree truncated Holstein data sets. The complete data included 80 million records for milk, fat, and protein yields from 31 million cows recorded since 1980. Phenotype-pedigree truncation scenarios included truncation of phenotypes for cows recorded before 1990 and 2000 combined with truncation of pedigree information after 2 or 3 ancestral generations. A total of 861,525 genotyped bulls with progeny and cows with phenotypic records were used in the analyses. Reliability and bias (inflation/deflation) of GEBV were obtained for 2,710 bulls based on deregressed proofs, and on 381,779 cows born after 2014 based on predictivity (adjusted cow phenotypes). The BLUP reliabilities for young bulls varied from 0.29 to 0.30 across traits and were unaffected by data truncation and number of generations in the pedigree. Reliabilities ranged from 0.54 to 0.69 for SS_UPG and were slightly affected by phenotype-pedigree truncation. Reliabilities ranged from 0.69 to 0.73 for SS_UPG2 and were unaffected by phenotype-pedigree truncation. The regression coefficient of bull deregressed proofs on (G)EBV (i.e., GEBV and EBV) ranged from 0.86 to 0.90 for BLUP, from 0.77 to 0.94 for SS_UPG, and was 1.00 ± 0.03 for SS_UPG2. Cow predictivity ranged from 0.22 to 0.28 for BLUP, 0.48 to 0.51 for SS_UPG, and 0.51 to 0.54 for SS_UPG2. The highest cow predictivities for BLUP were obtained with the most extreme truncation, whereas for SS_UPG2, cow predictivities were also unaffected by phenotype-pedigree truncations. The regression coefficient of cow predictivities on (G)EBV was 1.02 ± 0.02 for SS_UPG2 with the most extreme truncation, which indicated the least biased predictions. Computations with the complete data set took 17 h with BLUP, 58 h with SS_UPG, and 23 h with SS_UPG2. The same computations with the most extreme phenotype-pedigree truncation took 7, 36, and 15 h, respectively. The SS_UPG2 converged in fewer rounds than BLUP, whereas SS_UPG took up to twice as many rounds. Thus, the ssGBLUP with UPG assigned to both A and A22 provided accurate and unbiased evaluations, regardless of phenotype-pedigree truncation scenario. Old phenotypes (before 2000 in this data set) did not affect the reliability of predictions for young selection candidates, especially in SS_UPG2.
Collapse
Affiliation(s)
- A Cesarani
- Department of Animal and Dairy Science, University of Georgia, Athens 30602.
| | - Y Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - S Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | | | - P M VanRaden
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
| | - D Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - I Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| |
Collapse
|
15
|
Garcia ALS, Masuda Y, Tsuruta S, Miller S, Misztal I, Lourenco D. Indirect predictions with a large number of genotyped animals using the algorithm for proven and young. J Anim Sci 2020; 98:5831156. [PMID: 32374831 PMCID: PMC7263398 DOI: 10.1093/jas/skaa154] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 04/30/2020] [Indexed: 11/21/2022] Open
Abstract
Reliable single-nucleotide polymorphisms (SNP) effects from genomic best linear unbiased prediction BLUP (GBLUP) and single-step GBLUP (ssGBLUP) are needed to calculate indirect predictions (IP) for young genotyped animals and animals not included in official evaluations. Obtaining reliable SNP effects and IP requires a minimum number of animals and when a large number of genotyped animals are available, the algorithm for proven and young (APY) may be needed. Thus, the objectives of this study were to evaluate IP with an increasingly larger number of genotyped animals and to determine the minimum number of animals needed to compute reliable SNP effects and IP. Genotypes and phenotypes for birth weight, weaning weight, and postweaning gain were provided by the American Angus Association. The number of animals with phenotypes was more than 3.8 million. Genotyped animals were assigned to three cumulative year-classes: born until 2013 (N = 114,937), born until 2014 (N = 183,847), and born until 2015 (N = 280,506). A three-trait model was fitted using the APY algorithm with 19,021 core animals under two scenarios: 1) core 2013 (random sample of animals born until 2013) used for all year-classes and 2) core 2014 (random sample of animals born until 2014) used for year-class 2014 and core 2015 (random sample of animals born until 2015) used for year-class 2015. GBLUP used phenotypes from genotyped animals only, whereas ssGBLUP used all available phenotypes. SNP effects were predicted using genomic estimated breeding values (GEBV) from either all genotyped animals or only core animals. The correlations between GEBV from GBLUP and IP obtained using SNP effects from core 2013 were ≥0.99 for animals born in 2013 but as low as 0.07 for animals born in 2014 and 2015. Conversely, the correlations between GEBV from ssGBLUP and IP were ≥0.99 for animals born in all years. IP predictive abilities computed with GEBV from ssGBLUP and SNP predictions based on only core animals were as high as those based on all genotyped animals. The correlations between GEBV and IP from ssGBLUP were ≥0.76, ≥0.90, and ≥0.98 when SNP effects were computed using 2k, 5k, and 15k core animals. Suitable IP based on GEBV from GBLUP can be obtained when SNP predictions are based on an appropriate number of core animals, but a considerable decline in IP accuracy can occur in subsequent years. Conversely, IP from ssGBLUP based on large numbers of phenotypes from non-genotyped animals have persistent accuracy over time.
Collapse
Affiliation(s)
- Andre L S Garcia
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Yutaka Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Shogo Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | | | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| |
Collapse
|
16
|
Lourenco D, Legarra A, Tsuruta S, Masuda Y, Aguilar I, Misztal I. Single-Step Genomic Evaluations from Theory to Practice: Using SNP Chips and Sequence Data in BLUPF90. Genes (Basel) 2020; 11:E790. [PMID: 32674271 PMCID: PMC7397237 DOI: 10.3390/genes11070790] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 07/03/2020] [Accepted: 07/06/2020] [Indexed: 11/16/2022] Open
Abstract
Single-step genomic evaluation became a standard procedure in livestock breeding, and the main reason is the ability to combine all pedigree, phenotypes, and genotypes available into one single evaluation, without the need of post-analysis processing. Therefore, the incorporation of data on genotyped and non-genotyped animals in this method is straightforward. Since 2009, two main implementations of single-step were proposed. One is called single-step genomic best linear unbiased prediction (ssGBLUP) and uses single nucleotide polymorphism (SNP) to construct the genomic relationship matrix; the other is the single-step Bayesian regression (ssBR), which is a marker effect model. Under the same assumptions, both models are equivalent. In this review, we focus solely on ssGBLUP. The implementation of ssGBLUP into the BLUPF90 software suite was done in 2009, and since then, several changes were made to make ssGBLUP flexible to any model, number of traits, number of phenotypes, and number of genotyped animals. Single-step GBLUP from the BLUPF90 software suite has been used for genomic evaluations worldwide. In this review, we will show theoretical developments and numerical examples of ssGBLUP using SNP data from regular chips to sequence data.
Collapse
Affiliation(s)
- Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| | - Andres Legarra
- Institut National de la Recherche Agronomique, UMR1388 GenPhySE, 31326 Castanet Tolosan, France;
| | - Shogo Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| | - Yutaka Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| | - Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria (INIA), 11500 Montevideo, Uruguay;
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| |
Collapse
|
17
|
Aldridge MN, Vandenplas J, Bergsma R, Calus MPL. Variance estimates are similar using pedigree or genomic relationships with or without the use of metafounders or the algorithm for proven and young animals1. J Anim Sci 2020; 98:5709619. [PMID: 31955195 PMCID: PMC7053865 DOI: 10.1093/jas/skaa019] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 01/17/2020] [Indexed: 01/03/2023] Open
Abstract
With an increase in the number of animals genotyped there has been a shift from using pedigree relationship matrices (A) to genomic ones. As the use of genomic relationship matrices (G) has increased, new methods to build or approximate G have developed. We investigated whether the way variance components are estimated should reflect these changes. We estimated variance components for maternal sow traits by solving with restricted maximum likelihood, with four methods of calculating the inverse of the relationship matrix. These methods included using just the inverse of A (A−1), combining A−1 and the direct inverse of G (HDIRECT−1), including metafounders (HMETA−1), or combining A−1 with an approximated inverse of G using the algorithm for proven and young animals (HAPY−1). There was a tendency for higher additive genetic variances and lower permanent environmental variances estimated with A−1 compared with the three H−1 methods, which supports that G−1 is better than A−1 at separating genetic and permanent environmental components, due to a better definition of the actual relationships between animals. There were limited or no differences in variance estimates between HDIRECT−1, HMETA−1, and HAPY−1. Importantly, there was limited differences in variance components, repeatability or heritability estimates between methods. Heritabilities ranged between <0.01 to 0.04 for stayability after second cycle, and farrowing rate, between 0.08 and 0.15 for litter weight variation, maximum cycle number, total number born, total number still born, and prolonged interval between weaning and first insemination, and between 0.39 and 0.44 for litter birth weight and gestation length. The limited differences in heritabilities suggest that there would be very limited changes to estimated breeding values or ranking of animals across models using the different sets of variance components. It is suggested that variance estimates continue to be made using A−1, however including G−1 is possibly more appropriate if refining the model, for traits that fit a permanent environmental effect.
Collapse
Affiliation(s)
- Michael N Aldridge
- Wageningen University and Research, Animal Breeding and Genomics, Wageningen, the Netherlands
| | - Jérémie Vandenplas
- Wageningen University and Research, Animal Breeding and Genomics, Wageningen, the Netherlands
| | - Rob Bergsma
- Topigs Norsvin, AA Beuningen, the Netherlands
| | - Mario P L Calus
- Wageningen University and Research, Animal Breeding and Genomics, Wageningen, the Netherlands
| |
Collapse
|
18
|
Misztal I, Lourenco D, Legarra A. Current status of genomic evaluation. J Anim Sci 2020; 98:skaa101. [PMID: 32267923 PMCID: PMC7183352 DOI: 10.1093/jas/skaa101] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 04/07/2020] [Indexed: 12/14/2022] Open
Abstract
Early application of genomic selection relied on SNP estimation with phenotypes or de-regressed proofs (DRP). Chips of 50k SNP seemed sufficient for an accurate estimation of SNP effects. Genomic estimated breeding values (GEBV) were composed of an index with parent average, direct genomic value, and deduction of a parental index to eliminate double counting. Use of SNP selection or weighting increased accuracy with small data sets but had minimal to no impact with large data sets. Efforts to include potentially causative SNP derived from sequence data or high-density chips showed limited or no gain in accuracy. After the implementation of genomic selection, EBV by BLUP became biased because of genomic preselection and DRP computed based on EBV required adjustments, and the creation of DRP for females is hard and subject to double counting. Genomic selection was greatly simplified by single-step genomic BLUP (ssGBLUP). This method based on combining genomic and pedigree relationships automatically creates an index with all sources of information, can use any combination of male and female genotypes, and accounts for preselection. To avoid biases, especially under strong selection, ssGBLUP requires that pedigree and genomic relationships are compatible. Because the inversion of the genomic relationship matrix (G) becomes costly with more than 100k genotyped animals, large data computations in ssGBLUP were solved by exploiting limited dimensionality of genomic data due to limited effective population size. With such dimensionality ranging from 4k in chickens to about 15k in cattle, the inverse of G can be created directly (e.g., by the algorithm for proven and young) at a linear cost. Due to its simplicity and accuracy, ssGBLUP is routinely used for genomic selection by the major chicken, pig, and beef industries. Single step can be used to derive SNP effects for indirect prediction and for genome-wide association studies, including computations of the P-values. Alternative single-step formulations exist that use SNP effects for genotyped or for all animals. Although genomics is the new standard in breeding and genetics, there are still some problems that need to be solved. This involves new validation procedures that are unaffected by selection, parameter estimation that accounts for all the genomic data used in selection, and strategies to address reduction in genetic variances after genomic selection was implemented.
Collapse
Affiliation(s)
- Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Andres Legarra
- Department of Animal Genetics, Institut National de la Recherche Agronomique, Castanet-Tolosan, France
| |
Collapse
|
19
|
Abdalla EEA, Schenkel FS, Emamgholi Begli H, Willems OW, van As P, Vanderhout R, Wood BJ, Baes CF. Single-Step Methodology for Genomic Evaluation in Turkeys ( Meleagris gallopavo). Front Genet 2019; 10:1248. [PMID: 31921294 PMCID: PMC6934134 DOI: 10.3389/fgene.2019.01248] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 11/13/2019] [Indexed: 11/13/2022] Open
Abstract
Genomic information can contribute significantly to the increase in accuracy of genetic predictions compared to using pedigree relationships alone. The main objective of this study was to compare the prediction ability of pedigree-based best linear unbiased prediction (PBLUP) and single-step genomic BLUP (ssGBLUP) models. Turkey records of feed conversion ratio, residual feed intake, body weight, breast meat yield, and walking ability were provided by Hybrid Turkeys, Kitchener, Canada. This data was analyzed using pedigree-based and single-step genomic models. The genomic relationship matrix was calculated either using observed allele frequencies, all allele frequencies equal to 0.5 or with a different scaling. To avoid potential problems with inversion, three different weighting factors were applied to combine the genomic and pedigree matrices. Across the studied traits, ssGBLUP had higher heritability estimates and significantly outperformed PBLUP in terms of accuracy. Walking ability was genetically negatively correlated to body weight and breast meat yield; however, it was not correlated to feed conversion ratio (FCR) or residual feed intake (RFI). Body weight showed a moderate positive genetic correlation to feed conversion ratio, residual feed intake and breast meat yield. Feed conversion ratio was strongly correlated to residual feed intake (0.68 ± 0.06). There was almost no genetic correlation between breast meat yield and feed efficiency traits. Larger differences in accuracy between PBLUP and ssGBLUP were observed for traits with lower heritability. Results of the three weighting factors showed only slight differences and an increase in accuracy of prediction compared to PBLUP. Slightly different levels of bias were observed across the models, but were higher among the traits; BMY was the only trait that had a regression coefficient higher than 1 (1.38 to 1.41). We show that incorporating genomic information increases the prediction accuracy for preselection of young candidate turkeys for the five traits investigated. Single-step genomic prediction showed substantially higher accuracy estimates than the pedigree-based model, and only slight differences in bias were observed across the alternate models.
Collapse
Affiliation(s)
- Emhimad E A Abdalla
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada
| | - Flavio S Schenkel
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada
| | | | - Owen W Willems
- School of Veterinary Science, University of Queensland, Gatton, QLD, Australia
| | - Pieter van As
- Hendrix Genetics Research Technology & Service B.V., Boxmeer, Netherlands
| | - Ryley Vanderhout
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada
| | - Benjamin J Wood
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada.,School of Veterinary Science, University of Queensland, Gatton, QLD, Australia.,Hybrid Turkeys, Kitchener, ON, Canada
| | - Christine F Baes
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada.,Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| |
Collapse
|
20
|
Westhues M, Heuer C, Thaller G, Fernando R, Melchinger AE. Efficient genetic value prediction using incomplete omics data. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:1211-1222. [PMID: 30656353 DOI: 10.1007/s00122-018-03273-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Accepted: 12/21/2018] [Indexed: 05/05/2023]
Abstract
Covering a subset of individuals with a quantitative predictor, while imputing records for all others using pedigree or genomic data, could improve the precision of predictions while controlling for costs. Predicting genetic values with high accuracy is pivotal for effective candidate selection in animal and plant breeding. Novel 'omics'-based predictors have been shown to improve upon established genome-based predictions of important complex traits but require laborious and expensive assays. As a consequence, there are various datasets with full genetic marker coverage of all studied individuals but incomplete coverage with other 'omics' data. In animal breeding, single-step prediction was introduced to efficiently combine pedigree information, collected on a large number of animals, with genomic information, collected on a smaller subset of animals, for breeding value estimation without bias. Using two maize datasets of inbred lines and hybrids, we show that the single-step framework facilitates imputing transcriptomic data, boosting forecasts when their predictive ability exceeds that of pedigree or genomic data. Our results suggest that covering only a subset of inbred lines with 'omics' predictors and imputing all others using pedigree or genomic data could enable breeders to improve trait predictions while keeping costs under control. Employing 'omics' predictors could particularly improve candidate selection in hybrid breeding because the success of forecasts is a strongly convex function of predictive ability.
Collapse
Affiliation(s)
- Matthias Westhues
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70599, Stuttgart, Germany
| | - Claas Heuer
- Institute of Animal Breeding and Husbandry, Christian-Albrechts-University Kiel, 24098, Kiel, Germany
- Inguran, LLC dba STGenetics, 22575 SH6 South, Navasota, TX, 77868, USA
| | - Georg Thaller
- Institute of Animal Breeding and Husbandry, Christian-Albrechts-University Kiel, 24098, Kiel, Germany
| | - Rohan Fernando
- Department of Animal Science, Iowa State University, Ames, IA, 50011, USA
| | - Albrecht E Melchinger
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70599, Stuttgart, Germany.
| |
Collapse
|
21
|
Gonzalez-Peña D, Vukasinovic N, Brooker J, Przybyla C, DeNise S. Genomic evaluation for calf wellness traits in Holstein cattle. J Dairy Sci 2019; 102:2319-2329. [DOI: 10.3168/jds.2018-15540] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 11/12/2018] [Indexed: 11/19/2022]
|
22
|
Nilforooshan MA, Lee M. The quality of the algorithm for proven and young with various sets of core animals in a multibreed sheep population1. J Anim Sci 2019; 97:1090-1100. [PMID: 30624671 DOI: 10.1093/jas/skz010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 01/07/2019] [Indexed: 11/12/2022] Open
Abstract
The inverses of the pedigree and genomic relationship matrices (A, G) are required for single-step GBLUP (ssGBLUP). While, inverting A is possible for millions of animals at a linear cost, inverting G has a cubic cost and feasible for at most 150,000 animals, using the current conventional algorithms. The algorithm for proven and young (APY) provides approximations of the regular ssGBLUP by splitting genotyped animals into core and noncore groups, with computational costs being cubic for core and linear for noncore animals. The data consisted of 9,406,096 animals in the pedigree, 6,243,753 weaning weight phenotypes, and 46,949 genotyped animals from 5 breeds, composites, and animals with missing breed information from New Zealand. Aiming to find a core sample for a multibreed sheep population that can provide evaluations similar to those from the regular ssGBLUP, different core types, and core sizes were studied. Core types random, composite, oldest, youngest, the most inbred animals in G (GINB), and in A (AINB) were studied in 5K, 10K, and 20K core sizes (K = 1,000). Romney core was studied in 5K and 10K, and Coopworth-Perendale core was studied in 5K. Correlation and regression coefficient (slope) between GEBV from the non-APY and the APY analyses, as indicators for consistency with non-APY and bias from non-APY, showed a large impact of APY on noncore and a small impact on nongenotyped animals. Breed-based 5K cores resulted in large bias from non-APY even for nongenotyped animals. Random and GINB at 20K core size resulted in the highest consistency with non-APY and the lowest bias from non-APY. However, GINB did not perform as well as Random at lower core sizes. The number of animals from a breed in the core sample was very important for the evaluation of that breed. We observed that cores without Texel or Highlander animals resulted in poor evaluations for those breeds. Solving the mixed model equations, within core type, the smallest core size, and within core size, Random core converged in the least number of iterations. However, APY per se did not necessarily reduce the solving time. Random cores performed the best, as they could give a good coverage on the generations and breeds, representative for the genotyped population. Core size 20K performed better than 5K and 10K, and the optimum core size was found to be 18.8K, according to the eigenvalue decomposition of G.
Collapse
Affiliation(s)
| | - Michael Lee
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
| |
Collapse
|
23
|
Chang LY, Toghiani S, Aggrey SE, Rekaya R. Increasing accuracy of genomic selection in presence of high density marker panels through the prioritization of relevant polymorphisms. BMC Genet 2019; 20:21. [PMID: 30795734 PMCID: PMC6387489 DOI: 10.1186/s12863-019-0720-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 02/04/2019] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND It becomes clear that the increase in the density of marker panels and even the use of sequence data didn't result in any meaningful increase in the accuracy of genomic selection (GS) using either regression (RM) or variance component (VC) approaches. This is in part due to the limitations of current methods. Association model are well over-parameterized and suffer from severe co-linearity and lack of statistical power. Even when the variant effects are not directly estimated using VC based approaches, the genomic relationships didn't improve after the marker density exceeded a certain threshold. SNP prioritization-based fixation index (FST) scores were used to track the majority of significant QTL and to reduce the dimensionality of the association model. RESULTS Two populations with average LD between adjacent markers of 0.3 (P1) and 0.7 (P2) were simulated. In both populations, the genomic data consisted of 400 K SNP markers distributed on 10 chromosomes. The density of simulated genomic data mimics roughly 1.2 million SNP markers in the bovine genome. The genomic relationship matrix (G) was calculated for each set of selected SNPs based on their FST score and similar numbers of SNPs were selected randomly for comparison. Using all 400 K SNPs, 46% of the off-diagonal elements (OD) were between - 0.01 and 0.01. The same portion was 31, 23 and 16% when 80 K, 40 K and 20 K SNPs were selected based on FST scores. For randomly selected 20 K SNP subsets, around 33% of the OD fell within the same range. Genomic similarity computed using SNPs selected based on FST scores was always higher than using the same number of SNPs selected randomly. Maximum accuracies of 0.741 and 0.828 were achieved when 20 and 10 K SNPs were selected based on FST scores in P1 and P2, respectively. CONCLUSIONS Genomic similarity could be maximized by the decrease in the number of selected SNPs, but it also leads to a decrease in the percentage of genetic variation explained by the selected markers. Finding the balance between these two parameters could optimize the accuracy of GS in the presence of high density marker panels.
Collapse
Affiliation(s)
- Ling-Yun Chang
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA. .,ABS Global, Inc., DeForest, WI, 53532, USA.
| | - Sajjad Toghiani
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.,USDA Agricultural Research Service, Fort Keogh Livestock and Range Research Laboratory, Miles City, MT, 59301, USA
| | - Samuel E Aggrey
- Department of Poultry Science, University of Georgia, Athens, GA, 30602, USA.,Institute of Bioinformatics, University of Georgia, Athens, GA, 30602, USA
| | - Romdhane Rekaya
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.,Institute of Bioinformatics, University of Georgia, Athens, GA, 30602, USA
| |
Collapse
|
24
|
Lee S, Dang C, Choy Y, Do C, Cho K, Kim J, Kim Y, Lee J. Comparison of genome-wide association and genomic prediction methods for milk production traits in Korean Holstein cattle. ASIAN-AUSTRALASIAN JOURNAL OF ANIMAL SCIENCES 2019; 32:913-921. [PMID: 30744323 PMCID: PMC6601072 DOI: 10.5713/ajas.18.0847] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Accepted: 01/11/2019] [Indexed: 11/27/2022]
Abstract
OBJECTIVE The objectives of this study were to compare identified informative regions through two genome-wide association study (GWAS) approaches and determine the accuracy and bias of the direct genomic value (DGV) for milk production traits in Korean Holstein cattle, using two genomic prediction approaches: single-step genomic best linear unbiased prediction (ss-GBLUP) and Bayesian Bayes-B. METHODS Records on production traits such as adjusted 305-day milk (MY305), fat (FY305), and protein (PY305) yields were collected from 265,271 first parity cows. After quality control, 50,765 single-nucleotide polymorphic genotypes were available for analysis. In GWAS for ss-GBLUP (ssGWAS) and Bayes-B (BayesGWAS), the proportion of genetic variance for each 1-Mb genomic window was calculated and used to identify informative genomic regions. Accuracy of the DGV was estimated by a five-fold cross-validation with random clustering. As a measure of accuracy for DGV, we also assessed the correlation between DGV and deregressed-estimated breeding value (DEBV). The bias of DGV for each method was obtained by determining regression coefficients. RESULTS A total of nine and five significant windows (1 Mb) were identified for MY305 using ssGWAS and BayesGWAS, respectively. Using ssGWAS and BayesGWAS, we also detected multiple significant regions for FY305 (12 and 7) and PY305 (14 and 2), respectively. Both single-step DGV and Bayes DGV also showed somewhat moderate accuracy ranges for MY305 (0.32 to 0.34), FY305 (0.37 to 0.39), and PY305 (0.35 to 0.36) traits, respectively. The mean biases of DGVs determined using the single-step and Bayesian methods were 1.50±0.21 and 1.18±0.26 for MY305, 1.75±0.33 and 1.14±0.20 for FY305, and 1.59±0.20 and 1.14±0.15 for PY305, respectively. CONCLUSION From the bias perspective, we believe that genomic selection based on the application of Bayesian approaches would be more suitable than application of ss-GBLUP in Korean Holstein populations.
Collapse
Affiliation(s)
- SeokHyun Lee
- Animal Breeding and Genetics Division, National Institute of Animal Science, RDA, Cheonan 31000, Korea
| | - ChangGwon Dang
- Animal Breeding and Genetics Division, National Institute of Animal Science, RDA, Cheonan 31000, Korea
| | - YunHo Choy
- Animal Breeding and Genetics Division, National Institute of Animal Science, RDA, Cheonan 31000, Korea
| | - ChangHee Do
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Kwanghyun Cho
- Department of Dairy Science, Korea National College of Agriculture and Fisheries, Jeonju 54874, Korea
| | - Jongjoo Kim
- Division of Applied Life Science, Yeungnam University, Gyeongsan 38541, Korea
| | - Yousam Kim
- Division of Applied Life Science, Yeungnam University, Gyeongsan 38541, Korea
| | - Jungjae Lee
- Jun P&C Institute, INC., Yongin 16950, Korea
| |
Collapse
|
25
|
Fragomeni B, Masuda Y, Bradford HL, Lourenco DAL, Misztal I. International bull evaluations by genomic BLUP with a prediction population. J Dairy Sci 2019; 102:2330-2335. [PMID: 30639016 DOI: 10.3168/jds.2018-15554] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 11/08/2018] [Indexed: 11/19/2022]
Abstract
The purpose of this study was to determine whether multi-country genomic evaluation can be accomplished by multiple-trait genomic best linear unbiased predictor (GBLUP) without sharing genotypes of important animals. Phenotypes and genotypes with 40k SNP were simulated for 25,000 animals, each with 4 traits assuming the same genetic variance and 0.8 genetic correlations. The population was split into 4 subpopulations corresponding to 4 countries, one for each trait. Additionally, a prediction population was created from genotyped animals that were not present in the individual countries but were related to each country's population. Genomic estimated breeding values were computed for each country and subsequently converted to SNP effects. Phenotypes were reconstructed for the prediction population based on the SNP effects of a country and the prediction animals' genotypes. The prediction population was used as the basis for the international evaluation, enabling bull comparisons without sharing genotypes and only sharing SNP effects. The computations were such that SNP effects computed within-country or in the prediction population were the same. Genomic estimated breeding values were calculated by single-trait GBLUP for within-country and multiple-trait GBLUP for multi-country predictions. The true accuracy for the prediction population with reconstructed phenotypes was at most 0.02 less than the accuracy with the original data. The differences increased when countries were assumed unequally sized. However, accuracies by multiple-trait GBLUP with the prediction population were always greater than accuracies from any single within-country prediction. Multi-country genomic evaluations by multiple-trait GBLUP are possible without using original genotypes at a cost of lower accuracy compared with explicitly combining countries' data.
Collapse
Affiliation(s)
- B Fragomeni
- Department of Animal Science, University of Connecticut, Storrs 06269; Department of Animal and Dairy Science, University of Georgia, Athens 30602.
| | - Y Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - H L Bradford
- Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg 24061
| | - D A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - I Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| |
Collapse
|
26
|
Gao H, Koivula M, Jensen J, Strandén I, Madsen P, Pitkänen T, Aamand G, Mäntysaari E. Short communication: Genomic prediction using different single-step methods in the Finnish red dairy cattle population. J Dairy Sci 2018; 101:10082-10088. [DOI: 10.3168/jds.2018-14913] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Accepted: 07/09/2018] [Indexed: 12/21/2022]
|
27
|
Howard JT, Rathje TA, Bruns CE, Wilson-Wells DF, Kachman SD, Spangler ML. The impact of truncating data on the predictive ability for single-step genomic best linear unbiased prediction. J Anim Breed Genet 2018; 135:251-262. [PMID: 29882604 DOI: 10.1111/jbg.12334] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Revised: 04/08/2018] [Accepted: 04/25/2018] [Indexed: 11/29/2022]
Abstract
Simulated and swine industry data sets were utilized to assess the impact of removing older data on the predictive ability of selection candidate estimated breeding values (EBV) when using single-step genomic best linear unbiased prediction (ssGBLUP). Simulated data included thirty replicates designed to mimic the structure of swine data sets. For the simulated data, varying amounts of data were truncated based on the number of ancestral generations back from the selection candidates. The swine data sets consisted of phenotypic and genotypic records for three traits across two breeds on animals born from 2003 to 2017. Phenotypes and genotypes were iteratively removed 1 year at a time based on the year an animal was born. For the swine data sets, correlations between corrected phenotypes (Cp) and EBV were used to evaluate the predictive ability on young animals born in 2016-2017. In the simulated data set, keeping data two generations back or greater resulted in no statistical difference (p-value > 0.05) in the reduction in the true breeding value at generation 15 compared to utilizing all available data. Across swine data sets, removing phenotypes from animals born prior to 2011 resulted in a negligible or a slight numerical increase in the correlation between Cp and EBV. Truncating data is a method to alleviate computational issues without negatively impacting the predictive ability of selection candidate EBV.
Collapse
Affiliation(s)
- Jeremy T Howard
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, Nebraska
| | | | | | | | - Stephen D Kachman
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - Matthew L Spangler
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, Nebraska
| |
Collapse
|
28
|
Vandenplas J, Calus MPL, Ten Napel J. Sparse single-step genomic BLUP in crossbreeding schemes. J Anim Sci 2018; 96:2060-2073. [PMID: 29873759 DOI: 10.1093/jas/sky136] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 04/16/2018] [Indexed: 12/20/2022] Open
Abstract
The algorithm for proven and young animals (APY) efficiently computes an approximated inverse of the genomic relationship matrix, by dividing genotyped animals in the so-called core and noncore animals. The APY leads to computationally feasible single-step genomic Best Linear Unbiased Prediction (ssGBLUP) with a large number of genotyped animals and was successfully applied to real single-breed or line datasets. This study aimed to assess the quality of genomic estimated breeding values (GEBV) when using the APY (GEBVAPY), in comparison to GEBV when using the directly inverted genomic relationship matrix (GEBVDIRECT), for situations based on crossbreeding schemes, including F1 and F2 crosses, such as the ones for pigs and chickens. Based on simulations of a 3-way crossbreeding program, we compared different approximated inverses of a genomic relationship matrix, by varying the size and the composition of the core group. We showed that GEBVAPY were accurate approximations of GEBVDIRECT for multivariate ssGBLUP involving different breeds and their crosses. GEBVAPY as accurate as GEBVDIRECT were obtained when the core groups included animals from different breed compositions and when the core groups had a size between the numbers of the largest eigenvalues explaining 98% and 99% of the variation in the raw genomic relationship matrix.
Collapse
Affiliation(s)
- Jérémie Vandenplas
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, AH Wageningen, The Netherlands
| | - Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, AH Wageningen, The Netherlands
| | - Jan Ten Napel
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, AH Wageningen, The Netherlands
| |
Collapse
|
29
|
Masuda Y, VanRaden P, Misztal I, Lawlor T. Differing genetic trend estimates from traditional and genomic evaluations of genotyped animals as evidence of preselection bias in US Holsteins. J Dairy Sci 2018; 101:5194-5206. [DOI: 10.3168/jds.2017-13310] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Accepted: 12/10/2017] [Indexed: 11/19/2022]
|
30
|
Koivula M, Strandén I, Aamand G, Mäntysaari E. Reducing bias in the dairy cattle single-step genomic evaluation by ignoring bulls without progeny. J Anim Breed Genet 2018; 135:107-115. [DOI: 10.1111/jbg.12318] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 01/22/2018] [Indexed: 12/26/2022]
Affiliation(s)
- M. Koivula
- Natural Resources Institute Finland (Luke); Green Technology; Jokioinen Finland
| | - I. Strandén
- Natural Resources Institute Finland (Luke); Green Technology; Jokioinen Finland
| | - G.P. Aamand
- NAV Nordic Cattle Genetic Evaluation; Aarhus N Denmark
| | - E.A. Mäntysaari
- Natural Resources Institute Finland (Luke); Green Technology; Jokioinen Finland
| |
Collapse
|
31
|
Pocrnic I, Lourenco DAL, Bradford HL, Chen CY, Misztal I. Technical note: Impact of pedigree depth on convergence of single-step genomic BLUP in a purebred swine population. J Anim Sci 2018; 95:3391-3395. [PMID: 28805917 DOI: 10.2527/jas.2017.1581] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In genomic evaluations, it is desirable to have low computing cost while retaining high accuracy of evaluation for young animals. When the population is large but only few animals have phenotypes, especially for low heritability traits, the convergence rate of BLUP or single-step genomic BLUP (ssGBLUP) can be very slow. This study investigates the effect of pedigree truncation on convergence rate and solutions of ssGBLUP for data exhibiting slow convergence. The data consisted of 216,000, 221,000, 732,000, and 579,000 phenotypes on 4 traits. Heritabilities were less than 0.1 for 2 traits and greater than 0.2 for the other 2 traits. The full pedigree consisted of 2.4 million animals. Genotypes were available for 33,000 animals and consisted of 60,000 SNP. Two bivariate animal models were fit using pedigree-based BLUP or ssGBLUP. Either a regular or the algorithm for proven and young (APY) inverse was used for the genomic relationship matrix. Different pedigree depths were analyzed including full pedigree and 1 to 5 ancestral generations. Pedigree depths were defined as n ancestral generations for animals with phenotypes. The number of animals in the reduced pedigrees varied from 226,000 and 760,000 for 1 generation to 228,000 and 767,000 for 5 generations. Genomic EBV (GEBV) for genotyped animals had correlations greater than 0.99 between runs with the full and reduced pedigrees with 2 to 5 generations. A single generation of pedigree was not sufficient to obtain the same GEBV as full pedigree. The convergence rate was the worst with the full pedigree and generally improved with reduced pedigrees. Using ssGBLUP with the APY inverse improved convergence without affecting accuracy. Reducing pedigrees and the APY are important tools to reduce the computational cost in the implementation of ssGBLUP.
Collapse
|
32
|
Mäntysaari EA, Evans RD, Strandén I. Efficient single-step genomic evaluation for a multibreed beef cattle population having many genotyped animals. J Anim Sci 2017; 95:4728-4737. [PMID: 29293736 PMCID: PMC6292282 DOI: 10.2527/jas2017.1912] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 09/10/2017] [Indexed: 01/04/2023] Open
Abstract
An equivalent computational approach called ssGTBLUP was formulated for the original single-step GBLUP (ssGBLUP). In ssGTBLUP, the genomic relationship matrix has the form = ' + , where the (centered and scaled) marker matrix has size x (numbers of genotypes and markers), and the matrix can be easily inverted. The inverse can be written as = - ' where is an by matrix. When the preconditioned conjugate gradient (PCG) method is used to solve the mixed model equations, a matrix vector product needs to be computed. In ssGBLUP, this requires multiplications, but in ssGTBLUP, the product ' has 2 multiplications and has multiplications with the constant independent of or . In an approximate approach called ssGTBLUP(p), the eigendecomposition of ' is used to reduce the number of rows in the matrix. Here, p is the percentage of total variance explained by the accepted eigenvalues. The objective of this study was to compare the performance of ssGBLUP, ssGTBLUP, ssGTBLUP(p), and the APY (algorithm for proven and young) method. In APY, the core had 50,000 (APY50K), 30,000 (APY30K), or 10,000 (APY10K) animals. The approaches were tested on the Irish beef carcass conformation genetic evaluation which has a heterogeneous multibreed population. The pedigree had 13.3 million animals. There were = 54,620 markers available from = 163,277 genotyped animals. For genotyped animals, the correlations of breeding values between ssGBLUP and ssGTBLUP(p) for the 11 traits in the model ranged from 0.999-1.000 for p = 99, 0.998-1.000 for p = 98, and 0.992-0.998 for p = 95 but were 0.994-1.000 for APY50K, 0.969-0.997 for APY30K, and 0.899-0.967 for APY10K. Computing times per iteration were 4.43, 3.30, 2.69, 2.29, 1.55, 1.76, 1.27, and 0.55 min for ssGBLUP, ssGTBLUP, ssGTBLUP(99), ssGTBLUP(98), ssGTBLUP(95), APY50K, APY30K, and APY10K, respectively. The ssGTBLUP(p) approach allowed a well-defined approximation to ssGBLUP and fast computations.
Collapse
Affiliation(s)
- E. A. Mäntysaari
- Natural Resources Institute Finland (Luke), Green Technology, FI-31600 Jokioinen, Finland
| | - R. D. Evans
- Irish Cattle Breeding Federation, Highfield House, Newcestown Road, Bandon, Cork, Ireland
| | - I. Strandén
- Natural Resources Institute Finland (Luke), Green Technology, FI-31600 Jokioinen, Finland
| |
Collapse
|
33
|
Fragomeni BO, Lourenco DAL, Masuda Y, Legarra A, Misztal I. Incorporation of causative quantitative trait nucleotides in single-step GBLUP. Genet Sel Evol 2017; 49:59. [PMID: 28747171 PMCID: PMC5530494 DOI: 10.1186/s12711-017-0335-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Accepted: 07/17/2017] [Indexed: 11/23/2022] Open
Abstract
Background Much effort is put into identifying causative quantitative trait nucleotides (QTN) in animal breeding, empowered by the availability of dense single nucleotide polymorphism (SNP) information. Genomic selection using traditional SNP information is easily implemented for any number of genotyped individuals using single-step genomic best linear unbiased predictor (ssGBLUP) with the algorithm for proven and young (APY). Our aim was to investigate whether ssGBLUP is useful for genomic prediction when some or all QTN are known. Methods Simulations included 180,000 animals across 11 generations. Phenotypes were available for all animals in generations 6 to 10. Genotypes for 60,000 SNPs across 10 chromosomes were available for 29,000 individuals. The genetic variance was fully accounted for by 100 or 1000 biallelic QTN. Raw genomic relationship matrices (GRM) were computed from (a) unweighted SNPs, (b) unweighted SNPs and causative QTN, (c) SNPs and causative QTN weighted with results obtained with genome-wide association studies, (d) unweighted SNPs and causative QTN with simulated weights, (e) only unweighted causative QTN, (f–h) as in (b–d) but using only the top 10% causative QTN, and (i) using only causative QTN with simulated weight. Predictions were computed by pedigree-based BLUP (PBLUP) and ssGBLUP. Raw GRM were blended with 1 or 5% of the numerator relationship matrix, or 1% of the identity matrix. Inverses of GRM were obtained directly or with APY. Results Accuracy of breeding values for 5000 genotyped animals in the last generation with PBLUP was 0.32, and for ssGBLUP it increased to 0.49 with an unweighted GRM, 0.53 after adding unweighted QTN, 0.63 when QTN weights were estimated, and 0.89 when QTN weights were based on true effects known from the simulation. When the GRM was constructed from causative QTN only, accuracy was 0.95 and 0.99 with blending at 5 and 1%, respectively. Accuracies simulating 1000 QTN were generally lower, with a similar trend. Accuracies using the APY inverse were equal or higher than those with a regular inverse. Conclusions Single-step GBLUP can account for causative QTN via a weighted GRM. Accuracy gains are maximum when variances of causative QTN are known and blending is at 1%.
Collapse
Affiliation(s)
- Breno O Fragomeni
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA.
| | - Daniela A L Lourenco
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Yutaka Masuda
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Andres Legarra
- GenPhySE, INRA, INPT, INP-ENVT, Université de Toulouse, 31326, Castanet-Tolosan, France
| | - Ignacy Misztal
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA
| |
Collapse
|
34
|
Strandén I, Matilainen K, Aamand G, Mäntysaari E. Solving efficiently large single-step genomic best linear unbiased prediction models. J Anim Breed Genet 2017; 134:264-274. [DOI: 10.1111/jbg.12257] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Accepted: 01/15/2017] [Indexed: 12/24/2022]
Affiliation(s)
- I. Strandén
- Natural Resources Institute Finland (Luke); Green Technology; Biometrical Genetics; Jokioinen Finland
| | - K. Matilainen
- Natural Resources Institute Finland (Luke); Green Technology; Biometrical Genetics; Jokioinen Finland
| | - G.P. Aamand
- NAV Nordic Cattle Genetic Evaluation; Aarhus Denmark
| | - E.A. Mäntysaari
- Natural Resources Institute Finland (Luke); Green Technology; Biometrical Genetics; Jokioinen Finland
| |
Collapse
|
35
|
Bradford HL, Pocrnić I, Fragomeni BO, Lourenco DAL, Misztal I. Selection of core animals in the Algorithm for Proven and Young using a simulation model. J Anim Breed Genet 2017; 134:545-552. [PMID: 28464315 DOI: 10.1111/jbg.12276] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 03/25/2017] [Indexed: 12/01/2022]
Abstract
The Algorithm for Proven and Young (APY) enables the implementation of single-step genomic BLUP (ssGBLUP) in large, genotyped populations by separating genotyped animals into core and non-core subsets and creating a computationally efficient inverse for the genomic relationship matrix (G). As APY became the choice for large-scale genomic evaluations in BLUP-based methods, a common question is how to choose the animals in the core subset. We compared several core definitions to answer this question. Simulations comprised a moderately heritable trait for 95,010 animals and 50,000 genotypes for animals across five generations. Genotypes consisted of 25,500 SNP distributed across 15 chromosomes. Genotyping errors and missing pedigree were also mimicked. Core animals were defined based on individual generations, equal representation across generations, and at random. For a sufficiently large core size, core definitions had the same accuracies and biases, even if the core animals had imperfect genotypes. When genotyped animals had unknown parents, accuracy and bias were significantly better (p ≤ .05) for random and across generation core definitions.
Collapse
Affiliation(s)
- H L Bradford
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - I Pocrnić
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - B O Fragomeni
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - D A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - I Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| |
Collapse
|
36
|
Masuda Y, Misztal I, Legarra A, Tsuruta S, Lourenco DAL, Fragomeni BO, Aguilar I. Technical note: Avoiding the direct inversion of the numerator relationship matrix for genotyped animals in single-step genomic best linear unbiased prediction solved with the preconditioned conjugate gradient. J Anim Sci 2017; 95:49-52. [PMID: 28177357 DOI: 10.2527/jas.2016.0699] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This paper evaluates an efficient implementation to multiply the inverse of a numerator relationship matrix for genotyped animals () by a vector (). The computation is required for solving mixed model equations in single-step genomic BLUP (ssGBLUP) with the preconditioned conjugate gradient (PCG). The inverse can be decomposed into sparse matrices that are blocks of the sparse inverse of a numerator relationship matrix () including genotyped animals and their ancestors. The elements of were rapidly calculated with the Henderson's rule and stored as sparse matrices in memory. Implementation of was by a series of sparse matrix-vector multiplications. Diagonal elements of , which were required as preconditioners in PCG, were approximated with a Monte Carlo method using 1,000 samples. The efficient implementation of was compared with explicit inversion of with 3 data sets including about 15,000, 81,000, and 570,000 genotyped animals selected from populations with 213,000, 8.2 million, and 10.7 million pedigree animals, respectively. The explicit inversion required 1.8 GB, 49 GB, and 2,415 GB (estimated) of memory, respectively, and 42 s, 56 min, and 13.5 d (estimated), respectively, for the computations. The efficient implementation required <1 MB, 2.9 GB, and 2.3 GB of memory, respectively, and <1 sec, 3 min, and 5 min, respectively, for setting up. Only <1 sec was required for the multiplication in each PCG iteration for any data sets. When the equations in ssGBLUP are solved with the PCG algorithm, is no longer a limiting factor in the computations.
Collapse
|
37
|
Vukasinovic N, Bacciu N, Przybyla C, Boddhireddy P, DeNise S. Development of genetic and genomic evaluation for wellness traits in US Holstein cows. J Dairy Sci 2017; 100:428-438. [DOI: 10.3168/jds.2016-11520] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Accepted: 08/28/2016] [Indexed: 11/19/2022]
|
38
|
|
39
|
Pocrnic I, Lourenco DAL, Masuda Y, Misztal I. Dimensionality of genomic information and performance of the Algorithm for Proven and Young for different livestock species. Genet Sel Evol 2016; 48:82. [PMID: 27799053 PMCID: PMC5088690 DOI: 10.1186/s12711-016-0261-6] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 10/25/2016] [Indexed: 12/19/2022] Open
Abstract
Background A genomic relationship matrix (GRM) can be inverted efficiently with the Algorithm for Proven and Young (APY) through recursion on a small number of core animals. The number of core animals is theoretically linked to effective population size (Ne). In a simulation study, the optimal number of core animals was equal to the number of largest eigenvalues of GRM that explained 98% of its variation. The purpose of this study was to find the optimal number of core animals and estimate Ne for different species. Methods Datasets included phenotypes, pedigrees, and genotypes for populations of Holstein, Jersey, and Angus cattle, pigs, and broiler chickens. The number of genotyped animals varied from 15,000 for broiler chickens to 77,000 for Holsteins, and the number of single-nucleotide polymorphisms used for genomic prediction varied from 37,000 to 61,000. Eigenvalue decomposition of the GRM for each population determined numbers of largest eigenvalues corresponding to 90, 95, 98, and 99% of variation. Results The number of eigenvalues corresponding to 90% (98%) of variation was 4527 (14,026) for Holstein, 3325 (11,500) for Jersey, 3654 (10,605) for Angus, 1239 (4103) for pig, and 1655 (4171) for broiler chicken. Each trait in each species was analyzed using the APY inverse of the GRM with randomly selected core animals, and their number was equal to the number of largest eigenvalues. Realized accuracies peaked with the number of core animals corresponding to 98% of variation for Holstein and Jersey and closer to 99% for other breed/species. Ne was estimated based on comparisons of eigenvalue decomposition in a simulation study. Assuming a genome length of 30 Morgan, Ne was equal to 149 for Holsteins, 101 for Jerseys, 113 for Angus, 32 for pigs, and 44 for broilers. Conclusions Eigenvalue profiles of GRM for common species are similar to those in simulation studies although they are affected by number of genotyped animals and genotyping quality. For all investigated species, the APY required less than 15,000 core animals. Realized accuracies were equal or greater with the APY inverse than with regular inversion. Eigenvalue analysis of GRM can provide a realistic estimate of Ne.
Collapse
Affiliation(s)
- Ivan Pocrnic
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.
| | - Daniela A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| | - Yutaka Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| |
Collapse
|
40
|
Fernando RL, Cheng H, Garrick DJ. An efficient exact method to obtain GBLUP and single-step GBLUP when the genomic relationship matrix is singular. Genet Sel Evol 2016; 48:80. [PMID: 27788669 PMCID: PMC5082134 DOI: 10.1186/s12711-016-0260-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 10/20/2016] [Indexed: 01/08/2023] Open
Abstract
Background The mixed linear model employed for genomic best linear unbiased prediction (GBLUP) includes the breeding value for each animal as a random effect that has a mean of zero and a covariance matrix proportional to the genomic relationship matrix (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg), where the inverse of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg is required to set up the usual mixed model equations (MME). When only some animals have genomic information, genomic predictions can be obtained by an extension known as single-step GBLUP, where the covariance matrix of breeding values is constructed by combining the pedigree-based additive relationship matrix with \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg. The inverse of the combined relationship matrix can be obtained efficiently, provided \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg can be inverted. In some livestock species, however, the number \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$N_{g}$$\end{document}Ng of animals with genomic information exceeds the number of marker covariates used to compute \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg, and this results in a singular \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg. For such a case, an efficient and exact method to obtain GBLUP and single-step GBLUP is presented here. Results Exact methods are already available to obtain GBLUP when \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg is singular, but these require working with large dense matrices. Another approach is to modify \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg to make it nonsingular by adding a small value to all its diagonals or regressing it towards the pedigree-based relationship matrix. This, however, results in the inverse of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg being dense and difficult to compute as \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$N_{g}$$\end{document}Ng grows. The approach presented here recognizes that the number r of linearly independent genomic breeding values cannot exceed the number of marker covariates, and the mixed linear model used here for genomic prediction only fits these r linearly independent breeding values as random effects. Conclusions The exact method presented here was compared to Apy-GBLUP and to Apy single-step GBLUP, both of which are approximate methods that use a modified \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg that has a sparse inverse which can be computed efficiently. In a small numerical example, predictions from the exact approach and Apy were almost identical, but the MME from Apy had a condition number about 1000 times larger than that from the exact approach, indicating ill-conditioning of the MME from Apy. The practical application of exact SSGBLUP is not more difficult than implementation of Apy. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0260-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rohan L Fernando
- Department of Animal Science, Iowa State University, Ames, IA, 50011, USA.
| | - Hao Cheng
- Department of Animal Science, Iowa State University, Ames, IA, 50011, USA
| | - Dorian J Garrick
- Department of Animal Science, Iowa State University, Ames, IA, 50011, USA.,Institute of Veterinary, Animal and Biomedical Sciences, Massey University, Palmerston North, New Zealand
| |
Collapse
|
41
|
Ostersen T, Christensen OF, Madsen P, Henryon M. Sparse single-step method for genomic evaluation in pigs. Genet Sel Evol 2016; 48:48. [PMID: 27357825 PMCID: PMC4926299 DOI: 10.1186/s12711-016-0227-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Accepted: 06/17/2016] [Indexed: 11/10/2022] Open
Abstract
Background In many animal breeding programs, with the increasing number of genotyped animals, estimation of genomic breeding values by the single-step method is becoming limited by excessive computing requirements. A recently proposed algorithm for proven and young animals (APY) is an approximation that reduces computing time drastically by dividing genotyped animals into core and non-core animals, with only computations for core animals being time-consuming. We hypothesized that choosing core animals based on representing all generations, minimizing the relatedness within the core group, or maximizing the number of genotyped offspring, would result in greater accuracies of estimated breeding values (EBV). Methods We compared eight different core groups for the three pig breeds DanAvl Duroc, DanAvl Landrace and DanAvl Yorkshire. These eight sparse approximations of the single-step method were evaluated based on correlations of EBV for genotyped animals obtained from the sparse methods with those obtained from the usual version of the single-step method. We used a single-trait model with daily gain as trait. Results For core groups that distributed animals across generations, correlations for genotyped animals (from 0.977 to 0.989) were higher than for those that did not distribute core animals across generations (from 0.934 to 0.956). For core groups that maximized the number of genotyped offspring, correlations for genotyped animals (from 0.983 to 0.989) were higher than for other core groups (from 0.934 to 0.981). There was no clear association between low relatedness within the core group and accuracy of approximations. Conclusions We found that for core groups that represent all generations and that maximize the number of genotyped offspring, accurate approximations of EBV were obtained. However, we did not find a clear association between accuracy and relatedness within the core group. For the APY method, this is the first study that reports systematic criteria for the creation of core groups that result in more accurate EBV than a similar-sized random core group. Random core groups only ensure across-generation representation. Therefore, we recommend choosing a core group that represents all generations and that maximizes the number of genotyped offspring for single-step genomic evaluation using the APY method.
Collapse
Affiliation(s)
- Tage Ostersen
- SEGES Pig Research Centre, Axeltorv 3, 1609, Copenhagen V, Denmark.
| | - Ole F Christensen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, P.O. Box 50, 8830, Tjele, Denmark
| | - Per Madsen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, P.O. Box 50, 8830, Tjele, Denmark
| | - Mark Henryon
- SEGES Pig Research Centre, Axeltorv 3, 1609, Copenhagen V, Denmark.,School of Animal Biology, University of Western Australia, 35 Stirling Highway, Crawley, 6009, WA, Australia
| |
Collapse
|
42
|
Lourenco DAL, Tsuruta S, Fragomeni BO, Masuda Y, Aguilar I, Legarra A, Bertrand JK, Amen TS, Wang L, Moser DW, Misztal I. Genetic evaluation using single-step genomic best linear unbiased predictor in American Angus. J Anim Sci 2016; 93:2653-62. [PMID: 26115253 DOI: 10.2527/jas.2014-8836] [Citation(s) in RCA: 109] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Predictive ability of genomic EBV when using single-step genomic BLUP (ssGBLUP) in Angus cattle was investigated. Over 6 million records were available on birth weight (BiW) and weaning weight (WW), almost 3.4 million on postweaning gain (PWG), and over 1.3 million on calving ease (CE). Genomic information was available on, at most, 51,883 animals, which included high and low EBV accuracy animals. Traditional EBV was computed by BLUP and genomic EBV by ssGBLUP and indirect prediction based on SNP effects was derived from ssGBLUP; SNP effects were calculated based on the following reference populations: ref_2k (contains top bulls and top cows that had an EBV accuracy for BiW ≥0.85), ref_8k (contains all parents that were genotyped), and ref_33k (contains all genotyped animals born up to 2012). Indirect prediction was obtained as direct genomic value (DGV) or as an index of DGV and parent average (PA). Additionally, runs with ssGBLUP used the inverse of the genomic relationship matrix calculated by an algorithm for proven and young animals (APY) that uses recursions on a small subset of reference animals. An extra reference subset included 3,872 genotyped parents of genotyped animals (ref_4k). Cross-validation was used to assess predictive ability on a validation population of 18,721 animals born in 2013. Computations for growth traits used multiple-trait linear model and, for CE, a bivariate CE-BiW threshold-linear model. With BLUP, predictivities were 0.29, 0.34, 0.23, and 0.12 for BiW, WW, PWG, and CE, respectively. With ssGBLUP and ref_2k, predictivities were 0.34, 0.35, 0.27, and 0.13 for BiW, WW, PWG, and CE, respectively, and with ssGBLUP and ref_33k, predictivities were 0.39, 0.38, 0.29, and 0.13 for BiW, WW, PWG, and CE, respectively. Low predictivity for CE was due to low incidence rate of difficult calving. Indirect predictions with ref_33k were as accurate as with full ssGBLUP. Using the APY and recursions on ref_4k gave 88% gains of full ssGBLUP and using the APY and recursions on ref_8k gave 97% gains of full ssGBLUP. Genomic evaluation in beef cattle with ssGBLUP is feasible while keeping the models (maternal, multiple trait, and threshold) already used in regular BLUP. Gains in predictivity are dependent on the composition of the reference population. Indirect predictions via SNP effects derived from ssGBLUP allow for accurate genomic predictions on young animals, with no advantage of including PA in the index if the reference population is large. With the APY conditioning on about 10,000 reference animals, ssGBLUP is potentially applicable to a large number of genotyped animals without compromising predictive ability.
Collapse
|
43
|
The Dimensionality of Genomic Information and Its Effect on Genomic Prediction. Genetics 2016; 203:573-81. [PMID: 26944916 PMCID: PMC4858800 DOI: 10.1534/genetics.116.187013] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Accepted: 02/29/2016] [Indexed: 01/19/2023] Open
Abstract
The genomic relationship matrix (GRM) can be inverted by the algorithm for proven and young (APY) based on recursion on a random subset of animals. While a regular inverse has a cubic cost, the cost of the APY inverse can be close to linear. Theory for the APY assumes that the optimal size of the subset (maximizing accuracy of genomic predictions) is due to a limited dimensionality of the GRM, which is a function of the effective population size (Ne). The objective of this study was to evaluate these assumptions by simulation. Six populations were simulated with approximate effective population size (Ne) from 20 to 200. Each population consisted of 10 nonoverlapping generations, with 25,000 animals per generation and phenotypes available for generations 1–9. The last 3 generations were fully genotyped assuming genome length L = 30. The GRM was constructed for each population and analyzed for distribution of eigenvalues. Genomic estimated breeding values (GEBV) were computed by single-step GBLUP, using either a direct or an APY inverse of GRM. The sizes of the subset in APY were set to the number of the largest eigenvalues explaining x% of variation (EIGx, x = 90, 95, 98, 99) in GRM. Accuracies of GEBV for the last generation with the APY inverse peaked at EIG98 and were slightly lower with EIG95, EIG99, or the direct inverse. Most information in the GRM is contained in ∼NeL largest eigenvalues, with no information beyond 4NeL. Genomic predictions with the APY inverse of the GRM are more accurate than by the regular inverse.
Collapse
|
44
|
Masuda Y, Misztal I, Tsuruta S, Legarra A, Aguilar I, Lourenco DAL, Fragomeni BO, Lawlor TJ. Implementation of genomic recursions in single-step genomic best linear unbiased predictor for US Holsteins with a large number of genotyped animals. J Dairy Sci 2016; 99:1968-1974. [PMID: 26805987 DOI: 10.3168/jds.2015-10540] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 12/01/2015] [Indexed: 11/19/2022]
Abstract
The objectives of this study were to develop and evaluate an efficient implementation in the computation of the inverse of genomic relationship matrix with the recursion algorithm, called the algorithm for proven and young (APY), in single-step genomic BLUP. We validated genomic predictions for young bulls with more than 500,000 genotyped animals in final score for US Holsteins. Phenotypic data included 11,626,576 final scores on 7,093,380 US Holstein cows, and genotypes were available for 569,404 animals. Daughter deviations for young bulls with no classified daughters in 2009, but at least 30 classified daughters in 2014 were computed using all the phenotypic data. Genomic predictions for the same bulls were calculated with single-step genomic BLUP using phenotypes up to 2009. We calculated the inverse of the genomic relationship matrix GAPY(-1) based on a direct inversion of genomic relationship matrix on a small subset of genotyped animals (core animals) and extended that information to noncore animals by recursion. We tested several sets of core animals including 9,406 bulls with at least 1 classified daughter, 9,406 bulls and 1,052 classified dams of bulls, 9,406 bulls and 7,422 classified cows, and random samples of 5,000 to 30,000 animals. Validation reliability was assessed by the coefficient of determination from regression of daughter deviation on genomic predictions for the predicted young bulls. The reliabilities were 0.39 with 5,000 randomly chosen core animals, 0.45 with the 9,406 bulls, and 7,422 cows as core animals, and 0.44 with the remaining sets. With phenotypes truncated in 2009 and the preconditioned conjugate gradient to solve mixed model equations, the number of rounds to convergence for core animals defined by bulls was 1,343; defined by bulls and cows, 2,066; and defined by 10,000 random animals, at most 1,629. With complete phenotype data, the number of rounds decreased to 858, 1,299, and at most 1,092, respectively. Setting up GAPY(-1) for 569,404 genotyped animals with 10,000 core animals took 1.3h and 57 GB of memory. The validation reliability with APY reaches a plateau when the number of core animals is at least 10,000. Predictions with APY have little differences in reliability among definitions of core animals. Single-step genomic BLUP with APY is applicable to millions of genotyped animals.
Collapse
Affiliation(s)
- Y Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens 30602.
| | - I Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - S Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - A Legarra
- Institut National de la Recherche Agronomique, UMR1388 GenPhySE, 31326 Castanet Tolosan, France
| | - I Aguilar
- Instituto Nacional de Investigación Agropecuaria, Canelones, Uruguay 90200
| | - D A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - B O Fragomeni
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - T J Lawlor
- Holstein Association USA Inc., Brattleboro, VT 05301
| |
Collapse
|
45
|
VanRaden PM. Practical implications for genetic modeling in the genomics era. J Dairy Sci 2016; 99:2405-2412. [PMID: 26778313 DOI: 10.3168/jds.2015-10038] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 11/16/2015] [Indexed: 11/19/2022]
Abstract
Genetic models convert data into estimated breeding values and other information useful to breeders. The goal is to provide accurate and timely predictions of the future performance for each animal (or embryo). Modeling involves defining traits, editing raw data, removing environmental effects, including genetic by environmental interactions and correlations among traits, and accounting for nonadditive inheritance or nonnormal distributions. Data include phenotypes and pedigrees during the last century and genotypes within the last decade. The genomic data can include single nucleotide polymorphisms, quantitative trait loci, insertions, deletions, and haplotypes. Subsets must be selected to reduce computation because total numbers of variants that can be imputed have increased rapidly from thousands to millions. Current computation using 60,671 markers takes just a few days. Nonlinear models can account for the nonnormal distribution of genomic effects, but reliability is usually better than that of linear models only for traits influenced by major genes. Numbers of genotyped animals have also increased rapidly in the joint North American database from a few thousand in 2009 to over 1 million in 2015. Most are young females and will contribute to estimating allele effects in the future, but only about 150,000 have phenotypes so far. Genomic preselection can bias traditional animal models because Mendelian sampling of phenotyped progeny and mates is no longer expected to average zero; however, estimates of bias are small in current US data. Single-step models that combine pedigree and genomic relationships can account for preselection, but approximations are required for affordable computation. Traditional animal models may include all breeds and crossbreds, but most genomic evaluations are still computed within breed. Models that include inbreeding, heterosis, dominance, and interactions can improve predictions for individual matings. Multitrait genomic models may be preferred for traits with many missing records or when foreign records are included as pseudo-observations, but most countries use multitrait traditional evaluations followed by single-trait genomic evaluations. Genomic reliabilities are about 70% for the more heritable traits. Researchers must choose from many available models and explain how the models work so that breeders can more confidently apply the predictions in their selection programs.
Collapse
Affiliation(s)
- P M VanRaden
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350.
| |
Collapse
|
46
|
Meuwissen T, Hayes B, Goddard M. Genomic selection: A paradigm shift in animal breeding. Anim Front 2016. [DOI: 10.2527/af.2016-0002] [Citation(s) in RCA: 223] [Impact Index Per Article: 27.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Affiliation(s)
| | - Ben Hayes
- Department of Economic Development, Jobs, Transport and Resources and Dairy Futures Cooperative Research Centre, Agribio, 5 Ring Road, Bundoora, VIC 3083, Australia; School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Mike Goddard
- Department of Economic Development, Jobs, Transport and Resources and Dairy Futures Cooperative Research Centre, Agribio, 5 Ring Road, Bundoora, VIC 3083, Australia; Faculty of veterinary and agricultural sciences, University of Melbourne, Parkville, Australia
| |
Collapse
|
47
|
Inexpensive Computation of the Inverse of the Genomic Relationship Matrix in Populations with Small Effective Population Size. Genetics 2015; 202:401-9. [PMID: 26584903 PMCID: PMC4788224 DOI: 10.1534/genetics.115.182089] [Citation(s) in RCA: 89] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 11/15/2015] [Indexed: 11/18/2022] Open
Abstract
Many computations with SNP data including genomic evaluation, parameter estimation, and genome-wide association studies use an inverse of the genomic relationship matrix. The cost of a regular inversion is cubic and is prohibitively expensive for large matrices. Recent studies in cattle demonstrated that the inverse can be computed in almost linear time by recursion on any subset of ∼10,000 individuals. The purpose of this study is to present a theory of why such a recursion works and its implication for other populations. Assume that, because of a small effective population size, the additive information in a genotyped population has a small dimensionality, even with a very large number of SNP markers. That dimensionality is visible as a limited number of effective SNP effects, independent chromosome segments, or the rank of the genomic relationship matrix. Decompose a population arbitrarily into core and noncore individuals, with the number of core individuals equal to that dimensionality. Then, breeding values of noncore individuals can be derived by recursions on breeding values of core individuals, with coefficients of the recursion computed from the genomic relationship matrix. A resulting algorithm for the inversion called "algorithm for proven and young" (APY) has a linear computing and memory cost for noncore animals. Noninfinitesimal genetic architecture can be accommodated through a trait-specific genomic relationship matrix, possibly derived from Bayesian regressions. For populations with small effective population size, the inverse of the genomic relationship matrix can be computed inexpensively for a very large number of genotyped individuals.
Collapse
|
48
|
Tempelman RJ. Statistical and Computational Challenges in Whole Genome Prediction and Genome-Wide Association Analyses for Plant and Animal Breeding. JOURNAL OF AGRICULTURAL, BIOLOGICAL, AND ENVIRONMENTAL STATISTICS 2015. [DOI: 10.1007/s13253-015-0225-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|