1
|
Baller JL, Kachman SD, Kuehn LA, Spangler ML. Using pooled data for genomic prediction in a bivariate framework with missing data. J Anim Breed Genet 2022; 139:489-501. [PMID: 35698863 PMCID: PMC9544112 DOI: 10.1111/jbg.12727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 05/21/2022] [Indexed: 11/29/2022]
Abstract
Pooling samples to derive group genotypes can enable the economically efficient use of commercial animals within genetic evaluations. To test a multivariate framework for genetic evaluations using pooled data, simulation was used to mimic a beef cattle population including two moderately heritable traits with varying genetic correlations, genotypes and pedigree data. There were 15 generations (n = 32,000; random selection and mating), and the last generation was subjected to genotyping through pooling. Missing records were induced in two ways: (a) sequential culling and (b) random missing records. Gaps in genotyping were also explored whereby genotyping occurred through generation 13 or 14. Pools of 1, 20, 50 and 100 animals were constructed randomly or by minimizing phenotypic variation. The EBV was estimated using a bivariate single-step genomic best linear unbiased prediction model. Pools of 20 animals constructed by minimizing phenotypic variation generally led to accuracies that were not different than using individual progeny data. Gaps in genotyping led to significantly different EBV accuracies (p < .05) for sires and dams born in the generation nearest the pools. Pooling of any size generally led to larger accuracies than no information from generation 15 regardless of the way missing records arose, the percentage of records available or the genetic correlation. Pooling to aid in the use of commercial data in genetic evaluations can be utilized in multivariate cases with varying relationships between the traits and in the presence of systematic and randomly missing phenotypes.
Collapse
Affiliation(s)
- Johnna L Baller
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Stephen D Kachman
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Larry A Kuehn
- USDA, ARS, U.S. Meat Animal Research Center, Clay Center, Nebraska, USA
| | - Matthew L Spangler
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| |
Collapse
|
2
|
Baller JL, Kachman SD, Kuehn LA, Spangler ML. Genomic prediction using pooled data in a single-step genomic best linear unbiased prediction framework. J Anim Sci 2020; 98:5851497. [PMID: 32497209 DOI: 10.1093/jas/skaa184] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 06/01/2020] [Indexed: 01/16/2023] Open
Abstract
Economically relevant traits are routinely collected within the commercial segments of the beef industry but are rarely included in genetic evaluations because of unknown pedigrees. Individual relationships could be resurrected with genomics, but this would be costly; therefore, pooling DNA and phenotypic data provide a cost-effective solution. Pedigree, phenotypic, and genomic data were simulated for a beef cattle population consisting of 15 generations. Genotypes mimicked a 50k marker panel (841 quantitative trait loci were located across the genome, approximately once per 3 Mb) and the phenotype was moderately heritable. Individuals from generation 15 were included in pools (observed genotype and phenotype were mean values of a group). Estimated breeding values (EBV) were generated from a single-step genomic best linear unbiased prediction model. The effects of pooling strategy (random and minimizing or uniformly maximizing phenotypic variation within pools), pool size (1, 2, 10, 20, 50, 100, or no data from generation 15), and generational gaps of genotyping on EBV accuracy (correlation of EBV with true breeding values) were quantified. Greatest EBV accuracies of sires and dams were observed when there was no gap between genotyped parents and pooled offspring. The EBV accuracies resulting from pools were usually greater than no data from generation 15 regardless of sire or dam genotyping. Minimizing phenotypic variation increased EBV accuracy by 8% and 9% over random pooling and uniformly maximizing phenotypic variation, respectively. A pool size of 2 was the only scenario that did not significantly decrease EBV accuracy compared with individual data when pools were formed randomly or by uniformly maximizing phenotypic variation (P > 0.05). Pool sizes of 2, 10, 20, or 50 did not generally lead to statistical differences in EBV accuracy than individual data when pools were constructed to minimize phenotypic variation (P > 0.05). Largest numerical increases in EBV accuracy resulting from pooling compared with no data from generation 15 were seen with sires with prior low EBV accuracy (those born in generation 14). Pooling of any size led to larger EBV accuracies of the pools than individual data when minimizing phenotypic variation. Resulting EBV for the pools could be used to inform management decisions of those pools. Pooled genotyping to garner commercial-level phenotypes for genetic evaluations seems plausible although differences exist depending on pool size and pool formation strategy.
Collapse
Affiliation(s)
- Johnna L Baller
- Department of Animal Science, University of Nebraska, Lincoln, NE
| | | | - Larry A Kuehn
- USDA, ARS, U.S. Meat Animal Research Center, Clay Center, NE
| | | |
Collapse
|
3
|
Maltecca C, Tiezzi F, Cole JB, Baes C. Symposium review: Exploiting homozygosity in the era of genomics-Selection, inbreeding, and mating programs. J Dairy Sci 2020; 103:5302-5313. [PMID: 32331889 DOI: 10.3168/jds.2019-17846] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 02/25/2020] [Indexed: 01/06/2023]
Abstract
The advent of genomic selection paved the way for an unprecedented acceleration in genetic progress. The increased ability to select superior individuals has been coupled with a drastic reduction in the generation interval for most dairy populations, representing both an opportunity and a challenge. Homozygosity is now rapidly accumulating in dairy populations. Currently, inbreeding depression is managed mostly by culling at the farm level and by controlling the overall accumulation of homozygosity at the population level. A better understanding of how homozygosity and recessive load are related will guarantee continued genetic improvement while curtailing the accumulation of harmful recessives and maintaining enough genetic variability to ensure the possibility of selection in the face of changing environmental conditions. In this review, we present a snapshot of the current dairy selection structure as it relates to response to selection and accumulation of homozygosity, briefly outline the main approaches currently used to manage inbreeding and overall variability, and present some approaches that can be used in the short term to control accumulation of harmful recessives while maintaining sustained selection pressure.
Collapse
Affiliation(s)
- C Maltecca
- Animal Science Department, North Carolina State University, Raleigh 27695.
| | - F Tiezzi
- Animal Science Department, North Carolina State University, Raleigh 27695
| | - J B Cole
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705
| | - C Baes
- Centre for Genomic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, N1G 2W1 Guelph, Ontario, Canada; Institute of Genetics, Vetsuisse Faculty, University of Bern, 3012 Bern, Switzerland
| |
Collapse
|
4
|
Sollero BP, Howard JT, Spangler ML. The impact of reducing the frequency of animals genotyped at higher density on imputation and prediction accuracies using ssGBLUP1. J Anim Sci 2019; 97:2780-2792. [PMID: 31115442 DOI: 10.1093/jas/skz147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 04/25/2019] [Indexed: 11/12/2022] Open
Abstract
The largest gains in accuracy in a genomic selection program come from genotyping young selection candidates who have not yet produced progeny and who might, or might not, have a phenotypic record recorded. To reduce genotyping costs and to allow for an increased amount of genomic data to be available in a population, young selection candidates may be genotyped with low-density (LD) panels and imputed to a higher density. However, to ensure that a reasonable imputation accuracy persists overtime, some parent animals originally genotyped at LD must be re-genotyped at a higher density. This study investigated the long-term impact of selectively re-genotyping parents with a medium-density (MD) SNP panel on the accuracy of imputation and on the genetic predictions using ssGBLUP in a simulated beef cattle population. Assuming a moderately heritable trait (0.25) and a population undergoing selection, the simulation generated sequence data for a founder population (100 male and 500 female individuals) and 9,000 neutral markers, considered as the MD panel. All selection candidates from generation 8 to 15 were genotyped with LD panels corresponding to a density of 0.5% (LD_0.5), 2% (LD_2), and 5% (LD_5) of the MD. Re-genotyping scenarios chose parents at random or based on EBV and ranged from 10% of male parents to re-genotyping all male and female parents with MD. Ranges in average imputation accuracy at generation 15 were 0.567 to 0.936, 0.795 to 0.985, and 0.931 to 0.995 for the LD_0.5, LD_2, and LD_5, respectively, and the average EBV accuracies ranged from 0.453 to 0.735, 0.631 to 0.784, and 0.748 to 0.807 for LD_0.5, LD_2, and LD_5, respectively. Re-genotyping parents based on their EBV resulted in higher imputation and EBV accuracies compared to selecting parents at random and these values increased with the size of LD panels. Differences between re-genotyping scenarios decreased when the density of the LD panel increased, suggesting fewer animals needed to be re-genotyped to achieve higher accuracies. In general, imputation and EBV accuracies were greater when more parents were re-genotyped, independent of the proportion of males and females. In practice, the relationship between the density of the LD panel used and the target panel must be considered to determine the number (proportion) of animals that would need to be re-genotyped to enable sufficient imputation accuracy.
Collapse
|
5
|
Baller JL, Howard JT, Kachman SD, Spangler ML. The impact of clustering methods for cross-validation, choice of phenotypes, and genotyping strategies on the accuracy of genomic predictions. J Anim Sci 2019; 97:1534-1549. [PMID: 30721970 DOI: 10.1093/jas/skz055] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Accepted: 02/04/2019] [Indexed: 01/22/2023] Open
Abstract
For genomic predictors to be of use in genetic evaluation, their predicted accuracy must be a reliable indicator of their utility, and thus unbiased. The objective of this paper was to evaluate the accuracy of prediction of genomic breeding values (GBV) using different clustering strategies and response variables. Red Angus genotypes (n = 9,763) were imputed to a reference 50K panel. The influence of clustering method [k-means, k-medoids, principal component (PC) analysis on the numerator relationship matrix (A) and the identical-by-state genomic relationship matrix (G) as both data and covariance matrices, and random] and response variables [deregressed estimated breeding values (DEBV) and adjusted phenotypes] were evaluated for cross-validation. The GBV were estimated using a Bayes C model for all traits. Traits for DEBV included birth weight (BWT), marbling (MARB), rib-eye area (REA), and yearling weight (YWT). Adjusted phenotypes included BWT, YWT, and ultrasonically measured intramuscular fat percentage and REA. Prediction accuracies were estimated using the genetic correlation between GBV and associated response variable using a bivariate animal model. A simulation mimicking a cattle population, replicated 5 times, was conducted to quantify differences between true and estimated accuracies. The simulation used the same clustering methods and response variables, with the addition of 2 genotyping strategies (random and top 25% of individuals), and forward validation. The prediction accuracies were estimated similarly, and true accuracies were estimated as the correlation between the residuals of a bivariate model including true breeding value (TBV) and GBV. Using the adjusted Rand index, random clusters were clearly different from relationship-based clustering methods. In both real and simulated data, random clustering consistently led to the largest estimates of accuracy, while no method was consistently associated with more or less bias than other methods. In simulation, random genotyping led to higher estimated accuracies than selection of the top 25% of individuals. Interestingly, random genotyping seemed to overpredict true accuracy while selective genotyping tended to underpredict accuracy. When forward in time validation was used, DEBV led to less biased estimates of GBV accuracy. Results suggest the highest, least biased GBV accuracies are associated with random genotyping and DEBV.
Collapse
Affiliation(s)
- Johnna L Baller
- Department of Animal Science, University of Nebraska, Lincoln, NE
| | - Jeremy T Howard
- Department of Animal Science, University of Nebraska, Lincoln, NE
| | | | | |
Collapse
|
6
|
Howard JT, Rathje TA, Bruns CE, Wilson-Wells DF, Kachman SD, Spangler ML. The impact of selective genotyping on the response to selection using single-step genomic best linear unbiased prediction. J Anim Sci 2019; 96:4532-4542. [PMID: 30107560 DOI: 10.1093/jas/sky330] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2018] [Accepted: 08/09/2018] [Indexed: 11/14/2022] Open
Abstract
Across the majority livestock species, routinely collected genomic and pedigree information has been incorporated into evaluations using single-step methods. As a result, strategies that reduce genotyping costs without reducing the response to selection are important as they could have substantial economic impacts on breeding programs. Therefore, the objective of the current study was to investigate the impact of selectively genotyping selection candidates on the selection response using simulation. Populations were simulated to mimic the genome and population structure of a swine and cattle population undergoing selection on an index comprised of the estimated breeding values (EBV) for 2 genetically correlated quantitative traits. Ten generations were generated and genotyping began generation 7. Two phenotyping scenarios were simulated that assumed the first trait was recorded early in life on all individuals and the second trait was recorded on all versus a random subset of the individuals. The EBV were generated from a bivariate animal model. Multiple genotyping scenarios were generated that ranged from not genotyping any selection candidates, a proportion of the selection candidates based on either their index value or chosen at random, and genotyping all selection candidates. An interim index value was utilized to decide who to genotype for the selective genotype strategy. The interim value assumed only the first trait was observed and the only genotypic information available was on animals in previous generations. Within each genotyping scenario 25 replicates were generated. Within each genotyping scenario the mean response per generation and the degree to which EBV were inflated/deflated was calculated. Across both species and phenotyping strategies, the plateau of diminishing returns was observed when 60% of the selection candidates with the largest index values were genotyped. When randomly genotyping selection candidates, either 80 or 100% of the selection candidates needed to be genotyped for there not to be a reduction in the index response. Across both populations, no differences in the degree that EBV were inflated/deflated for either trait 1 or 2 were observed between nongenotyped and genotyped animals. The current study has shown that animals can be selectively genotyped in order to optimize the response to selection as a function of the cost to conduct a breeding program using single-step genomic best linear unbiased prediction.
Collapse
Affiliation(s)
- Jeremy T Howard
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE
| | | | | | | | - Stephen D Kachman
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE
| | - Matthew L Spangler
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE
| |
Collapse
|
7
|
Howard JT, Rathje TA, Bruns CE, Wilson-Wells DF, Kachman SD, Spangler ML. The impact of truncating data on the predictive ability for single-step genomic best linear unbiased prediction. J Anim Breed Genet 2018; 135:251-262. [PMID: 29882604 DOI: 10.1111/jbg.12334] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Revised: 04/08/2018] [Accepted: 04/25/2018] [Indexed: 11/29/2022]
Abstract
Simulated and swine industry data sets were utilized to assess the impact of removing older data on the predictive ability of selection candidate estimated breeding values (EBV) when using single-step genomic best linear unbiased prediction (ssGBLUP). Simulated data included thirty replicates designed to mimic the structure of swine data sets. For the simulated data, varying amounts of data were truncated based on the number of ancestral generations back from the selection candidates. The swine data sets consisted of phenotypic and genotypic records for three traits across two breeds on animals born from 2003 to 2017. Phenotypes and genotypes were iteratively removed 1 year at a time based on the year an animal was born. For the swine data sets, correlations between corrected phenotypes (Cp) and EBV were used to evaluate the predictive ability on young animals born in 2016-2017. In the simulated data set, keeping data two generations back or greater resulted in no statistical difference (p-value > 0.05) in the reduction in the true breeding value at generation 15 compared to utilizing all available data. Across swine data sets, removing phenotypes from animals born prior to 2011 resulted in a negligible or a slight numerical increase in the correlation between Cp and EBV. Truncating data is a method to alleviate computational issues without negatively impacting the predictive ability of selection candidate EBV.
Collapse
Affiliation(s)
- Jeremy T Howard
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, Nebraska
| | | | | | | | - Stephen D Kachman
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - Matthew L Spangler
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, Nebraska
| |
Collapse
|
8
|
Howard JT, Tiezzi F, Huang Y, Gray KA, Maltecca C. A heuristic method to identify runs of homozygosity associated with reduced performance in livestock. J Anim Sci 2018; 95:4318-4332. [PMID: 29108032 DOI: 10.2527/jas2017.1664] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Although, for the most part, genome-wide metrics are currently used in managing livestock inbreeding, genomic data offer, in principle, the ability to identify functional inbreeding. Here, we present a heuristic method to identify haplotypes contained within a run of homozygosity (ROH) associated with reduced performance. Results are presented for simulated and swine data. The algorithm comprises 3 steps. Step 1 scans the genome based on marker windows of decreasing size and identifies ROH genotypes associated with an unfavorable phenotype. Within this stage, multiple aggregation steps reduce the haplotype to the smallest possible length. In step 2, the resulting regions are formally tested for significance with the use of a linear mixed model. Lastly, step 3 removes nested windows. The effect of the unfavorable haplotypes identified and their associated haplotype probabilities for a progeny of a given mating pair or an individual can be used to generate an inbreeding load matrix (ILM). Diagonals of ILM characterize the functional individual inbreeding load (IIL). We estimated the accuracy of predicting the phenotype based on IIL. We further compared the significance of the regression coefficient for IIL on phenotypes with genome-wide inbreeding metrics. We tested the algorithm using simulated scenarios (12 scenarios), combining different levels of linkage disequilibrium (LD) and number of loci impacting a quantitative trait. Additionally, we investigated 9 traits from 2 maternal purebred swine lines. In simulated data, as the LD in the population increased, the algorithm identified a greater proportion of the true unfavorable ROH effects. For example, the proportion of highly unfavorable true ROH effects identified rose from 32 to 41% for the low- to the high-LD scenario. In both simulated and real data, the haplotypes identified were contained within a much larger ROH (9.12-12.1 Mb). The IIL prediction accuracy was greater than 0 across all scenarios for simulated data (mean of 0.49 [95% confidence interval 0.47-0.52] for the high-LD scenario) and for nearly all swine traits (mean of 0.17 [SD 0.10]). On average, across simulated and swine data sets, the IIL regression coefficient was more closely related to progeny performance than any genome-wide inbreeding metric. A heuristic method was developed that identified ROH genotypes with reduced performance and characterized the combined effects of ROH genotypes within and across individuals.
Collapse
|