1
|
Meyer K. Reducing computational demands of restricted maximum likelihood estimation with genomic relationship matrices. Genet Sel Evol 2023; 55:7. [PMID: 36698054 PMCID: PMC9875494 DOI: 10.1186/s12711-023-00781-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 01/12/2023] [Indexed: 01/26/2023] Open
Abstract
Restricted maximum likelihood estimation of genetic parameters accounting for genomic relationships has been reported to impose computational burdens which typically are many times higher than those of corresponding analyses considering pedigree based relationships only. This can be attributed to the dense nature of genomic relationship matrices and their inverses. We outline a reparameterisation of the multivariate linear mixed model to principal components and its effects on the sparsity pattern of the pertaining coefficient matrix in the mixed model equations. Using two data sets we demonstrate that this can dramatically reduce the computing time per iterate of the widely used 'average information' algorithm for restricted maximum likelihood. This is primarily due to the fact that on the principal component scale, the first derivatives of the coefficient matrix with respect to the parameters modelling genetic covariances between traits are independent of the relationship matrix between individuals, i.e. are not afflicted by a multitude of genomic relationships.
Collapse
Affiliation(s)
- Karin Meyer
- grid.1020.30000 0004 1936 7371AGBU, A Joint Venture of NSW Department of Primary Industries and University of New England, Armidale, NSW 2351 Australia
| |
Collapse
|
2
|
Junqueira VS, Lourenco D, Masuda Y, Cardoso FF, Lopes PS, Silva FFE, Misztal I. Is single-step genomic REML with the algorithm for proven and young more computationally efficient when less generations of data are present? J Anim Sci 2022; 100:skac082. [PMID: 35289906 PMCID: PMC9118993 DOI: 10.1093/jas/skac082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 03/10/2022] [Indexed: 12/04/2022] Open
Abstract
Efficient computing techniques allow the estimation of variance components for virtually any traditional dataset. When genomic information is available, variance components can be estimated using genomic REML (GREML). If only a portion of the animals have genotypes, single-step GREML (ssGREML) is the method of choice. The genomic relationship matrix (G) used in both cases is dense, limiting computations depending on the number of genotyped animals. The algorithm for proven and young (APY) can be used to create a sparse inverse of G (GAPY~-1) with close to linear memory and computing requirements. In ssGREML, the inverse of the realized relationship matrix (H-1) also includes the inverse of the pedigree relationship matrix, which can be dense with a long pedigree, but sparser with short. The main purpose of this study was to investigate whether costs of ssGREML can be reduced using APY with truncated pedigree and phenotypes. We also investigated the impact of truncation on variance components estimation when different numbers of core animals are used in APY. Simulations included 150K animals from 10 generations, with selection. Phenotypes (h2 = 0.3) were available for all animals in generations 1-9. A total of 30K animals in generations 8 and 9, and 15K validation animals in generation 10 were genotyped for 52,890 SNP. Average information REML and ssGREML with G-1 and GAPY~-1 using 1K, 5K, 9K, and 14K core animals were compared. Variance components are impacted when the core group in APY represents the number of eigenvalues explaining a small fraction of the total variation in G. The most time-consuming operation was the inversion of G, with more than 50% of the total time. Next, numerical factorization consumed nearly 30% of the total computing time. On average, a 7% decrease in the computing time for ordering was observed by removing each generation of data. APY can be successfully applied to create the inverse of the genomic relationship matrix used in ssGREML for estimating variance components. To ensure reliable variance component estimation, it is important to use a core size that corresponds to the number of largest eigenvalues explaining around 98% of total variation in G. When APY is used, pedigrees can be truncated to increase the sparsity of H and slightly reduce computing time for ordering and symbolic factorization, with no impact on the estimates.
Collapse
Affiliation(s)
- Vinícius Silva Junqueira
- Breeding Research Department, Bayer Crop Science, Uberlândia, Minas Gerais, Brazil
- Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Daniela Lourenco
- Department of Dairy and Animal Science, University of Georgia, Athens, GA 30602, USA
| | - Yutaka Masuda
- Department of Dairy and Animal Science, University of Georgia, Athens, GA 30602, USA
| | - Fernando Flores Cardoso
- Empresa Brasileira de Pesquisa Agropecuária (EMBRAPA) Pecuária Sul, Bagé, Rio Grande do Sul, Brasil
| | - Paulo Sávio Lopes
- Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | | | - Ignacy Misztal
- Department of Dairy and Animal Science, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
3
|
Jang S, Lourenco D, Miller S. Inclusion of Sire by Herd interaction effect in the genomic evaluation for weaning weight of American Angus. J Anim Sci 2022; 100:6537149. [PMID: 35213718 PMCID: PMC9030219 DOI: 10.1093/jas/skac057] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 02/23/2022] [Indexed: 11/12/2022] Open
Abstract
A spurious negative genetic correlation between direct and maternal effects of weaning weight (WW) in beef cattle has historically been problematic for researchers and industry. Previous research has suggested the covariance between sires and herds may be contributing to this relationship. The objective of this study was to estimate the variance components (VC) for WW in American Angus with and without sire by herd (S×H) interaction effect when genomic information is used or not. Five subsets of ~100k animals for each subset were used. When genomic information was included, genotypes were added for 15,637 animals. Five replicates were performed. Four different models were tested, namely, M1: without S×H interaction effect and with covariance between direct and maternal effect (σam) ≠ 0; M2: with S×H interaction effect and σam ≠ 0; M3: without S×H interaction effect and with σam = 0; M4: with S×H interaction effect and σam = 0. VC were estimated using the restricted maximum likelihood (REML) and single-step genomic REML (ssGREML) with the average information algorithm. Breeding values were computed using single-step genomic BLUP for the models above and one additional model, which had the covariance zeroed after the estimation of VC (M5). The ability of each model to predict future breeding values was investigated with the linear regression method. Under REML, when the S×H interaction effect was added to the model, both direct and maternal genetic variances were greatly reduced, and the negative covariance became positive (i.e., when moving from M1 to M2). Similar patterns were observed under ssGREML, but with less reduction in the direct and maternal genetic variances and still a negative covariance. Models with the S×H interaction effect (M2 and M4) had a better fit according to the Akaike information criteria. Breeding values from those models were more accurate and had less bias than the other three models. The rankings and breeding values of artificial insemination sires (N = 1,977) greatly changed when the S×H interaction effect was fit in the model. Although the S×H interaction effect accounted for 3% to 5% of the total phenotypic variance and improved the model fit, this change in the evaluation model will cause severe reranking among animals. A spurious negative genetic correlation between direct and maternal effects of weaning weight (WW) in beef cattle has been problematic for researchers and industry. Previous research suggested the covariance between sires and herds may contribute to this relationship. The objective of this study was to estimate the variance components (VC) for WW in American Angus with and without sire by herd (S×H) interaction effect when genomic information is used or not. Four models were designed to investigate the S×H effect. The restricted maximum likelihood (REML) and single-step genomic REML (ssGREML) were used to estimate VC. Breeding values were computed using single-step genomic BLUP and the validation was done through the linear regression method. Under REML, when the S×H was added to the model, both direct and maternal genetic variances were greatly reduced, and the negative covariance became positive. Similar patterns were observed under ssGREML, but with less reduction in the direct and maternal genetic variances and still a negative covariance. Breeding values from models with S×H were more accurate and had less bias than the other models. Although the S×H improved the model, this change in the evaluation model will cause severe reranking among key animals.
Collapse
Affiliation(s)
- Sungbong Jang
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | | |
Collapse
|
4
|
Tsuruta S, Lourenco D, Masuda Y, Lawlor T, Misztal I. Reducing computational cost of large-scale genomic evaluation by using indirect genomic prediction. JDS COMMUNICATIONS 2021; 2:356-360. [PMID: 36337117 PMCID: PMC9623783 DOI: 10.3168/jdsc.2021-0097] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 06/27/2021] [Indexed: 06/10/2023]
Abstract
Over half a million Holsteins are being genotyped annually in the United States. The computational cost of including all genotypes in single-step genomic (ssG)BLUP is high, although it is feasible to conduct large-scale genomic prediction using an efficient algorithm such as APY (algorithm for proven and young). An effective method to further reduce the computing cost could be the use of indirect genomic predictions (IGP) for genotyped animals when they have neither progeny nor phenotypes. These young genotyped animals have no effect on the other genotyped animals and could have their genomic prediction done indirectly. The main objective of this study was to calculate IGP for various groups of genotyped animals and investigate the reduction in computing time as well as bias and accuracy of the IGP. We compared IGP with genomic (G)EBV for 18 linear type traits in US Holsteins, including 2.3 million (M) genotyped animals. The full data set consisted of 10.9M records for 18 linear type traits up to 2018 calving, 13.6M animals in the pedigree, and 2.3M animals genotyped for 79K SNP. For IGP, ssGBLUP included all genotyped animals except those with neither progeny nor phenotypes by year from 2014 to 2018 (i.e., the target animals). The SNP marker effects were computed based on GEBV for genotyped animals that had progeny, or phenotypes, or both. Further, IGP were calculated for target genotyped animals in each year group. For all genotyped animal groups from 2014 to 2018, the coefficients of determination (R2) of a linear regression of GEBV on IGP were 0.960 for males and 0.954 for females for 18 traits on average. To reduce computing costs, the SNP marker effects were calculated based on GEBV from randomly selected genotyped animals from 15K to 60K. By randomly selecting a small number of genotyped animals, the computing time was dramatically reduced. As more genotyped animals were randomly selected to calculate SNP effects, R2 was higher (more accurate) and the regression coefficient was lower (more inflated IGP). In a practical genomic evaluation in US Holsteins, to get sufficient contributions from GEBV, 25K to 35K is a rational number of genotyped animals that can be randomly selected to compute SNP effects and obtain accurate and unbiased IGP. Considering the computing time and both unbiasedness and accuracy of IGP, genomic evaluation can be conducted separately in GEBV for genotyped animals with phenotypes or progeny and in IGP for young genotyped animals. This can be a practical solution when conducting a large-scale genomic evaluation and would enable more frequent evaluation at lower cost, especially when many genotyped animals have neither phenotypes nor progeny.
Collapse
Affiliation(s)
- S. Tsuruta
- Animal and Dairy Science Department, University of Georgia, Athens 30602
| | - D.A.L. Lourenco
- Animal and Dairy Science Department, University of Georgia, Athens 30602
| | - Y. Masuda
- Animal and Dairy Science Department, University of Georgia, Athens 30602
| | - T.J. Lawlor
- Holstein Association USA Inc., Brattleboro, VT 05301
| | - I. Misztal
- Animal and Dairy Science Department, University of Georgia, Athens 30602
| |
Collapse
|
5
|
Misztal I, Aguilar I, Lourenco D, Ma L, Steibel JP, Toro M. Emerging issues in genomic selection. J Anim Sci 2021; 99:skab092. [PMID: 33773494 PMCID: PMC8186541 DOI: 10.1093/jas/skab092] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Accepted: 03/26/2021] [Indexed: 12/22/2022] Open
Abstract
Genomic selection (GS) is now practiced successfully across many species. However, many questions remain, such as long-term effects, estimations of genomic parameters, robustness of genome-wide association study (GWAS) with small and large datasets, and stability of genomic predictions. This study summarizes presentations from the authors at the 2020 American Society of Animal Science (ASAS) symposium. The focus of many studies until now is on linkage disequilibrium between two loci. Ignoring higher-level equilibrium may lead to phantom dominance and epistasis. The Bulmer effect leads to a reduction of the additive variance; however, the selection for increased recombination rate can release anew genetic variance. With genomic information, estimates of genetic parameters may be biased by genomic preselection, but costs of estimation can increase drastically due to the dense form of the genomic information. To make the computation of estimates feasible, genotypes could be retained only for the most important animals, and methods of estimation should use algorithms that can recognize dense blocks in sparse matrices. GWASs using small genomic datasets frequently find many marker-trait associations, whereas studies using much bigger datasets find only a few. Most of the current tools use very simple models for GWAS, possibly causing artifacts. These models are adequate for large datasets where pseudo-phenotypes such as deregressed proofs indirectly account for important effects for traits of interest. Artifacts arising in GWAS with small datasets can be minimized by using data from all animals (whether genotyped or not), realistic models, and methods that account for population structure. Recent developments permit the computation of P-values from genomic best linear unbiased prediction (GBLUP), where models can be arbitrarily complex but restricted to genotyped animals only, and single-step GBLUP that also uses phenotypes from ungenotyped animals. Stability was an important part of nongenomic evaluations, where genetic predictions were stable in the absence of new data even with low prediction accuracies. Unfortunately, genomic evaluations for such animals change because all animals with genotypes are connected. A top-ranked animal can easily drop in the next evaluation, causing a crisis of confidence in genomic evaluations. While correlations between consecutive genomic evaluations are high, outliers can have differences as high as 1 SD. A solution to fluctuating genomic evaluations is to base selection decisions on groups of animals. Although many issues in GS have been solved, many new issues that require additional research continue to surface.
Collapse
Affiliation(s)
- Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| | - Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria (INIA), 90200 Canelones, Uruguay
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA
| | - Juan Pedro Steibel
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
| | - Miguel Toro
- Departamento de Producción Agraria, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
6
|
Cesarani A, Hidalgo J, Garcia A, Degano L, Vicario D, Masuda Y, Misztal I, Lourenco D. Beef trait genetic parameters based on old and recent data and its implications for genomic predictions in Italian Simmental cattle. J Anim Sci 2020; 98:5879002. [PMID: 32730571 DOI: 10.1093/jas/skaa242] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 07/21/2020] [Indexed: 01/24/2023] Open
Abstract
This study aimed to evaluate the changes in variance components over time to identify a subset of data from the Italian Simmental (IS) population that would yield the most appropriate estimates of genetic parameters and breeding values for beef traits to select young bulls. Data from bulls raised between 1986 and 2017 were used to estimate genetic parameters and breeding values for four beef traits (average daily gain [ADG], body size [BS], muscularity [MUS], and feet and legs [FL]). The phenotypic mean increased during the years of the study for ADG, but it decreased for BS, MUS, and FL. The complete dataset (ALL) was divided into four generational subsets (Gen1, Gen2, Gen3, and Gen4). Additionally, ALL was divided into two larger subsets: the first one (OLD) combined data from Gen1 and Gen2 to represent the starting population, and the second one (CUR) combined data from Gen3 and Gen4 to represent a subpopulation with stronger ties to the current population. Genetic parameters were estimated with a four-trait genomic animal model using a single-step genomic average information restricted maximum likelihood algorithm. Heritability estimates from ALL were 0.26 ± 0.03 for ADG, 0.33 ± 0.04 for BS, 0.55 ± 0.03 for MUS, and 0.23 ± 0.03 for FL. Higher heritability estimates were obtained with OLD and ALL than with CUR. Considerable changes in heritability existed between Gen1 and Gen4 due to fluctuations in both additive genetic and residual variances. Genetic correlations also changed over time, with some values moving from positive to negative or even to zero. Genetic correlations from OLD were stronger than those from CUR. Changes in genetic parameters over time indicated that they should be updated regularly to avoid biases in genomic estimated breeding values (GEBV) and low selection accuracies. GEBV estimated using CUR variance components were less biased and more consistent than those estimated with OLD and ALL variance components. Validation results indicated that data from recent generations produced genetic parameters that more appropriately represent the structure of the current population, yielding accurate GEBV to select young animals and increasing the likelihood of higher genetic gains.
Collapse
Affiliation(s)
- Alberto Cesarani
- Department of Animal and Dairy Science, University of Georgia, Athens, GA.,Associazione Nazionale Allevatori Bovini di Razza Pezzata Rossa Italiana, Udine, Italy
| | - Jorge Hidalgo
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Andre Garcia
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Lorenzo Degano
- Associazione Nazionale Allevatori Bovini di Razza Pezzata Rossa Italiana, Udine, Italy
| | - Daniele Vicario
- Associazione Nazionale Allevatori Bovini di Razza Pezzata Rossa Italiana, Udine, Italy
| | - Yutaka Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| |
Collapse
|
7
|
Selle ML, Steinsland I, Powell O, Hickey JM, Gorjanc G. Spatial modelling improves genetic evaluation in smallholder breeding programs. Genet Sel Evol 2020; 52:69. [PMID: 33198636 PMCID: PMC7670695 DOI: 10.1186/s12711-020-00588-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 11/03/2020] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Breeders and geneticists use statistical models to separate genetic and environmental effects on phenotype. A common way to separate these effects is to model a descriptor of an environment, a contemporary group or herd, and account for genetic relationship between animals across environments. However, separating the genetic and environmental effects in smallholder systems is challenging due to small herd sizes and weak genetic connectedness across herds. We hypothesised that accounting for spatial relationships between nearby herds can improve genetic evaluation in smallholder systems. Furthermore, geographically referenced environmental covariates are increasingly available and could model underlying sources of spatial relationships. The objective of this study was therefore, to evaluate the potential of spatial modelling to improve genetic evaluation in dairy cattle smallholder systems. METHODS We performed simulations and real dairy cattle data analysis to test our hypothesis. We modelled environmental variation by estimating herd and spatial effects. Herd effects were considered independent, whereas spatial effects had distance-based covariance between herds. We compared these models using pedigree or genomic data. RESULTS The results show that in smallholder systems (i) standard models do not separate genetic and environmental effects accurately, (ii) spatial modelling increases the accuracy of genetic evaluation for phenotyped and non-phenotyped animals, (iii) environmental covariates do not substantially improve the accuracy of genetic evaluation beyond simple distance-based relationships between herds, (iv) the benefit of spatial modelling was largest when separating the genetic and environmental effects was challenging, and (v) spatial modelling was beneficial when using either pedigree or genomic data. CONCLUSIONS We have demonstrated the potential of spatial modelling to improve genetic evaluation in smallholder systems. This improvement is driven by establishing environmental connectedness between herds, which enhances separation of genetic and environmental effects. We suggest routine spatial modelling in genetic evaluations, particularly for smallholder systems. Spatial modelling could also have a major impact in studies of human and wild populations.
Collapse
Affiliation(s)
- Maria L Selle
- Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway.
| | - Ingelin Steinsland
- Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Owen Powell
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
8
|
Stolpovsky YA, Svishcheva GR, Piskunov AK. Genomic Selection. II. Latest Trends and Future Trajectories. RUSS J GENET+ 2020. [DOI: 10.1134/s1022795420100129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
9
|
Misztal I, Lourenco D, Legarra A. Current status of genomic evaluation. J Anim Sci 2020; 98:skaa101. [PMID: 32267923 PMCID: PMC7183352 DOI: 10.1093/jas/skaa101] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 04/07/2020] [Indexed: 12/14/2022] Open
Abstract
Early application of genomic selection relied on SNP estimation with phenotypes or de-regressed proofs (DRP). Chips of 50k SNP seemed sufficient for an accurate estimation of SNP effects. Genomic estimated breeding values (GEBV) were composed of an index with parent average, direct genomic value, and deduction of a parental index to eliminate double counting. Use of SNP selection or weighting increased accuracy with small data sets but had minimal to no impact with large data sets. Efforts to include potentially causative SNP derived from sequence data or high-density chips showed limited or no gain in accuracy. After the implementation of genomic selection, EBV by BLUP became biased because of genomic preselection and DRP computed based on EBV required adjustments, and the creation of DRP for females is hard and subject to double counting. Genomic selection was greatly simplified by single-step genomic BLUP (ssGBLUP). This method based on combining genomic and pedigree relationships automatically creates an index with all sources of information, can use any combination of male and female genotypes, and accounts for preselection. To avoid biases, especially under strong selection, ssGBLUP requires that pedigree and genomic relationships are compatible. Because the inversion of the genomic relationship matrix (G) becomes costly with more than 100k genotyped animals, large data computations in ssGBLUP were solved by exploiting limited dimensionality of genomic data due to limited effective population size. With such dimensionality ranging from 4k in chickens to about 15k in cattle, the inverse of G can be created directly (e.g., by the algorithm for proven and young) at a linear cost. Due to its simplicity and accuracy, ssGBLUP is routinely used for genomic selection by the major chicken, pig, and beef industries. Single step can be used to derive SNP effects for indirect prediction and for genome-wide association studies, including computations of the P-values. Alternative single-step formulations exist that use SNP effects for genotyped or for all animals. Although genomics is the new standard in breeding and genetics, there are still some problems that need to be solved. This involves new validation procedures that are unaffected by selection, parameter estimation that accounts for all the genomic data used in selection, and strategies to address reduction in genetic variances after genomic selection was implemented.
Collapse
Affiliation(s)
- Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Andres Legarra
- Department of Animal Genetics, Institut National de la Recherche Agronomique, Castanet-Tolosan, France
| |
Collapse
|
10
|
Aguilar I, Fernandez EN, Blasco A, Ravagnolo O, Legarra A. Effects of ignoring inbreeding in model-based accuracy for BLUP and SSGBLUP. J Anim Breed Genet 2020; 137:356-364. [PMID: 32080913 DOI: 10.1111/jbg.12470] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 12/10/2019] [Accepted: 01/11/2020] [Indexed: 11/29/2022]
Abstract
Model-based accuracy, defined as the theoretical correlation between true and estimated breeding value, can be obtained for each individual as a function of its prediction error variance (PEV) and inbreeding coefficient F, in BLUP, GBLUP and SSGBLUP genetic evaluations. However, for computational convenience, inbreeding is often ignored in two places. First, in the computation of reliability = 1-PEV/(1 + F). Second, in the set-up, using Henderson's rules, of the inverse of the pedigree-based relationship matrix A. Both approximations have an effect in the computation of model-based accuracy and result in wrong values. In this work, first we present a reminder of the theory and extend it to SSGBLUP. Second, we quantify the error of ignoring inbreeding with real data in three scenarios: BLUP evaluation and SSGBLUP in Uruguayan dairy cattle, and BLUP evaluations in a line of rabbit closed for >40 generations with steady increase of inbreeding up to an average of 0.30. We show that ignoring inbreeding in the set-up of the A-inverse is equivalent to assume that non-inbred animals are actually inbred. This results in an increase of apparent PEV that is negligible for dairy cattle but considerable for rabbit. Ignoring inbreeding in reliability = 1-PEV/(1 + F) leads to underestimation of reliability for BLUP evaluations, and this underestimation is very large for rabbit. For SSGBLUP in dairy cattle, it leads to both underestimation and overestimation of reliability, both for genotyped and non-genotyped animals. We strongly recommend to include inbreeding both in the set-up of A-inverse and in the computation of reliability from PEVs.
Collapse
Affiliation(s)
- Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria (INIA), Montevideo, Uruguay
| | - Eduardo N Fernandez
- Cátedra de Mejora y Conservación de Recursos Genéticos e Instituto de Investigación sobre Producción Agropecuaria, Ambiente y Salud, Facultad de Ciencias Agrarias, UNLZ, Buenos Aires, Argentina
| | - Agustin Blasco
- Institute for Animal Science and Technology, Universitat Politècnica de València, València, Spain
| | - Olga Ravagnolo
- Instituto Nacional de Investigación Agropecuaria (INIA), Montevideo, Uruguay
| | | |
Collapse
|
11
|
Matilainen K, Mäntysaari EA, Strandén I. Efficient Monte Carlo algorithm for restricted maximum likelihood estimation of genetic parameters. J Anim Breed Genet 2019; 136:252-261. [PMID: 31247679 DOI: 10.1111/jbg.12375] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 11/21/2018] [Accepted: 11/26/2018] [Indexed: 11/30/2022]
Abstract
Monte Carlo (MC) methods have been found useful in estimation of variance parameters for large data and complex models with many variance components (VC), with respect to both computer memory and computing time. A disadvantage has been a fluctuation in round-to-round values of estimates that makes the estimation of convergence challenging. Furthermore, with Newton-type algorithms, the approximate Hessian matrix might have sufficient accuracy, but the inaccuracy in the gradient vector exaggerates the round-to-round fluctuation to intolerable. In this study, the reuse of the same random numbers within each MC sample was used to remove the MC fluctuation. Simulated data with six VC parameters were analysed by four different MC REML methods: expectation-maximization (EM), Newton-Raphson (NR), average information (AI) and Broyden's method (BM). In addition, field data with 96 VC parameters were analysed by MC EM REML. In all the analyses with reused samples, the MC fluctuations disappeared, but the final estimates by the MC REML methods differed from the analytically calculated values more than expected especially when the number of MC samples was small. The difference depended on the random numbers generated, and based on repeated MC AI REML analyses, the VC estimates were on average non-biased. The advantage of reusing MC samples is more apparent in the NR-type algorithms. Smooth convergence opens the possibility to use the fast converging Newton-type algorithms. However, a disadvantage from reusing MC samples is a possible "bias" in the estimates. To attain acceptable accuracy, sufficient number of MC samples need to be generated.
Collapse
Affiliation(s)
| | | | - Ismo Strandén
- Natural Resources Institute Finland (Luke), Jokioinen, Finland
| |
Collapse
|
12
|
Aguilar I, Legarra A, Cardoso F, Masuda Y, Lourenco D, Misztal I. Frequentist p-values for large-scale-single step genome-wide association, with an application to birth weight in American Angus cattle. Genet Sel Evol 2019; 51:28. [PMID: 31221101 PMCID: PMC6584984 DOI: 10.1186/s12711-019-0469-3] [Citation(s) in RCA: 82] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 05/27/2019] [Indexed: 11/14/2022] Open
Abstract
Background Single-step genomic best linear unbiased prediction (SSGBLUP) is a comprehensive method for genomic prediction. Point estimates of marker effects from SSGBLUP are often used for genome-wide association studies (GWAS) without a formal framework of hypothesis testing. Our objective was to implement p-values for single-marker GWAS studies within the single-step GWAS (SSGWAS) framework by deriving computational algorithms and procedures, and by applying these to a large beef cattle population. Methods P-values were obtained based on the prediction error (co)variances for single nucleotide polymorphisms (SNPs), which were obtained from the prediction error (co)variances of genomic predictions based on the inverse of the coefficient matrix and formulas to estimate SNP effects. Results Computation of p-values took a negligible time for a dataset with almost 2 million animals in the pedigree and 1424 genotyped sires, and no inflation of statistics was observed. The SNPs that passed the Bonferroni threshold of 10−5.9 were the same as those that explained the highest proportion of additive genetic variance, but even at the same significance levels and effects, some of them explained less genetic variance due to lower allele frequency. Conclusions The use of a p-value for SSGWAS is a very general and efficient strategy to identify quantitative trait loci (QTL). It can be used for complex datasets such as those used in animal breeding, where only a proportion of the pedigreed animals are genotyped.
Collapse
Affiliation(s)
- Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria (INIA), 90200, Canelones, Uruguay
| | - Andres Legarra
- UMR GenPhySE, INRA Toulouse, BP52626, 31326, Castanet Tolosan, France.
| | - Fernando Cardoso
- Department of Animal Science, Federal University of Pelotas, Rio Grande do Sul, Brazil.,Embrapa Pecuária Sul, Bagé, RS, 96400-031, Brazil
| | - Yutaka Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| |
Collapse
|
13
|
Shor T, Kalka I, Geiger D, Erlich Y, Weissbrod O. Estimating variance components in population scale family trees. PLoS Genet 2019; 15:e1008124. [PMID: 31071088 PMCID: PMC6529016 DOI: 10.1371/journal.pgen.1008124] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 05/21/2019] [Accepted: 04/03/2019] [Indexed: 12/14/2022] Open
Abstract
The rapid digitization of genealogical and medical records enables the assembly of extremely large pedigree records spanning millions of individuals and trillions of pairs of relatives. Such pedigrees provide the opportunity to investigate the sociological and epidemiological history of human populations in scales much larger than previously possible. Linear mixed models (LMMs) are routinely used to analyze extremely large animal and plant pedigrees for the purposes of selective breeding. However, LMMs have not been previously applied to analyze population-scale human family trees. Here, we present Sparse Cholesky factorIzation LMM (Sci-LMM), a modeling framework for studying population-scale family trees that combines techniques from the animal and plant breeding literature and from human genetics literature. The proposed framework can construct a matrix of relationships between trillions of pairs of individuals and fit the corresponding LMM in several hours. We demonstrate the capabilities of Sci-LMM via simulation studies and by estimating the heritability of longevity and of reproductive fitness (quantified via number of children) in a large pedigree spanning millions of individuals and over five centuries of human history. Sci-LMM provides a unified framework for investigating the epidemiological history of human populations via genealogical records.
Collapse
Affiliation(s)
- Tal Shor
- Computer Science Department, Technion—Israel Institute of Technology, Haifa, Israel
- MyHeritage Ltd., Or Yehuda, Israel
| | - Iris Kalka
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Dan Geiger
- Computer Science Department, Technion—Israel Institute of Technology, Haifa, Israel
| | - Yaniv Erlich
- MyHeritage Ltd., Or Yehuda, Israel
- The New York Genome Center, New York, NY, United States of America
- Department of Computer Science, Fu School of Engineering, Columbia University, NY, United States of America
| | - Omer Weissbrod
- Computer Science Department, Technion—Israel Institute of Technology, Haifa, Israel
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, United States of America
| |
Collapse
|
14
|
Cesarani A, Pocrnic I, Macciotta NPP, Fragomeni BO, Misztal I, Lourenco DAL. Bias in heritability estimates from genomic restricted maximum likelihood methods under different genotyping strategies. J Anim Breed Genet 2018; 136:40-50. [DOI: 10.1111/jbg.12367] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 10/08/2018] [Accepted: 10/09/2018] [Indexed: 01/11/2023]
Affiliation(s)
- Alberto Cesarani
- Department of Animal and Dairy Science; University of Georgia; Athens Georgia
- Department of Agricultural Sciences; University of Sassari; Sassari Italy
| | - Ivan Pocrnic
- Department of Animal and Dairy Science; University of Georgia; Athens Georgia
| | | | - Breno O. Fragomeni
- Department of Animal Science; University of Connecticut; Storrs Connecticut
| | - Ignacy Misztal
- Department of Animal and Dairy Science; University of Georgia; Athens Georgia
| | | |
Collapse
|
15
|
|
16
|
Masuda Y, Misztal I, Tsuruta S, Legarra A, Aguilar I, Lourenco DAL, Fragomeni BO, Lawlor TJ. Implementation of genomic recursions in single-step genomic best linear unbiased predictor for US Holsteins with a large number of genotyped animals. J Dairy Sci 2016; 99:1968-1974. [PMID: 26805987 DOI: 10.3168/jds.2015-10540] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 12/01/2015] [Indexed: 11/19/2022]
Abstract
The objectives of this study were to develop and evaluate an efficient implementation in the computation of the inverse of genomic relationship matrix with the recursion algorithm, called the algorithm for proven and young (APY), in single-step genomic BLUP. We validated genomic predictions for young bulls with more than 500,000 genotyped animals in final score for US Holsteins. Phenotypic data included 11,626,576 final scores on 7,093,380 US Holstein cows, and genotypes were available for 569,404 animals. Daughter deviations for young bulls with no classified daughters in 2009, but at least 30 classified daughters in 2014 were computed using all the phenotypic data. Genomic predictions for the same bulls were calculated with single-step genomic BLUP using phenotypes up to 2009. We calculated the inverse of the genomic relationship matrix GAPY(-1) based on a direct inversion of genomic relationship matrix on a small subset of genotyped animals (core animals) and extended that information to noncore animals by recursion. We tested several sets of core animals including 9,406 bulls with at least 1 classified daughter, 9,406 bulls and 1,052 classified dams of bulls, 9,406 bulls and 7,422 classified cows, and random samples of 5,000 to 30,000 animals. Validation reliability was assessed by the coefficient of determination from regression of daughter deviation on genomic predictions for the predicted young bulls. The reliabilities were 0.39 with 5,000 randomly chosen core animals, 0.45 with the 9,406 bulls, and 7,422 cows as core animals, and 0.44 with the remaining sets. With phenotypes truncated in 2009 and the preconditioned conjugate gradient to solve mixed model equations, the number of rounds to convergence for core animals defined by bulls was 1,343; defined by bulls and cows, 2,066; and defined by 10,000 random animals, at most 1,629. With complete phenotype data, the number of rounds decreased to 858, 1,299, and at most 1,092, respectively. Setting up GAPY(-1) for 569,404 genotyped animals with 10,000 core animals took 1.3h and 57 GB of memory. The validation reliability with APY reaches a plateau when the number of core animals is at least 10,000. Predictions with APY have little differences in reliability among definitions of core animals. Single-step genomic BLUP with APY is applicable to millions of genotyped animals.
Collapse
Affiliation(s)
- Y Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens 30602.
| | - I Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - S Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - A Legarra
- Institut National de la Recherche Agronomique, UMR1388 GenPhySE, 31326 Castanet Tolosan, France
| | - I Aguilar
- Instituto Nacional de Investigación Agropecuaria, Canelones, Uruguay 90200
| | - D A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - B O Fragomeni
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - T J Lawlor
- Holstein Association USA Inc., Brattleboro, VT 05301
| |
Collapse
|