1
|
Baba T, Morota G, Kawakami J, Gotoh Y, Oka T, Masuda Y, Brito LF, Cockrum RR, Kawahara T. Longitudinal genome-wide association analysis using a single-step random regression model for height in Japanese Holstein cattle. JDS COMMUNICATIONS 2023; 4:363-368. [PMID: 37727246 PMCID: PMC10505781 DOI: 10.3168/jdsc.2022-0347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 03/22/2023] [Indexed: 09/21/2023]
Abstract
Growth traits, such as body weight and height, are essential in the design of genetic improvement programs of dairy cattle due to their relationship with feeding efficiency, longevity, and health. We investigated genomic regions influencing height across growth stages in Japanese Holstein cattle using a single-step random regression model. We used 72,921 records from birth to 60 mo of age with 4,111 animals born between 2000 and 2016. The analysis included 1,410 genotyped animals with 35,319 single nucleotide polymorphisms, consisting of 883 females with records and 527 bulls, and 30,745 animals with pedigree information. A single genomic region at the 58.4 megabase pair on chromosome 18 was consistently identified across 6 age points of 10, 20, 30, 40, 50, and 60 mo after multiple testing corrections for the significance threshold. Twelve candidate genes, previously reported for longevity and gestation length, were found near the identified genomic region. Another location near the identified region was also previously associated with body conformation, fertility, and calving difficulty. Functional Gene Ontology enrichment analysis suggested that the candidate genes regulate dephosphorylation and phosphatase activity. Our findings show that further study of the identified candidate genes will contribute to a better understanding of the genetic basis of height in Japanese Holstein cattle.
Collapse
Affiliation(s)
- Toshimi Baba
- Holstein Cattle Association of Japan, Hokkaido Branch, Sapporo, Hokkaido, Japan 001-8555
- School of Animal Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061
| | - Gota Morota
- School of Animal Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061
| | - Junpei Kawakami
- Holstein Cattle Association of Japan, Hokkaido Branch, Sapporo, Hokkaido, Japan 001-8555
| | - Yusaku Gotoh
- Holstein Cattle Association of Japan, Hokkaido Branch, Sapporo, Hokkaido, Japan 001-8555
| | - Taro Oka
- Holstein Cattle Association of Japan, Tokyo, Japan 164-0012
| | - Yutaka Masuda
- Department of Sustainable Agriculture, Rakuno Gakuen University, Ebetsu, Hokkaido, Japan 069-8501
| | - Luiz F. Brito
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907
| | - Rebbeca R. Cockrum
- School of Animal Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061
| | - Takayoshi Kawahara
- Holstein Cattle Association of Japan, Hokkaido Branch, Sapporo, Hokkaido, Japan 001-8555
| |
Collapse
|
2
|
Meyer K. Reducing computational demands of restricted maximum likelihood estimation with genomic relationship matrices. Genet Sel Evol 2023; 55:7. [PMID: 36698054 PMCID: PMC9875494 DOI: 10.1186/s12711-023-00781-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 01/12/2023] [Indexed: 01/26/2023] Open
Abstract
Restricted maximum likelihood estimation of genetic parameters accounting for genomic relationships has been reported to impose computational burdens which typically are many times higher than those of corresponding analyses considering pedigree based relationships only. This can be attributed to the dense nature of genomic relationship matrices and their inverses. We outline a reparameterisation of the multivariate linear mixed model to principal components and its effects on the sparsity pattern of the pertaining coefficient matrix in the mixed model equations. Using two data sets we demonstrate that this can dramatically reduce the computing time per iterate of the widely used 'average information' algorithm for restricted maximum likelihood. This is primarily due to the fact that on the principal component scale, the first derivatives of the coefficient matrix with respect to the parameters modelling genetic covariances between traits are independent of the relationship matrix between individuals, i.e. are not afflicted by a multitude of genomic relationships.
Collapse
Affiliation(s)
- Karin Meyer
- grid.1020.30000 0004 1936 7371AGBU, A Joint Venture of NSW Department of Primary Industries and University of New England, Armidale, NSW 2351 Australia
| |
Collapse
|
3
|
Xavier A, Habier D. A new approach fits multivariate genomic prediction models efficiently. Genet Sel Evol 2022; 54:45. [PMID: 35715755 PMCID: PMC9204867 DOI: 10.1186/s12711-022-00730-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 05/13/2022] [Indexed: 12/03/2022] Open
Abstract
Background Fast, memory-efficient, and reliable algorithms for estimating genomic estimated breeding values (GEBV) for multiple traits and environments are needed to make timely decisions in breeding. Multivariate genomic prediction exploits genetic correlations between traits and environments to increase accuracy of GEBV compared to univariate methods. These genetic correlations are estimated simultaneously with GEBV, because they are specific to year, environment, and management. However, estimating genetic parameters is computationally demanding with restricted maximum likelihood (REML) and Bayesian samplers, and canonical transformations or orthogonalizations cannot be used for unbalanced experimental designs. Methods We propose a multivariate randomized Gauss–Seidel algorithm for simultaneous estimation of model effects and genetic parameters. Two previously proposed methods for estimating genetic parameters were combined with a Gauss–Seidel (GS) solver, and were called Tilde-Hat-GS (THGS) and Pseudo-Expectation-GS (PEGS). Balanced and unbalanced experimental designs were simulated to compare runtime, bias and accuracy of GEBV, and bias and standard errors of estimates of heritabilities and genetic correlations of THGS, PEGS, and REML. Models with 10 to 400 response variables, 1279 to 42,034 genetic markers, and 5990 to 1.85 million observations were fitted. Results Runtime of PEGS and THGS was a fraction of REML. Accuracies of GEBV were slightly lower than those from REML, but higher than those from the univariate approach, hence THGS and PEGS exploited genetic correlations. For 500 to 600 observations per response variable, biases of estimates of genetic parameters of THGS and PEGS were small, but standard errors of estimates of genetic correlations were higher than for REML. Bias and standard errors decreased as sample size increased. For balanced designs, GEBV and estimates of genetic correlations from THGS were unbiased when only an intercept and eigenvectors of genotype scores were fitted. Conclusions THGS and PEGS are fast and memory-efficient algorithms for multivariate genomic prediction for balanced and unbalanced experimental designs. They are scalable for increasing numbers of environments and genetic markers. Accuracy of GEBV was comparable to REML. Estimates of genetic parameters had little bias, but their standard errors were larger than for REML. More studies are needed to evaluate the proposed methods for datasets that contain selection. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-022-00730-w.
Collapse
Affiliation(s)
- Alencar Xavier
- Biostatistics, Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA. .,Department of Agronomy, Purdue University, 915 W State St, West Lafayette, IN, 47907, USA.
| | - David Habier
- Biostatistics, Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA.
| |
Collapse
|
4
|
Junqueira VS, Lourenco D, Masuda Y, Cardoso FF, Lopes PS, Silva FFE, Misztal I. Is single-step genomic REML with the algorithm for proven and young more computationally efficient when less generations of data are present? J Anim Sci 2022; 100:skac082. [PMID: 35289906 PMCID: PMC9118993 DOI: 10.1093/jas/skac082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 03/10/2022] [Indexed: 12/04/2022] Open
Abstract
Efficient computing techniques allow the estimation of variance components for virtually any traditional dataset. When genomic information is available, variance components can be estimated using genomic REML (GREML). If only a portion of the animals have genotypes, single-step GREML (ssGREML) is the method of choice. The genomic relationship matrix (G) used in both cases is dense, limiting computations depending on the number of genotyped animals. The algorithm for proven and young (APY) can be used to create a sparse inverse of G (GAPY~-1) with close to linear memory and computing requirements. In ssGREML, the inverse of the realized relationship matrix (H-1) also includes the inverse of the pedigree relationship matrix, which can be dense with a long pedigree, but sparser with short. The main purpose of this study was to investigate whether costs of ssGREML can be reduced using APY with truncated pedigree and phenotypes. We also investigated the impact of truncation on variance components estimation when different numbers of core animals are used in APY. Simulations included 150K animals from 10 generations, with selection. Phenotypes (h2 = 0.3) were available for all animals in generations 1-9. A total of 30K animals in generations 8 and 9, and 15K validation animals in generation 10 were genotyped for 52,890 SNP. Average information REML and ssGREML with G-1 and GAPY~-1 using 1K, 5K, 9K, and 14K core animals were compared. Variance components are impacted when the core group in APY represents the number of eigenvalues explaining a small fraction of the total variation in G. The most time-consuming operation was the inversion of G, with more than 50% of the total time. Next, numerical factorization consumed nearly 30% of the total computing time. On average, a 7% decrease in the computing time for ordering was observed by removing each generation of data. APY can be successfully applied to create the inverse of the genomic relationship matrix used in ssGREML for estimating variance components. To ensure reliable variance component estimation, it is important to use a core size that corresponds to the number of largest eigenvalues explaining around 98% of total variation in G. When APY is used, pedigrees can be truncated to increase the sparsity of H and slightly reduce computing time for ordering and symbolic factorization, with no impact on the estimates.
Collapse
Affiliation(s)
- Vinícius Silva Junqueira
- Breeding Research Department, Bayer Crop Science, Uberlândia, Minas Gerais, Brazil
- Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Daniela Lourenco
- Department of Dairy and Animal Science, University of Georgia, Athens, GA 30602, USA
| | - Yutaka Masuda
- Department of Dairy and Animal Science, University of Georgia, Athens, GA 30602, USA
| | - Fernando Flores Cardoso
- Empresa Brasileira de Pesquisa Agropecuária (EMBRAPA) Pecuária Sul, Bagé, Rio Grande do Sul, Brasil
| | - Paulo Sávio Lopes
- Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | | | - Ignacy Misztal
- Department of Dairy and Animal Science, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
5
|
Misztal I, Aguilar I, Lourenco D, Ma L, Steibel JP, Toro M. Emerging issues in genomic selection. J Anim Sci 2021; 99:skab092. [PMID: 33773494 PMCID: PMC8186541 DOI: 10.1093/jas/skab092] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Accepted: 03/26/2021] [Indexed: 12/22/2022] Open
Abstract
Genomic selection (GS) is now practiced successfully across many species. However, many questions remain, such as long-term effects, estimations of genomic parameters, robustness of genome-wide association study (GWAS) with small and large datasets, and stability of genomic predictions. This study summarizes presentations from the authors at the 2020 American Society of Animal Science (ASAS) symposium. The focus of many studies until now is on linkage disequilibrium between two loci. Ignoring higher-level equilibrium may lead to phantom dominance and epistasis. The Bulmer effect leads to a reduction of the additive variance; however, the selection for increased recombination rate can release anew genetic variance. With genomic information, estimates of genetic parameters may be biased by genomic preselection, but costs of estimation can increase drastically due to the dense form of the genomic information. To make the computation of estimates feasible, genotypes could be retained only for the most important animals, and methods of estimation should use algorithms that can recognize dense blocks in sparse matrices. GWASs using small genomic datasets frequently find many marker-trait associations, whereas studies using much bigger datasets find only a few. Most of the current tools use very simple models for GWAS, possibly causing artifacts. These models are adequate for large datasets where pseudo-phenotypes such as deregressed proofs indirectly account for important effects for traits of interest. Artifacts arising in GWAS with small datasets can be minimized by using data from all animals (whether genotyped or not), realistic models, and methods that account for population structure. Recent developments permit the computation of P-values from genomic best linear unbiased prediction (GBLUP), where models can be arbitrarily complex but restricted to genotyped animals only, and single-step GBLUP that also uses phenotypes from ungenotyped animals. Stability was an important part of nongenomic evaluations, where genetic predictions were stable in the absence of new data even with low prediction accuracies. Unfortunately, genomic evaluations for such animals change because all animals with genotypes are connected. A top-ranked animal can easily drop in the next evaluation, causing a crisis of confidence in genomic evaluations. While correlations between consecutive genomic evaluations are high, outliers can have differences as high as 1 SD. A solution to fluctuating genomic evaluations is to base selection decisions on groups of animals. Although many issues in GS have been solved, many new issues that require additional research continue to surface.
Collapse
Affiliation(s)
- Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| | - Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria (INIA), 90200 Canelones, Uruguay
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA
| | - Juan Pedro Steibel
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
| | - Miguel Toro
- Departamento de Producción Agraria, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
6
|
Junqueira VS, Lopes PS, Lourenco D, Silva FFE, Cardoso FF. Applying the Metafounders Approach for Genomic Evaluation in a Multibreed Beef Cattle Population. Front Genet 2021; 11:556399. [PMID: 33424914 PMCID: PMC7793833 DOI: 10.3389/fgene.2020.556399] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 10/29/2020] [Indexed: 11/23/2022] Open
Abstract
Pedigree information is incomplete by nature and commonly not well-established because many of the genetic ties are not known a priori or can be wrong. The genomic era brought new opportunities to assess relationships between individuals. However, when pedigree and genomic information are used simultaneously, which is the case of single-step genomic BLUP (ssGBLUP), defining the genetic base is still a challenge. One alternative to overcome this challenge is to use metafounders, which are pseudo-individuals that describe the genetic relationship between the base population individuals. The purpose of this study was to evaluate the impact of metafounders on the estimation of breeding values for tick resistance under ssGBLUP for a multibreed population composed by Hereford, Braford, and Zebu animals. Three different scenarios were studied: pedigree-based model (BLUP), ssGBLUP, and ssGBLUP with metafounders (ssGBLUPm). In ssGBLUPm, a total of four different metafounders based on breed of origin (i.e., Hereford, Braford, Zebu, and unknown) were included for the animals with missing parents. The relationship coefficient between metafounders was in average 0.54 (ranging from 0.34 to 0.96) suggesting an overlap between ancestor populations. The estimates of metafounder relationships indicate that Hereford and Zebu breeds have a possible common ancestral relationship. Inbreeding coefficients calculated following the metafounder approach had less negative values, suggesting that ancestral populations were large enough and that gametes inherited from the historical population were not identical. Variance components were estimated based on ssGBLUPm, ssGBLUP, and BLUP, but the values from ssGBLUPm were scaled to provide a fair comparison with estimates from the other two models. In general, additive, residual, and phenotypic variance components in the Hereford population were smaller than in Braford across different models. The addition of genomic information increased heritability for Hereford, possibly because of improved genetic relationships. As expected, genomic models had greater predictive ability, with an additional gain for ssGBLUPm over ssGBLUP. The increase in predictive ability was greater for Herefords. Our results show the potential of using metafounders to increase accuracy of GEBV, and therefore, the rate of genetic gain in beef cattle populations with partial levels of missing pedigree information.
Collapse
Affiliation(s)
- Vinícius Silva Junqueira
- Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, Brazil.,Breeding Research Department, Bayer Crop Science, Uberlândia, Brazil
| | - Paulo Sávio Lopes
- Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, Brazil
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States
| | | | | |
Collapse
|
7
|
Selle ML, Steinsland I, Powell O, Hickey JM, Gorjanc G. Spatial modelling improves genetic evaluation in smallholder breeding programs. Genet Sel Evol 2020; 52:69. [PMID: 33198636 PMCID: PMC7670695 DOI: 10.1186/s12711-020-00588-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 11/03/2020] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Breeders and geneticists use statistical models to separate genetic and environmental effects on phenotype. A common way to separate these effects is to model a descriptor of an environment, a contemporary group or herd, and account for genetic relationship between animals across environments. However, separating the genetic and environmental effects in smallholder systems is challenging due to small herd sizes and weak genetic connectedness across herds. We hypothesised that accounting for spatial relationships between nearby herds can improve genetic evaluation in smallholder systems. Furthermore, geographically referenced environmental covariates are increasingly available and could model underlying sources of spatial relationships. The objective of this study was therefore, to evaluate the potential of spatial modelling to improve genetic evaluation in dairy cattle smallholder systems. METHODS We performed simulations and real dairy cattle data analysis to test our hypothesis. We modelled environmental variation by estimating herd and spatial effects. Herd effects were considered independent, whereas spatial effects had distance-based covariance between herds. We compared these models using pedigree or genomic data. RESULTS The results show that in smallholder systems (i) standard models do not separate genetic and environmental effects accurately, (ii) spatial modelling increases the accuracy of genetic evaluation for phenotyped and non-phenotyped animals, (iii) environmental covariates do not substantially improve the accuracy of genetic evaluation beyond simple distance-based relationships between herds, (iv) the benefit of spatial modelling was largest when separating the genetic and environmental effects was challenging, and (v) spatial modelling was beneficial when using either pedigree or genomic data. CONCLUSIONS We have demonstrated the potential of spatial modelling to improve genetic evaluation in smallholder systems. This improvement is driven by establishing environmental connectedness between herds, which enhances separation of genetic and environmental effects. We suggest routine spatial modelling in genetic evaluations, particularly for smallholder systems. Spatial modelling could also have a major impact in studies of human and wild populations.
Collapse
Affiliation(s)
- Maria L Selle
- Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway.
| | - Ingelin Steinsland
- Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Owen Powell
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
8
|
Misztal I, Lourenco D, Legarra A. Current status of genomic evaluation. J Anim Sci 2020; 98:skaa101. [PMID: 32267923 PMCID: PMC7183352 DOI: 10.1093/jas/skaa101] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 04/07/2020] [Indexed: 12/14/2022] Open
Abstract
Early application of genomic selection relied on SNP estimation with phenotypes or de-regressed proofs (DRP). Chips of 50k SNP seemed sufficient for an accurate estimation of SNP effects. Genomic estimated breeding values (GEBV) were composed of an index with parent average, direct genomic value, and deduction of a parental index to eliminate double counting. Use of SNP selection or weighting increased accuracy with small data sets but had minimal to no impact with large data sets. Efforts to include potentially causative SNP derived from sequence data or high-density chips showed limited or no gain in accuracy. After the implementation of genomic selection, EBV by BLUP became biased because of genomic preselection and DRP computed based on EBV required adjustments, and the creation of DRP for females is hard and subject to double counting. Genomic selection was greatly simplified by single-step genomic BLUP (ssGBLUP). This method based on combining genomic and pedigree relationships automatically creates an index with all sources of information, can use any combination of male and female genotypes, and accounts for preselection. To avoid biases, especially under strong selection, ssGBLUP requires that pedigree and genomic relationships are compatible. Because the inversion of the genomic relationship matrix (G) becomes costly with more than 100k genotyped animals, large data computations in ssGBLUP were solved by exploiting limited dimensionality of genomic data due to limited effective population size. With such dimensionality ranging from 4k in chickens to about 15k in cattle, the inverse of G can be created directly (e.g., by the algorithm for proven and young) at a linear cost. Due to its simplicity and accuracy, ssGBLUP is routinely used for genomic selection by the major chicken, pig, and beef industries. Single step can be used to derive SNP effects for indirect prediction and for genome-wide association studies, including computations of the P-values. Alternative single-step formulations exist that use SNP effects for genotyped or for all animals. Although genomics is the new standard in breeding and genetics, there are still some problems that need to be solved. This involves new validation procedures that are unaffected by selection, parameter estimation that accounts for all the genomic data used in selection, and strategies to address reduction in genetic variances after genomic selection was implemented.
Collapse
Affiliation(s)
- Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | - Andres Legarra
- Department of Animal Genetics, Institut National de la Recherche Agronomique, Castanet-Tolosan, France
| |
Collapse
|
9
|
Aguilar I, Legarra A, Cardoso F, Masuda Y, Lourenco D, Misztal I. Frequentist p-values for large-scale-single step genome-wide association, with an application to birth weight in American Angus cattle. Genet Sel Evol 2019; 51:28. [PMID: 31221101 PMCID: PMC6584984 DOI: 10.1186/s12711-019-0469-3] [Citation(s) in RCA: 82] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 05/27/2019] [Indexed: 11/14/2022] Open
Abstract
Background Single-step genomic best linear unbiased prediction (SSGBLUP) is a comprehensive method for genomic prediction. Point estimates of marker effects from SSGBLUP are often used for genome-wide association studies (GWAS) without a formal framework of hypothesis testing. Our objective was to implement p-values for single-marker GWAS studies within the single-step GWAS (SSGWAS) framework by deriving computational algorithms and procedures, and by applying these to a large beef cattle population. Methods P-values were obtained based on the prediction error (co)variances for single nucleotide polymorphisms (SNPs), which were obtained from the prediction error (co)variances of genomic predictions based on the inverse of the coefficient matrix and formulas to estimate SNP effects. Results Computation of p-values took a negligible time for a dataset with almost 2 million animals in the pedigree and 1424 genotyped sires, and no inflation of statistics was observed. The SNPs that passed the Bonferroni threshold of 10−5.9 were the same as those that explained the highest proportion of additive genetic variance, but even at the same significance levels and effects, some of them explained less genetic variance due to lower allele frequency. Conclusions The use of a p-value for SSGWAS is a very general and efficient strategy to identify quantitative trait loci (QTL). It can be used for complex datasets such as those used in animal breeding, where only a proportion of the pedigreed animals are genotyped.
Collapse
Affiliation(s)
- Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria (INIA), 90200, Canelones, Uruguay
| | - Andres Legarra
- UMR GenPhySE, INRA Toulouse, BP52626, 31326, Castanet Tolosan, France.
| | - Fernando Cardoso
- Department of Animal Science, Federal University of Pelotas, Rio Grande do Sul, Brazil.,Embrapa Pecuária Sul, Bagé, RS, 96400-031, Brazil
| | - Yutaka Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| |
Collapse
|
10
|
Colleau JJ, Palhière I, Rodríguez-Ramilo ST, Legarra A. A fast indirect method to compute functions of genomic relationships concerning genotyped and ungenotyped individuals, for diversity management. Genet Sel Evol 2017; 49:87. [PMID: 29191178 PMCID: PMC5709854 DOI: 10.1186/s12711-017-0363-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Accepted: 11/24/2017] [Indexed: 12/01/2022] Open
Abstract
Background Pedigree-based management of genetic diversity in populations, e.g., using optimal contributions, involves computation of the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{Ax}}$$\end{document}Ax type yielding elements (relationships) or functions (usually averages) of relationship matrices. For pedigree-based relationships \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{A}}$$\end{document}A, a very efficient method exists. When all the individuals of interest are genotyped, genomic management can be addressed using the genomic relationship matrix \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{G}}$$\end{document}G; however, to date, the computational problem of efficiently computing \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{Gx}}$$\end{document}Gx has not been well studied. When some individuals of interest are not genotyped, genomic management should consider the relationship matrix \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{H}}$$\end{document}H that combines genotyped and ungenotyped individuals; however, direct computation of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{Hx}}$$\end{document}Hx is computationally very demanding, because construction of a possibly huge matrix is required. Our work presents efficient ways of computing \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{Gx}}$$\end{document}Gx and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{Hx}}$$\end{document}Hx, with applications on real data from dairy sheep and dairy goat breeding schemes. Results For genomic relationships, an efficient indirect computation with quadratic instead of cubic cost is \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{x}} = {\mathbf{Z}}\left( {{\mathbf{Z^{\prime}x}}} \right)/k$$\end{document}x=ZZ′x/k, where Z is a matrix relating animals to genotypes. For the relationship matrix \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{H}}$$\end{document}H, we propose an indirect method based on the difference between vectors \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{Hx}} - {\mathbf{Ax}}$$\end{document}Hx-Ax, which involves computation of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{Ax}}$$\end{document}Ax and of products such as \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{Gw}}$$\end{document}Gw and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{A}}_{22}^{ - 1} {\mathbf{w}}$$\end{document}A22-1w, where \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{w}}$$\end{document}w is a working vector derived from \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{x}}$$\end{document}x. The latter computation is the most demanding but can be done using sparse Cholesky decompositions of matrix \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{A}}^{ - 1}$$\end{document}A-1, which allows handling very large genomic and pedigree data files. Studies based on simulations reported in the literature show that the trends of average relationships in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{H}}$$\end{document}H and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{A}}$$\end{document}A differ as genomic selection proceeds. When selection is based on genomic relationships but management is based on pedigree data, the true genetic diversity is overestimated. However, our tests on real data from sheep and goat obtained before genomic selection started do not show this. Conclusions We present efficient methods to compute elements and statistics of the genomic relationships \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{G}}$$\end{document}G and of matrix \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{H}}$$\end{document}H that combines ungenotyped and genotyped individuals. These methods should be useful to monitor and handle genomic diversity.
Collapse
Affiliation(s)
- Jean-Jacques Colleau
- GABI, INRA, AgroParisTech, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Isabelle Palhière
- GenPhySE, Université de Toulouse, INRA, INPT, ENVT, Castanet Tolosan, France
| | | | - Andres Legarra
- GenPhySE, Université de Toulouse, INRA, INPT, ENVT, Castanet Tolosan, France.
| |
Collapse
|
11
|
Masuda Y, Misztal I, Legarra A, Tsuruta S, Lourenco DAL, Fragomeni BO, Aguilar I. Technical note: Avoiding the direct inversion of the numerator relationship matrix for genotyped animals in single-step genomic best linear unbiased prediction solved with the preconditioned conjugate gradient. J Anim Sci 2017; 95:49-52. [PMID: 28177357 DOI: 10.2527/jas.2016.0699] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This paper evaluates an efficient implementation to multiply the inverse of a numerator relationship matrix for genotyped animals () by a vector (). The computation is required for solving mixed model equations in single-step genomic BLUP (ssGBLUP) with the preconditioned conjugate gradient (PCG). The inverse can be decomposed into sparse matrices that are blocks of the sparse inverse of a numerator relationship matrix () including genotyped animals and their ancestors. The elements of were rapidly calculated with the Henderson's rule and stored as sparse matrices in memory. Implementation of was by a series of sparse matrix-vector multiplications. Diagonal elements of , which were required as preconditioners in PCG, were approximated with a Monte Carlo method using 1,000 samples. The efficient implementation of was compared with explicit inversion of with 3 data sets including about 15,000, 81,000, and 570,000 genotyped animals selected from populations with 213,000, 8.2 million, and 10.7 million pedigree animals, respectively. The explicit inversion required 1.8 GB, 49 GB, and 2,415 GB (estimated) of memory, respectively, and 42 s, 56 min, and 13.5 d (estimated), respectively, for the computations. The efficient implementation required <1 MB, 2.9 GB, and 2.3 GB of memory, respectively, and <1 sec, 3 min, and 5 min, respectively, for setting up. Only <1 sec was required for the multiplication in each PCG iteration for any data sets. When the equations in ssGBLUP are solved with the PCG algorithm, is no longer a limiting factor in the computations.
Collapse
|
12
|
Masuda Y, Aguilar I, Tsuruta S, Misztal I. Technical note: Acceleration of sparse operations for average-information REML analyses with supernodal methods and sparse-storage refinements. J Anim Sci 2016; 93:4670-4. [PMID: 26523559 DOI: 10.2527/jas.2015-9395] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The objective of this study was to remove bottlenecks generally found in a computer program for average-information REML. The refinements included improvements to setting-up mixed-model equations on a hash table with a faster hash function as sparse matrix storage, changing sparse structures in calculation of traces, and replacing a sparse matrix package using traditional methods (FSPAK) with a new package using supernodal methods (YAMS); the latter package quickly processed sparse matrices containing large, dense blocks. Comparisons included 23 models with data sets from broiler, swine, beef, and dairy cattle. Models included single-trait, multiple-trait, maternal, and random regression models with phenotypic data; selected models used genomic information in a single-step approach. Setting-up mixed model equations was completed without abnormal termination in all analyses. Calculations in traces were accelerated with a hash format, especially for models with a genomic relationship matrix, and the maximum speed was 67 times faster. Computations with YAMS were, on average, more than 10 times faster than with FSPAK and had greater advantages for large data and more complicated models including multiple traits, random regressions, and genomic effects. These refinements can be applied to general average-information REML programs.
Collapse
|
13
|
Masuda Y, Misztal I, Tsuruta S, Legarra A, Aguilar I, Lourenco DAL, Fragomeni BO, Lawlor TJ. Implementation of genomic recursions in single-step genomic best linear unbiased predictor for US Holsteins with a large number of genotyped animals. J Dairy Sci 2016; 99:1968-1974. [PMID: 26805987 DOI: 10.3168/jds.2015-10540] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 12/01/2015] [Indexed: 11/19/2022]
Abstract
The objectives of this study were to develop and evaluate an efficient implementation in the computation of the inverse of genomic relationship matrix with the recursion algorithm, called the algorithm for proven and young (APY), in single-step genomic BLUP. We validated genomic predictions for young bulls with more than 500,000 genotyped animals in final score for US Holsteins. Phenotypic data included 11,626,576 final scores on 7,093,380 US Holstein cows, and genotypes were available for 569,404 animals. Daughter deviations for young bulls with no classified daughters in 2009, but at least 30 classified daughters in 2014 were computed using all the phenotypic data. Genomic predictions for the same bulls were calculated with single-step genomic BLUP using phenotypes up to 2009. We calculated the inverse of the genomic relationship matrix GAPY(-1) based on a direct inversion of genomic relationship matrix on a small subset of genotyped animals (core animals) and extended that information to noncore animals by recursion. We tested several sets of core animals including 9,406 bulls with at least 1 classified daughter, 9,406 bulls and 1,052 classified dams of bulls, 9,406 bulls and 7,422 classified cows, and random samples of 5,000 to 30,000 animals. Validation reliability was assessed by the coefficient of determination from regression of daughter deviation on genomic predictions for the predicted young bulls. The reliabilities were 0.39 with 5,000 randomly chosen core animals, 0.45 with the 9,406 bulls, and 7,422 cows as core animals, and 0.44 with the remaining sets. With phenotypes truncated in 2009 and the preconditioned conjugate gradient to solve mixed model equations, the number of rounds to convergence for core animals defined by bulls was 1,343; defined by bulls and cows, 2,066; and defined by 10,000 random animals, at most 1,629. With complete phenotype data, the number of rounds decreased to 858, 1,299, and at most 1,092, respectively. Setting up GAPY(-1) for 569,404 genotyped animals with 10,000 core animals took 1.3h and 57 GB of memory. The validation reliability with APY reaches a plateau when the number of core animals is at least 10,000. Predictions with APY have little differences in reliability among definitions of core animals. Single-step genomic BLUP with APY is applicable to millions of genotyped animals.
Collapse
Affiliation(s)
- Y Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens 30602.
| | - I Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - S Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - A Legarra
- Institut National de la Recherche Agronomique, UMR1388 GenPhySE, 31326 Castanet Tolosan, France
| | - I Aguilar
- Instituto Nacional de Investigación Agropecuaria, Canelones, Uruguay 90200
| | - D A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - B O Fragomeni
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - T J Lawlor
- Holstein Association USA Inc., Brattleboro, VT 05301
| |
Collapse
|
14
|
Fragomeni BO, Lourenco DAL, Tsuruta S, Masuda Y, Aguilar I, Misztal I. Use of genomic recursions and algorithm for proven and young animals for single-step genomic BLUP analyses--a simulation study. J Anim Breed Genet 2015; 132:340-5. [PMID: 25857518 DOI: 10.1111/jbg.12161] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2014] [Accepted: 03/13/2015] [Indexed: 11/29/2022]
Abstract
The purpose of this study was to examine accuracy of genomic selection via single-step genomic BLUP (ssGBLUP) when the direct inverse of the genomic relationship matrix (G) is replaced by an approximation of G(-1) based on recursions for young genotyped animals conditioned on a subset of proven animals, termed algorithm for proven and young animals (APY). With the efficient implementation, this algorithm has a cubic cost with proven animals and linear with young animals. Ten duplicate data sets mimicking a dairy cattle population were simulated. In a first scenario, genomic information for 20k genotyped bulls, divided in 7k proven and 13k young bulls, was generated for each replicate. In a second scenario, 5k genotyped cows with phenotypes were included in the analysis as young animals. Accuracies (average for the 10 replicates) in regular EBV were 0.72 and 0.34 for proven and young animals, respectively. When genomic information was included, they increased to 0.75 and 0.50. No differences between genomic EBV (GEBV) obtained with the regular G(-1) and the approximated G(-1) via the recursive method were observed. In the second scenario, accuracies in GEBV (0.76, 0.51 and 0.59 for proven bulls, young males and young females, respectively) were also higher than those in EBV (0.72, 0.35 and 0.49). Again, no differences between GEBV with regular G(-1) and with recursions were observed. With the recursive algorithm, the number of iterations to achieve convergence was reduced from 227 to 206 in the first scenario and from 232 to 209 in the second scenario. Cows can be treated as young animals in APY without reducing the accuracy. The proposed algorithm can be implemented to reduce computing costs and to overcome current limitations on the number of genotyped animals in the ssGBLUP method.
Collapse
Affiliation(s)
- B O Fragomeni
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - D A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - S Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Y Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - I Aguilar
- Instituto Nacional de Investigacion Agropecuaria, Canelones, Uruguay
| | - I Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| |
Collapse
|
15
|
Abstract
In genomic prediction, common analysis methods rely on a linear mixed-model framework to estimate SNP marker effects and breeding values of animals or plants. Ridge regression-best linear unbiased prediction (RR-BLUP) is based on the assumptions that SNP marker effects are normally distributed, are uncorrelated, and have equal variances. We propose DAIRRy-BLUP, a parallel, Distributed-memory RR-BLUP implementation, based on single-trait observations ( Y: ), that uses the Average Information algorithm for restricted maximum-likelihood estimation of the variance components. The goal of DAIRRy-BLUP is to enable the analysis of large-scale data sets to provide more accurate estimates of marker effects and breeding values. A distributed-memory framework is required since the dimensionality of the problem, determined by the number of SNP markers, can become too large to be analyzed by a single computing node. Initial results show that DAIRRy-BLUP enables the analysis of very large-scale data sets (up to 1,000,000 individuals and 360,000 SNPs) and indicate that increasing the number of phenotypic and genotypic records has a more significant effect on the prediction accuracy than increasing the density of SNP arrays.
Collapse
|