51
|
Su G, Brøndum RF, Ma P, Guldbrandtsen B, Aamand GP, Lund MS. Comparison of genomic predictions using medium-density (∼54,000) and high-density (∼777,000) single nucleotide polymorphism marker panels in Nordic Holstein and Red Dairy Cattle populations. J Dairy Sci 2012; 95:4657-65. [PMID: 22818480 DOI: 10.3168/jds.2012-5379] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2012] [Accepted: 04/17/2012] [Indexed: 12/22/2022]
Abstract
This study investigated genomic prediction using medium-density (∼54,000; 54K) and high-density marker panels (∼777,000; 777K), based on data from Nordic Holstein and Red Dairy Cattle (RDC). The Holstein data comprised 4,539 progeny-tested bulls, and the RDC data 4,403 progeny-tested bulls. The data were divided into reference data and test data using October 1, 2001, as a cut-off date (birth date of the bulls). This resulted in about 25% genotyped bulls in the Holstein test data and 20% in the RDC test data. For each breed, 3 data sets of markers were used to predict breeding values: (1) 54K data set with missing genotypes, (2) 54K data set where missing genotypes were imputed, and (3) imputed high-density (HD) marker data set created by imputing the 54K data to the HD data based on 557 bulls genotyped using a 777K single nucleotide polymorphism chip in Holstein, and 706 bulls in RDC. Based on the 3 marker data sets, direct genomic breeding values (DGV) for protein, fertility, and udder health were predicted using a genomic BLUP model (GBLUP) and a Bayesian mixture model with 2 normal distributions. Reliability of DGV was measured as squared correlations between deregressed proofs (DRP) and DGV corrected for reliability of DRP. Unbiasedness was assessed by regression of DRP on DGV, based on the bulls in the test data sets. Averaged over the 3 traits, reliability of DGV based on the HD markers was 0.5% higher than that based on the 54K data in Holstein, and 1.0% higher than that in RDC. In addition, the HD markers led to an improvement of unbiasedness of DGV. The Bayesian mixture model led to 0.5% higher reliability than the GBLUP model in Holstein, but not in RDC. Imputing missing genotypes in the 54K marker data did not improve genomic predictions for most of the traits.
Collapse
Affiliation(s)
- G Su
- Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark.
| | | | | | | | | | | |
Collapse
|
52
|
Abraham G, Kowalczyk A, Zobel J, Inouye M. Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease. Genet Epidemiol 2012. [PMID: 23203348 DOI: 10.1002/gepi.21698] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A central goal of medical genetics is to accurately predict complex disease from genotypes. Here, we present a comprehensive analysis of simulated and real data using lasso and elastic-net penalized support-vector machine models, a mixed-effects linear model, a polygenic score, and unpenalized logistic regression. In simulation, the sparse penalized models achieved lower false-positive rates and higher precision than the other methods for detecting causal SNPs. The common practice of prefiltering SNP lists for subsequent penalized modeling was examined and shown to substantially reduce the ability to recover the causal SNPs. Using genome-wide SNP profiles across eight complex diseases within cross-validation, lasso and elastic-net models achieved substantially better predictive ability in celiac disease, type 1 diabetes, and Crohn's disease, and had equivalent predictive ability in the rest, with the results in celiac disease strongly replicating between independent datasets. We investigated the effect of linkage disequilibrium on the predictive models, showing that the penalized methods leverage this information to their advantage, compared with methods that assume SNP independence. Our findings show that sparse penalized approaches are robust across different disease architectures, producing as good as or better phenotype predictions and variance explained. This has fundamental ramifications for the selection and future development of methods to genetically predict human disease.
Collapse
Affiliation(s)
- Gad Abraham
- Medical Systems Biology, Departments of Pathology and of Microbiology & Immunology, The University of Melbourne, Parkville, VIC, Australia
| | | | | | | |
Collapse
|
53
|
Colombani C, Legarra A, Fritz S, Guillaume F, Croiseau P, Ducrocq V, Robert-Granié C. Application of Bayesian least absolute shrinkage and selection operator (LASSO) and BayesCπ methods for genomic selection in French Holstein and Montbéliarde breeds. J Dairy Sci 2012; 96:575-91. [PMID: 23127905 DOI: 10.3168/jds.2011-5225] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2011] [Accepted: 09/14/2012] [Indexed: 11/19/2022]
Abstract
Recently, the amount of available single nucleotide polymorphism (SNP) marker data has considerably increased in dairy cattle breeds, both for research purposes and for application in commercial breeding and selection programs. Bayesian methods are currently used in the genomic evaluation of dairy cattle to handle very large sets of explanatory variables with a limited number of observations. In this study, we applied 2 bayesian methods, BayesCπ and bayesian least absolute shrinkage and selection operator (LASSO), to 2 genotyped and phenotyped reference populations consisting of 3,940 Holstein bulls and 1,172 Montbéliarde bulls with approximately 40,000 polymorphic SNP. We compared the accuracy of the bayesian methods for the prediction of 3 traits (milk yield, fat content, and conception rate) with pedigree-based BLUP, genomic BLUP, partial least squares (PLS) regression, and sparse PLS regression, a variable selection PLS variant. The results showed that the correlations between observed and predicted phenotypes were similar in BayesCπ (including or not pedigree information) and bayesian LASSO for most of the traits and whatever the breed. In the Holstein breed, bayesian methods led to higher correlations than other approaches for fat content and were similar to genomic BLUP for milk yield and to genomic BLUP and PLS regression for the conception rate. In the Montbéliarde breed, no method dominated the others, except BayesCπ for fat content. The better performances of the bayesian methods for fat content in Holstein and Montbéliarde breeds are probably due to the effect of the DGAT1 gene. The SNP identified by the BayesCπ, bayesian LASSO, and sparse PLS regression methods, based on their effect on the different traits of interest, were located at almost the same position on the genome. As the bayesian methods resulted in regressions of direct genomic values on daughter trait deviations closer to 1 than for the other methods tested in this study, bayesian methods are suggested for genomic evaluations of French dairy cattle.
Collapse
Affiliation(s)
- C Colombani
- INRA, UR631-SAGA, BP 52627, 31326 Castanet-Tolosan Cedex, France
| | | | | | | | | | | | | |
Collapse
|
54
|
Khatkar MS, Moser G, Hayes BJ, Raadsma HW. Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle. BMC Genomics 2012; 13:538. [PMID: 23043356 PMCID: PMC3531262 DOI: 10.1186/1471-2164-13-538] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2012] [Accepted: 10/06/2012] [Indexed: 12/21/2022] Open
Abstract
Background We investigated strategies and factors affecting accuracy of imputing genotypes from lower-density SNP panels (Illumina 3K, 7K, Affymetrix 15K and 25K, and evenly spaced subsets) up to one medium (Illumina 50K) and one high-density (Illumina 800K) SNP panel. We also evaluated the utility of imputed genotypes on the accuracy of genomic selection using Australian Holstein-Friesian cattle data from 2727 and 845 animals genotyped with 50K and 800K SNP chip, respectively. Animals were divided into reference and test sets (genotyped with higher and lower density SNP panels, respectively) for evaluating the accuracies of imputation. For the accuracy of genomic selection, a comparison of direct genetic values (DGV) was made by dividing the data into training and validation sets under a range of imputation scenarios. Results Of the three methods compared for imputation, IMPUTE2 outperformed Beagle and fastPhase for almost all scenarios. Higher SNP densities in the test animals, larger reference sets and higher relatedness between test and reference animals increased the accuracy of imputation. 50K specific genotypes were imputed with moderate allelic error rates from 15K (2.85%) and 25K (2.75%) genotypes. Using IMPUTE2, SNP genotypes up to 800K were imputed with low allelic error rate (0.79% genome-wide) from 50K genotypes, and with moderate error rate from 3K (4.78%) and 7K (2.00%) genotypes. The error rate of imputing up to 800K from 3K or 7K was further reduced when an additional middle tier of 50K genotypes was incorporated in a 3-tiered framework. Accuracies of DGV for five production traits using imputed 50K genotypes were close to those obtained with the actual 50K genotypes and higher compared to using 3K or 7K genotypes. The loss in accuracy of DGV was small when most of the training animals also had imputed (50K) genotypes. Additional gains in DGV accuracies were small when SNP densities increased from 50K to imputed 800K. Conclusion Population-based genotype imputation can be used to predict and combine genotypes from different low, medium and high-density SNP chips with a high level of accuracy. Imputing genotypes from low-density SNP panels to at least 50K SNP density increases the accuracy of genomic selection.
Collapse
Affiliation(s)
- Mehar S Khatkar
- Reprogen - Animal Bioscience, Faculty of Veterinary Science, University of Sydney, 425 Werombi Road, Camden, NSW 2570, Australia.
| | | | | | | |
Collapse
|
55
|
Segelke D, Chen J, Liu Z, Reinhardt F, Thaller G, Reents R. Reliability of genomic prediction for German Holsteins using imputed genotypes from low-density chips. J Dairy Sci 2012; 95:5403-5411. [DOI: 10.3168/jds.2012-5466] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2012] [Accepted: 05/23/2012] [Indexed: 02/05/2023]
|
56
|
de los Campos G, Klimentidis YC, Vazquez AI, Allison DB. Prediction of expected years of life using whole-genome markers. PLoS One 2012; 7:e40964. [PMID: 22848416 PMCID: PMC3405107 DOI: 10.1371/journal.pone.0040964] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2011] [Accepted: 06/15/2012] [Indexed: 01/27/2023] Open
Abstract
Genetic factors are believed to account for 25% of the interindividual differences in Years of Life (YL) among humans. However, the genetic loci that have thus far been found to be associated with YL explain a very small proportion of the expected genetic variation in this trait, perhaps reflecting the complexity of the trait and the limitations of traditional association studies when applied to traits affected by a large number of small-effect genes. Using data from the Framingham Heart Study and statistical methods borrowed largely from the field of animal genetics (whole-genome prediction, WGP), we developed a WGP model for the study of YL and evaluated the extent to which thousands of genetic variants across the genome examined simultaneously can be used to predict interindividual differences in YL. We find that a sizable proportion of differences in YL--which were unexplained by age at entry, sex, smoking and BMI--can be accounted for and predicted using WGP methods. The contribution of genomic information to prediction accuracy was even higher than that of smoking and body mass index (BMI) combined; two predictors that are considered among the most important life-shortening factors. We evaluated the impacts of familial relationships and population structure (as described by the first two marker-derived principal components) and concluded that in our dataset population structure explained partially, but not fully the gains in prediction accuracy obtained with WGP. Further inspection of prediction accuracies by age at death indicated that most of the gains in predictive ability achieved with WGP were due to the increased accuracy of prediction of early mortality, perhaps reflecting the ability of WGP to capture differences in genetic risk to deadly diseases such as cancer, which are most often responsible for early mortality in our sample.
Collapse
Affiliation(s)
- Gustavo de los Campos
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, United States of America.
| | | | | | | |
Collapse
|
57
|
An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle. Genet Res (Camb) 2012; 94:133-50. [PMID: 22809677 DOI: 10.1017/s001667231200033x] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Summary Imputation of moderate-density genotypes from low-density panels is of increasing interest in genomic selection, because it can dramatically reduce genotyping costs. Several imputation software packages have been developed, but they vary in imputation accuracy, and imputed genotypes may be inconsistent among methods. An AdaBoost-like approach is proposed to combine imputation results from several independent software packages, i.e. Beagle(v3.3), IMPUTE(v2.0), fastPHASE(v1.4), AlphaImpute, findhap(v2) and Fimpute(v2), with each package serving as a basic classifier in an ensemble-based system. The ensemble-based method computes weights sequentially for all classifiers, and combines results from component methods via weighted majority 'voting' to determine unknown genotypes. The data included 3078 registered Angus cattle, each genotyped with the Illumina BovineSNP50 BeadChip. SNP genotypes on three chromosomes (BTA1, BTA16 and BTA28) were used to compare imputation accuracy among methods, and the application involved the imputation of 50K genotypes covering 29 chromosomes based on a set of 5K genotypes. Beagle and Fimpute had the greatest accuracy among the six imputation packages, which ranged from 0·8677 to 0·9858. The proposed ensemble method was better than any of these packages, but the sequence of independent classifiers in the voting scheme affected imputation accuracy. The ensemble systems yielding the best imputation accuracies were those that had Beagle as first classifier, followed by one or two methods that utilized pedigree information. A salient feature of the proposed ensemble method is that it can solve imputation inconsistencies among different imputation methods, hence leading to a more reliable system for imputing genotypes relative to independent methods.
Collapse
|
58
|
Mulder HA, Calus MPL, Druet T, Schrooten C. Imputation of genotypes with low-density chips and its effect on reliability of direct genomic values in Dutch Holstein cattle. J Dairy Sci 2012; 95:876-89. [PMID: 22281352 DOI: 10.3168/jds.2011-4490] [Citation(s) in RCA: 77] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2011] [Accepted: 10/25/2011] [Indexed: 11/19/2022]
Abstract
Genomic selection using 50,000 single nucleotide polymorphism (50k SNP) chips has been implemented in many dairy cattle breeding programs. Cheap, low-density chips make genotyping of a larger number of animals cost effective. A commonly proposed strategy is to impute low-density genotypes up to 50,000 genotypes before predicting direct genomic values (DGV). The objectives of this study were to investigate the accuracy of imputation for animals genotyped with a low-density chip and to investigate the effect of imputation on reliability of DGV. Low-density chips contained 384, 3,000, or 6,000 SNP. The SNP were selected based either on the highest minor allele frequency in a bin or the middle SNP in a bin, and DAGPHASE, CHROMIBD, and multivariate BLUP were used for imputation. Genotypes of 9,378 animals were used, from which approximately 2,350 animals had deregressed proofs. Bayesian stochastic search variable selection was used for estimating SNP effects of the 50k chip. Imputation accuracies and imputation error rates were poor for low-density chips with 384 SNP. Imputation accuracies were higher with 3,000 and 6,000 SNP. Performance of DAGPHASE and CHROMIBD was very similar and much better than that of multivariate BLUP for both imputation accuracy and reliability of DGV. With 3,000 SNP and using CHROMIBD or DAGPHASE for imputation, 84 to 90% of the increase in DGV reliability using the 50k chip, compared with a pedigree index, was obtained. With multivariate BLUP, the increase in reliability was only 40%. With 384 SNP, the reliability of DGV was lower than for a pedigree index, whereas with 6,000 SNP, about 93% of the increase in reliability of DGV based on the 50k chip was obtained when using DAGPHASE for imputation. Using genotype probabilities to predict gene content increased imputation accuracy and the reliability of DGV and is therefore recommended for applications of imputation for genomic prediction. A deterministic equation was derived to predict accuracy of DGV based on imputation accuracy, which fitted closely with the observed relationship. The deterministic equation can be used to evaluate the effect of differences in imputation accuracy on accuracy and reliability of DGV.
Collapse
Affiliation(s)
- H A Mulder
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, PO Box 65, 8200 AB Lelystad, The Netherlands.
| | | | | | | |
Collapse
|
59
|
Maltecca C, Parker KL, Cassady JP. Application of multiple shrinkage methods to genomic predictions1. J Anim Sci 2012; 90:1777-87. [DOI: 10.2527/jas.2011-4350] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Christian Maltecca
- Department of Animal Science, North Carolina State University, Raleigh, 27695
| | - Kristen L. Parker
- Department of Animal Science, North Carolina State University, Raleigh, 27695
| | - Joseph P. Cassady
- Department of Animal Science, North Carolina State University, Raleigh, 27695
| |
Collapse
|
60
|
Imputation of genotypes from low- to high-density genotyping platforms and implications for genomic selection. Animal 2012; 5:1162-9. [PMID: 22440168 DOI: 10.1017/s1751731111000309] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
The objective of this study was to quantify the accuracy achievable from imputing genotypes from a commercially available low-density marker panel (2730 single nucleotide polymorphisms (SNPs) following edits) to a commercially available higher density marker panel (51 602 SNPs following edits) in Holstein-Friesian cattle using Beagle, a freely available software package. A population of 764 Holstein-Friesian animals born since 2006 were used as the test group to quantify the accuracy of imputation, all of which had genotypes for the high-density panel; only SNPs on the low-density panel were retained with the remaining SNPs to be imputed. The reference population for imputation consisted of 4732 animals born before 2006 also with genotypes on the higher density marker panel. The concordance between the actual and imputed genotypes in the test group of animals did not vary across chromosomes and was on average 95%; the concordance between actual and imputed alleles was, on average, 97% across all SNPs. Genomic predictions were undertaken across a range of production and functional traits for the 764 test group animals using either their real or imputed genotypes. Little or no mean difference in the genomic predictions was evident when comparing direct genomic values (DGVs) using real or imputed genotypes. The average correlation between the DGVs estimated using the real or imputed genotypes for the 15 traits included in the Irish total merit index was 0.97 (range of 0.92 to 0.99), indicating good concordance between proofs from real or imputed genotypes. Results show that a commercially available high-density marker panel can be imputed from a commercially available lower density marker panel, which will also have a lower cost, thereby facilitating a reduction in the cost of genomic selection. Increased available numbers of genotyped and phenotyped animals also has implications for increasing the accuracy of genomic prediction in the entire population and thus genetic gain using genomic selection.
Collapse
|
61
|
Use of partial least squares regression to predict single nucleotide polymorphism marker genotypes when some animals are genotyped with a low-density panel. Animal 2012; 5:833-7. [PMID: 22440021 DOI: 10.1017/s1751731110002600] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
High-density single nucleotide polymorphism (SNP) platforms are currently used in genomic selection (GS) programs to enhance the selection response. However, the genotyping of a large number of animals with high-throughput platforms is rather expensive and may represent a constraint for a large-scale implementation of GS. The use of low-density marker (LDM) platforms could overcome this problem, but different SNP chips may be required for each trait and/or breed. In this study, a strategy of imputation independent from trait and breed is proposed. A simulated population of 5865 individuals with a genome of 6000 SNP equally distributed on six chromosomes was considered. First, reference and prediction populations were generated by mimicking high- and low-density SNP platforms, respectively. Then, the partial least squares regression (PLSR) technique was applied to reconstruct the missing SNP in the low-density chip. The proportion of SNP correctly reconstructed by the PLSR method ranged from 0.78 to 0.97 when 90% and 50%, respectively, of genotypes were predicted. Moreover, data sets consisting of a mixture of actual and PLSR-predicted SNP or only actual SNP were used to predict genomic breeding values (GEBVs). Correlations between GEBV and true breeding values varied from 0.74 to 0.76, respectively. The results of the study indicate that the PLSR technique can be considered a reliable computational strategy for predicting SNP genotypes in an LDM platform with reasonable accuracy.
Collapse
|
62
|
Pérez-Cabal MA, Vazquez AI, Gianola D, Rosa GJM, Weigel KA. Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts. Front Genet 2012; 3:27. [PMID: 22403583 PMCID: PMC3288819 DOI: 10.3389/fgene.2012.00027] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 02/13/2012] [Indexed: 11/26/2022] Open
Abstract
The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN) or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL), as well as a layout based on stratification by generation (GEN). The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full-sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, CV designs should resemble the intended use of the predictive models, e.g., within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered.
Collapse
|
63
|
A simple method for genomic selection of moderately sized dairy cattle populations. Animal 2012; 6:193-202. [DOI: 10.1017/s1751731111001704] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
64
|
Long N, Gianola D, Rosa GJM, Weigel KA. Application of support vector regression to genome-assisted prediction of quantitative traits. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2011; 123:1065-1074. [PMID: 21739137 DOI: 10.1007/s00122-011-1648-y] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2011] [Accepted: 06/22/2011] [Indexed: 05/31/2023]
Abstract
A byproduct of genome-wide association studies is the possibility of carrying out genome-enabled prediction of disease risk or of quantitative traits. This study is concerned with predicting two quantitative traits, milk yield in dairy cattle and grain yield in wheat, using dense molecular markers as predictors. Two support vector regression (SVR) models, ε-SVR and least-squares SVR, were explored and compared to a widely applied linear regression model, the Bayesian Lasso, the latter assuming additive marker effects. Predictive performance was measured using predictive correlation and mean squared error of prediction. Depending on the kernel function chosen, SVR can model either linear or nonlinear relationships between phenotypes and marker genotypes. For milk yield, where phenotypes were estimated breeding values of bulls (a linear combination of the data), SVR with a Gaussian radial basis function (RBF) kernel had a slightly better performance than with a linear kernel, and was similar to the Bayesian Lasso. For the wheat data, where phenotype was raw grain yield, the RBF kernel provided clear advantages over the linear kernel, e.g., a 17.5% increase in correlation when using the ε-SVR. SVR with a RBF kernel also compared favorably to the Bayesian Lasso in this case. It is concluded that a nonlinear RBF kernel may be an optimal choice for SVR, especially when phenotypes to be predicted have a nonlinear dependency on genotypes, as it might have been the case in the wheat data.
Collapse
Affiliation(s)
- Nanye Long
- Department of Animal Sciences, University of Wisconsin, 1675 Observatory Dr., Animal Science Bldg, Madison, WI 53706, USA.
| | | | | | | |
Collapse
|
65
|
Zhang Z, Ding X, Liu J, Zhang Q, de Koning DJ. Accuracy of genomic prediction using low-density marker panels. J Dairy Sci 2011; 94:3642-50. [PMID: 21700054 DOI: 10.3168/jds.2010-3917] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2010] [Accepted: 03/15/2011] [Indexed: 12/26/2022]
Abstract
Genomic selection has been widely implemented in national and international genetic evaluation in the dairy cattle industry, because of its potential advantages over traditional selection methods and the availability of commercial high-density (HD) single nucleotide polymorphism (SNP) panels. However, this method may not be cost-effective for cow selection and for other livestock species, because the cost of HD SNP panels is still relatively high. One possible solution that can enable other species to benefit from this promising method is genomic selection with low-density (LD) SNP panels. In this simulation study, LD SNP panels designed with different strategies and different SNP densities were compared. The effects of number of quantitative trait loci, heritability, and effective population size were evaluated in the framework of genomic selection with LD SNP panels. Methodologies of Bayesian variable selection; BLUP with a trait-specific, marker-derived relationship matrix; and BLUP with a realized relationship matrix were employed to predict genomic estimated breeding values with both HD and LD SNP panels. Up to 95% of accuracy obtained by using an HD panel can be obtained by using only a small proportion of markers. The LD panel with markers selected on the basis of their effects always performs better than the LD panel with evenly spaced markers. Both the genetic architecture of the trait and the effective population size have a significant effect on the performance of the LD panels. We concluded that, to implement genomic selection with LD panels, a training population of sufficient size and genotyped with an HD panel is necessary. The trade-off between the LD panels with evenly spaced markers and selected markers must be considered, which depends on the number of target traits in a breeding program and the genetic architecture of these traits. Genomic selection with LD panels could be feasible and cost-effective, though before implementation, a further detailed genetic and economic analysis is recommended.
Collapse
Affiliation(s)
- Z Zhang
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | | | | | | | | |
Collapse
|
66
|
|
67
|
Abstract
Background Accurate prediction of genomic breeding values (GEBVs) requires numerous markers. However, predictive accuracy can be enhanced by excluding markers with no effects or with inconsistent effects among crosses that can adversely affect the prediction of GEBVs. Methods We present three different approaches for pre-selecting markers prior to predicting GEBVs using four different BLUP methods, including ridge regression and three spatial models. Performances of the models were evaluated using 5-fold cross-validation. Results and conclusions Ridge regression and the spatial models gave essentially similar fits. Pre-selecting markers was evidently beneficial since excluding markers with inconsistent effects among crosses increased the correlation between GEBVs and true breeding values of the non-phenotyped individuals from 0.607 (using all markers) to 0.625 (using pre-selected markers). Moreover, extension of the ridge regression model to allow for heterogeneous variances between the most significant subset and the complementary subset of pre-selected markers increased predictive accuracy (from 0.625 to 0.648) for the simulated dataset for the QTL-MAS 2010 workshop.
Collapse
|
68
|
Weller JI, Ron M. Invited review: quantitative trait nucleotide determination in the era of genomic selection. J Dairy Sci 2011; 94:1082-90. [PMID: 21338774 DOI: 10.3168/jds.2010-3793] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Accepted: 11/09/2010] [Indexed: 12/24/2022]
Abstract
Genome-wide association studies based on tens of thousands of single nucleotide polymorphisms have been completed for several dairy cattle populations. Methods have been proposed to directly incorporate genome scan data into breeding programs, chiefly by selection of young sires based on their genotypes for the genetic markers and pedigree without progeny test. Thus, the rate of genetic gain is increased by reduction of the mean generation interval. The methods developed so far for application of genomic selection do not require identification of the actual quantitative trait nucleotides (QTN) responsible for the observed variation of quantitative trait loci (QTL). To date, 2 QTN affecting milk production traits have been detected in dairy cattle: DGAT1 and ABCG2. This review will attempt to address the following questions based on the current state of bovine genomics and statistics. What are the pros and cons for QTN determination? How can data obtained from high-density, genome-wide scans be used most efficiently for QTN determination? Can the genome scan results already available and next-generation sequencing data be used to determine QTN? Should QTN be treated differently than markers at linkage disequilibrium with QTL in genetic evaluation programs? Data obtained by genome-wide association studies can be used to deduce QTL genotypes of sires via application of the a posteriori granddaughter design for concordance testing of putative QTN. This, together with next-generation sequencing technology, will dramatically reduce costs for QTN determination. By complete genome sequencing of 21 sires with many artificial insemination sons, it should be possible to determine concordance for all potential QTN, thus establishing the field of QTNomics.
Collapse
Affiliation(s)
- J I Weller
- Institute of Animal Sciences, A.R.O., The Volcani Center, Bet Dagan 50250, Israel.
| | | |
Collapse
|
69
|
Makowsky R, Pajewski NM, Klimentidis YC, Vazquez AI, Duarte CW, Allison DB, de los Campos G. Beyond missing heritability: prediction of complex traits. PLoS Genet 2011; 7:e1002051. [PMID: 21552331 PMCID: PMC3084207 DOI: 10.1371/journal.pgen.1002051] [Citation(s) in RCA: 210] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2010] [Accepted: 03/02/2011] [Indexed: 01/25/2023] Open
Abstract
Despite rapid advances in genomic technology, our ability to account for phenotypic variation using genetic information remains limited for many traits. This has unfortunately resulted in limited application of genetic data towards preventive and personalized medicine, one of the primary impetuses of genome-wide association studies. Recently, a large proportion of the "missing heritability" for human height was statistically explained by modeling thousands of single nucleotide polymorphisms concurrently. However, it is currently unclear how gains in explained genetic variance will translate to the prediction of yet-to-be observed phenotypes. Using data from the Framingham Heart Study, we explore the genomic prediction of human height in training and validation samples while varying the statistical approach used, the number of SNPs included in the model, the validation scheme, and the number of subjects used to train the model. In our training datasets, we are able to explain a large proportion of the variation in height (h(2) up to 0.83, R(2) up to 0.96). However, the proportion of variance accounted for in validation samples is much smaller (ranging from 0.15 to 0.36 depending on the degree of familial information used in the training dataset). While such R(2) values vastly exceed what has been previously reported using a reduced number of pre-selected markers (<0.10), given the heritability of the trait (∼ 0.80), substantial room for improvement remains.
Collapse
Affiliation(s)
- Robert Makowsky
- Department of Biostatistics, University of Alabama at Birmingham, Alabama, United States of America.
| | | | | | | | | | | | | |
Collapse
|
70
|
Long N, Gianola D, Rosa G, Weigel K. Dimension reduction and variable selection for genomic selection: application to predicting milk yield in Holsteins. J Anim Breed Genet 2011; 128:247-57. [DOI: 10.1111/j.1439-0388.2011.00917.x] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
71
|
VanRaden PM, O'Connell JR, Wiggans GR, Weigel KA. Genomic evaluations with many more genotypes. Genet Sel Evol 2011; 43:10. [PMID: 21366914 PMCID: PMC3056758 DOI: 10.1186/1297-9686-43-10] [Citation(s) in RCA: 180] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2010] [Accepted: 03/02/2011] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Genomic evaluations in Holstein dairy cattle have quickly become more reliable over the last two years in many countries as more animals have been genotyped for 50,000 markers. Evaluations can also include animals genotyped with more or fewer markers using new tools such as the 777,000 or 2,900 marker chips recently introduced for cattle. Gains from more markers can be predicted using simulation, whereas strategies to use fewer markers have been compared using subsets of actual genotypes. The overall cost of selection is reduced by genotyping most animals at less than the highest density and imputing their missing genotypes using haplotypes. Algorithms to combine different densities need to be efficient because numbers of genotyped animals and markers may continue to grow quickly. METHODS Genotypes for 500,000 markers were simulated for the 33,414 Holsteins that had 50,000 marker genotypes in the North American database. Another 86,465 non-genotyped ancestors were included in the pedigree file, and linkage disequilibrium was generated directly in the base population. Mixed density datasets were created by keeping 50,000 (every tenth) of the markers for most animals. Missing genotypes were imputed using a combination of population haplotyping and pedigree haplotyping. Reliabilities of genomic evaluations using linear and nonlinear methods were compared. RESULTS Differing marker sets for a large population were combined with just a few hours of computation. About 95% of paternal alleles were determined correctly, and > 95% of missing genotypes were called correctly. Reliability of breeding values was already high (84.4%) with 50,000 simulated markers. The gain in reliability from increasing the number of markers to 500,000 was only 1.6%, but more than half of that gain resulted from genotyping just 1,406 young bulls at higher density. Linear genomic evaluations had reliabilities 1.5% lower than the nonlinear evaluations with 50,000 markers and 1.6% lower with 500,000 markers. CONCLUSIONS Methods to impute genotypes and compute genomic evaluations were affordable with many more markers. Reliabilities for individual animals can be modified to reflect success of imputation. Breeders can improve reliability at lower cost by combining marker densities to increase both the numbers of markers and animals included in genomic evaluation. Larger gains are expected from increasing the number of animals than the number of markers.
Collapse
Affiliation(s)
- Paul M VanRaden
- Animal Improvement Programs Laboratory, USDA, Building 5 BARC-West, Beltsville, MD 20705-2350, USA.
| | | | | | | |
Collapse
|
72
|
Vazquez AI, Rosa GJM, Weigel KA, de los Campos G, Gianola D, Allison DB. Predictive ability of subsets of single nucleotide polymorphisms with and without parent average in US Holsteins. J Dairy Sci 2011; 93:5942-9. [PMID: 21094768 DOI: 10.3168/jds.2010-3335] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2010] [Accepted: 08/12/2010] [Indexed: 01/15/2023]
Abstract
Genome-enabled prediction of breeding values using high-density panels (HDP) can be highly accurate, even for young sires. However, the cost of the assay may limit its use to elite animals only. Low-density panels (LDP) containing a subset of single nucleotide polymorphisms (SNP) may give reasonably accurate predictions and could be used cost-effectively with young males and females. This study evaluates strategies for selecting subsets of SNP for several traits, compares predictive ability of LDP with that of HDP, and assesses the benefits of including parent average (PA) as a predictor in models using LDP. Data consisting of progeny-test predicted transmitting ability (PTA) for net merit and 6 other traits of economic interest from 4,783 Holstein sires were evaluated using testing and training sets with regressions on their high-density genotypes and parent averages for net merit index. Additionally, SNP subsets of different sizes were selected using different strategies, including the "best" SNP based on the absolute values of their estimated effects from HDP models for either the trait itself or lifetime net merit, and evenly spaced (ES) SNP across the genome. Overall, HDP models had the best predictive ability, setting an upper bound for the predictive ability of LDP sets. Low-density panels targeting the SNP with strongest effects (for either a single trait or lifetime net merit) provided reasonably accurate predictions and generally outperformed predictions based on evenly spaced SNP. For example, evenly spaced sets would require at least 5,000 to 7,500 SNP to reach 95% of the predictive ability provided by HDP. On the other hand, this level of predictive ability can be achieved with sets of 2,000 SNP when SNP are selected based on magnitude of estimated effects for the trait. Accuracy of predictions based on LDP can be improved markedly by including parent average as a fixed effect in the model; for example, a set with the 1,000 best SNP using the parent average achieved the 95% of the accuracy of a HDP model.
Collapse
Affiliation(s)
- A I Vazquez
- Department of Dairy Science, University of Wisconsin, Madison 53706, USA.
| | | | | | | | | | | |
Collapse
|
73
|
Zhang Z, Druet T. Marker imputation with low-density marker panels in Dutch Holstein cattle. J Dairy Sci 2011; 93:5487-94. [PMID: 20965364 DOI: 10.3168/jds.2010-3501] [Citation(s) in RCA: 88] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2010] [Accepted: 08/13/2010] [Indexed: 11/19/2022]
Abstract
The availability of high-density bovine genotyping arrays made implementation of genomic selection possible in dairy cattle. Development of low-density single nucleotide polymorphism (SNP) panels will allow the extension of genomic selection to a larger portion of the population. Prediction of ungenotyped markers, called imputation, is a strategy that allows using the same low-density chips for all traits (and for different breeds). In the present study, we evaluated the accuracy of imputation with low-density genotyping arrays in the Dutch Holstein population. Five different sizes of genotyping arrays were tested, from 384 to 6,000 SNP. According to marker density, the overall allelic imputation error rate obtained with the program DAGPHASE, which relies on linkage disequilibrium and linkage, ranged from 11.7 to 2.0%, and that obtained with the program CHROMIBD, which relies on linkage and the set of all genotyped ancestors, ranged from 10.7 to 3.3%. However, imputation efficiency was influenced by the relationship between low-density and high-density genotyped animals. Animals with both parents genotyped had particularly low imputation error rates: <1% with 1,500 SNP or more. In summary, missing marker alleles can be predicted with 3 to 4% errors with approximately 1 SNP/Mb (approximately 3,000 markers). The CHROMIBD program proved more efficient than DAGPHASE only at lower marker densities or when several genotyped ancestors were available. Future studies are required to measure the effect of these imputation error rates on accuracy of genomic selection with low-density SNP panels.
Collapse
Affiliation(s)
- Z Zhang
- Faculty of Veterinary Medicine, University of Liège, Liège, Belgium
| | | |
Collapse
|
74
|
Abstract
Empirical experience with genomic selection in dairy cattle suggests that the distribution of the effects of single nucleotide polymorphisms (SNPs) might be far from normality for some traits. An alternative, avoiding the use of arbitrary prior information, is the Bayesian Lasso (BL). Regular BL uses a common variance parameter for residual and SNP effects (BL1Var). We propose here a BL with different residual and SNP effect variances (BL2Var), equivalent to the original Lasso formulation. The λ parameter in Lasso is related to genetic variation in the population. We also suggest precomputing individual variances of SNP effects by BL2Var, to be later used in a linear mixed model (HetVar-GBLUP). Models were tested in a cross-validation design including 1756 Holstein and 678 Montbéliarde French bulls, with 1216 and 451 bulls used as training data; 51 325 and 49 625 polymorphic SNP were used. Milk production traits were tested. Other methods tested included linear mixed models using variances inferred from pedigree estimates or integrated out from the data. Estimates of genetic variation in the population were close to pedigree estimates in BL2Var but not in BL1Var. BL1Var shrank breeding values too little because of the common variance. BL2Var was the most accurate method for prediction and accommodated well major genes, in particular for fat percentage. BL1Var was the least accurate. HetVar-GBLUP was almost as accurate as BL2Var and allows for simple computations and extensions.
Collapse
|
75
|
L2-Boosting algorithm applied to high-dimensional problems in genomic selection. Genet Res (Camb) 2010; 92:227-37. [PMID: 20667166 DOI: 10.1017/s0016672310000261] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
The L(2)-Boosting algorithm is one of the most promising machine-learning techniques that has appeared in recent decades. It may be applied to high-dimensional problems such as whole-genome studies, and it is relatively simple from a computational point of view. In this study, we used this algorithm in a genomic selection context to make predictions of yet to be observed outcomes. Two data sets were used: (1) productive lifetime predicted transmitting abilities from 4702 Holstein sires genotyped for 32 611 single nucleotide polymorphisms (SNPs) derived from the Illumina BovineSNP50 BeadChip, and (2) progeny averages of food conversion rate, pre-corrected by environmental and mate effects, in 394 broilers genotyped for 3481 SNPs. Each of these data sets was split into training and testing sets, the latter comprising dairy or broiler sires whose ancestors were in the training set. Two weak learners, ordinary least squares (OLS) and non-parametric (NP) regression were used for the L2-Boosting algorithm, to provide a stringent evaluation of the procedure. This algorithm was compared with BL [Bayesian LASSO (least absolute shrinkage and selection operator)] and BayesA regression. Learning tasks were carried out in the training set, whereas validation of the models was performed in the testing set. Pearson correlations between predicted and observed responses in the dairy cattle (broiler) data set were 0.65 (0.33), 0.53 (0.37), 0.66 (0.26) and 0.63 (0.27) for OLS-Boosting, NP-Boosting, BL and BayesA, respectively. The smallest bias and mean-squared errors (MSEs) were obtained with OLS-Boosting in both the dairy cattle (0.08 and 1.08, respectively) and broiler (-0.011 and 0.006) data sets, respectively. In the dairy cattle data set, the BL was more accurate (bias=0.10 and MSE=1.10) than BayesA (bias=1.26 and MSE=2.81), whereas no differences between these two methods were found in the broiler data set. L2-Boosting with a suitable learner was found to be a competitive alternative for genomic selection applications, providing high accuracy and low bias in genomic-assisted evaluations with a relatively short computational time.
Collapse
|
76
|
de los Campos G, Gianola D, Allison DB. Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat Rev Genet 2010; 11:880-6. [DOI: 10.1038/nrg2898] [Citation(s) in RCA: 211] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
77
|
Weigel K, de los Campos G, Vazquez A, Rosa G, Gianola D, Van Tassell C. Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. J Dairy Sci 2010; 93:5423-35. [DOI: 10.3168/jds.2010-3149] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2010] [Accepted: 07/05/2010] [Indexed: 01/22/2023]
|
78
|
Weigel KA, Van Tassell CP, O'Connell JR, VanRaden PM, Wiggans GR. Prediction of unobserved single nucleotide polymorphism genotypes of Jersey cattle using reference panels and population-based imputation algorithms. J Dairy Sci 2010; 93:2229-38. [PMID: 20412938 DOI: 10.3168/jds.2009-2849] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2009] [Accepted: 01/04/2010] [Indexed: 12/30/2022]
Abstract
The availability of dense single nucleotide polymorphism (SNP) genotypes for dairy cattle has created exciting research opportunities and revolutionized practical breeding programs. Broader application of this technology will lead to situations in which genotypes from different low-, medium-, or high-density platforms must be combined. In this case, missing SNP genotypes can be imputed using family- or population-based algorithms. Our objective was to evaluate the accuracy of imputation in Jersey cattle, using reference panels comprising 2,542 animals with 43,385 SNP genotypes and study samples of 604 animals for which genotypes were available for 1, 2, 5, 10, 20, 40, or 80% of loci. Two population-based algorithms, fastPHASE 1.2 (P. Scheet and M. Stevens; University of Washington TechTransfer Digital Ventures Program, Seattle, WA) and IMPUTE 2.0 (B. Howie and J. Marchini; Department of Statistics, University of Oxford, UK), were used to impute genotypes on Bos taurus autosomes 1, 15, and 28. The mean proportion of genotypes imputed correctly ranged from 0.659 to 0.801 when 1 to 2% of genotypes were available in the study samples, from 0.733 to 0.964 when 5 to 20% of genotypes were available, and from 0.896 to 0.995 when 40 to 80% of genotypes were available. In the absence of pedigrees or genotypes of close relatives, the accuracy of imputation may be modest (generally <0.80) when low-density platforms with fewer than 1,000 SNP are used, but population-based algorithms can provide reasonably good accuracy (0.80 to 0.95) when medium-density platforms of 2,000 to 4,000 SNP are used in conjunction with high-density genotypes (e.g., >40,000 SNP) from a reference population. Accurate imputation of high-density genotypes from inexpensive low- or medium-density platforms could greatly enhance the efficiency of whole-genome selection programs in dairy cattle.
Collapse
Affiliation(s)
- K A Weigel
- Department of Dairy Science, University of Wisconsin, Madison, Wisconsin 53706, USA.
| | | | | | | | | |
Collapse
|
79
|
Moser G, Khatkar MS, Hayes BJ, Raadsma HW. Accuracy of direct genomic values in Holstein bulls and cows using subsets of SNP markers. Genet Sel Evol 2010; 42:37. [PMID: 20950478 PMCID: PMC2964565 DOI: 10.1186/1297-9686-42-37] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Accepted: 10/16/2010] [Indexed: 08/26/2023] Open
Abstract
Background At the current price, the use of high-density single nucleotide polymorphisms (SNP) genotyping assays in genomic selection of dairy cattle is limited to applications involving elite sires and dams. The objective of this study was to evaluate the use of low-density assays to predict direct genomic value (DGV) on five milk production traits, an overall conformation trait, a survival index, and two profit index traits (APR, ASI). Methods Dense SNP genotypes were available for 42,576 SNP for 2,114 Holstein bulls and 510 cows. A subset of 1,847 bulls born between 1955 and 2004 was used as a training set to fit models with various sets of pre-selected SNP. A group of 297 bulls born between 2001 and 2004 and all cows born between 1992 and 2004 were used to evaluate the accuracy of DGV prediction. Ridge regression (RR) and partial least squares regression (PLSR) were used to derive prediction equations and to rank SNP based on the absolute value of the regression coefficients. Four alternative strategies were applied to select subset of SNP, namely: subsets of the highest ranked SNP for each individual trait, or a single subset of evenly spaced SNP, where SNP were selected based on their rank for ASI, APR or minor allele frequency within intervals of approximately equal length. Results RR and PLSR performed very similarly to predict DGV, with PLSR performing better for low-density assays and RR for higher-density SNP sets. When using all SNP, DGV predictions for production traits, which have a higher heritability, were more accurate (0.52-0.64) than for survival (0.19-0.20), which has a low heritability. The gain in accuracy using subsets that included the highest ranked SNP for each trait was marginal (5-6%) over a common set of evenly spaced SNP when at least 3,000 SNP were used. Subsets containing 3,000 SNP provided more than 90% of the accuracy that could be achieved with a high-density assay for cows, and 80% of the high-density assay for young bulls. Conclusions Accurate genomic evaluation of the broader bull and cow population can be achieved with a single genotyping assays containing ~ 3,000 to 5,000 evenly spaced SNP.
Collapse
Affiliation(s)
- Gerhard Moser
- Dairy Futures Cooperative Research Centre, Australia.
| | | | | | | |
Collapse
|
80
|
Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 2010; 186:713-24. [PMID: 20813882 DOI: 10.1534/genetics.110.118521] [Citation(s) in RCA: 402] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The availability of dense molecular markers has made possible the use of genomic selection (GS) for plant breeding. However, the evaluation of models for GS in real plant populations is very limited. This article evaluates the performance of parametric and semiparametric models for GS using wheat (Triticum aestivum L.) and maize (Zea mays) data in which different traits were measured in several environmental conditions. The findings, based on extensive cross-validations, indicate that models including marker information had higher predictive ability than pedigree-based models. In the wheat data set, and relative to a pedigree model, gains in predictive ability due to inclusion of markers ranged from 7.7 to 35.7%. Correlation between observed and predictive values in the maize data set achieved values up to 0.79. Estimates of marker effects were different across environmental conditions, indicating that genotype × environment interaction is an important component of genetic variability. These results indicate that GS in plant breeding can be an effective strategy for selecting among lines whose phenotypes have yet to be observed.
Collapse
|
81
|
Bayesian inference of genetic parameters based on conditional decompositions of multivariate normal distributions. Genetics 2010; 185:645-54. [PMID: 20351218 DOI: 10.1534/genetics.110.114249] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
It is widely recognized that the mixed linear model is an important tool for parameter estimation in the analysis of complex pedigrees, which includes both pedigree and genomic information, and where mutually dependent genetic factors are often assumed to follow multivariate normal distributions of high dimension. We have developed a Bayesian statistical method based on the decomposition of the multivariate normal prior distribution into products of conditional univariate distributions. This procedure permits computationally demanding genetic evaluations of complex pedigrees, within the user-friendly computer package WinBUGS. To demonstrate and evaluate the flexibility of the method, we analyzed two example pedigrees: a large noninbred pedigree of Scots pine (Pinus sylvestris L.) that includes additive and dominance polygenic relationships and a simulated pedigree where genomic relationships have been calculated on the basis of a dense marker map. The analysis showed that our method was fast and provided accurate estimates and that it should therefore be a helpful tool for estimating genetic parameters of complex pedigrees quickly and reliably.
Collapse
|
82
|
Garrick DJ, Taylor JF, Fernando RL. Deregressing estimated breeding values and weighting information for genomic regression analyses. Genet Sel Evol 2009; 41:55. [PMID: 20043827 PMCID: PMC2817680 DOI: 10.1186/1297-9686-41-55] [Citation(s) in RCA: 417] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2009] [Accepted: 12/31/2009] [Indexed: 11/15/2022] Open
Abstract
Background Genomic prediction of breeding values involves a so-called training analysis that predicts the influence of small genomic regions by regression of observed information on marker genotypes for a given population of individuals. Available observations may take the form of individual phenotypes, repeated observations, records on close family members such as progeny, estimated breeding values (EBV) or their deregressed counterparts from genetic evaluations. The literature indicates that researchers are inconsistent in their approach to using EBV or deregressed data, and as to using the appropriate methods for weighting some data sources to account for heterogeneous variance. Methods A logical approach to using information for genomic prediction is introduced, which demonstrates the appropriate weights for analyzing observations with heterogeneous variance and explains the need for and the manner in which EBV should have parent average effects removed, be deregressed and weighted. Results An appropriate deregression for genomic regression analyses is EBV/r2 where EBV excludes parent information and r2 is the reliability of that EBV. The appropriate weights for deregressed breeding values are neither the reliability nor the prediction error variance, two alternatives that have been used in published studies, but the ratio (1 - h2)/[(c + (1 - r2)/r2)h2] where c > 0 is the fraction of genetic variance not explained by markers. Conclusions Phenotypic information on some individuals and deregressed data on others can be combined in genomic analyses using appropriate weighting.
Collapse
Affiliation(s)
- Dorian J Garrick
- Department of Animal Science, Iowa State University, Ames, IA 50011, USA.
| | | | | |
Collapse
|