1
|
Li C, Yang Q, Liu B, Shi X, Liu Z, Yang C, Wang T, Xiao F, Zhang M, Shi A, Yan L. Ability of Genomic Prediction to Bi-Parent-Derived Breeding Population Using Public Data for Soybean Oil and Protein Content. PLANTS (BASEL, SWITZERLAND) 2024; 13:1260. [PMID: 38732474 PMCID: PMC11085238 DOI: 10.3390/plants13091260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 04/21/2024] [Accepted: 04/29/2024] [Indexed: 05/13/2024]
Abstract
Genomic selection (GS) is a marker-based selection method used to improve the genetic gain of quantitative traits in plant breeding. A large number of breeding datasets are available in the soybean database, and the application of these public datasets in GS will improve breeding efficiency and reduce time and cost. However, the most important problem to be solved is how to improve the ability of across-population prediction. The objectives of this study were to perform genomic prediction (GP) and estimate the prediction ability (PA) for seed oil and protein contents in soybean using available public datasets to predict breeding populations in current, ongoing breeding programs. In this study, six public datasets of USDA GRIN soybean germplasm accessions with available phenotypic data of seed oil and protein contents from different experimental populations and their genotypic data of single-nucleotide polymorphisms (SNPs) were used to perform GP and to predict a bi-parent-derived breeding population in our experiment. The average PA was 0.55 and 0.50 for seed oil and protein contents within the bi-parents population according to the within-population prediction; and 0.45 for oil and 0.39 for protein content when the six USDA populations were combined and employed as training sets to predict the bi-parent-derived population. The results showed that four USDA-cultivated populations can be used as a training set individually or combined to predict oil and protein contents in GS when using 800 or more USDA germplasm accessions as a training set. The smaller the genetic distance between training population and testing population, the higher the PA. The PA increased as the population size increased. In across-population prediction, no significant difference was observed in PA for oil and protein content among different models. The PA increased as the SNP number increased until a marker set consisted of 10,000 SNPs. This study provides reasonable suggestions and methods for breeders to utilize public datasets for GS. It will aid breeders in developing GS-assisted breeding strategies to develop elite soybean cultivars with high oil and protein contents.
Collapse
Affiliation(s)
- Chenhui Li
- College of Life Sciences, Hebei Agricultural University, Baoding 071001, China;
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Qing Yang
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Bingqiang Liu
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Xiaolei Shi
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Zhi Liu
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Chunyan Yang
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Tao Wang
- Handan Academy of Agricultural Science, Handan 056001, China; (T.W.); (F.X.)
| | - Fuming Xiao
- Handan Academy of Agricultural Science, Handan 056001, China; (T.W.); (F.X.)
| | - Mengchen Zhang
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Ainong Shi
- Department of Horticulture, University of Arkansas, Fayetteville, AR 72701, USA
| | - Long Yan
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| |
Collapse
|
2
|
Bermann M, Legarra A, Munera AA, Misztal I, Lourenco D. Confidence intervals for validation statistics with data truncation in genomic prediction. Genet Sel Evol 2024; 56:18. [PMID: 38459504 PMCID: PMC11234739 DOI: 10.1186/s12711-024-00883-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 01/31/2024] [Indexed: 03/10/2024] Open
Abstract
BACKGROUND Validation by data truncation is a common practice in genetic evaluations because of the interest in predicting the genetic merit of a set of young selection candidates. Two of the most used validation methods in genetic evaluations use a single data partition: predictivity or predictive ability (correlation between pre-adjusted phenotypes and estimated breeding values (EBV) divided by the square root of the heritability) and the linear regression (LR) method (comparison of "early" and "late" EBV). Both methods compare predictions with the whole dataset and a partial dataset that is obtained by removing the information related to a set of validation individuals. EBV obtained with the partial dataset are compared against adjusted phenotypes for the predictivity or EBV obtained with the whole dataset in the LR method. Confidence intervals for predictivity and the LR method can be obtained by replicating the validation for different samples (or folds), or bootstrapping. Analytical confidence intervals would be beneficial to avoid running several validations and to test the quality of the bootstrap intervals. However, analytical confidence intervals are unavailable for predictivity and the LR method. RESULTS We derived standard errors and Wald confidence intervals for the predictivity and statistics included in the LR method (bias, dispersion, ratio of accuracies, and reliability). The confidence intervals for the bias, dispersion, and reliability depend on the relationships and prediction error variances and covariances across the individuals in the validation set. We developed approximations for large datasets that only need the reliabilities of the individuals in the validation set. The confidence intervals for the ratio of accuracies and predictivity were obtained through the Fisher transformation. We show the adequacy of both the analytical and approximated analytical confidence intervals and compare them versus bootstrap confidence intervals using two simulated examples. The analytical confidence intervals were closer to the simulated ones for both examples. Bootstrap confidence intervals tend to be narrower than the simulated ones. The approximated analytical confidence intervals were similar to those obtained by bootstrapping. CONCLUSIONS Estimating the sampling variation of predictivity and the statistics in the LR method without replication or bootstrap is possible for any dataset with the formulas presented in this study.
Collapse
Affiliation(s)
- Matias Bermann
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.
| | - Andres Legarra
- Council on Dairy Cattle Breeding (CDCB), Bowie, MD, 20716, USA
| | | | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| |
Collapse
|
3
|
Yan Q, Fruzangohar M, Taylor J, Gong D, Walter J, Norman A, Shi JQ, Coram T. Improved genomic prediction using machine learning with Variational Bayesian sparsity. PLANT METHODS 2023; 19:96. [PMID: 37660084 PMCID: PMC10474716 DOI: 10.1186/s13007-023-01073-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 08/22/2023] [Indexed: 09/04/2023]
Abstract
BACKGROUND Genomic prediction has become a powerful modelling tool for assessing line performance in plant and livestock breeding programmes. Among the genomic prediction modelling approaches, linear based models have proven to provide accurate predictions even when the number of genetic markers exceeds the number of data samples. However, breeding programmes are now compiling data from large numbers of lines and test environments for analyses, rendering these approaches computationally prohibitive. Machine learning (ML) now offers a solution to this problem through the construction of fully connected deep learning architectures and high parallelisation of the predictive task. However, the fully connected nature of these architectures immediately generates an over-parameterisation of the network that needs addressing for efficient and accurate predictions. RESULTS In this research we explore the use of an ML architecture governed by variational Bayesian sparsity in its initial layers that we have called VBS-ML. The use of VBS-ML provides a mechanism for feature selection of important markers linked to the trait, immediately reducing the network over-parameterisation. Selected markers then propagate to the remaining fully connected feed-forward components of the ML network to form the final genomic prediction. We illustrated the approach with four large Australian wheat breeding data sets that range from 2665 lines to 10375 lines genotyped across a large set of markers. For all data sets, the use of the VBS-ML architecture improved genomic prediction accuracy over legacy linear based modelling approaches. CONCLUSIONS An ML architecture governed under a variational Bayesian paradigm was shown to improve genomic prediction accuracy over legacy modelling approaches. This VBS-ML approach can be used to dramatically decrease the parameter burden on the network and provide a computationally feasible approach for improving genomic prediction conducted with large breeding population numbers and genetic markers.
Collapse
Affiliation(s)
- Qingsen Yan
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Mario Fruzangohar
- School of Food, Agriculture and Wine, University of Adelaide, Adelaide, Australia
| | - Julian Taylor
- School of Food, Agriculture and Wine, University of Adelaide, Adelaide, Australia
| | - Dong Gong
- School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia
| | - James Walter
- Australian Grains Technologies, Roseworthy, Australia
| | - Adam Norman
- Australian Grains Technologies, Roseworthy, Australia
| | - Javen Qinfeng Shi
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, Australia
| | - Tristan Coram
- Australian Grains Technologies, Roseworthy, Australia
| |
Collapse
|
4
|
Pravia MI, Navajas EA, Aguilar I, Ravagnolo O. Prediction ability of an alternative multi-trait genomic evaluation for residual feed intake. J Anim Breed Genet 2023; 140:508-518. [PMID: 37186475 DOI: 10.1111/jbg.12775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/04/2023] [Accepted: 04/06/2023] [Indexed: 05/17/2023]
Abstract
Selection for feed efficiency is the goal for many genetic breeding programs in beef cattle. Residual feed intake has been included in genetic evaluations to reduce feed intake without compromising performance traits as liveweight, body gain or carcass traits. However, measuring feed intake is expensive, and only a small percentage of selection candidates are phenotyped. Genomic selection has become a very important tool to achieve effective genetic progress in these traits. Another effective strategy has been the implementation of multi-trait prediction using easily recordable predictor traits on both reference animals and candidates without phenotypes, and this could be another inexpensive way to increase accuracy. The objective of this work was to analyse and compare the prediction ability of two alternative different approaches to predict GEBVs for RFI. The population of inference was Hereford bulls in Uruguay that were genotyped candidates for to selection. The first model was the conventional univariate model for RFI and the second model was a multi-trait model which included a predictor trait (weaning weight, WW), in addition to the traits used in the first one (dry matter intake, metabolic mid test weight, average daily gain and ultrasound back fat) (DMI, MWT, ADG, UBF, respectively). GEBVs from the multi-trait model were combined using selection index theory to derive RFI values. All analyses were performed using ssGBLUP procedure. The prediction ability of both models was tested using two validation strategies (30 different replicates of random groups of animals and validation across 9 different feed intake tests). The prediction quality was assessed by the following parameters: bias, dispersion, ratio of accuracies and the relative increase in accuracy by adding phenotypic information. All parameters showed that the univariate model outperforms the multi-trait model, regardless of the validation strategy considered. These results indicate that including WW as a proxy trait in a multi-trait analysis does not improve the prediction ability when all animals to be predicted are genotyped.
Collapse
Affiliation(s)
- Maria Isabel Pravia
- Instituto Nacional de Investigación Agropecuaria, INIA Uruguay, Canelones, Uruguay
| | - Elly Ana Navajas
- Instituto Nacional de Investigación Agropecuaria, INIA Uruguay, Canelones, Uruguay
| | - Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria, INIA Uruguay, Canelones, Uruguay
| | - Olga Ravagnolo
- Instituto Nacional de Investigación Agropecuaria, INIA Uruguay, Canelones, Uruguay
| |
Collapse
|
5
|
Wang K, Abid MA, Rasheed A, Crossa J, Hearne S, Li H. DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. MOLECULAR PLANT 2023; 16:279-293. [PMID: 36366781 DOI: 10.1016/j.molp.2022.11.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 09/28/2022] [Accepted: 11/08/2022] [Indexed: 06/16/2023]
Abstract
Genomic prediction is an effective way to accelerate the rate of agronomic trait improvement in plants. Traditional methods typically use linear regression models with clear assumptions; such methods are unable to capture the complex relationships between genotypes and phenotypes. Non-linear models (e.g., deep neural networks) have been proposed as a superior alternative to linear models because they can capture complex non-additive effects. Here we introduce a deep learning (DL) method, deep neural network genomic prediction (DNNGP), for integration of multi-omics data in plants. We trained DNNGP on four datasets and compared its performance with methods built with five classic models: genomic best linear unbiased prediction (GBLUP); two methods based on a machine learning (ML) framework, light gradient boosting machine (LightGBM) and support vector regression (SVR); and two methods based on a DL framework, deep learning genomic selection (DeepGS) and deep learning genome-wide association study (DLGWAS). DNNGP is novel in five ways. First, it can be applied to a variety of omics data to predict phenotypes. Second, the multilayered hierarchical structure of DNNGP dynamically learns features from raw data, avoiding overfitting and improving the convergence rate using a batch normalization layer and early stopping and rectified linear activation (rectified linear unit) functions. Third, when small datasets were used, DNNGP produced results that are competitive with results from the other five methods, showing greater prediction accuracy than the other methods when large-scale breeding data were used. Fourth, the computation time required by DNNGP was comparable with that of commonly used methods, up to 10 times faster than DeepGS. Fifth, hyperparameters can easily be batch tuned on a local machine. Compared with GBLUP, LightGBM, SVR, DeepGS and DLGWAS, DNNGP is superior to these existing widely used genomic selection (GS) methods. Moreover, DNNGP can generate robust assessments from diverse datasets, including omics data, and quickly incorporate complex and large datasets into usable models, making it a promising and practical approach for straightforward integration into existing GS platforms.
Collapse
Affiliation(s)
- Kelin Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT - China Office, 12 Zhongguancun South Street, Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China
| | | | - Awais Rasheed
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT - China Office, 12 Zhongguancun South Street, Beijing 100081, China; Department of Plant Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, Texcoco, D.F. 06600, Mexico
| | - Sarah Hearne
- International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, Texcoco, D.F. 06600, Mexico
| | - Huihui Li
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT - China Office, 12 Zhongguancun South Street, Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China.
| |
Collapse
|
6
|
Gianola D, Fernando RL, Schön CC. Inference about quantitative traits under selection: a Bayesian revisitation for the post-genomic era. Genet Sel Evol 2022; 54:78. [PMID: 36460973 PMCID: PMC9716705 DOI: 10.1186/s12711-022-00765-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 10/26/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Selection schemes distort inference when estimating differences between treatments or genetic associations between traits, and may degrade prediction of outcomes, e.g., the expected performance of the progeny of an individual with a certain genotype. If input and output measurements are not collected on random samples, inferences and predictions must be biased to some degree. Our paper revisits inference in quantitative genetics when using samples stemming from some selection process. The approach used integrates the classical notion of fitness with that of missing data. Treatment is fully Bayesian, with inference and prediction dealt with, in an unified manner. While focus is on animal and plant breeding, concepts apply to natural selection as well. Examples based on real data and stylized models illustrate how selection can be accounted for in four different situations, and sometimes without success. RESULTS Our flexible "soft selection" setting helps to diagnose the extent to which selection can be ignored. The clear connection between probability of missingness and the concept of fitness in stylized selection scenarios is highlighted. It is not realistic to assume that a fixed selection threshold t holds in conceptual replication, as the chance of selection depends on observed and unobserved data, and on unequal amounts of information over individuals, aspects that a "soft" selection representation addresses explicitly. There does not seem to be a general prescription to accommodate potential distortions due to selection. In structures that combine cross-sectional, longitudinal and multi-trait data such as in animal breeding, balance is the exception rather than the rule. The Bayesian approach provides an integrated answer to inference, prediction and model choice under selection that goes beyond the likelihood-based approach, where breeding values are inferred indirectly. CONCLUSIONS The approach used here for inference and prediction under selection may or may not yield the best possible answers. One may believe that selection has been accounted for diligently, but the central problem of whether statistical inferences are good or bad does not have an unambiguous solution. On the other hand, the quality of predictions can be gauged empirically via appropriate training-testing of competing methods.
Collapse
Affiliation(s)
- Daniel Gianola
- grid.28803.310000 0001 0701 8607Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI USA
| | - Rohan L. Fernando
- grid.34421.300000 0004 1936 7312Department of Animal Science, Iowa State University, Ames, IA USA
| | - Chris C. Schön
- grid.6936.a0000000123222966Department of Plant Breeding, Technical University of Munich, Freising, Germany
| |
Collapse
|
7
|
Nantongo JS, Potts BM, Klápště J, Graham NJ, Dungey HS, Fitzgerald H, O'Reilly-Wapstra JM. Genomic selection for resistance to mammalian bark stripping and associated chemical compounds in radiata pine. G3 (BETHESDA, MD.) 2022; 12:jkac245. [PMID: 36218439 PMCID: PMC9635650 DOI: 10.1093/g3journal/jkac245] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Accepted: 08/29/2022] [Indexed: 07/28/2023]
Abstract
The integration of genomic data into genetic evaluations can facilitate the rapid selection of superior genotypes and accelerate the breeding cycle in trees. In this study, 390 trees from 74 control-pollinated families were genotyped using a 36K Axiom SNP array. A total of 15,624 high-quality SNPs were used to develop genomic prediction models for mammalian bark stripping, tree height, and selected primary and secondary chemical compounds in the bark. Genetic parameters from different genomic prediction methods-single-trait best linear unbiased prediction based on a marker-based relationship matrix (genomic best linear unbiased prediction), multitrait single-step genomic best linear unbiased prediction, which integrated the marker-based and pedigree-based relationship matrices (single-step genomic best linear unbiased prediction) and the single-trait generalized ridge regression-were compared to equivalent single- or multitrait pedigree-based approaches (ABLUP). The influence of the statistical distribution of data on the genetic parameters was assessed. Results indicated that the heritability estimates were increased nearly 2-fold with genomic models compared to the equivalent pedigree-based models. Predictive accuracy of the single-step genomic best linear unbiased prediction was higher than the ABLUP for most traits. Allowing for heterogeneity in marker effects through the use of generalized ridge regression did not markedly improve predictive ability over genomic best linear unbiased prediction, arguing that most of the chemical traits are modulated by many genes with small effects. Overall, the traits with low pedigree-based heritability benefited more from genomic models compared to the traits with high pedigree-based heritability. There was no evidence that data skewness or the presence of outliers affected the genomic or pedigree-based genetic estimates.
Collapse
Affiliation(s)
- Judith S Nantongo
- Corresponding author: National Agricultural Research Organization, P.O Box 1752, Mukono, Uganda.
| | - Brad M Potts
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
- ARC Training Centre for Forest Value, Hobart, TAS 7001, Australia
| | - Jaroslav Klápště
- Scion (New Zealand Forest Research Institute Ltd.), Rotorua 3046, New Zealand
| | - Natalie J Graham
- Scion (New Zealand Forest Research Institute Ltd.), Rotorua 3046, New Zealand
| | - Heidi S Dungey
- Scion (New Zealand Forest Research Institute Ltd.), Rotorua 3046, New Zealand
| | - Hugh Fitzgerald
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
| | - Julianne M O'Reilly-Wapstra
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
- ARC Training Centre for Forest Value, Hobart, TAS 7001, Australia
| |
Collapse
|
8
|
Bermann M, Cesarani A, Misztal I, Lourenco D. Past, present, and future developments in single-step genomic models. ITALIAN JOURNAL OF ANIMAL SCIENCE 2022. [DOI: 10.1080/1828051x.2022.2053366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Matias Bermann
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Alberto Cesarani
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
- Dipartimento di Agraria, Università degli Studi di Sassari, Sassari, Italy
| | - Ignacy Misztal
- Dipartimento di Agraria, Università degli Studi di Sassari, Sassari, Italy
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| |
Collapse
|
9
|
Puglisi D, Visioni A, Ozkan H, Kara İ, Lo Piero AR, Rachdad FE, Tondelli A, Valè G, Cattivelli L, Fricano A. High accuracy of genome-enabled prediction of belowground and physiological traits in barley seedlings. G3 GENES|GENOMES|GENETICS 2022; 12:6517783. [PMID: 35099521 PMCID: PMC8895982 DOI: 10.1093/g3journal/jkac022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 01/21/2022] [Indexed: 11/24/2022]
Abstract
In plants, the study of belowground traits is gaining momentum due to their importance on yield formation and the uptake of water and nutrients. In several cereal crops, seminal root number and seminal root angle are proxy traits of the root system architecture at the mature stages, which in turn contributes to modulating the uptake of water and nutrients. Along with seminal root number and seminal root angle, experimental evidence indicates that the transpiration rate response to evaporative demand or vapor pressure deficit is a key physiological trait that might be targeted to cope with drought tolerance as the reduction of the water flux to leaves for limiting transpiration rate at high levels of vapor pressure deficit allows to better manage soil moisture. In the present study, we examined the phenotypic diversity of seminal root number, seminal root angle, and transpiration rate at the seedling stage in a panel of 8-way Multiparent Advanced Generation Inter-Crosses lines of winter barley and correlated these traits with grain yield measured in different site-by-season combinations. Second, phenotypic and genotypic data of the Multiparent Advanced Generation Inter-Crosses population were combined to fit and cross-validate different genomic prediction models for these belowground and physiological traits. Genomic prediction models for seminal root number were fitted using threshold and log-normal models, considering these data as ordinal discrete variable and as count data, respectively, while for seminal root angle and transpiration rate, genomic prediction was implemented using models based on extended genomic best linear unbiased predictors. The results presented in this study show that genome-enabled prediction models of seminal root number, seminal root angle, and transpiration rate data have high predictive ability and that the best models investigated in the present study include first-order additive × additive epistatic interaction effects. Our analyses indicate that beyond grain yield, genomic prediction models might be used to predict belowground and physiological traits and pave the way to practical applications for barley improvement.
Collapse
Affiliation(s)
- Damiano Puglisi
- Dipartimento di Agricoltura, Alimentazione e Ambiente (Di3A), Università di Catania , 95123 Catania, Italy
| | - Andrea Visioni
- Biodiversity and Crop Improvement Program, International Center for Agricultural Research in the Dry Areas , 6299 Rabat, Morocco
| | - Hakan Ozkan
- Faculty of Agriculture, Department of Field Crops, University of Cukurova , 01330 Adana, Turkey
| | - İbrahim Kara
- Bahri Dagdas International Agricultural Research Institute , Km Karatay/Konya 42020, Turkey
| | - Angela Roberta Lo Piero
- Dipartimento di Agricoltura, Alimentazione e Ambiente (Di3A), Università di Catania , 95123 Catania, Italy
| | - Fatima Ezzahra Rachdad
- Biodiversity and Crop Improvement Program, International Center for Agricultural Research in the Dry Areas , 6299 Rabat, Morocco
- Faculty of Sciences Ben M’sik, Department of Biology, Environment and Ecology Laboratory, Hassan II University of Casablanca , 7955 Casablanca, Morocco
| | - Alessandro Tondelli
- Council for Agricultural Research and Economics—Research Centre for Genomics and Bioinformatics , 29017 Fiorenzuola d’Arda (PC), Italy
| | - Giampiero Valè
- DiSIT, Dipartimento di Scienze e Innovazione Tecnologica, Università del Piemonte Orientale , 13100 Vercelli, Italy
| | - Luigi Cattivelli
- Council for Agricultural Research and Economics—Research Centre for Genomics and Bioinformatics , 29017 Fiorenzuola d’Arda (PC), Italy
| | - Agostino Fricano
- Council for Agricultural Research and Economics—Research Centre for Genomics and Bioinformatics , 29017 Fiorenzuola d’Arda (PC), Italy
| |
Collapse
|
10
|
Bartholomé J, Prakash PT, Cobb JN. Genomic Prediction: Progress and Perspectives for Rice Improvement. Methods Mol Biol 2022; 2467:569-617. [PMID: 35451791 DOI: 10.1007/978-1-0716-2205-6_21] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genomic prediction can be a powerful tool to achieve greater rates of genetic gain for quantitative traits if thoroughly integrated into a breeding strategy. In rice as in other crops, the interest in genomic prediction is very strong with a number of studies addressing multiple aspects of its use, ranging from the more conceptual to the more practical. In this chapter, we review the literature on rice (Oryza sativa) and summarize important considerations for the integration of genomic prediction in breeding programs. The irrigated breeding program at the International Rice Research Institute is used as a concrete example on which we provide data and R scripts to reproduce the analysis but also to highlight practical challenges regarding the use of predictions. The adage "To someone with a hammer, everything looks like a nail" describes a common psychological pitfall that sometimes plagues the integration and application of new technologies to a discipline. We have designed this chapter to help rice breeders avoid that pitfall and appreciate the benefits and limitations of applying genomic prediction, as it is not always the best approach nor the first step to increasing the rate of genetic gain in every context.
Collapse
Affiliation(s)
- Jérôme Bartholomé
- CIRAD, UMR AGAP Institut, Montpellier, France.
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France.
- Rice Breeding Platform, International Rice Research Institute, Manila, Philippines.
| | | | | |
Collapse
|
11
|
Elsen JM. Genomic Prediction of Complex Traits, Principles, Overview of Factors Affecting the Reliability of Genomic Prediction, and Algebra of the Reliability. Methods Mol Biol 2022; 2467:45-76. [PMID: 35451772 DOI: 10.1007/978-1-0716-2205-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The quality of the predictions of genetic values based on the genotyping of neutral markers (GEBVs) is a key information to decide whether or not to implement genomic selection. This quality depends on the part of the genetic variability captured by the markers and on the precision of the estimate of their effects. Selection index theory provided the framework for evaluating the accuracy of GEBVs once the information had been gathered, with the genomic relationship matrix (GRM) playing a central role. When this accuracy must be known a priori, the theory of quantitative genetics gives clues to calculate the expectation of this GRM. This chapter makes a critical inventory of the methods developed to calculate these accuracies a posteriori and a priori. The most significant factors affecting this accuracy are described (size of the reference population, number of markers, linkage disequilibrium, heritability).
Collapse
Affiliation(s)
- Jean-Michel Elsen
- GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France.
| |
Collapse
|
12
|
Dekkers JCM, Su H, Cheng J. Predicting the accuracy of genomic predictions. Genet Sel Evol 2021; 53:55. [PMID: 34187354 PMCID: PMC8244147 DOI: 10.1186/s12711-021-00647-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 06/11/2021] [Indexed: 11/22/2022] Open
Abstract
Background Mathematical models are needed for the design of breeding programs using genomic prediction. While deterministic models for selection on pedigree-based estimates of breeding values (PEBV) are available, these have not been fully developed for genomic selection, with a key missing component being the accuracy of genomic EBV (GEBV) of selection candidates. Here, a deterministic method was developed to predict this accuracy within a closed breeding population based on the accuracy of GEBV and PEBV in the reference population and the distance of selection candidates from their closest ancestors in the reference population. Methods The accuracy of GEBV was modeled as a combination of the accuracy of PEBV and of EBV based on genomic relationships deviated from pedigree (DEBV). Loss of the accuracy of DEBV from the reference to the target population was modeled based on the effective number of independent chromosome segments in the reference population (Me). Measures of Me derived from the inverse of the variance of relationships and from the accuracies of GEBV and PEBV in the reference population, derived using either a Fisher information or a selection index approach, were compared by simulation. Results Using simulation, both the Fisher and the selection index approach correctly predicted accuracy in the target population over time, both with and without selection. The index approach, however, resulted in estimates of Me that were less affected by heritability, reference size, and selection, and which are, therefore, more appropriate as a population parameter. The variance of relationships underpredicted Me and was greatly affected by selection. A leave-one-out cross-validation approach was proposed to estimate required accuracies of EBV in the reference population. Aspects of the methods were validated using real data. Conclusions A deterministic method was developed to predict the accuracy of GEBV in selection candidates in a closed breeding population. The population parameter Me that is required for these predictions can be derived from an available reference data set, and applied to other reference data sets and traits for that population. This method can be used to evaluate the benefit of genomic prediction and to optimize genomic selection breeding programs. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-021-00647-w.
Collapse
Affiliation(s)
- Jack C M Dekkers
- Department of Animal Science, Iowa State University, Ames, Iowa, USA.
| | - Hailin Su
- Department of Animal Science, Iowa State University, Ames, Iowa, USA
| | - Jian Cheng
- Department of Animal Science, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
13
|
Puglisi D, Delbono S, Visioni A, Ozkan H, Kara İ, Casas AM, Igartua E, Valè G, Piero ARL, Cattivelli L, Tondelli A, Fricano A. Genomic Prediction of Grain Yield in a Barley MAGIC Population Modeling Genotype per Environment Interaction. FRONTIERS IN PLANT SCIENCE 2021; 12:664148. [PMID: 34108982 PMCID: PMC8183822 DOI: 10.3389/fpls.2021.664148] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 04/26/2021] [Indexed: 06/12/2023]
Abstract
Multi-parent Advanced Generation Inter-crosses (MAGIC) lines have mosaic genomes that are generated shuffling the genetic material of the founder parents following pre-defined crossing schemes. In cereal crops, these experimental populations have been extensively used to investigate the genetic bases of several traits and dissect the genetic bases of epistasis. In plants, genomic prediction models are usually fitted using either diverse panels of mostly unrelated accessions or individuals of biparental families and several empirical analyses have been conducted to evaluate the predictive ability of models fitted to these populations using different traits. In this paper, we constructed, genotyped and evaluated a barley MAGIC population of 352 individuals developed with a diverse set of eight founder parents showing contrasting phenotypes for grain yield. We combined phenotypic and genotypic information of this MAGIC population to fit several genomic prediction models which were cross-validated to conduct empirical analyses aimed at examining the predictive ability of these models varying the sizes of training populations. Moreover, several methods to optimize the composition of the training population were also applied to this MAGIC population and cross-validated to estimate the resulting predictive ability. Finally, extensive phenotypic data generated in field trials organized across an ample range of water regimes and climatic conditions in the Mediterranean were used to fit and cross-validate multi-environment genomic prediction models including G×E interaction, using both genomic best linear unbiased prediction and reproducing kernel Hilbert space along with a non-linear Gaussian Kernel. Overall, our empirical analyses showed that genomic prediction models trained with a limited number of MAGIC lines can be used to predict grain yield with values of predictive ability that vary from 0.25 to 0.60 and that beyond QTL mapping and analysis of epistatic effects, MAGIC population might be used to successfully fit genomic prediction models. We concluded that for grain yield, the single-environment genomic prediction models examined in this study are equivalent in terms of predictive ability while, in general, multi-environment models that explicitly split marker effects in main and environmental-specific effects outperform simpler multi-environment models.
Collapse
Affiliation(s)
- Damiano Puglisi
- Dipartimento di Agricoltura, Alimentazione e Ambiente (Di3A), Università di Catania, Catania, Italy
| | - Stefano Delbono
- Council for Agricultural Research and Economics–Research Centre for Genomics and Bioinformatics, Fiorenzuola d’Arda, Italy
| | - Andrea Visioni
- Biodiversity and Crop Improvement Program, International Center for Agricultural Research in the Dry Areas, Avenue Hafiane Cherkaoui, Rabat, Morocco
| | - Hakan Ozkan
- Department of Field Crops, Faculty of Agriculture, University of Cukurova, Adana, Turkey
| | - İbrahim Kara
- Bahri Dagdas International Agricultural Research Institute, Konya, Turkey
| | - Ana M. Casas
- Aula Dei Experimental Station (EEAD-CSIC), Spanish Research Council, Zaragoza, Spain
| | - Ernesto Igartua
- Aula Dei Experimental Station (EEAD-CSIC), Spanish Research Council, Zaragoza, Spain
| | - Giampiero Valè
- DiSIT, Dipartimento di Scienze e Innovazione Tecnologica, Università del Piemonte Orientale, Vercelli, Italy
| | - Angela Roberta Lo Piero
- Dipartimento di Agricoltura, Alimentazione e Ambiente (Di3A), Università di Catania, Catania, Italy
| | - Luigi Cattivelli
- Council for Agricultural Research and Economics–Research Centre for Genomics and Bioinformatics, Fiorenzuola d’Arda, Italy
| | - Alessandro Tondelli
- Council for Agricultural Research and Economics–Research Centre for Genomics and Bioinformatics, Fiorenzuola d’Arda, Italy
| | - Agostino Fricano
- Council for Agricultural Research and Economics–Research Centre for Genomics and Bioinformatics, Fiorenzuola d’Arda, Italy
| |
Collapse
|
14
|
Bermann M, Lourenco D, Breen V, Hawken R, Brito Lopes F, Misztal I. Modeling genetic differences of combined broiler chicken populations in single-step GBLUP. J Anim Sci 2021; 99:6154135. [PMID: 33649764 PMCID: PMC8355479 DOI: 10.1093/jas/skab056] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 02/17/2021] [Indexed: 11/13/2022] Open
Abstract
The introduction of animals from a different environment or population is a common practice in commercial livestock populations. In this study, we modeled the inclusion of a group of external birds into a local broiler chicken population for the purpose of genomic evaluations. The pedigree was composed of 242,413 birds and genotypes were available for 107,216 birds. A five-trait model that included one growth, two yield, and two efficiency traits was used for the analyses. The strategies to model the introduction of external birds were to include a fixed effect representing the origin of parents and to use unknown parent groups (UPG) or metafounders (MF). Genomic estimated breeding values (GEBV) were obtained with single-step GBLUP using the Algorithm for Proven and Young. Bias, dispersion, and accuracy of GEBV for the validation birds, that is, from the most recent generation, were computed. The bias and dispersion were estimated with the linear regression (LR) method,whereas accuracy was estimated by the LR method and predictive ability. When fixed UPG were fit without estimated inbreeding, the model did not converge. In contrast, models with fixed UPG and estimated inbreeding or random UPG converged and resulted in similar GEBV. The inclusion of an extra fixed effect in the model made the GEBV unbiased and reduced the inflation. Genomic predictions with MF were slightly biased and inflated due to the unbalanced number of observations assigned to each metafounder. When combining local and external populations, the greatest accuracy can be obtained by adding an extra fixed effect to account for the origin of parents plus UPG with estimated inbreeding or random UPG. To estimate the accuracy, the LR method is more consistent among scenarios, whereas the predictive ability greatly depends on the model specification.
Collapse
Affiliation(s)
- Matias Bermann
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Vivian Breen
- Cobb-Vantress Inc., Siloam Springs, AR 72761, USA
| | | | | | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| |
Collapse
|
15
|
Cheng J, Dekkers JCM, Fernando RL. Cross-validation of best linear unbiased predictions of breeding values using an efficient leave-one-out strategy. J Anim Breed Genet 2021; 138:519-527. [PMID: 33729622 DOI: 10.1111/jbg.12545] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 02/06/2021] [Accepted: 02/20/2021] [Indexed: 01/22/2023]
Abstract
Empirical estimates of the accuracy of estimates of breeding values (EBV) can be obtained by cross-validation. Leave-one-out cross-validation (LOOCV) is an extreme case of k-fold cross-validation. Efficient strategies for LOOCV of predictions of phenotypes have been developed for a simple model with an overall mean and random marker or animal genetic effects. The objective here was to develop and evaluate an efficient LOOCV method for prediction of breeding values and other random effects under a general mixed linear model with multiple random effects. Conventional LOOCV of EBV requires inverting an (n-1)×(n-1) covariance matrix for each of n (= number of observations) data sets. Our efficient LOOCV obtains the required inverses from the inverse of the covariance matrix for all n observations. The efficient method can be applied to complex models with multiple fixed and random effects, but requires fixed effects to be treated as random, with large variances. An alternative is to precorrect observations using estimates of fixed effects obtained from the complete data, but this can lead to biases. The efficient LOOCV method was compared to conventional LOOCV of predictions of breeding values in terms of computational demands and accuracy. For a data set with 3,205 observations and a model with multiple random and fixed effects, the efficient LOOCV method was 962 times faster than the conventional LOOCV with precorrection for fixed effects based on each training data set but resulted in identical EBV. A computationally efficient LOOCV for prediction of breeding values for single- and multiple-trait mixed models with multiple fixed and random effects was successfully developed. The method enables cross-validation of predictions of breeding values and of any linear combination of random and/or fixed effects, along with leave-one-out precorrection of validation phenotypes.
Collapse
Affiliation(s)
- Jian Cheng
- Department of Animal Science, Iowa State University, Ames, IA, USA
| | - Jack C M Dekkers
- Department of Animal Science, Iowa State University, Ames, IA, USA
| | - Rohan L Fernando
- Department of Animal Science, Iowa State University, Ames, IA, USA
| |
Collapse
|
16
|
Xu Y, Zhao Y, Wang X, Ma Y, Li P, Yang Z, Zhang X, Xu C, Xu S. Incorporation of parental phenotypic data into multi-omic models improves prediction of yield-related traits in hybrid rice. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:261-272. [PMID: 32738177 PMCID: PMC7868986 DOI: 10.1111/pbi.13458] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 06/14/2020] [Accepted: 07/22/2020] [Indexed: 05/15/2023]
Abstract
Hybrid breeding has been shown to effectively increase rice productivity. However, identifying desirable hybrids out of numerous potential combinations is a daunting challenge. Genomic selection holds great promise for accelerating hybrid breeding by enabling early selection before phenotypes are measured. With the recent advances in multi-omic technologies, hybrid prediction based on transcriptomic and metabolomic data has received increasing attention. However, the current omic-based hybrid prediction has ignored parental phenotypic information, which is of fundamental importance in plant breeding. In this study, we integrated parental phenotypic information into various multi-omic prediction models applied in hybrid breeding of rice and compared the predictabilities of 15 combinations from four sets of predictors from the parents, that is genome, transcriptome, metabolome and phenome. The predictability for each combination was evaluated using the best linear unbiased prediction and a modified fast HAT method. We found significant interactions between predictors and traits in predictability, but joint prediction with various combinations of the predictors significantly improved predictability relative to prediction of any single source omic data for each trait investigated. Incorporation of parental phenotypic data into various omic predictors increased the predictability, averagely by 13.6%, 54.5%, 19.9% and 8.3%, for grain yield, number of tillers per plant, number of grains per panicle and 1000 grain weight, respectively. Among nine models of incorporating parental traits, the AD-All model was the most effective one. This novel strategy of incorporating parental phenotypic data into multi-omic prediction is expected to improve hybrid breeding progress, especially with the development of high-throughput phenotyping technologies.
Collapse
Affiliation(s)
- Yang Xu
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Yue Zhao
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Xin Wang
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Ying Ma
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Pengcheng Li
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Zefeng Yang
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Xuecai Zhang
- International Maize and Wheat Improvement Center (CIMMYT)MexicoDFMexico
| | - Chenwu Xu
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Shizhong Xu
- Department of Botany and Plant SciencesUniversity of CaliforniaRiversideCAUSA
| |
Collapse
|
17
|
Identification of candidate genes encoding tumor-specific neoantigens in early- and late-stage colon adenocarcinoma. Aging (Albany NY) 2021; 13:4024-4044. [PMID: 33428592 PMCID: PMC7906157 DOI: 10.18632/aging.202370] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 10/31/2020] [Indexed: 12/24/2022]
Abstract
Colon adenocarcinoma (COAD) is one of the most common gastrointestinal malignant tumors and is characterized by a high mortality rate. Here, we integrated whole-exome and RNA sequencing data from The Cancer Genome Atlas and investigated the mutational spectra of COAD-overexpressed genes to define clinically relevant diagnostic/prognostic signatures and to unmask functional relationships with both tumor-infiltrating immune cells and regulatory miRNAs. We identified 24 recurrently mutated genes (frequency > 5%) encoding putative COAD-specific neoantigens. Five of them (NEB, DNAH2, ABCA12, CENPF and CELSR1) had not been previously reported as COAD biomarkers. Through machine learning-based feature selection, four early-stage-related (COL11A1, TG, SOX9, and DNAH2) and four late-stage-related (COL11A1, SOX9, TG and BRCA2) candidate neoantigen-encoding genes were selected as diagnostic signatures. They respectively showed 100% and 97% accuracy in predicting early- and late-stage patients, and an 8-gene signature had excellent prognostic performance predicting disease-free survival (DFS) in COAD patients. We also found significant correlations between the 24 candidate neoantigen genes and the abundance and/or activation status of 22 tumor-infiltrating immune cell types and 56 regulatory miRNAs. Our novel neoantigen-based signatures may improve diagnostic and prognostic accuracy and help design targeted immunotherapies for COAD treatment.
Collapse
|
18
|
Jia M, Li Z, Pan M, Tao M, Lu X, Liu Y. Evaluation of immune infiltrating of thyroid cancer based on the intrinsic correlation between pair-wise immune genes. Life Sci 2020; 259:118248. [PMID: 32791153 DOI: 10.1016/j.lfs.2020.118248] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 07/09/2020] [Accepted: 08/07/2020] [Indexed: 10/23/2022]
Abstract
INTRODUCTION Unlike most mutation-driven cancers, thyroid cancer is thought to be highly dependent on changes in human hormone levels. It has become research hotspot using the change of gene expression level as a detection and diagnostic marker. The internal relationship between two genes and disease development is used to avoid the instability caused by single gene fluctuation. Aim It is possible to achieve early diagnosis in thyroid cancer during tumorigenesis and recurrence using IGPS (immune gene pairs). METHODS We extracted thyroid cancer data from The Cancer Genome Atlas (TCGA), using CIBERSORT algorithm to infiltrate out 22 immune cells types. We screened out IGPS that differ significantly between different groups, then used LinearSVC model to learn and screen features, combined with deep learning neural network model to predict benign and malignant cancer as well as patients at different groups. KEY FINDINGS There are significant differences of immune cell ratio in tumor stages and relapse samples. We screen out 42 and 64 IGPS for in normal-tumor and non-relapsed groups respectively, for example ASCC3-MAP3K7 and ATF2-SOCS5, have significant correlation in IGPS expression. Then we use the IGPS to train the tumor diagnostic classifier, obtain average AUC are both 0.99 after ten times cross-validation. SIGNIFICANCE The IGPS gives us new insight to explore immune cell infiltration of thyroid cancer, deep learning model can be further used in early diagnosis of thyroid cancer and estimation of the risk of recurrence.
Collapse
Affiliation(s)
- Meng Jia
- Thyroid Surgery, the First Affiliated Hospital of Zhengzhou University, Henan, 450052 Zhengzhou, China
| | - Zhuyao Li
- Thyroid Surgery, the First Affiliated Hospital of Zhengzhou University, Henan, 450052 Zhengzhou, China
| | - Mengjiao Pan
- Thyroid Surgery, the First Affiliated Hospital of Zhengzhou University, Henan, 450052 Zhengzhou, China
| | - Mei Tao
- Thyroid Surgery, the First Affiliated Hospital of Zhengzhou University, Henan, 450052 Zhengzhou, China
| | - Xiubo Lu
- Thyroid Surgery, the First Affiliated Hospital of Zhengzhou University, Henan, 450052 Zhengzhou, China.
| | - Yang Liu
- Department of Radiotherapy, Henan Cancer Hospital and the Affiliated Cancer Hospital of Zhengzhou University, Zhengzhou 450008, China.
| |
Collapse
|
19
|
Bermann M, Legarra A, Hollifield MK, Masuda Y, Lourenco D, Misztal I. Validation of single-step GBLUP genomic predictions from threshold models using the linear regression method: An application in chicken mortality. J Anim Breed Genet 2020; 138:4-13. [PMID: 32985749 PMCID: PMC7756448 DOI: 10.1111/jbg.12507] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 07/27/2020] [Accepted: 08/18/2020] [Indexed: 11/30/2022]
Abstract
The objective of this study was to determine whether the linear regression (LR) method could be used to validate genomic threshold models. Statistics for the LR method were computed from estimated breeding values (EBVs) using the whole and truncated data sets with variances from the reference and validation populations. The method was tested using simulated and real chicken data sets. The simulated data set included 10 generations of 4,500 birds each; genotypes were available for the last three generations. Each animal was assigned a continuous trait, which was converted to a binary score assuming an incidence of failure of 7%. The real data set included the survival status of 186,596 broilers (mortality rate equal to 7.2%) and genotypes of 18,047 birds. Both data sets were analysed using best linear unbiased predictor (BLUP) or single-step GBLUP (ssGBLUP). The whole data set included all phenotypes available, whereas in the partial data set, phenotypes of the most recent generation were removed. In the simulated data set, the accuracies based on the LR formulas were 0.45 for BLUP and 0.76 for ssGBLUP, whereas the correlations between true breeding values and EBVs (i.e. true accuracies) were 0.37 and 0.65, respectively. The gain in accuracy by adding genomic information was overestimated by 0.09 when using the LR method compared to the true increase in accuracy. However, when the estimated ratio between the additive variance computed based on pedigree only and on pedigree and genomic information was considered, the difference between true and estimated gain was <0.02. Accuracies of BLUP and ssGBLUP with the real data set were 0.41 and 0.47, respectively. This small improvement in accuracy when using ssGBLUP with the real data set was due to population structure and lower heritability. The LR method is a useful tool for estimating improvements in accuracy of EBVs due to the inclusion of genomic information when traditional validation methods as k-fold validation and predictive ability are not applicable.
Collapse
Affiliation(s)
- Matias Bermann
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | | | | | - Yutaka Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA
| |
Collapse
|
20
|
Bresolin T, Dórea JRR. Infrared Spectrometry as a High-Throughput Phenotyping Technology to Predict Complex Traits in Livestock Systems. Front Genet 2020; 11:923. [PMID: 32973876 PMCID: PMC7468402 DOI: 10.3389/fgene.2020.00923] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 07/24/2020] [Indexed: 12/17/2022] Open
Abstract
High-throughput phenotyping technologies are growing in importance in livestock systems due to their ability to generate real-time, non-invasive, and accurate animal-level information. Collecting such individual-level information can generate novel traits and potentially improve animal selection and management decisions in livestock operations. One of the most relevant tools used in the dairy and beef industry to predict complex traits is infrared spectrometry, which is based on the analysis of the interaction between electromagnetic radiation and matter. The infrared electromagnetic radiation spans an enormous range of wavelengths and frequencies known as the electromagnetic spectrum. The spectrum is divided into different regions, with near- and mid-infrared regions being the main spectral regions used in livestock applications. The advantage of using infrared spectrometry includes speed, non-destructive measurement, and great potential for on-line analysis. This paper aims to review the use of mid- and near-infrared spectrometry techniques as tools to predict complex dairy and beef phenotypes, such as milk composition, feed efficiency, methane emission, fertility, energy balance, health status, and meat quality traits. Although several research studies have used these technologies to predict a wide range of phenotypes, most of them are based on Partial Least Squares (PLS) and did not considered other machine learning (ML) techniques to improve prediction quality. Therefore, we will discuss the role of analytical methods employed on spectral data to improve the predictive ability for complex traits in livestock operations. Furthermore, we will discuss different approaches to reduce data dimensionality and the impact of validation strategies on predictive quality.
Collapse
Affiliation(s)
- Tiago Bresolin
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI, United States
| | - João R R Dórea
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|
21
|
Zhang L, Giuste F, Vizcarra JC, Li X, Gutman D. Radiomics Features Predict CIC Mutation Status in Lower Grade Glioma. Front Oncol 2020; 10:937. [PMID: 32676453 PMCID: PMC7333647 DOI: 10.3389/fonc.2020.00937] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2019] [Accepted: 05/12/2020] [Indexed: 12/15/2022] Open
Abstract
MRI in combination with genomic markers are critical in the management of gliomas. Radiomics and radiogenomics analysis facilitate the quantitative assessment of tumor properties which can be used to model both molecular subtype and predict disease progression. In this work, we report on the Drosophila gene capicua (CIC) mutation biomarker effects alongside radiomics features on the predictive ability of CIC mutation status in lower-grade gliomas (LGG). Genomic data of lower grade glioma (LGG) patients from The Cancer Genome Atlas (TCGA) (n = 509) and corresponding MR images from TCIA (n = 120) were utilized. Following tumor segmentation, radiomics features were extracted from T1, T2, T2 Flair, and T1 contrast enhanced (CE) images. Lasso feature reduction was used to obtain the most important MR image features and then logistic regression used to predict CIC mutation status. In our study, CIC mutation rarely occurred in Astrocytoma but has a high probability of occurrence in Oligodendroglioma. The presence of CIC mutation was found to be associated with better survival of glioma patients (p < 1e−4, HR: 0.2445), even with co-occurrence of IDH mutation and 1p/19q co-deletion (p = 0.0362, HR: 0.3674). An eleven-feature model achieved glioma prediction accuracy of 94.2% (95% CI, 94.03–94.38%), a six-feature model achieved oligodendroglioma prediction accuracy of 92.3% (95% CI, 91.70–92.92%). MR imaging and its derived image of gliomas with CIC mutation appears more complex and non-uniform but are associated with lower malignancy. Our study identified CIC as a potential prognostic factor in glioma which has close associations with survival. MRI radiomic features could predict CIC mutation, and reflect less malignant manifestations such as milder necrosis and larger tumor volume in MRI and its derived images that could help clinical judgment.
Collapse
Affiliation(s)
- Luyuan Zhang
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China.,Department of Neurosurgery, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Felipe Giuste
- Department of Biomedical Engineering of the Georgia Institute of Technology, Emory University, Atlanta, GA, United States
| | - Juan C Vizcarra
- Department of Biomedical Engineering of the Georgia Institute of Technology, Emory University, Atlanta, GA, United States
| | - Xuejun Li
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - David Gutman
- Department of Neurology, Emory University, Atlanta, GA, United States
| |
Collapse
|
22
|
Lopes F, Rosa G, Pinedo P, Santos JEP, Chebel RC, Galvao KN, Schuenemann GM, Bicalho RC, Gilbert RO, Rodrigez-Zas S, Seabury CM, Thatcher W. Genome-enable prediction for health traits using high-density SNP panel in US Holstein cattle. Anim Genet 2020; 51:192-199. [PMID: 31909828 PMCID: PMC7065151 DOI: 10.1111/age.12892] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/22/2019] [Indexed: 11/29/2022]
Abstract
The objective of this study was to compare accuracies of different Bayesian regression models in predicting molecular breeding values for health traits in Holstein cattle. The dataset was composed of 2505 records reporting the occurrence of retained fetal membranes (RFM), metritis (MET), mastitis (MAST), displaced abomasum (DA), lameness (LS), clinical endometritis (CE), respiratory disease (RD), dystocia (DYST) and subclinical ketosis (SCK) in Holstein cows, collected between 2012 and 2014 in 16 dairies located across the US. Cows were genotyped with the Illumina BovineHD (HD, 777K). The quality controls for SNP genotypes were HWE P‐value of at least 1 × 10−10; MAF greater than 0.01 and call rate greater than 0.95. The fimpute program was used for imputation of missing SNP markers. The effect of each SNP was estimated using the Bayesian Ridge Regression (BRR), Bayes A, Bayes B and Bayes Cπ methods. The prediction quality was assessed by the area under the curve, the prediction mean square error and the correlation between genomic breeding value and the observed phenotype, using a leave‐one‐out cross‐validation technique that avoids iterative cross‐validation. The highest accuracies of predictions achieved were: RFM [Bayes B (0.34)], MET [BRR (0.36)], MAST [Bayes B (0.55), DA [Bayes Cπ (0.26)], LS [Bayes A (0.12)], CE [Bayes A (0.32)], RD [Bayes Cπ (0.23)], DYST [Bayes A (0.35)] and SCK [Bayes Cπ (0.38)] models. Except for DA, LS and RD, the predictive abilities were similar between the methods. A strong relationship between the predictive ability and the heritability of the trait was observed, where traits with higher heritability achieved higher accuracy and lower bias when compared with those with low heritability. Overall, it has been shown that a high‐density SNP panel can be used successfully to predict genomic breeding values of health traits in Holstein cattle and that the model of choice will depend mostly on the genetic architecture of the trait.
Collapse
Affiliation(s)
- F Lopes
- Department of Animal Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - G Rosa
- Department of Animal Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - P Pinedo
- Department of Animal Sciences, Colorado State University, Fort Collins, CO, 80521, USA
| | - J E P Santos
- Department of Animal Sciences, University of Florida, Gainesville, FL, 32611, USA
| | - R C Chebel
- College of Veterinary Medicine, University of Florida, Gainesville, FL, 32611, USA
| | - K N Galvao
- College of Veterinary Medicine, University of Florida, Gainesville, FL, 32611, USA
| | - G M Schuenemann
- College of Veterinary Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - R C Bicalho
- College of Veterinary Medicine, Cornell University, Ithaca, NY, 14850, USA
| | - R O Gilbert
- School of Veterinary Medicine, Ross University, Saint Kitts, Saint Kitts and Nevis, West Indies
| | - S Rodrigez-Zas
- Department of Animal Sciences, University of Illinois, Urbana-Champaign, IL, 61790, USA
| | - C M Seabury
- College of Veterinary Medicine, Texas A&M University, College Station, TX, 77843, USA
| | - W Thatcher
- Department of Animal Sciences, University of Florida, Gainesville, FL, 32611, USA
| |
Collapse
|
23
|
Runcie D, Cheng H. Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods. G3 (BETHESDA, MD.) 2019; 9:3727-3741. [PMID: 31511297 PMCID: PMC6829121 DOI: 10.1534/g3.119.400598] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 09/10/2019] [Indexed: 01/08/2023]
Abstract
Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful.
Collapse
Affiliation(s)
| | - Hao Cheng
- Department of Animal Science, University of California Davis, Davis, CA 95616
| |
Collapse
|
24
|
Waldmann P. On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction. Front Genet 2019; 10:899. [PMID: 31632436 PMCID: PMC6781837 DOI: 10.3389/fgene.2019.00899] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 08/23/2019] [Indexed: 01/24/2023] Open
Abstract
The large number of markers in genome-wide prediction demands the use of methods with regularization and model comparison based on some hold-out test prediction error measure. In quantitative genetics, it is common practice to calculate the Pearson correlation coefficient (r2 ) as a standardized measure of the predictive accuracy of a model. Based on arguments from the bias-variance trade-off theory in statistical learning, we show that shrinkage of the regression coefficients (i.e., QTL effects) reduces the prediction mean squared error (MSE) by introducing model bias compared with the ordinary least squares method. We also show that the LASSO and the adaptive LASSO (ALASSO) can reduce the model bias and prediction MSE by adding model variance. In an application of ridge regression, the LASSO and ALASSO to a simulated example based on results for 9,723 SNPs and 3,226 individuals, the best model selected was with the LASSO when r2 was used as a measure. However, when model selection was based on test MSE and coefficient of determination R2 the ALASSO proved to be the best method. Hence, use of r2 may lead to selection of the wrong model and therefore also nonoptimal ranking of phenotype predictions and genomic breeding values. Instead, we propose use of the test MSE for model selection and R2 as a standardized measure of the accuracy.
Collapse
Affiliation(s)
- Patrik Waldmann
- Department of Animal Breeding and Genetics, The Swedish Universiy of Agricultural Sciences, SLU, Uppsala, Sweden
| |
Collapse
|
25
|
Velazco JG, Malosetti M, Hunt CH, Mace ES, Jordan DR, van Eeuwijk FA. Combining pedigree and genomic information to improve prediction quality: an example in sorghum. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:2055-2067. [PMID: 30968160 PMCID: PMC6588709 DOI: 10.1007/s00122-019-03337-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 03/26/2019] [Indexed: 05/10/2023]
Abstract
The use of a kinship matrix integrating pedigree- and marker-based relationships optimized the performance of genomic prediction in sorghum, especially for traits of lower heritability. Selection based on genome-wide markers has become an active breeding strategy in crops. Genomic prediction models can make use of pedigree information to account for the residual polygenic effects not captured by markers. Our aim was to evaluate the impact of using pedigree and genomic information on prediction quality of breeding values for different traits in sorghum. We explored BLUP models that use weighted combinations of pedigree and genomic relationship matrices. The optimal weighting factor was empirically determined in order to maximize predictive ability after evaluating a range of candidate weights. The phenotypic data consisted of testcross evaluations of sorghum parental lines across multiple environments. All lines were genotyped, and full pedigree information was available. The performance of the best predictive combined matrix was compared to that of models fitting the component matrices independently. Model performance was assessed using cross-validation technique. Fitting a combined pedigree-genomic matrix with the optimal weight always yielded the largest increases in predictive ability and the largest reductions in prediction bias relative to the simple G-BLUP. However, the weight that optimized prediction varied across traits. The benefits of including pedigree information in the genomic model were more relevant for traits with lower heritability, such as grain yield and stay-green. Our results suggest that the combination of pedigree and genomic relatedness can be used to optimize predictions of complex traits in crops when the additive variation is not fully explained by markers.
Collapse
Affiliation(s)
- Julio G Velazco
- Department of Plant Breeding, National Institute of Agricultural Technology (INTA), EEA Pergamino, B2700WAA, Pergamino, Argentina
- Biometris, Wageningen University and Research, 6700AA, Wageningen, The Netherlands
| | - Marcos Malosetti
- Biometris, Wageningen University and Research, 6700AA, Wageningen, The Netherlands
| | - Colleen H Hunt
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Hermitage Research Facility, Warwick, QLD, 4370, Australia
- Department of Agriculture and Fisheries, Hermitage Research Facility, Warwick, QLD, 4370, Australia
| | - Emma S Mace
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Hermitage Research Facility, Warwick, QLD, 4370, Australia
- Department of Agriculture and Fisheries, Hermitage Research Facility, Warwick, QLD, 4370, Australia
| | - David R Jordan
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Hermitage Research Facility, Warwick, QLD, 4370, Australia
| | - Fred A van Eeuwijk
- Biometris, Wageningen University and Research, 6700AA, Wageningen, The Netherlands.
| |
Collapse
|
26
|
High-frequency marker haplotypes in the genomic selection of dairy cattle. J Appl Genet 2019; 60:179-186. [PMID: 30877657 PMCID: PMC6483952 DOI: 10.1007/s13353-019-00489-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 01/18/2019] [Accepted: 02/28/2019] [Indexed: 11/05/2022]
Abstract
The aim of this study was to predict the genomic breeding value (DGV) of production, selected conformation and reproductive traits, and somatic cell score of dairy cattle in Poland using high-frequency marker haplotypes. The dataset consisted of phenotypic, genotypic, and pedigree data of 1216 Polish Holstein-Friesian bulls. The genotypic data consisted of 54,000 single-nucleotide polymorphisms (SNPs). The data were divided into two subsets: a test dataset (n = 1064) and a validation dataset (n = 152). Genotypic data were selected using three criteria: the percentage of missing genotypes, minor allele frequency, and linkage disequilibrium. The purpose of the data selection was to identify blocks of SNPs that were then used for the construction of haplotypes. Only haplotypes with a frequency higher than 25% were selected. DGV was predicted using four variants of a linear model with random haplotype effects and deregressed breeding values as the response variables. The accuracy of genomic prediction was checked by comparing DGVs with estimated breeding values (EBVs) using two methods: Pearson’s correlations and the regression of EBV on DGV. The use of high-frequency haplotypes showed a tendency to underestimate DGVs. None of the models tested was clearly superior with regard to the traits studied. DGVs of production and conformation traits as well as somatic cell score (medium or high heritability traits) were more accurate than those estimated for fertility traits (low heritability traits).
Collapse
|
27
|
Li Z, Gao N, Martini JWR, Simianer H. Integrating Gene Expression Data Into Genomic Prediction. Front Genet 2019; 10:126. [PMID: 30858865 PMCID: PMC6397893 DOI: 10.3389/fgene.2019.00126] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2018] [Accepted: 02/04/2019] [Indexed: 01/14/2023] Open
Abstract
Gene expression profiles potentially hold valuable information for the prediction of breeding values and phenotypes. In this study, the utility of transcriptome data for phenotype prediction was tested with 185 inbred lines of Drosophila melanogaster for nine traits in two sexes. We incorporated the transcriptome data into genomic prediction via two methods: GTBLUP and GRBLUP, both combining single nucleotide polymorphisms (SNPs) and transcriptome data. The genotypic data was used to construct the common additive genomic relationship, which was used in genomic best linear unbiased prediction (GBLUP) or jointly in a linear mixed model with a transcriptome-based linear kernel (GTBLUP), or with a transcriptome-based Gaussian kernel (GRBLUP). We studied the predictive ability of the models and discuss a concept of "omics-augmented broad sense heritability" for the multi-omics era. For most traits, GRBLUP and GBLUP provided similar predictive abilities, but GRBLUP explained more of the phenotypic variance. There was only one trait (olfactory perception to Ethyl Butyrate in females) in which the predictive ability of GRBLUP (0.23) was significantly higher than the predictive ability of GBLUP (0.21). Our results suggest that accounting for transcriptome data has the potential to improve genomic predictions if transcriptome data can be included on a larger scale.
Collapse
Affiliation(s)
- Zhengcao Li
- Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Göttingen, Göttingen, Germany
| | - Ning Gao
- State Key Laboratory of Biocontrol, Guangzhou Higher Education Mega Center, School of Life Science, Sun Yat-sen University, Guangzhou, China
| | | | - Henner Simianer
- Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Göttingen, Göttingen, Germany
| |
Collapse
|
28
|
Ma W, Qiu Z, Song J, Li J, Cheng Q, Zhai J, Ma C. A deep convolutional neural network approach for predicting phenotypes from genotypes. PLANTA 2018; 248:1307-1318. [PMID: 30101399 DOI: 10.1007/s00425-018-2976-9] [Citation(s) in RCA: 92] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 07/11/2018] [Indexed: 05/21/2023]
Abstract
Deep learning is a promising technology to accurately select individuals with high phenotypic values based on genotypic data. Genomic selection (GS) is a promising breeding strategy by which the phenotypes of plant individuals are usually predicted based on genome-wide markers of genotypes. In this study, we present a deep learning method, named DeepGS, to predict phenotypes from genotypes. Using a deep convolutional neural network, DeepGS uses hidden variables that jointly represent features in genotypes when making predictions; it also employs convolution, sampling and dropout strategies to reduce the complexity of high-dimensional genotypic data. We used a large GS dataset to train DeepGS and compared its performance with other methods. The experimental results indicate that DeepGS can be used as a complement to the commonly used RR-BLUP in the prediction of phenotypes from genotypes. The complementarity between DeepGS and RR-BLUP can be utilized using an ensemble learning approach for more accurately selecting individuals with high phenotypic values, even for the absence of outlier individuals and subsets of genotypic markers. The source codes of DeepGS and the ensemble learning approach have been packaged into Docker images for facilitating their applications in different GS programs.
Collapse
Affiliation(s)
- Wenlong Ma
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Zhixu Qiu
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China
- Biomass Energy Center for Arid and Semi-arid Lands, Northwest A&F University, Shaanxi, 712100, Yangling, China
| | - Jie Song
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Jiajia Li
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China
- Biomass Energy Center for Arid and Semi-arid Lands, Northwest A&F University, Shaanxi, 712100, Yangling, China
| | - Qian Cheng
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China
- Biomass Energy Center for Arid and Semi-arid Lands, Northwest A&F University, Shaanxi, 712100, Yangling, China
| | - Jingjing Zhai
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Chuang Ma
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China.
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, 712100, Shaanxi, China.
| |
Collapse
|
29
|
Gianola D, Cecchinato A, Naya H, Schön CC. Prediction of Complex Traits: Robust Alternatives to Best Linear Unbiased Prediction. Front Genet 2018; 9:195. [PMID: 29951082 PMCID: PMC6008589 DOI: 10.3389/fgene.2018.00195] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Accepted: 05/14/2018] [Indexed: 12/05/2022] Open
Abstract
A widely used method for prediction of complex traits in animal and plant breeding is “genomic best linear unbiased prediction” (GBLUP). In a quantitative genetics setting, BLUP is a linear regression of phenotypes on a pedigree or on a genomic relationship matrix, depending on the type of input information available. Normality of the distributions of random effects and of model residuals is not required for BLUP but a Gaussian assumption is made implicitly. A potential downside is that Gaussian linear regressions are sensitive to outliers, genetic or environmental in origin. We present simple (relative to a fully Bayesian analysis) to implement robust alternatives to BLUP using a linear model with residual t or Laplace distributions instead of a Gaussian one, and evaluate the methods with milk yield records on Italian Brown Swiss cattle, grain yield data in inbred wheat lines, and using three traits measured on accessions of Arabidopsis thaliana. The methods do not use Markov chain Monte Carlo sampling and model hyper-parameters, viewed here as regularization “knobs,” are tuned via some cross-validation. Uncertainty of predictions are evaluated by employing bootstrapping or by random reconstruction of training and testing sets. It was found (e.g., test-day milk yield in cows, flowering time and FRIGIDA expression in Arabidopsis) that the best predictions were often those obtained with the robust methods. The results obtained are encouraging and stimulate further investigation and generalization.
Collapse
Affiliation(s)
- Daniel Gianola
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, United States.,Department of Dairy Science, University of Wisconsin-Madison, Madison, WI, United States.,Department of Plant Sciences, TUM School of Life Sciences, Technical University of Munich, Munich, Germany.,Department of Agronomy, Food Natural Resources, Animals and Environment, Università degli Studi di Padova, Padova, Italy.,Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Alessio Cecchinato
- Department of Agronomy, Food Natural Resources, Animals and Environment, Università degli Studi di Padova, Padova, Italy
| | - Hugo Naya
- Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Chris-Carolin Schön
- Department of Plant Sciences, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| |
Collapse
|
30
|
Fritsche-Neto R, Akdemir D, Jannink JL. Accuracy of genomic selection to predict maize single-crosses obtained through different mating designs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2018; 131:1153-1162. [PMID: 29445844 DOI: 10.1007/s00122-018-3068-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2017] [Accepted: 02/08/2018] [Indexed: 05/02/2023]
Abstract
Testcross is the worst mating design to use as a training set to predict maize single-crosses that would be obtained through full diallel or North Carolina design II. Even though many papers have been published about genomic prediction (GP) in maize, the best mating design to build the training population has not been defined yet. Such design must maximize the accuracy given constraints on costs and on the logistics of the crosses to be made. Hence, the aims of this work were: (1) empirically evaluate the effect of the mating designs, used as training set, on genomic selection to predict maize single-crosses obtained through full diallel and North Carolina design II, (2) and identify the possibility of reducing the number of crosses and parents to compose these training sets. Our results suggest that testcross is the worst mating design to use as a training set to predict maize single-crosses that would be obtained through full diallel or North Carolina design II. Moreover, North Carolina design II is the best training set to predict hybrids taken from full diallel. However, hybrids from full diallel and North Carolina design II can be well predicted using optimized training sets, which also allow reducing the total number of crosses to be made. Nevertheless, the number of parents and the crosses per parent in the training sets should be maximized.
Collapse
Affiliation(s)
- Roberto Fritsche-Neto
- Department of Genetics, "Luiz de Queiroz" Agriculture College, University of São Paulo, Piracicaba, São Paulo, Brazil.
| | | | - Jean-Luc Jannink
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY, USA
| |
Collapse
|
31
|
Fritsche-Neto R, Akdemir D, Jannink JL. Accuracy of genomic selection to predict maize single-crosses obtained through different mating designs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2018. [PMID: 29445844 DOI: 10.1007/s00122‐018‐3068‐8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 09/29/2022]
Abstract
KEY MESSAGE Testcross is the worst mating design to use as a training set to predict maize single-crosses that would be obtained through full diallel or North Carolina design II. Even though many papers have been published about genomic prediction (GP) in maize, the best mating design to build the training population has not been defined yet. Such design must maximize the accuracy given constraints on costs and on the logistics of the crosses to be made. Hence, the aims of this work were: (1) empirically evaluate the effect of the mating designs, used as training set, on genomic selection to predict maize single-crosses obtained through full diallel and North Carolina design II, (2) and identify the possibility of reducing the number of crosses and parents to compose these training sets. Our results suggest that testcross is the worst mating design to use as a training set to predict maize single-crosses that would be obtained through full diallel or North Carolina design II. Moreover, North Carolina design II is the best training set to predict hybrids taken from full diallel. However, hybrids from full diallel and North Carolina design II can be well predicted using optimized training sets, which also allow reducing the total number of crosses to be made. Nevertheless, the number of parents and the crosses per parent in the training sets should be maximized.
Collapse
Affiliation(s)
- Roberto Fritsche-Neto
- Department of Genetics, "Luiz de Queiroz" Agriculture College, University of São Paulo, Piracicaba, São Paulo, Brazil.
| | | | - Jean-Luc Jannink
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY, USA
| |
Collapse
|
32
|
Lopes FB, Wu XL, Li H, Xu J, Perkins T, Genho J, Ferretti R, Tait RG, Bauck S, Rosa GJM. Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low-density SNP genotypes. J Anim Breed Genet 2018; 135:14-27. [PMID: 29345073 DOI: 10.1111/jbg.12312] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 12/04/2017] [Indexed: 11/27/2022]
Abstract
Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle.
Collapse
Affiliation(s)
- F B Lopes
- Department of Animal Sciences, University of Wisconsin, Madison, WI, USA.,Biostatistics and Bioinformatics, GeneSeek (A Neogen Company), Lincoln, NE, USA
| | - X-L Wu
- Department of Animal Sciences, University of Wisconsin, Madison, WI, USA.,Biostatistics and Bioinformatics, GeneSeek (A Neogen Company), Lincoln, NE, USA
| | - H Li
- Department of Animal Sciences, University of Wisconsin, Madison, WI, USA.,Biostatistics and Bioinformatics, GeneSeek (A Neogen Company), Lincoln, NE, USA
| | - J Xu
- Biostatistics and Bioinformatics, GeneSeek (A Neogen Company), Lincoln, NE, USA.,Department of Statistics, University of Nebraska, Lincoln, NE, USA
| | - T Perkins
- International Brangus Breeders Association, San Antonio, TX, USA
| | - J Genho
- Livestock Genetic Services LLC, Woodville, VA, USA
| | - R Ferretti
- Biostatistics and Bioinformatics, GeneSeek (A Neogen Company), Lincoln, NE, USA
| | - R G Tait
- Biostatistics and Bioinformatics, GeneSeek (A Neogen Company), Lincoln, NE, USA
| | - S Bauck
- Biostatistics and Bioinformatics, GeneSeek (A Neogen Company), Lincoln, NE, USA
| | - G J M Rosa
- Department of Animal Sciences, University of Wisconsin, Madison, WI, USA
| |
Collapse
|
33
|
Naya H, Peñagaricano F, Urioste JI. Modelling female fertility traits in beef cattle using linear and non-linear models. J Anim Breed Genet 2017; 134:202-212. [PMID: 28508488 DOI: 10.1111/jbg.12266] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 02/07/2017] [Indexed: 11/29/2022]
Abstract
Female fertility traits are key components of the profitability of beef cattle production. However, these traits are difficult and expensive to measure, particularly under extensive pastoral conditions, and consequently, fertility records are in general scarce and somehow incomplete. Moreover, fertility traits are usually dominated by the effects of herd-year environment, and it is generally assumed that relatively small margins are kept for genetic improvement. New ways of modelling genetic variation in these traits are needed. Inspired in the methodological developments made by Prof. Daniel Gianola and co-workers, we assayed linear (Gaussian), Poisson, probit (threshold), censored Poisson and censored Gaussian models to three different kinds of endpoints, namely calving success (CS), number of days from first calving (CD) and number of failed oestrus (FE). For models involving FE and CS, non-linear models overperformed their linear counterparts. For models derived from CD, linear versions displayed better adjustment than the non-linear counterparts. Non-linear models showed consistently higher estimates of heritability and repeatability in all cases (h2 < 0.08 and r < 0.13, for linear models; h2 > 0.23 and r > 0.24, for non-linear models). While additive and permanent environment effects showed highly favourable correlations between all models (>0.789), consistency in selecting the 10% best sires showed important differences, mainly amongst the considered endpoints (FE, CS and CD). In consequence, endpoints should be considered as modelling different underlying genetic effects, with linear models more appropriate to describe CD and non-linear models better for FE and CS.
Collapse
Affiliation(s)
- H Naya
- Unidad de Bioinformática, Institut Pasteur de Montevideo, Montevideo, Uruguay.,Facultad de Agronomía, Universidad de la República, Montevideo, Uruguay
| | - F Peñagaricano
- Department of Animal Sciences, University of Florida, Gainesville, FL, USA.,University of Florida Genetics Institute, University of Florida, Gainesville, FL, USA
| | - J I Urioste
- Facultad de Agronomía, Universidad de la República, Montevideo, Uruguay
| |
Collapse
|
34
|
Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction. G3-GENES GENOMES GENETICS 2017; 7:895-909. [PMID: 28108552 PMCID: PMC5345720 DOI: 10.1534/g3.116.038059] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Genomic prediction is a statistical method to predict phenotypes of polygenic traits using high-throughput genomic data. Most diseases and behaviors in humans and animals are polygenic traits. The majority of agronomic traits in crops are also polygenic. Accurate prediction of these traits can help medical professionals diagnose acute diseases and breeders to increase food products, and therefore significantly contribute to human health and global food security. The best linear unbiased prediction (BLUP) is an important tool to analyze high-throughput genomic data for prediction. However, to judge the efficacy of the BLUP model with a particular set of predictors for a given trait, one has to provide an unbiased mechanism to evaluate the predictability. Cross-validation (CV) is an essential tool to achieve this goal, where a sample is partitioned into K parts of roughly equal size, one part is predicted using parameters estimated from the remaining K – 1 parts, and eventually every part is predicted using a sample excluding that part. Such a CV is called the K-fold CV. Unfortunately, CV presents a substantial increase in computational burden. We developed an alternative method, the HAT method, to replace CV. The new method corrects the estimated residual errors from the whole sample analysis using the leverage values of a hat matrix of the random effects to achieve the predicted residual errors. Properties of the HAT method were investigated using seven agronomic and 1000 metabolomic traits of an inbred rice population. Results showed that the HAT method is a very good approximation of the CV method. The method was also applied to 10 traits in 1495 hybrid rice with 1.6 million SNPs, and to human height of 6161 subjects with roughly 0.5 million SNPs of the Framingham heart study data. Predictabilities of the HAT and CV methods were all similar. The HAT method allows us to easily evaluate the predictabilities of genomic prediction for large numbers of traits in very large populations.
Collapse
|
35
|
Mikshowsky AA, Gianola D, Weigel KA. Assessing genomic prediction accuracy for Holstein sires using bootstrap aggregation sampling and leave-one-out cross validation. J Dairy Sci 2016; 100:453-464. [PMID: 27889124 DOI: 10.3168/jds.2016-11496] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 10/05/2016] [Indexed: 11/19/2022]
Abstract
Since the introduction of genome-enabled prediction for dairy cattle in 2009, genomic selection has markedly changed many aspects of the dairy genetics industry and enhanced the rate of response to selection for most economically important traits. Young dairy bulls are genotyped to obtain their genomic predicted transmitting ability (GPTA) and reliability (REL) values. These GPTA are a main factor in most purchasing, marketing, and culling decisions until bulls reach 5 yr of age and their milk-recorded offspring become available. At that time, daughter yield deviations (DYD) can be compared with the GPTA computed several years earlier. For most bulls, the DYD align well with the initial predictions. However, for some bulls, the difference between DYD and corresponding GPTA is quite large, and published REL are of limited value in identifying such bulls. A method of bootstrap aggregation sampling (bagging) using genomic BLUP (GBLUP) was applied to predict the GPTA of 2,963, 2,963, and 2,803 young Holstein bulls for protein yield, somatic cell score, and daughter pregnancy rate (DPR), respectively. For each trait, 50 bootstrap samples from a reference population comprising 2011 DYD of 8,610, 8,405, and 7,945 older Holstein bulls were used. Leave-one-out cross validation was also performed to assess prediction accuracy when removing specific bulls from the reference population. The main objectives of this study were (1) to assess the extent to which current REL values and alternative measures of variability, such as the bootstrap standard deviation (SD) of predictions, could detect bulls whose daughter performance deviates significantly from early genomic predictions, and (2) to identify factors associated with the reference population that inform about inaccurate genomic predictions. The SD of bootstrap predictions was a mildly useful metric for identifying bulls whose future daughter performance may deviate significantly from early GPTA for protein and DPR. Leave-one-out cross validation allowed us to identify groups of reference population bulls that were influential on other reference population bulls for protein yield and observe their effects on predictions of testing set bulls, as a whole and individually.
Collapse
Affiliation(s)
| | - Daniel Gianola
- Department of Dairy Science, University of Wisconsin, Madison 53706; Department of Animal Sciences, University of Wisconsin, Madison 53706; Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison 53706
| | - Kent A Weigel
- Department of Dairy Science, University of Wisconsin, Madison 53706
| |
Collapse
|