Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gianola D, Schön CC. Cross-Validation Without Doing Cross-Validation in Genome-Enabled Prediction. G3 (Bethesda) 2016;6:3107-28. [PMID: 27489209 DOI: 10.1534/g3.116.033381] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

For:	Gianola D, Schön CC. Cross-Validation Without Doing Cross-Validation in Genome-Enabled Prediction. G3 (Bethesda) 2016;6:3107-28. [PMID: 27489209 DOI: 10.1534/g3.116.033381] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Number

Cited by Other Article(s)

Li C, Yang Q, Liu B, Shi X, Liu Z, Yang C, Wang T, Xiao F, Zhang M, Shi A, Yan L. Ability of Genomic Prediction to Bi-Parent-Derived Breeding Population Using Public Data for Soybean Oil and Protein Content. PLANTS (BASEL, SWITZERLAND) 2024;13:1260. [PMID: 38732474 PMCID: PMC11085238 DOI: 10.3390/plants13091260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 04/21/2024] [Accepted: 04/29/2024] [Indexed: 05/13/2024]

Abstract

Genomic selection (GS) is a marker-based selection method used to improve the genetic gain of quantitative traits in plant breeding. A large number of breeding datasets are available in the soybean database, and the application of these public datasets in GS will improve breeding efficiency and reduce time and cost. However, the most important problem to be solved is how to improve the ability of across-population prediction. The objectives of this study were to perform genomic prediction (GP) and estimate the prediction ability (PA) for seed oil and protein contents in soybean using available public datasets to predict breeding populations in current, ongoing breeding programs. In this study, six public datasets of USDA GRIN soybean germplasm accessions with available phenotypic data of seed oil and protein contents from different experimental populations and their genotypic data of single-nucleotide polymorphisms (SNPs) were used to perform GP and to predict a bi-parent-derived breeding population in our experiment. The average PA was 0.55 and 0.50 for seed oil and protein contents within the bi-parents population according to the within-population prediction; and 0.45 for oil and 0.39 for protein content when the six USDA populations were combined and employed as training sets to predict the bi-parent-derived population. The results showed that four USDA-cultivated populations can be used as a training set individually or combined to predict oil and protein contents in GS when using 800 or more USDA germplasm accessions as a training set. The smaller the genetic distance between training population and testing population, the higher the PA. The PA increased as the population size increased. In across-population prediction, no significant difference was observed in PA for oil and protein content among different models. The PA increased as the SNP number increased until a marker set consisted of 10,000 SNPs. This study provides reasonable suggestions and methods for breeders to utilize public datasets for GS. It will aid breeders in developing GS-assisted breeding strategies to develop elite soybean cultivars with high oil and protein contents.

Collapse

Affiliation(s)

Chenhui Li College of Life Sciences, Hebei Agricultural University, Baoding 071001, China; Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Qing Yang Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Bingqiang Liu Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Xiaolei Shi Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Zhi Liu Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Chunyan Yang Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Tao Wang Handan Academy of Agricultural Science, Handan 056001, China; (T.W.); (F.X.)
Fuming Xiao Handan Academy of Agricultural Science, Handan 056001, China; (T.W.); (F.X.)
Mengchen Zhang Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Ainong Shi Department of Horticulture, University of Arkansas, Fayetteville, AR 72701, USA
Long Yan Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)

Collapse

Bermann M, Legarra A, Munera AA, Misztal I, Lourenco D. Confidence intervals for validation statistics with data truncation in genomic prediction. Genet Sel Evol 2024;56:18. [PMID: 38459504 PMCID: PMC11234739 DOI: 10.1186/s12711-024-00883-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 01/31/2024] [Indexed: 03/10/2024] Open

Abstract

BACKGROUND

Validation by data truncation is a common practice in genetic evaluations because of the interest in predicting the genetic merit of a set of young selection candidates. Two of the most used validation methods in genetic evaluations use a single data partition: predictivity or predictive ability (correlation between pre-adjusted phenotypes and estimated breeding values (EBV) divided by the square root of the heritability) and the linear regression (LR) method (comparison of "early" and "late" EBV). Both methods compare predictions with the whole dataset and a partial dataset that is obtained by removing the information related to a set of validation individuals. EBV obtained with the partial dataset are compared against adjusted phenotypes for the predictivity or EBV obtained with the whole dataset in the LR method. Confidence intervals for predictivity and the LR method can be obtained by replicating the validation for different samples (or folds), or bootstrapping. Analytical confidence intervals would be beneficial to avoid running several validations and to test the quality of the bootstrap intervals. However, analytical confidence intervals are unavailable for predictivity and the LR method.

RESULTS

We derived standard errors and Wald confidence intervals for the predictivity and statistics included in the LR method (bias, dispersion, ratio of accuracies, and reliability). The confidence intervals for the bias, dispersion, and reliability depend on the relationships and prediction error variances and covariances across the individuals in the validation set. We developed approximations for large datasets that only need the reliabilities of the individuals in the validation set. The confidence intervals for the ratio of accuracies and predictivity were obtained through the Fisher transformation. We show the adequacy of both the analytical and approximated analytical confidence intervals and compare them versus bootstrap confidence intervals using two simulated examples. The analytical confidence intervals were closer to the simulated ones for both examples. Bootstrap confidence intervals tend to be narrower than the simulated ones. The approximated analytical confidence intervals were similar to those obtained by bootstrapping.

CONCLUSIONS

Estimating the sampling variation of predictivity and the statistics in the LR method without replication or bootstrap is possible for any dataset with the formulas presented in this study.

Collapse

Yan Q, Fruzangohar M, Taylor J, Gong D, Walter J, Norman A, Shi JQ, Coram T. Improved genomic prediction using machine learning with Variational Bayesian sparsity. PLANT METHODS 2023;19:96. [PMID: 37660084 PMCID: PMC10474716 DOI: 10.1186/s13007-023-01073-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 08/22/2023] [Indexed: 09/04/2023]

Abstract

BACKGROUND

Genomic prediction has become a powerful modelling tool for assessing line performance in plant and livestock breeding programmes. Among the genomic prediction modelling approaches, linear based models have proven to provide accurate predictions even when the number of genetic markers exceeds the number of data samples. However, breeding programmes are now compiling data from large numbers of lines and test environments for analyses, rendering these approaches computationally prohibitive. Machine learning (ML) now offers a solution to this problem through the construction of fully connected deep learning architectures and high parallelisation of the predictive task. However, the fully connected nature of these architectures immediately generates an over-parameterisation of the network that needs addressing for efficient and accurate predictions.

RESULTS

In this research we explore the use of an ML architecture governed by variational Bayesian sparsity in its initial layers that we have called VBS-ML. The use of VBS-ML provides a mechanism for feature selection of important markers linked to the trait, immediately reducing the network over-parameterisation. Selected markers then propagate to the remaining fully connected feed-forward components of the ML network to form the final genomic prediction. We illustrated the approach with four large Australian wheat breeding data sets that range from 2665 lines to 10375 lines genotyped across a large set of markers. For all data sets, the use of the VBS-ML architecture improved genomic prediction accuracy over legacy linear based modelling approaches.

CONCLUSIONS

An ML architecture governed under a variational Bayesian paradigm was shown to improve genomic prediction accuracy over legacy modelling approaches. This VBS-ML approach can be used to dramatically decrease the parameter burden on the network and provide a computationally feasible approach for improving genomic prediction conducted with large breeding population numbers and genetic markers.

Collapse

Pravia MI, Navajas EA, Aguilar I, Ravagnolo O. Prediction ability of an alternative multi-trait genomic evaluation for residual feed intake. J Anim Breed Genet 2023;140:508-518. [PMID: 37186475 DOI: 10.1111/jbg.12775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/04/2023] [Accepted: 04/06/2023] [Indexed: 05/17/2023]

Abstract

Selection for feed efficiency is the goal for many genetic breeding programs in beef cattle. Residual feed intake has been included in genetic evaluations to reduce feed intake without compromising performance traits as liveweight, body gain or carcass traits. However, measuring feed intake is expensive, and only a small percentage of selection candidates are phenotyped. Genomic selection has become a very important tool to achieve effective genetic progress in these traits. Another effective strategy has been the implementation of multi-trait prediction using easily recordable predictor traits on both reference animals and candidates without phenotypes, and this could be another inexpensive way to increase accuracy. The objective of this work was to analyse and compare the prediction ability of two alternative different approaches to predict GEBVs for RFI. The population of inference was Hereford bulls in Uruguay that were genotyped candidates for to selection. The first model was the conventional univariate model for RFI and the second model was a multi-trait model which included a predictor trait (weaning weight, WW), in addition to the traits used in the first one (dry matter intake, metabolic mid test weight, average daily gain and ultrasound back fat) (DMI, MWT, ADG, UBF, respectively). GEBVs from the multi-trait model were combined using selection index theory to derive RFI values. All analyses were performed using ssGBLUP procedure. The prediction ability of both models was tested using two validation strategies (30 different replicates of random groups of animals and validation across 9 different feed intake tests). The prediction quality was assessed by the following parameters: bias, dispersion, ratio of accuracies and the relative increase in accuracy by adding phenotypic information. All parameters showed that the univariate model outperforms the multi-trait model, regardless of the validation strategy considered. These results indicate that including WW as a proxy trait in a multi-trait analysis does not improve the prediction ability when all animals to be predicted are genotyped.

Collapse

Wang K, Abid MA, Rasheed A, Crossa J, Hearne S, Li H. DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. MOLECULAR PLANT 2023;16:279-293. [PMID: 36366781 DOI: 10.1016/j.molp.2022.11.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 09/28/2022] [Accepted: 11/08/2022] [Indexed: 06/16/2023]

Abstract

Genomic prediction is an effective way to accelerate the rate of agronomic trait improvement in plants. Traditional methods typically use linear regression models with clear assumptions; such methods are unable to capture the complex relationships between genotypes and phenotypes. Non-linear models (e.g., deep neural networks) have been proposed as a superior alternative to linear models because they can capture complex non-additive effects. Here we introduce a deep learning (DL) method, deep neural network genomic prediction (DNNGP), for integration of multi-omics data in plants. We trained DNNGP on four datasets and compared its performance with methods built with five classic models: genomic best linear unbiased prediction (GBLUP); two methods based on a machine learning (ML) framework, light gradient boosting machine (LightGBM) and support vector regression (SVR); and two methods based on a DL framework, deep learning genomic selection (DeepGS) and deep learning genome-wide association study (DLGWAS). DNNGP is novel in five ways. First, it can be applied to a variety of omics data to predict phenotypes. Second, the multilayered hierarchical structure of DNNGP dynamically learns features from raw data, avoiding overfitting and improving the convergence rate using a batch normalization layer and early stopping and rectified linear activation (rectified linear unit) functions. Third, when small datasets were used, DNNGP produced results that are competitive with results from the other five methods, showing greater prediction accuracy than the other methods when large-scale breeding data were used. Fourth, the computation time required by DNNGP was comparable with that of commonly used methods, up to 10 times faster than DeepGS. Fifth, hyperparameters can easily be batch tuned on a local machine. Compared with GBLUP, LightGBM, SVR, DeepGS and DLGWAS, DNNGP is superior to these existing widely used genomic selection (GS) methods. Moreover, DNNGP can generate robust assessments from diverse datasets, including omics data, and quickly incorporate complex and large datasets into usable models, making it a promising and practical approach for straightforward integration into existing GS platforms.

Collapse

Gianola D, Fernando RL, Schön CC. Inference about quantitative traits under selection: a Bayesian revisitation for the post-genomic era. Genet Sel Evol 2022;54:78. [PMID: 36460973 PMCID: PMC9716705 DOI: 10.1186/s12711-022-00765-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 10/26/2022] [Indexed: 12/03/2022] Open

Abstract

BACKGROUND

Selection schemes distort inference when estimating differences between treatments or genetic associations between traits, and may degrade prediction of outcomes, e.g., the expected performance of the progeny of an individual with a certain genotype. If input and output measurements are not collected on random samples, inferences and predictions must be biased to some degree. Our paper revisits inference in quantitative genetics when using samples stemming from some selection process. The approach used integrates the classical notion of fitness with that of missing data. Treatment is fully Bayesian, with inference and prediction dealt with, in an unified manner. While focus is on animal and plant breeding, concepts apply to natural selection as well. Examples based on real data and stylized models illustrate how selection can be accounted for in four different situations, and sometimes without success.

RESULTS

Our flexible "soft selection" setting helps to diagnose the extent to which selection can be ignored. The clear connection between probability of missingness and the concept of fitness in stylized selection scenarios is highlighted. It is not realistic to assume that a fixed selection threshold t holds in conceptual replication, as the chance of selection depends on observed and unobserved data, and on unequal amounts of information over individuals, aspects that a "soft" selection representation addresses explicitly. There does not seem to be a general prescription to accommodate potential distortions due to selection. In structures that combine cross-sectional, longitudinal and multi-trait data such as in animal breeding, balance is the exception rather than the rule. The Bayesian approach provides an integrated answer to inference, prediction and model choice under selection that goes beyond the likelihood-based approach, where breeding values are inferred indirectly.

CONCLUSIONS

The approach used here for inference and prediction under selection may or may not yield the best possible answers. One may believe that selection has been accounted for diligently, but the central problem of whether statistical inferences are good or bad does not have an unambiguous solution. On the other hand, the quality of predictions can be gauged empirically via appropriate training-testing of competing methods.

Collapse

Nantongo JS, Potts BM, Klápště J, Graham NJ, Dungey HS, Fitzgerald H, O'Reilly-Wapstra JM. Genomic selection for resistance to mammalian bark stripping and associated chemical compounds in radiata pine. G3 (BETHESDA, MD.) 2022;12:jkac245. [PMID: 36218439 PMCID: PMC9635650 DOI: 10.1093/g3journal/jkac245] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Accepted: 08/29/2022] [Indexed: 07/28/2023]

Abstract

The integration of genomic data into genetic evaluations can facilitate the rapid selection of superior genotypes and accelerate the breeding cycle in trees. In this study, 390 trees from 74 control-pollinated families were genotyped using a 36K Axiom SNP array. A total of 15,624 high-quality SNPs were used to develop genomic prediction models for mammalian bark stripping, tree height, and selected primary and secondary chemical compounds in the bark. Genetic parameters from different genomic prediction methods-single-trait best linear unbiased prediction based on a marker-based relationship matrix (genomic best linear unbiased prediction), multitrait single-step genomic best linear unbiased prediction, which integrated the marker-based and pedigree-based relationship matrices (single-step genomic best linear unbiased prediction) and the single-trait generalized ridge regression-were compared to equivalent single- or multitrait pedigree-based approaches (ABLUP). The influence of the statistical distribution of data on the genetic parameters was assessed. Results indicated that the heritability estimates were increased nearly 2-fold with genomic models compared to the equivalent pedigree-based models. Predictive accuracy of the single-step genomic best linear unbiased prediction was higher than the ABLUP for most traits. Allowing for heterogeneity in marker effects through the use of generalized ridge regression did not markedly improve predictive ability over genomic best linear unbiased prediction, arguing that most of the chemical traits are modulated by many genes with small effects. Overall, the traits with low pedigree-based heritability benefited more from genomic models compared to the traits with high pedigree-based heritability. There was no evidence that data skewness or the presence of outliers affected the genomic or pedigree-based genetic estimates.

Collapse

Bermann M, Cesarani A, Misztal I, Lourenco D. Past, present, and future developments in single-step genomic models. ITALIAN JOURNAL OF ANIMAL SCIENCE 2022. [DOI: 10.1080/1828051x.2022.2053366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Puglisi D, Visioni A, Ozkan H, Kara İ, Lo Piero AR, Rachdad FE, Tondelli A, Valè G, Cattivelli L, Fricano A. High accuracy of genome-enabled prediction of belowground and physiological traits in barley seedlings. G3 GENES|GENOMES|GENETICS 2022;12:6517783. [PMID: 35099521 PMCID: PMC8895982 DOI: 10.1093/g3journal/jkac022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 01/21/2022] [Indexed: 11/24/2022]

Abstract

In plants, the study of belowground traits is gaining momentum due to their importance on yield formation and the uptake of water and nutrients. In several cereal crops, seminal root number and seminal root angle are proxy traits of the root system architecture at the mature stages, which in turn contributes to modulating the uptake of water and nutrients. Along with seminal root number and seminal root angle, experimental evidence indicates that the transpiration rate response to evaporative demand or vapor pressure deficit is a key physiological trait that might be targeted to cope with drought tolerance as the reduction of the water flux to leaves for limiting transpiration rate at high levels of vapor pressure deficit allows to better manage soil moisture. In the present study, we examined the phenotypic diversity of seminal root number, seminal root angle, and transpiration rate at the seedling stage in a panel of 8-way Multiparent Advanced Generation Inter-Crosses lines of winter barley and correlated these traits with grain yield measured in different site-by-season combinations. Second, phenotypic and genotypic data of the Multiparent Advanced Generation Inter-Crosses population were combined to fit and cross-validate different genomic prediction models for these belowground and physiological traits. Genomic prediction models for seminal root number were fitted using threshold and log-normal models, considering these data as ordinal discrete variable and as count data, respectively, while for seminal root angle and transpiration rate, genomic prediction was implemented using models based on extended genomic best linear unbiased predictors. The results presented in this study show that genome-enabled prediction models of seminal root number, seminal root angle, and transpiration rate data have high predictive ability and that the best models investigated in the present study include first-order additive × additive epistatic interaction effects. Our analyses indicate that beyond grain yield, genomic prediction models might be used to predict belowground and physiological traits and pave the way to practical applications for barley improvement.

Collapse

Bartholomé J, Prakash PT, Cobb JN. Genomic Prediction: Progress and Perspectives for Rice Improvement. Methods Mol Biol 2022;2467:569-617. [PMID: 35451791 DOI: 10.1007/978-1-0716-2205-6_21] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Elsen JM. Genomic Prediction of Complex Traits, Principles, Overview of Factors Affecting the Reliability of Genomic Prediction, and Algebra of the Reliability. Methods Mol Biol 2022;2467:45-76. [PMID: 35451772 DOI: 10.1007/978-1-0716-2205-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Dekkers JCM, Su H, Cheng J. Predicting the accuracy of genomic predictions. Genet Sel Evol 2021;53:55. [PMID: 34187354 PMCID: PMC8244147 DOI: 10.1186/s12711-021-00647-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 06/11/2021] [Indexed: 11/22/2022] Open

Abstract

Background

Mathematical models are needed for the design of breeding programs using genomic prediction. While deterministic models for selection on pedigree-based estimates of breeding values (PEBV) are available, these have not been fully developed for genomic selection, with a key missing component being the accuracy of genomic EBV (GEBV) of selection candidates. Here, a deterministic method was developed to predict this accuracy within a closed breeding population based on the accuracy of GEBV and PEBV in the reference population and the distance of selection candidates from their closest ancestors in the reference population.

Methods

The accuracy of GEBV was modeled as a combination of the accuracy of PEBV and of EBV based on genomic relationships deviated from pedigree (DEBV). Loss of the accuracy of DEBV from the reference to the target population was modeled based on the effective number of independent chromosome segments in the reference population (M_e). Measures of M_e derived from the inverse of the variance of relationships and from the accuracies of GEBV and PEBV in the reference population, derived using either a Fisher information or a selection index approach, were compared by simulation.

Results

Using simulation, both the Fisher and the selection index approach correctly predicted accuracy in the target population over time, both with and without selection. The index approach, however, resulted in estimates of M_e that were less affected by heritability, reference size, and selection, and which are, therefore, more appropriate as a population parameter. The variance of relationships underpredicted M_e and was greatly affected by selection. A leave-one-out cross-validation approach was proposed to estimate required accuracies of EBV in the reference population. Aspects of the methods were validated using real data.

Conclusions

A deterministic method was developed to predict the accuracy of GEBV in selection candidates in a closed breeding population. The population parameter M_e that is required for these predictions can be derived from an available reference data set, and applied to other reference data sets and traits for that population. This method can be used to evaluate the benefit of genomic prediction and to optimize genomic selection breeding programs.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12711-021-00647-w.

Collapse

Puglisi D, Delbono S, Visioni A, Ozkan H, Kara İ, Casas AM, Igartua E, Valè G, Piero ARL, Cattivelli L, Tondelli A, Fricano A. Genomic Prediction of Grain Yield in a Barley MAGIC Population Modeling Genotype per Environment Interaction. FRONTIERS IN PLANT SCIENCE 2021;12:664148. [PMID: 34108982 PMCID: PMC8183822 DOI: 10.3389/fpls.2021.664148] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 04/26/2021] [Indexed: 06/12/2023]

Abstract

Multi-parent Advanced Generation Inter-crosses (MAGIC) lines have mosaic genomes that are generated shuffling the genetic material of the founder parents following pre-defined crossing schemes. In cereal crops, these experimental populations have been extensively used to investigate the genetic bases of several traits and dissect the genetic bases of epistasis. In plants, genomic prediction models are usually fitted using either diverse panels of mostly unrelated accessions or individuals of biparental families and several empirical analyses have been conducted to evaluate the predictive ability of models fitted to these populations using different traits. In this paper, we constructed, genotyped and evaluated a barley MAGIC population of 352 individuals developed with a diverse set of eight founder parents showing contrasting phenotypes for grain yield. We combined phenotypic and genotypic information of this MAGIC population to fit several genomic prediction models which were cross-validated to conduct empirical analyses aimed at examining the predictive ability of these models varying the sizes of training populations. Moreover, several methods to optimize the composition of the training population were also applied to this MAGIC population and cross-validated to estimate the resulting predictive ability. Finally, extensive phenotypic data generated in field trials organized across an ample range of water regimes and climatic conditions in the Mediterranean were used to fit and cross-validate multi-environment genomic prediction models including G×E interaction, using both genomic best linear unbiased prediction and reproducing kernel Hilbert space along with a non-linear Gaussian Kernel. Overall, our empirical analyses showed that genomic prediction models trained with a limited number of MAGIC lines can be used to predict grain yield with values of predictive ability that vary from 0.25 to 0.60 and that beyond QTL mapping and analysis of epistatic effects, MAGIC population might be used to successfully fit genomic prediction models. We concluded that for grain yield, the single-environment genomic prediction models examined in this study are equivalent in terms of predictive ability while, in general, multi-environment models that explicitly split marker effects in main and environmental-specific effects outperform simpler multi-environment models.

Collapse

Bermann M, Lourenco D, Breen V, Hawken R, Brito Lopes F, Misztal I. Modeling genetic differences of combined broiler chicken populations in single-step GBLUP. J Anim Sci 2021;99:6154135. [PMID: 33649764 PMCID: PMC8355479 DOI: 10.1093/jas/skab056] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 02/17/2021] [Indexed: 11/13/2022] Open

Abstract

The introduction of animals from a different environment or population is a common practice in commercial livestock populations. In this study, we modeled the inclusion of a group of external birds into a local broiler chicken population for the purpose of genomic evaluations. The pedigree was composed of 242,413 birds and genotypes were available for 107,216 birds. A five-trait model that included one growth, two yield, and two efficiency traits was used for the analyses. The strategies to model the introduction of external birds were to include a fixed effect representing the origin of parents and to use unknown parent groups (UPG) or metafounders (MF). Genomic estimated breeding values (GEBV) were obtained with single-step GBLUP using the Algorithm for Proven and Young. Bias, dispersion, and accuracy of GEBV for the validation birds, that is, from the most recent generation, were computed. The bias and dispersion were estimated with the linear regression (LR) method,whereas accuracy was estimated by the LR method and predictive ability. When fixed UPG were fit without estimated inbreeding, the model did not converge. In contrast, models with fixed UPG and estimated inbreeding or random UPG converged and resulted in similar GEBV. The inclusion of an extra fixed effect in the model made the GEBV unbiased and reduced the inflation. Genomic predictions with MF were slightly biased and inflated due to the unbalanced number of observations assigned to each metafounder. When combining local and external populations, the greatest accuracy can be obtained by adding an extra fixed effect to account for the origin of parents plus UPG with estimated inbreeding or random UPG. To estimate the accuracy, the LR method is more consistent among scenarios, whereas the predictive ability greatly depends on the model specification.

Collapse

Cheng J, Dekkers JCM, Fernando RL. Cross-validation of best linear unbiased predictions of breeding values using an efficient leave-one-out strategy. J Anim Breed Genet 2021;138:519-527. [PMID: 33729622 DOI: 10.1111/jbg.12545] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 02/06/2021] [Accepted: 02/20/2021] [Indexed: 01/22/2023]

Abstract

Empirical estimates of the accuracy of estimates of breeding values (EBV) can be obtained by cross-validation. Leave-one-out cross-validation (LOOCV) is an extreme case of k-fold cross-validation. Efficient strategies for LOOCV of predictions of phenotypes have been developed for a simple model with an overall mean and random marker or animal genetic effects. The objective here was to develop and evaluate an efficient LOOCV method for prediction of breeding values and other random effects under a general mixed linear model with multiple random effects. Conventional LOOCV of EBV requires inverting an (n-1)×(n-1) covariance matrix for each of n (= number of observations) data sets. Our efficient LOOCV obtains the required inverses from the inverse of the covariance matrix for all n observations. The efficient method can be applied to complex models with multiple fixed and random effects, but requires fixed effects to be treated as random, with large variances. An alternative is to precorrect observations using estimates of fixed effects obtained from the complete data, but this can lead to biases. The efficient LOOCV method was compared to conventional LOOCV of predictions of breeding values in terms of computational demands and accuracy. For a data set with 3,205 observations and a model with multiple random and fixed effects, the efficient LOOCV method was 962 times faster than the conventional LOOCV with precorrection for fixed effects based on each training data set but resulted in identical EBV. A computationally efficient LOOCV for prediction of breeding values for single- and multiple-trait mixed models with multiple fixed and random effects was successfully developed. The method enables cross-validation of predictions of breeding values and of any linear combination of random and/or fixed effects, along with leave-one-out precorrection of validation phenotypes.

Collapse

Xu Y, Zhao Y, Wang X, Ma Y, Li P, Yang Z, Zhang X, Xu C, Xu S. Incorporation of parental phenotypic data into multi-omic models improves prediction of yield-related traits in hybrid rice. PLANT BIOTECHNOLOGY JOURNAL 2021;19:261-272. [PMID: 32738177 PMCID: PMC7868986 DOI: 10.1111/pbi.13458] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 06/14/2020] [Accepted: 07/22/2020] [Indexed: 05/15/2023]

Abstract

Hybrid breeding has been shown to effectively increase rice productivity. However, identifying desirable hybrids out of numerous potential combinations is a daunting challenge. Genomic selection holds great promise for accelerating hybrid breeding by enabling early selection before phenotypes are measured. With the recent advances in multi-omic technologies, hybrid prediction based on transcriptomic and metabolomic data has received increasing attention. However, the current omic-based hybrid prediction has ignored parental phenotypic information, which is of fundamental importance in plant breeding. In this study, we integrated parental phenotypic information into various multi-omic prediction models applied in hybrid breeding of rice and compared the predictabilities of 15 combinations from four sets of predictors from the parents, that is genome, transcriptome, metabolome and phenome. The predictability for each combination was evaluated using the best linear unbiased prediction and a modified fast HAT method. We found significant interactions between predictors and traits in predictability, but joint prediction with various combinations of the predictors significantly improved predictability relative to prediction of any single source omic data for each trait investigated. Incorporation of parental phenotypic data into various omic predictors increased the predictability, averagely by 13.6%, 54.5%, 19.9% and 8.3%, for grain yield, number of tillers per plant, number of grains per panicle and 1000 grain weight, respectively. Among nine models of incorporating parental traits, the AD-All model was the most effective one. This novel strategy of incorporating parental phenotypic data into multi-omic prediction is expected to improve hybrid breeding progress, especially with the development of high-throughput phenotyping technologies.

Collapse

Affiliation(s)

Yang Xu Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
Yue Zhao Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
Xin Wang Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
Ying Ma Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
Pengcheng Li Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
Zefeng Yang Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
Xuecai Zhang International Maize and Wheat Improvement Center (CIMMYT)MexicoDFMexico
Chenwu Xu Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
Shizhong Xu Department of Botany and Plant SciencesUniversity of CaliforniaRiversideCAUSA

Collapse

Identification of candidate genes encoding tumor-specific neoantigens in early- and late-stage colon adenocarcinoma. Aging (Albany NY) 2021;13:4024-4044. [PMID: 33428592 PMCID: PMC7906157 DOI: 10.18632/aging.202370] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 10/31/2020] [Indexed: 12/24/2022]

Jia M, Li Z, Pan M, Tao M, Lu X, Liu Y. Evaluation of immune infiltrating of thyroid cancer based on the intrinsic correlation between pair-wise immune genes. Life Sci 2020;259:118248. [PMID: 32791153 DOI: 10.1016/j.lfs.2020.118248] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 07/09/2020] [Accepted: 08/07/2020] [Indexed: 10/23/2022]

Bermann M, Legarra A, Hollifield MK, Masuda Y, Lourenco D, Misztal I. Validation of single-step GBLUP genomic predictions from threshold models using the linear regression method: An application in chicken mortality. J Anim Breed Genet 2020;138:4-13. [PMID: 32985749 PMCID: PMC7756448 DOI: 10.1111/jbg.12507] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 07/27/2020] [Accepted: 08/18/2020] [Indexed: 11/30/2022]

Abstract

The objective of this study was to determine whether the linear regression (LR) method could be used to validate genomic threshold models. Statistics for the LR method were computed from estimated breeding values (EBVs) using the whole and truncated data sets with variances from the reference and validation populations. The method was tested using simulated and real chicken data sets. The simulated data set included 10 generations of 4,500 birds each; genotypes were available for the last three generations. Each animal was assigned a continuous trait, which was converted to a binary score assuming an incidence of failure of 7%. The real data set included the survival status of 186,596 broilers (mortality rate equal to 7.2%) and genotypes of 18,047 birds. Both data sets were analysed using best linear unbiased predictor (BLUP) or single-step GBLUP (ssGBLUP). The whole data set included all phenotypes available, whereas in the partial data set, phenotypes of the most recent generation were removed. In the simulated data set, the accuracies based on the LR formulas were 0.45 for BLUP and 0.76 for ssGBLUP, whereas the correlations between true breeding values and EBVs (i.e. true accuracies) were 0.37 and 0.65, respectively. The gain in accuracy by adding genomic information was overestimated by 0.09 when using the LR method compared to the true increase in accuracy. However, when the estimated ratio between the additive variance computed based on pedigree only and on pedigree and genomic information was considered, the difference between true and estimated gain was <0.02. Accuracies of BLUP and ssGBLUP with the real data set were 0.41 and 0.47, respectively. This small improvement in accuracy when using ssGBLUP with the real data set was due to population structure and lower heritability. The LR method is a useful tool for estimating improvements in accuracy of EBVs due to the inclusion of genomic information when traditional validation methods as k-fold validation and predictive ability are not applicable.

Collapse

Bresolin T, Dórea JRR. Infrared Spectrometry as a High-Throughput Phenotyping Technology to Predict Complex Traits in Livestock Systems. Front Genet 2020;11:923. [PMID: 32973876 PMCID: PMC7468402 DOI: 10.3389/fgene.2020.00923] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 07/24/2020] [Indexed: 12/17/2022] Open

Zhang L, Giuste F, Vizcarra JC, Li X, Gutman D. Radiomics Features Predict CIC Mutation Status in Lower Grade Glioma. Front Oncol 2020;10:937. [PMID: 32676453 PMCID: PMC7333647 DOI: 10.3389/fonc.2020.00937] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2019] [Accepted: 05/12/2020] [Indexed: 12/15/2022] Open

Abstract

MRI in combination with genomic markers are critical in the management of gliomas. Radiomics and radiogenomics analysis facilitate the quantitative assessment of tumor properties which can be used to model both molecular subtype and predict disease progression. In this work, we report on the Drosophila gene capicua (CIC) mutation biomarker effects alongside radiomics features on the predictive ability of CIC mutation status in lower-grade gliomas (LGG). Genomic data of lower grade glioma (LGG) patients from The Cancer Genome Atlas (TCGA) (n = 509) and corresponding MR images from TCIA (n = 120) were utilized. Following tumor segmentation, radiomics features were extracted from T1, T2, T2 Flair, and T1 contrast enhanced (CE) images. Lasso feature reduction was used to obtain the most important MR image features and then logistic regression used to predict CIC mutation status. In our study, CIC mutation rarely occurred in Astrocytoma but has a high probability of occurrence in Oligodendroglioma. The presence of CIC mutation was found to be associated with better survival of glioma patients (p < 1e−4, HR: 0.2445), even with co-occurrence of IDH mutation and 1p/19q co-deletion (p = 0.0362, HR: 0.3674). An eleven-feature model achieved glioma prediction accuracy of 94.2% (95% CI, 94.03–94.38%), a six-feature model achieved oligodendroglioma prediction accuracy of 92.3% (95% CI, 91.70–92.92%). MR imaging and its derived image of gliomas with CIC mutation appears more complex and non-uniform but are associated with lower malignancy. Our study identified CIC as a potential prognostic factor in glioma which has close associations with survival. MRI radiomic features could predict CIC mutation, and reflect less malignant manifestations such as milder necrosis and larger tumor volume in MRI and its derived images that could help clinical judgment.

Collapse

Lopes F, Rosa G, Pinedo P, Santos JEP, Chebel RC, Galvao KN, Schuenemann GM, Bicalho RC, Gilbert RO, Rodrigez-Zas S, Seabury CM, Thatcher W. Genome-enable prediction for health traits using high-density SNP panel in US Holstein cattle. Anim Genet 2020;51:192-199. [PMID: 31909828 PMCID: PMC7065151 DOI: 10.1111/age.12892] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/22/2019] [Indexed: 11/29/2022]

Abstract

The objective of this study was to compare accuracies of different Bayesian regression models in predicting molecular breeding values for health traits in Holstein cattle. The dataset was composed of 2505 records reporting the occurrence of retained fetal membranes (RFM), metritis (MET), mastitis (MAST), displaced abomasum (DA), lameness (LS), clinical endometritis (CE), respiratory disease (RD), dystocia (DYST) and subclinical ketosis (SCK) in Holstein cows, collected between 2012 and 2014 in 16 dairies located across the US. Cows were genotyped with the Illumina BovineHD (HD, 777K). The quality controls for SNP genotypes were HWE P‐value of at least 1 × 10⁻¹⁰; MAF greater than 0.01 and call rate greater than 0.95. The fimpute program was used for imputation of missing SNP markers. The effect of each SNP was estimated using the Bayesian Ridge Regression (BRR), Bayes A, Bayes B and Bayes Cπ methods. The prediction quality was assessed by the area under the curve, the prediction mean square error and the correlation between genomic breeding value and the observed phenotype, using a leave‐one‐out cross‐validation technique that avoids iterative cross‐validation. The highest accuracies of predictions achieved were: RFM [Bayes B (0.34)], MET [BRR (0.36)], MAST [Bayes B (0.55), DA [Bayes Cπ (0.26)], LS [Bayes A (0.12)], CE [Bayes A (0.32)], RD [Bayes Cπ (0.23)], DYST [Bayes A (0.35)] and SCK [Bayes Cπ (0.38)] models. Except for DA, LS and RD, the predictive abilities were similar between the methods. A strong relationship between the predictive ability and the heritability of the trait was observed, where traits with higher heritability achieved higher accuracy and lower bias when compared with those with low heritability. Overall, it has been shown that a high‐density SNP panel can be used successfully to predict genomic breeding values of health traits in Holstein cattle and that the model of choice will depend mostly on the genetic architecture of the trait.

Collapse

Runcie D, Cheng H. Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods. G3 (BETHESDA, MD.) 2019;9:3727-3741. [PMID: 31511297 PMCID: PMC6829121 DOI: 10.1534/g3.119.400598] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 09/10/2019] [Indexed: 01/08/2023]

Waldmann P. On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction. Front Genet 2019;10:899. [PMID: 31632436 PMCID: PMC6781837 DOI: 10.3389/fgene.2019.00899] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 08/23/2019] [Indexed: 01/24/2023] Open

Velazco JG, Malosetti M, Hunt CH, Mace ES, Jordan DR, van Eeuwijk FA. Combining pedigree and genomic information to improve prediction quality: an example in sorghum. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019;132:2055-2067. [PMID: 30968160 PMCID: PMC6588709 DOI: 10.1007/s00122-019-03337-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 03/26/2019] [Indexed: 05/10/2023]

High-frequency marker haplotypes in the genomic selection of dairy cattle. J Appl Genet 2019;60:179-186. [PMID: 30877657 PMCID: PMC6483952 DOI: 10.1007/s13353-019-00489-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 01/18/2019] [Accepted: 02/28/2019] [Indexed: 11/05/2022]

Li Z, Gao N, Martini JWR, Simianer H. Integrating Gene Expression Data Into Genomic Prediction. Front Genet 2019;10:126. [PMID: 30858865 PMCID: PMC6397893 DOI: 10.3389/fgene.2019.00126] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2018] [Accepted: 02/04/2019] [Indexed: 01/14/2023] Open

Ma W, Qiu Z, Song J, Li J, Cheng Q, Zhai J, Ma C. A deep convolutional neural network approach for predicting phenotypes from genotypes. PLANTA 2018;248:1307-1318. [PMID: 30101399 DOI: 10.1007/s00425-018-2976-9] [Citation(s) in RCA: 92] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 07/11/2018] [Indexed: 05/21/2023]

Affiliation(s)

Wenlong Ma State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, 712100, Shaanxi, China
Zhixu Qiu State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China Biomass Energy Center for Arid and Semi-arid Lands, Northwest A&F University, Shaanxi, 712100, Yangling, China
Jie Song State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, 712100, Shaanxi, China
Jiajia Li State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China Biomass Energy Center for Arid and Semi-arid Lands, Northwest A&F University, Shaanxi, 712100, Yangling, China
Qian Cheng State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China Biomass Energy Center for Arid and Semi-arid Lands, Northwest A&F University, Shaanxi, 712100, Yangling, China
Jingjing Zhai State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, 712100, Shaanxi, China
Chuang Ma State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China. Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, 712100, Shaanxi, China.

Collapse

Gianola D, Cecchinato A, Naya H, Schön CC. Prediction of Complex Traits: Robust Alternatives to Best Linear Unbiased Prediction. Front Genet 2018;9:195. [PMID: 29951082 PMCID: PMC6008589 DOI: 10.3389/fgene.2018.00195] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Accepted: 05/14/2018] [Indexed: 12/05/2022] Open

Fritsche-Neto R, Akdemir D, Jannink JL. Accuracy of genomic selection to predict maize single-crosses obtained through different mating designs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2018;131:1153-1162. [PMID: 29445844 DOI: 10.1007/s00122-018-3068-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2017] [Accepted: 02/08/2018] [Indexed: 05/02/2023]

Fritsche-Neto R, Akdemir D, Jannink JL. Accuracy of genomic selection to predict maize single-crosses obtained through different mating designs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2018. [PMID: 29445844 DOI: 10.1007/s00122‐018‐3068‐8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 09/29/2022]

Lopes FB, Wu XL, Li H, Xu J, Perkins T, Genho J, Ferretti R, Tait RG, Bauck S, Rosa GJM. Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low-density SNP genotypes. J Anim Breed Genet 2018;135:14-27. [PMID: 29345073 DOI: 10.1111/jbg.12312] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 12/04/2017] [Indexed: 11/27/2022]

Abstract

Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle.

Collapse

Naya H, Peñagaricano F, Urioste JI. Modelling female fertility traits in beef cattle using linear and non-linear models. J Anim Breed Genet 2017;134:202-212. [PMID: 28508488 DOI: 10.1111/jbg.12266] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 02/07/2017] [Indexed: 11/29/2022]

Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction. G3-GENES GENOMES GENETICS 2017;7:895-909. [PMID: 28108552 PMCID: PMC5345720 DOI: 10.1534/g3.116.038059] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Abstract

Genomic prediction is a statistical method to predict phenotypes of polygenic traits using high-throughput genomic data. Most diseases and behaviors in humans and animals are polygenic traits. The majority of agronomic traits in crops are also polygenic. Accurate prediction of these traits can help medical professionals diagnose acute diseases and breeders to increase food products, and therefore significantly contribute to human health and global food security. The best linear unbiased prediction (BLUP) is an important tool to analyze high-throughput genomic data for prediction. However, to judge the efficacy of the BLUP model with a particular set of predictors for a given trait, one has to provide an unbiased mechanism to evaluate the predictability. Cross-validation (CV) is an essential tool to achieve this goal, where a sample is partitioned into K parts of roughly equal size, one part is predicted using parameters estimated from the remaining K – 1 parts, and eventually every part is predicted using a sample excluding that part. Such a CV is called the K-fold CV. Unfortunately, CV presents a substantial increase in computational burden. We developed an alternative method, the HAT method, to replace CV. The new method corrects the estimated residual errors from the whole sample analysis using the leverage values of a hat matrix of the random effects to achieve the predicted residual errors. Properties of the HAT method were investigated using seven agronomic and 1000 metabolomic traits of an inbred rice population. Results showed that the HAT method is a very good approximation of the CV method. The method was also applied to 10 traits in 1495 hybrid rice with 1.6 million SNPs, and to human height of 6161 subjects with roughly 0.5 million SNPs of the Framingham heart study data. Predictabilities of the HAT and CV methods were all similar. The HAT method allows us to easily evaluate the predictabilities of genomic prediction for large numbers of traits in very large populations.

Collapse

Mikshowsky AA, Gianola D, Weigel KA. Assessing genomic prediction accuracy for Holstein sires using bootstrap aggregation sampling and leave-one-out cross validation. J Dairy Sci 2016;100:453-464. [PMID: 27889124 DOI: 10.3168/jds.2016-11496] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 10/05/2016] [Indexed: 11/19/2022]

Abstract

Since the introduction of genome-enabled prediction for dairy cattle in 2009, genomic selection has markedly changed many aspects of the dairy genetics industry and enhanced the rate of response to selection for most economically important traits. Young dairy bulls are genotyped to obtain their genomic predicted transmitting ability (GPTA) and reliability (REL) values. These GPTA are a main factor in most purchasing, marketing, and culling decisions until bulls reach 5 yr of age and their milk-recorded offspring become available. At that time, daughter yield deviations (DYD) can be compared with the GPTA computed several years earlier. For most bulls, the DYD align well with the initial predictions. However, for some bulls, the difference between DYD and corresponding GPTA is quite large, and published REL are of limited value in identifying such bulls. A method of bootstrap aggregation sampling (bagging) using genomic BLUP (GBLUP) was applied to predict the GPTA of 2,963, 2,963, and 2,803 young Holstein bulls for protein yield, somatic cell score, and daughter pregnancy rate (DPR), respectively. For each trait, 50 bootstrap samples from a reference population comprising 2011 DYD of 8,610, 8,405, and 7,945 older Holstein bulls were used. Leave-one-out cross validation was also performed to assess prediction accuracy when removing specific bulls from the reference population. The main objectives of this study were (1) to assess the extent to which current REL values and alternative measures of variability, such as the bootstrap standard deviation (SD) of predictions, could detect bulls whose daughter performance deviates significantly from early genomic predictions, and (2) to identify factors associated with the reference population that inform about inaccurate genomic predictions. The SD of bootstrap predictions was a mildly useful metric for identifying bulls whose future daughter performance may deviate significantly from early GPTA for protein and DPR. Leave-one-out cross validation allowed us to identify groups of reference population bulls that were influential on other reference population bulls for protein yield and observe their effects on predictions of testing set bulls, as a whole and individually.

Collapse