1
|
Building a Calibration Set for Genomic Prediction, Characteristics to Be Considered, and Optimization Approaches. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:77-112. [PMID: 35451773 DOI: 10.1007/978-1-0716-2205-6_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The efficiency of genomic selection strongly depends on the prediction accuracy of the genetic merit of candidates. Numerous papers have shown that the composition of the calibration set is a key contributor to prediction accuracy. A poorly defined calibration set can result in low accuracies, whereas an optimized one can considerably increase accuracy compared to random sampling, for a same size. Alternatively, optimizing the calibration set can be a way of decreasing the costs of phenotyping by enabling similar levels of accuracy compared to random sampling but with fewer phenotypic units. We present here the different factors that have to be considered when designing a calibration set, and review the different criteria proposed in the literature. We classified these criteria into two groups: model-free criteria based on relatedness, and criteria derived from the linear mixed model. We introduce criteria targeting specific prediction objectives including the prediction of highly diverse panels, biparental families, or hybrids. We also review different ways of updating the calibration set, and different procedures for optimizing phenotyping experimental designs.
Collapse
|
2
|
Elsen JM. Genomic Prediction of Complex Traits, Principles, Overview of Factors Affecting the Reliability of Genomic Prediction, and Algebra of the Reliability. Methods Mol Biol 2022; 2467:45-76. [PMID: 35451772 DOI: 10.1007/978-1-0716-2205-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The quality of the predictions of genetic values based on the genotyping of neutral markers (GEBVs) is a key information to decide whether or not to implement genomic selection. This quality depends on the part of the genetic variability captured by the markers and on the precision of the estimate of their effects. Selection index theory provided the framework for evaluating the accuracy of GEBVs once the information had been gathered, with the genomic relationship matrix (GRM) playing a central role. When this accuracy must be known a priori, the theory of quantitative genetics gives clues to calculate the expectation of this GRM. This chapter makes a critical inventory of the methods developed to calculate these accuracies a posteriori and a priori. The most significant factors affecting this accuracy are described (size of the reference population, number of markers, linkage disequilibrium, heritability).
Collapse
Affiliation(s)
- Jean-Michel Elsen
- GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France.
| |
Collapse
|
3
|
Picard Druet D, Varenne A, Herry F, Hérault F, Allais S, Burlot T, Le Roy P. Reliability of genomic evaluation for egg quality traits in layers. BMC Genet 2020; 21:17. [PMID: 32046634 PMCID: PMC7014768 DOI: 10.1186/s12863-020-0820-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 01/31/2020] [Indexed: 11/17/2022] Open
Abstract
Background Genomic evaluation, based on the use of thousands of genetic markers in addition to pedigree and phenotype information, has become the standard evaluation methodology in dairy cattle breeding programmes over the past several years. Despite the many differences between dairy cattle breeding and poultry breeding, genomic selection seems very promising for the avian sector, and studies are currently being conducted to optimize avian selection schemes. In this optimization perspective, one of the key parameters is to properly predict the accuracy of genomic evaluation in pure line layers. Results It was observed that genomic evaluation, whether performed on males or females, always proved more accurate than genetic evaluation. The gain was higher when phenotypic information was narrowed, and an augmentation of the size of the reference population led to an increase in accuracy prediction with regard to genomic evaluation. By taking into account the increase of selection intensity and the decrease of the generation interval induced by genomic selection, the expected annual genetic gain would be higher with ancestry-based genomic evaluation of male candidates than with genetic evaluation based on collaterals. This advantage of genomic selection over genetic selection requires more detailed further study for female candidates. Conclusions In conclusion, in the population studied, the genomic evaluation of egg quality traits of breeding birds at birth seems to be a promising strategy, at least for the selection of males.
Collapse
Affiliation(s)
- David Picard Druet
- PEGASE, INRAE, Agrocampus Ouest, 16 Le Clos, Saint-Gilles, 35590, France
| | | | - Florian Herry
- PEGASE, INRAE, Agrocampus Ouest, 16 Le Clos, Saint-Gilles, 35590, France.,NOVOGEN, 5, rue des Compagnons, Plédran, 22960, France
| | - Frédéric Hérault
- PEGASE, INRAE, Agrocampus Ouest, 16 Le Clos, Saint-Gilles, 35590, France
| | - Sophie Allais
- PEGASE, INRAE, Agrocampus Ouest, 16 Le Clos, Saint-Gilles, 35590, France
| | | | - Pascale Le Roy
- PEGASE, INRAE, Agrocampus Ouest, 16 Le Clos, Saint-Gilles, 35590, France.
| |
Collapse
|
4
|
Mangin B, Rincent R, Rabier CE, Moreau L, Goudemand-Dugue E. Training set optimization of genomic prediction by means of EthAcc. PLoS One 2019; 14:e0205629. [PMID: 30779753 PMCID: PMC6380617 DOI: 10.1371/journal.pone.0205629] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 01/03/2019] [Indexed: 12/17/2022] Open
Abstract
Genomic prediction is a useful tool for plant and animal breeding programs and is starting to be used to predict human diseases as well. A shortcoming that slows down the genomic selection deployment is that the accuracy of the prediction is not known a priori. We propose EthAcc (Estimated THeoretical ACCuracy) as a method for estimating the accuracy given a training set that is genotyped and phenotyped. EthAcc is based on a causal quantitative trait loci model estimated by a genome-wide association study. This estimated causal model is crucial; therefore, we compared different methods to find the one yielding the best EthAcc. The multilocus mixed model was found to perform the best. We compared EthAcc to accuracy estimators that can be derived via a mixed marker model. We showed that EthAcc is the only approach to correctly estimate the accuracy. Moreover, in case of a structured population, in accordance with the achieved accuracy, EthAcc showed that the biggest training set is not always better than a smaller and closer training set. We then performed training set optimization with EthAcc and compared it to CDmean. EthAcc outperformed CDmean on real datasets from sugar beet, maize, and wheat. Nonetheless, its performance was mainly due to the use of an optimal but inaccessible set as a start of the optimization algorithm. EthAcc's precision and algorithm issues prevent it from reaching a good training set with a random start. Despite this drawback, we demonstrated that a substantial gain in accuracy can be obtained by performing training set optimization.
Collapse
Affiliation(s)
- Brigitte Mangin
- LIPM, Université de Toulouse, INRA, CNRS, Castanet-Tolosan, France
- * E-mail:
| | | | - Charles-Elie Rabier
- ISEM, Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
- LIRMM, Univ. Montpellier, CNRS, Montpellier, France
| | - Laurence Moreau
- GQE-Le Moulon, INRA, Univ Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | | |
Collapse
|
5
|
Rio S, Mary-Huard T, Moreau L, Charcosset A. Genomic selection efficiency and a priori estimation of accuracy in a structured dent maize panel. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:81-96. [PMID: 30288553 DOI: 10.1007/s00122-018-3196-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 09/22/2018] [Indexed: 06/08/2023]
Abstract
Population structure affects genomic selection efficiency as well as the ability to forecast accuracy using standard GBLUP. Genomic prediction models usually assume that the individuals used for calibration belong to the same population as those to be predicted. Most of the a priori indicators of precision, such as the coefficient of determination (CD), were derived from those same models. But genetic structure is a common feature in plant species, and it may impact genomic selection efficiency and the ability to forecast prediction accuracy. We investigated the impact of genetic structure in a dent maize panel ("Amaizing Dent") using different scenarios including within- or across-group predictions. For a given training set size, the best accuracies were achieved when predicting individuals using a model calibrated on the same genetic group. Nevertheless, a diverse training set representing all the groups had a certain predictive efficiency for all the validation sets, and adding extra-group individuals was almost always beneficial. It underlines the potential of such a generic training set for dent maize genomic selection applications. Alternative prediction models, taking genetic structure explicitly into account, did not improve the prediction accuracy compared to GBLUP. We also investigated the ability of different indicators of precision to forecast accuracy in the within- or across-group scenarios. There was a global encouraging trend of the CD to differentiate scenarios, although there were specific combinations of target populations and traits where the efficiency of this indicator proved to be null. One hypothesis to explain such erratic performances is the impact of genetic structure through group-specific allele diversity at QTLs rather than group-specific allele effects.
Collapse
Affiliation(s)
- Simon Rio
- GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
| | - Tristan Mary-Huard
- GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
- MIA, INRA, AgroParisTech, Université Paris-Saclay, 75005, Paris, France
| | - Laurence Moreau
- GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
| | - Alain Charcosset
- GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190, Gif-sur-Yvette, France.
| |
Collapse
|
6
|
|
7
|
Elsen JM. An analytical framework to derive the expected precision of genomic selection. Genet Sel Evol 2017; 49:95. [PMID: 29281960 PMCID: PMC5745666 DOI: 10.1186/s12711-017-0366-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 12/01/2017] [Indexed: 11/16/2022] Open
Abstract
Background Formulae to predict the precision or accuracy of genomic estimated breeding values (GEBV) are important when modelling selection schemes. Simple versions of such formulae have been proposed in the past, based on a number of simplifying hypotheses, including absence of linkage disequilibrium and linkage between loci, a population made up of unrelated individuals, and that all genetic variability of the trait is explained by the genotyped loci. These formulae were based on approximations that were not always clear. The objective of this paper is to offer a unique framework to derive equations that predict the precision of GEBV from the size of the reference population and the heritability of and number of QTL controlling the quantitative trait. Results The exact formulation of the precision of GEBV involves the expectation of the inverse of a linear function of the genomic matrix, which cannot be calculated from simple algebra but can be approximated using a Taylor polynomial expansion. First order approximations performed better than the initial prediction equations published in the literature. Second order approximations produced almost perfect estimates of precision when compared to results obtained when simulating situations that agreed with the assumptions that were required to derive the precision equations. Using this proposed framework, we present several generalizations, including multi-trait genomic evaluation. Conclusions Although further improvements are needed to account for the complexity of practical situations, the equations proposed here can be used to derive the precision of GEBV when comparing breeding schemes a priori. Electronic supplementary material The online version of this article (10.1186/s12711-017-0366-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jean-Michel Elsen
- GenPhySE (Génétique Physiologie et Systèmes d'Elevage), Université de Toulouse, INRA, ENVT, 31326, Castanet-Tolosan, France.
| |
Collapse
|