1
|
Azevedo CF, Ferrão LFV, Benevenuto J, de Resende MDV, Nascimento M, Nascimento ACC, Munoz PR. Using visual scores for genomic prediction of complex traits in breeding programs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 137:9. [PMID: 38102495 DOI: 10.1007/s00122-023-04512-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 11/21/2023] [Indexed: 12/17/2023]
Abstract
KEY MESSAGE An approach for handling visual scores with potential errors and subjectivity in scores was evaluated in simulated and blueberry recurrent selection breeding schemes to assist breeders in their decision-making. Most genomic prediction methods are based on assumptions of normality due to their simplicity and ease of implementation. However, in plant and animal breeding, continuous traits are often visually scored as categorical traits and analyzed as a Gaussian variable, thus violating the normality assumption, which could affect the prediction of breeding values and the estimation of genetic parameters. In this study, we examined the main challenges of visual scores for genomic prediction and genetic parameter estimation using mixed models, Bayesian, and machine learning methods. We evaluated these approaches using simulated and real breeding data sets. Our contribution in this study is a five-fold demonstration: (i) collecting data using an intermediate number of categories (1-3 and 1-5) is the best strategy, even considering errors associated with visual scores; (ii) Linear Mixed Models and Bayesian Linear Regression are robust to the normality violation, but marginal gains can be achieved when using Bayesian Ordinal Regression Models (BORM) and Random Forest Classification; (iii) genetic parameters are better estimated using BORM; (iv) our conclusions using simulated data are also applicable to real data in autotetraploid blueberry; and (v) a comparison of continuous and categorical phenotypes found that investing in the evaluation of 600-1000 categorical data points with low error, when it is not feasible to collect continuous phenotypes, is a strategy for improving predictive abilities. Our findings suggest the best approaches for effectively using visual scores traits to explore genetic information in breeding programs and highlight the importance of investing in the training of evaluator teams and in high-quality phenotyping.
Collapse
Affiliation(s)
- Camila Ferreira Azevedo
- Statistics Department, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
- Horticultural Sciences Department, Blueberry Breeding and Genomics Lab, University of Florida, Gainesville, FL, USA
| | - Luis Felipe Ventorim Ferrão
- Horticultural Sciences Department, Blueberry Breeding and Genomics Lab, University of Florida, Gainesville, FL, USA
| | - Juliana Benevenuto
- Horticultural Sciences Department, Blueberry Breeding and Genomics Lab, University of Florida, Gainesville, FL, USA
| | - Marcos Deon Vilela de Resende
- Statistics Department, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
- Department of Forestry Engineering, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
- Embrapa Café, Brasília, Distrito Federal, Brazil
| | - Moyses Nascimento
- Statistics Department, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | | | - Patricio R Munoz
- Horticultural Sciences Department, Blueberry Breeding and Genomics Lab, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
2
|
Sala T, Puglisi D, Ferrari L, Salamone F, Tassone MR, Rotino GL, Fricano A, Losa A. Genome-wide analysis of genetic diversity in a germplasm collection including wild relatives and interspecific clones of garden asparagus. FRONTIERS IN PLANT SCIENCE 2023; 14:1187663. [PMID: 37476175 PMCID: PMC10354869 DOI: 10.3389/fpls.2023.1187663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 06/09/2023] [Indexed: 07/22/2023]
Abstract
The Asparagus genus includes approximately 240 species, the most important of which is garden asparagus (Asparagus officinalis L.), as this is a vegetable crop cultivated worldwide for its edible spear. Along with garden asparagus, other species are also cultivated (e.g., Asparagus maritimus L.) or have been proposed as untapped sources of variability in breeding programs (e.g., Asparagus acutifolius L.). In the present work, we applied reduced-representation sequencing to examine a panel of 378 diverse asparagus genotypes, including commercial hybrids, interspecific lines, wild relatives of garden asparagus, and doubled haploids currently used in breeding programs, which enabled the identification of more than 200K single-nucleotide polymorphisms (SNPs). These SNPs were used to assess the extent of linkage disequilibrium in the diploid gene pool of asparagus and combined with preliminary phenotypic information to conduct genome-wide association studies for sex and traits tied to spear quality and production. Moreover, using the same phenotypic and genotypic information, we fitted and cross-validated genome-enabled prediction models for the same set of traits. Overall, our analyses demonstrated that, unlike the diversity detected in wild species related to garden asparagus and in interspecific crosses, cultivated and wild genotypes of A. officinalis L. show a narrow genetic basis, which is a contributing factor hampering the genetic improvement of this crop. Estimating the extent of linkage disequilibrium and providing the first example of genome-wide association study and genome-enabled prediction in this species, we concluded that the asparagus panel examined in the present study can lay the foundation for determination of the genetic bases of agronomically important traits and for the implementation of predictive breeding tools to sustain breeding.
Collapse
Affiliation(s)
- Tea Sala
- Council for Agricultural Research and Economics – Research Centre for Genomics and Bioinformatics (CREA-GB), Montanaso Lombardo, LO, Italy
| | - Damiano Puglisi
- Council for Agricultural Research and Economics – Research Centre for Genomics and Bioinformatics (CREA-GB), Fiorenzuola d’Arda, PC, Italy
| | - Luisa Ferrari
- Council for Agricultural Research and Economics – Research Centre for Genomics and Bioinformatics (CREA-GB), Montanaso Lombardo, LO, Italy
| | - Filippo Salamone
- Council for Agricultural Research and Economics – Research Centre for Genomics and Bioinformatics (CREA-GB), Montanaso Lombardo, LO, Italy
| | - Maria Rosaria Tassone
- Council for Agricultural Research and Economics – Research Centre for Genomics and Bioinformatics (CREA-GB), Montanaso Lombardo, LO, Italy
| | - Giuseppe Leonardo Rotino
- Council for Agricultural Research and Economics – Research Centre for Genomics and Bioinformatics (CREA-GB), Montanaso Lombardo, LO, Italy
| | - Agostino Fricano
- Council for Agricultural Research and Economics – Research Centre for Genomics and Bioinformatics (CREA-GB), Fiorenzuola d’Arda, PC, Italy
| | - Alessia Losa
- Council for Agricultural Research and Economics – Research Centre for Genomics and Bioinformatics (CREA-GB), Montanaso Lombardo, LO, Italy
| |
Collapse
|
3
|
Montesinos-López OA, Gonzalez HN, Montesinos-López A, Daza-Torres M, Lillemo M, Montesinos-López JC, Crossa J. Comparing gradient boosting machine and Bayesian threshold BLUP for genome-based prediction of categorical traits in wheat breeding. THE PLANT GENOME 2022; 15:e20214. [PMID: 35535459 DOI: 10.1002/tpg2.20214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 03/21/2022] [Indexed: 06/14/2023]
Abstract
Genomic selection (GS) is a predictive methodology that is changing plant breeding. Genomic selection trains a statistical machine-learning model using available phenotypic and genotypic data with which predictions are performed for individuals that were only genotyped. For this reason, some statistical machine-learning methods are being implemented in GS, but in order to improve the selection of new genotypes early in the prediction process, the exploration of new statistical machine-learning algorithms must continue. In this paper, we performed a benchmarking study between the Bayesian threshold genomic best linear unbiased predictor model (TGBLUP; popular in GS) and the gradient boosting machine (GBM). This comparison was done using four real wheat (Triticum aestivum L.) data sets with categorical traits measured in terms of two metrics: the proportion of cases correctly classified (PCCC) and the Kappa coefficient in the testing set. Under 10 random partitions with four different sizes of testing proportions (20, 40, 60, and 80%), we compared the two algorithms and found that in three of the four data sets, the GBM outperformed the TGBLUP model in terms of both metrics (PCCC and Kappa coefficient). In the larger data sets (Data Sets 3 and 4), the gain in terms of prediction accuracy of the GBM was considerably significant. For this reason, we encourage more research using the GBM in GS to evaluate its virtues in terms of prediction performance in the context of GS.
Collapse
Affiliation(s)
| | | | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Univ. de Guadalajara, Guadalajara, Jalisco, 44430, México
| | - María Daza-Torres
- Dep. of Public Health Sciences, Univ. of California, Davis, CA, 95616, USA
| | - Morten Lillemo
- Dep. of Plant Sciences, Norwegian Univ. of Life Sciences, IHA/CIGENE, P.O. Box 5003, NO-1432, Ås, Norway
| | | | - José Crossa
- Colegio de Postgraduados, Montecillos, Edo. de México, 56230, México
- Biometrics and Statistics Unit, Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera, México-Veracruz, 52640, México
| |
Collapse
|
4
|
Puglisi D, Visioni A, Ozkan H, Kara İ, Lo Piero AR, Rachdad FE, Tondelli A, Valè G, Cattivelli L, Fricano A. High accuracy of genome-enabled prediction of belowground and physiological traits in barley seedlings. G3 GENES|GENOMES|GENETICS 2022; 12:6517783. [PMID: 35099521 PMCID: PMC8895982 DOI: 10.1093/g3journal/jkac022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 01/21/2022] [Indexed: 11/24/2022]
Abstract
In plants, the study of belowground traits is gaining momentum due to their importance on yield formation and the uptake of water and nutrients. In several cereal crops, seminal root number and seminal root angle are proxy traits of the root system architecture at the mature stages, which in turn contributes to modulating the uptake of water and nutrients. Along with seminal root number and seminal root angle, experimental evidence indicates that the transpiration rate response to evaporative demand or vapor pressure deficit is a key physiological trait that might be targeted to cope with drought tolerance as the reduction of the water flux to leaves for limiting transpiration rate at high levels of vapor pressure deficit allows to better manage soil moisture. In the present study, we examined the phenotypic diversity of seminal root number, seminal root angle, and transpiration rate at the seedling stage in a panel of 8-way Multiparent Advanced Generation Inter-Crosses lines of winter barley and correlated these traits with grain yield measured in different site-by-season combinations. Second, phenotypic and genotypic data of the Multiparent Advanced Generation Inter-Crosses population were combined to fit and cross-validate different genomic prediction models for these belowground and physiological traits. Genomic prediction models for seminal root number were fitted using threshold and log-normal models, considering these data as ordinal discrete variable and as count data, respectively, while for seminal root angle and transpiration rate, genomic prediction was implemented using models based on extended genomic best linear unbiased predictors. The results presented in this study show that genome-enabled prediction models of seminal root number, seminal root angle, and transpiration rate data have high predictive ability and that the best models investigated in the present study include first-order additive × additive epistatic interaction effects. Our analyses indicate that beyond grain yield, genomic prediction models might be used to predict belowground and physiological traits and pave the way to practical applications for barley improvement.
Collapse
Affiliation(s)
- Damiano Puglisi
- Dipartimento di Agricoltura, Alimentazione e Ambiente (Di3A), Università di Catania , 95123 Catania, Italy
| | - Andrea Visioni
- Biodiversity and Crop Improvement Program, International Center for Agricultural Research in the Dry Areas , 6299 Rabat, Morocco
| | - Hakan Ozkan
- Faculty of Agriculture, Department of Field Crops, University of Cukurova , 01330 Adana, Turkey
| | - İbrahim Kara
- Bahri Dagdas International Agricultural Research Institute , Km Karatay/Konya 42020, Turkey
| | - Angela Roberta Lo Piero
- Dipartimento di Agricoltura, Alimentazione e Ambiente (Di3A), Università di Catania , 95123 Catania, Italy
| | - Fatima Ezzahra Rachdad
- Biodiversity and Crop Improvement Program, International Center for Agricultural Research in the Dry Areas , 6299 Rabat, Morocco
- Faculty of Sciences Ben M’sik, Department of Biology, Environment and Ecology Laboratory, Hassan II University of Casablanca , 7955 Casablanca, Morocco
| | - Alessandro Tondelli
- Council for Agricultural Research and Economics—Research Centre for Genomics and Bioinformatics , 29017 Fiorenzuola d’Arda (PC), Italy
| | - Giampiero Valè
- DiSIT, Dipartimento di Scienze e Innovazione Tecnologica, Università del Piemonte Orientale , 13100 Vercelli, Italy
| | - Luigi Cattivelli
- Council for Agricultural Research and Economics—Research Centre for Genomics and Bioinformatics , 29017 Fiorenzuola d’Arda (PC), Italy
| | - Agostino Fricano
- Council for Agricultural Research and Economics—Research Centre for Genomics and Bioinformatics , 29017 Fiorenzuola d’Arda (PC), Italy
| |
Collapse
|