1
|
Couto EGO, Chaves SFS, Dias KOG, Morales-Marroquín JA, Alves-Pereira A, Motoike SY, Colombo CA, Zucchi MI. Training set optimization is a feasible alternative for perennial orphan crop domestication and germplasm management: an Acrocomia aculeata example. FRONTIERS IN PLANT SCIENCE 2024; 15:1441683. [PMID: 39323537 PMCID: PMC11423296 DOI: 10.3389/fpls.2024.1441683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 08/14/2024] [Indexed: 09/27/2024]
Abstract
Orphan perennial native species are gaining importance as sustainability in agriculture becomes crucial to mitigate climate change. Nevertheless, issues related to the undomesticated status and lack of improved germplasm impede the evolution of formal agricultural initiatives. Acrocomia aculeata - a neotropical palm with potential for oil production - is an example. Breeding efforts can aid the species to reach its full potential and increase market competitiveness. Here, we present genomic information and training set optimization as alternatives to boost orphan perennial native species breeding using Acrocomia aculeata as an example. Furthermore, we compared three SNP calling methods and, for the first time, presented the prediction accuracies of three yield-related traits. We collected data for two years from 201 wild individuals. These trees were genotyped, and three references were used for SNP calling: the oil palm genome, de novo sequencing, and the A. aculeata transcriptome. The traits analyzed were fruit dry mass (FDM), pulp dry mass (PDM), and pulp oil content (OC). We compared the predictive ability of GBLUP and BayesB models in cross- and real validation procedures. Afterwards, we tested several optimization criteria regarding consistency and the ability to provide the optimized training set that yielded less risk in both targeted and untargeted scenarios. Using the oil palm genome as a reference and GBLUP models had better results for the genomic prediction of FDM, OC, and PDM (prediction accuracies of 0.46, 0.45, and 0.39, respectively). Using the criteria PEV, r-score and core collection methodology provides risk-averse decisions. Training set optimization is an alternative to improve decision-making while leveraging genomic information as a cost-saving tool to accelerate plant domestication and breeding. The optimized training set can be used as a reference for the characterization of native species populations, aiding in decisions involving germplasm collection and construction of breeding populations.
Collapse
Affiliation(s)
| | | | | | | | - Alessandro Alves-Pereira
- Genetics and Molecular Biology Department, Biology Institute, University of Campinas (UNICAMP), Campinas, Brazil
| | | | - Carlos Augusto Colombo
- Research Center of Plant Genetic Resources, Campinas Agronomic Institute, Campinas, Brazil
| | - Maria Imaculada Zucchi
- Department of Genetics, "Luiz de Queiroz" College of Agriculture, University of São Paulo, Piracicaba, Brazil
| |
Collapse
|
2
|
Xie Z, Weng L, He J, Feng X, Xu X, Ma Y, Bai P, Kong Q. PNNGS, a multi-convolutional parallel neural network for genomic selection. FRONTIERS IN PLANT SCIENCE 2024; 15:1410596. [PMID: 39290743 PMCID: PMC11405342 DOI: 10.3389/fpls.2024.1410596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 08/19/2024] [Indexed: 09/19/2024]
Abstract
Genomic selection (GS) can accomplish breeding faster than phenotypic selection. Improving prediction accuracy is the key to promoting GS. To improve the GS prediction accuracy and stability, we introduce parallel convolution to deep learning for GS and call it a parallel neural network for genomic selection (PNNGS). In PNNGS, information passes through convolutions of different kernel sizes in parallel. The convolutions in each branch are connected with residuals. Four different Lp loss functions train PNNGS. Through experiments, the optimal number of parallel paths for rice, sunflower, wheat, and maize is found to be 4, 6, 4, and 3, respectively. Phenotype prediction is performed on 24 cases through ridge-regression best linear unbiased prediction (RRBLUP), random forests (RF), support vector regression (SVR), deep neural network genomic prediction (DNNGP), and PNNGS. Serial DNNGP and parallel PNNGS outperform the other three algorithms. On average, PNNGS prediction accuracy is 0.031 larger than DNNGP prediction accuracy, indicating that parallelism can improve the GS model. Plants are divided into clusters through principal component analysis (PCA) and K-means clustering algorithms. The sample sizes of different clusters vary greatly, indicating that this is unbalanced data. Through stratified sampling, the prediction stability and accuracy of PNNGS are improved. When the training samples are reduced in small clusters, the prediction accuracy of PNNGS decreases significantly. Increasing the sample size of small clusters is critical to improving the prediction accuracy of GS.
Collapse
Affiliation(s)
- Zhengchao Xie
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Lin Weng
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Jingjing He
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Xianzhong Feng
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
| | - Xiaogang Xu
- School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, China
| | - Yinxing Ma
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Panpan Bai
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Qihui Kong
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| |
Collapse
|
3
|
Kusmec A, Yeh CT'E, Schnable PS. Data-driven identification of environmental variables influencing phenotypic plasticity to facilitate breeding for future climates. THE NEW PHYTOLOGIST 2024. [PMID: 39183371 DOI: 10.1111/nph.19937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 05/20/2024] [Indexed: 08/27/2024]
Abstract
Phenotypic plasticity describes a genotype's ability to produce different phenotypes in response to different environments. Breeding crops that exhibit appropriate levels of plasticity for future climates will be crucial to meeting global demand, but knowledge of the critical environmental factors is limited to a handful of well-studied major crops. Using 727 maize (Zea mays L.) hybrids phenotyped for grain yield in 45 environments, we investigated the ability of a genetic algorithm and two other methods to identify environmental determinants of grain yield from a large set of candidate environmental variables constructed using minimal assumptions. The genetic algorithm identified pre- and postanthesis maximum temperature, mid-season solar radiation, and whole season net evapotranspiration as the four most important variables from a candidate set of 9150. Importantly, these four variables are supported by previous literature. After calculating reaction norms for each environmental variable, candidate genes were identified and gene annotations investigated to demonstrate how this method can generate insights into phenotypic plasticity. The genetic algorithm successfully identified known environmental determinants of hybrid maize grain yield. This demonstrates that the methodology could be applied to other less well-studied phenotypes and crops to improve understanding of phenotypic plasticity and facilitate breeding crops for future climates.
Collapse
Affiliation(s)
- Aaron Kusmec
- Department of Agronomy, Iowa State University, Ames, IA, 50011-3650, USA
| | | | - Patrick S Schnable
- Department of Agronomy, Iowa State University, Ames, IA, 50011-3650, USA
- Plant Sciences Institute, Iowa State University, Ames, IA, 50011-3650, USA
| |
Collapse
|
4
|
Montesinos-López OA, Crespo-Herrera L, Pierre CS, Cano-Paez B, Huerta-Prado GI, Mosqueda-González BA, Ramos-Pulido S, Gerard G, Alnowibet K, Fritsche-Neto R, Montesinos-López A, Crossa J. Feature engineering of environmental covariates improves plant genomic-enabled prediction. FRONTIERS IN PLANT SCIENCE 2024; 15:1349569. [PMID: 38812738 PMCID: PMC11135473 DOI: 10.3389/fpls.2024.1349569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 04/11/2024] [Indexed: 05/31/2024]
Abstract
Introduction Because Genomic selection (GS) is a predictive methodology, it needs to guarantee high-prediction accuracies for practical implementations. However, since many factors affect the prediction performance of this methodology, its practical implementation still needs to be improved in many breeding programs. For this reason, many strategies have been explored to improve the prediction performance of this methodology. Methods When environmental covariates are incorporated as inputs in the genomic prediction models, this information only sometimes helps increase prediction performance. For this reason, this investigation explores the use of feature engineering on the environmental covariates to enhance the prediction performance of genomic prediction models. Results and discussion We found that across data sets, feature engineering helps reduce prediction error regarding only the inclusion of the environmental covariates without feature engineering by 761.625% across predictors. These results are very promising regarding the potential of feature engineering to enhance prediction accuracy. However, since a significant gain in prediction accuracy was observed in only some data sets, further research is required to guarantee a robust feature engineering strategy to incorporate the environmental covariates.
Collapse
Affiliation(s)
| | | | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Edo. de Mexico, Mexico
| | - Bernabe Cano-Paez
- Facultad de Ciencias, Universidad Nacioanl Autónoma de México (UNAM), México City, Mexico
| | | | | | - Sofia Ramos-Pulido
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Guillermo Gerard
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Edo. de Mexico, Mexico
| | - Khalid Alnowibet
- Department of Statistics and Operations Research, King Saud University, Riyah, Saudi Arabia
| | | | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Edo. de Mexico, Mexico
- Louisiana State University, Baton Rouge, LA, United States
- Distinguished Scientist Fellowship Program, King Saud University, Riyah, Saudi Arabia
- Instituto de Socieconomia, Estadistica e Informatica, Colegio de Postgraduados, Montecillos, Edo. de México, Texcoco, Mexico
| |
Collapse
|
5
|
Bose S, Banerjee S, Kumar S, Saha A, Nandy D, Hazra S. Review of applications of artificial intelligence (AI) methods in crop research. J Appl Genet 2024; 65:225-240. [PMID: 38216788 DOI: 10.1007/s13353-023-00826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 12/23/2023] [Accepted: 12/26/2023] [Indexed: 01/14/2024]
Abstract
Sophisticated and modern crop improvement techniques can bridge the gap for feeding the ever-increasing population. Artificial intelligence (AI) refers to the simulation of human intelligence in machines, which refers to the application of computational algorithms, machine learning (ML) and deep learning (DL) techniques. This is aimed to generalise patterns and relationships from historical data, employing various mathematical optimisation techniques thus making prediction models for facilitating selection of superior genotypes. These techniques are less resource intensive and can solve the problem based on the analysis of large-scale phenotypic datasets. ML for genomic selection (GS) uses high-throughput genotyping technologies to gather genetic information on a large number of markers across the genome. The prediction of GS models is based on the mathematical relation between genotypic and phenotypic data from the training population. ML techniques have emerged as powerful tools for genome editing through analysing large-scale genomic data and facilitating the development of accurate prediction models. Precise phenotyping is a prerequisite to advance crop breeding for solving agricultural production-related issues. ML algorithms can solve this problem through generating predictive models, based on the analysis of large-scale phenotypic datasets. DL models also have the potential reliability of precise phenotyping. This review provides a comprehensive overview on various ML and DL models, their applications, potential to enhance the efficiency, specificity and safety towards advanced crop improvement protocols such as genomic selection, genome editing, along with phenotypic prediction to promote accelerated breeding.
Collapse
Affiliation(s)
- Suvojit Bose
- Department of Vegetables and Spice Crops, Uttar Banga Krishi Viswavidyalaya, Pundibari, Cooch Behar, 736165, West Bengal, India
| | | | - Soumya Kumar
- School of Agricultural Sciences, JIS University, Kolkata, 700109, West Bengal, India
| | - Akash Saha
- School of Agricultural Sciences, JIS University, Kolkata, 700109, West Bengal, India
| | - Debalina Nandy
- School of Agricultural Sciences, JIS University, Kolkata, 700109, West Bengal, India
| | - Soham Hazra
- Department of Agriculture, Brainware University, Barasat, 700125, West Bengal, India.
| |
Collapse
|
6
|
Alemu A, Åstrand J, Montesinos-López OA, Isidro Y Sánchez J, Fernández-Gónzalez J, Tadesse W, Vetukuri RR, Carlsson AS, Ceplitis A, Crossa J, Ortiz R, Chawade A. Genomic selection in plant breeding: Key factors shaping two decades of progress. MOLECULAR PLANT 2024; 17:552-578. [PMID: 38475993 DOI: 10.1016/j.molp.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/14/2024]
Abstract
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.
Collapse
Affiliation(s)
- Admas Alemu
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Johanna Åstrand
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden; Lantmännen Lantbruk, Svalöv, Sweden
| | | | - Julio Isidro Y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Javier Fernández-Gónzalez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Wuletaw Tadesse
- International Center for Agricultural Research in the Dry Areas (ICARDA), Rabat, Morocco
| | - Ramesh R Vetukuri
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - Anders S Carlsson
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco, México 52640, Mexico
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| |
Collapse
|
7
|
Fernández-González J, Haquin B, Combes E, Bernard K, Allard A, Isidro Y Sánchez J. Maximizing efficiency in sunflower breeding through historical data optimization. PLANT METHODS 2024; 20:42. [PMID: 38493115 PMCID: PMC10943787 DOI: 10.1186/s13007-024-01151-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 01/30/2024] [Indexed: 03/18/2024]
Abstract
Genomic selection (GS) has become an increasingly popular tool in plant breeding programs, propelled by declining genotyping costs, an increase in computational power, and rediscovery of the best linear unbiased prediction methodology over the past two decades. This development has led to an accumulation of extensive historical datasets with genotypic and phenotypic information, triggering the question of how to best utilize these datasets. Here, we investigate whether all available data or a subset should be used to calibrate GS models for across-year predictions in a 7-year dataset of a commercial hybrid sunflower breeding program. We employed a multi-objective optimization approach to determine the ideal years to include in the training set (TRS). Next, for a given combination of TRS years, we further optimized the TRS size and its genetic composition. We developed the Min_GRM size optimization method which consistently found the optimal TRS size, reducing dimensionality by 20% with an approximately 1% loss in predictive ability. Additionally, the Tails_GEGVs algorithm displayed potential, outperforming the use of all data by using just 60% of it for grain yield, a high-complexity, low-heritability trait. Moreover, maximizing the genetic diversity of the TRS resulted in a consistent predictive ability across the entire range of genotypic values in the test set. Interestingly, the Tails_GEGVs algorithm, due to its ability to leverage heterogeneity, enhanced predictive performance for key hybrids with extreme genotypic values. Our study provides new insights into the optimal utilization of historical data in plant breeding programs, resulting in improved GS model predictive ability.
Collapse
Affiliation(s)
- Javier Fernández-González
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Campus de Montegancedo-UPM, Pozuelo de Alarcón, Madrid, 28223, Spain.
| | | | | | | | | | - Julio Isidro Y Sánchez
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Campus de Montegancedo-UPM, Pozuelo de Alarcón, Madrid, 28223, Spain.
| |
Collapse
|
8
|
Lorenzi A, Bauland C, Pin S, Madur D, Combes V, Palaffre C, Guillaume C, Touzy G, Mary-Huard T, Charcosset A, Moreau L. Portability of genomic predictions trained on sparse factorial designs across two maize silage breeding cycles. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:75. [PMID: 38453705 PMCID: PMC11341662 DOI: 10.1007/s00122-024-04566-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 01/30/2024] [Indexed: 03/09/2024]
Abstract
KEY MESSAGE We validated the efficiency of genomic predictions calibrated on sparse factorial training sets to predict the next generation of hybrids and tested different strategies for updating predictions along generations. Genomic selection offers new prospects for revisiting hybrid breeding schemes by replacing extensive phenotyping of individuals with genomic predictions. Finding the ideal design for training genomic prediction models is still an open question. Previous studies have shown promising predictive abilities using sparse factorial instead of tester-based training sets to predict single-cross hybrids from the same generation. This study aims to further investigate the use of factorials and their optimization to predict line general combining abilities (GCAs) and hybrid values across breeding cycles. It relies on two breeding cycles of a maize reciprocal genomic selection scheme involving multiparental connected reciprocal populations from flint and dent complementary heterotic groups selected for silage performances. Selection based on genomic predictions trained on a factorial design resulted in a significant genetic gain for dry matter yield in the new generation. Results confirmed the efficiency of sparse factorial training sets to predict candidate line GCAs and hybrid values across breeding cycles. Compared to a previous study based on the first generation, the advantage of factorial over tester training sets appeared lower across generations. Updating factorial training sets by adding single-cross hybrids between selected lines from the previous generation or a random subset of hybrids from the new generation both improved predictive abilities. The CDmean criterion helped determine the set of single-crosses to phenotype to update the training set efficiently. Our results validated the efficiency of sparse factorial designs for calibrating hybrid genomic prediction experimentally and showed the benefit of updating it along generations.
Collapse
Affiliation(s)
- Alizarine Lorenzi
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
- RAGT2n, Genetics and Analytics Unit, 12510, Druelle, France
| | - Cyril Bauland
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
| | - Sophie Pin
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
| | - Delphine Madur
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
| | - Valérie Combes
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
| | - Carine Palaffre
- UE 0394 SMH, INRAE, 2297 Route de l'INRA, 40390, Saint-Martin-de-Hinx, France
| | | | - Gaëtan Touzy
- RAGT2n, Genetics and Analytics Unit, 12510, Druelle, France
| | - Tristan Mary-Huard
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Alain Charcosset
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
| | - Laurence Moreau
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France.
| |
Collapse
|
9
|
de Verdal H, Baertschi C, Frouin J, Quintero C, Ospina Y, Alvarez MF, Cao TV, Bartholomé J, Grenier C. Optimization of Multi-Generation Multi-location Genomic Prediction Models for Recurrent Genomic Selection in an Upland Rice Population. RICE (NEW YORK, N.Y.) 2023; 16:43. [PMID: 37758969 PMCID: PMC10533757 DOI: 10.1186/s12284-023-00661-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 09/19/2023] [Indexed: 09/29/2023]
Abstract
Genomic selection is a worthy breeding method to improve genetic gain in recurrent selection breeding schemes. The integration of multi-generation and multi-location information could significantly improve genomic prediction models in the context of shuttle breeding. The Cirad-CIAT upland rice breeding program applies recurrent genomic selection and seeks to optimize the scheme to increase genetic gain while reducing phenotyping efforts. We used a synthetic population (PCT27) of which S0 plants were all genotyped and advanced by selfing and bulk seed harvest to the S0:2, S0:3, and S0:4 generations. The PCT27 was then divided into two sets. The S0:2 and S0:3 progenies for PCT27A and the S0:4 progenies for PCT27B were phenotyped in two locations: Santa Rosa the target selection location, within the upland rice growing area, and Palmira, the surrogate location, far from the upland rice growing area but easier for experimentation. While the calibration used either one of the two sets phenotyped in one or two locations, the validation population was only the PCT27B phenotyped in Santa Rosa. Five scenarios of genomic prediction and 24 models were performed and compared. Training the prediction model with the PCT27B phenotyped in Santa Rosa resulted in predictive abilities ranging from 0.19 for grain zinc concentration to 0.30 for grain yield. Expanding the training set with the inclusion of the PCT27A resulted in greater predictive abilities for all traits but grain yield, with increases from 5% for plant height to 61% for grain zinc concentration. Models with the PCT27B phenotyped in two locations resulted in higher prediction accuracy when the models assumed no genotype-by-environment (G × E) interaction for flowering (0.38) and grain zinc concentration (0.27). For plant height, the model assuming a single G × E variance provided higher accuracy (0.28). The gain in predictive ability for grain yield was the greatest (0.25) when environment-specific variance deviation effect for G × E was considered. While the best scenario was specific to each trait, the results indicated that the gain in predictive ability provided by the multi-location and multi-generation calibration was low. Yet, this approach could lead to increased selection intensity, acceleration of the breeding cycle, and a sizable economic advantage for the program.
Collapse
Affiliation(s)
- Hugues de Verdal
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France.
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France.
| | - Cédric Baertschi
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France
| | - Julien Frouin
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France
| | - Constanza Quintero
- Alliance Bioversity-CIAT, A.A.6713, Km 17 Recta Palmira Cali, Cali, Colombia
| | - Yolima Ospina
- Alliance Bioversity-CIAT, A.A.6713, Km 17 Recta Palmira Cali, Cali, Colombia
| | | | - Tuong-Vi Cao
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France
| | - Jérôme Bartholomé
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France
- Alliance Bioversity-CIAT, A.A.6713, Km 17 Recta Palmira Cali, Cali, Colombia
| | - Cécile Grenier
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France.
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France.
- Alliance Bioversity-CIAT, A.A.6713, Km 17 Recta Palmira Cali, Cali, Colombia.
| |
Collapse
|
10
|
Liu Y, Ao M, Lu M, Zheng S, Zhu F, Ruan Y, Guan Y, Zhang A, Cui Z. Genomic selection to improve husk tightness based on genomic molecular markers in maize. FRONTIERS IN PLANT SCIENCE 2023; 14:1252298. [PMID: 37828926 PMCID: PMC10566295 DOI: 10.3389/fpls.2023.1252298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 09/04/2023] [Indexed: 10/14/2023]
Abstract
Introduction The husk tightness (HTI) in maize plays a crucial role in regulating the water content of ears during the maturity stage, thereby influencing the quality of mechanical grain harvesting in China. Genomic selection (GS), which employs molecular markers, offers a promising approach for identifying and selecting inbred lines with the desired HTI trait in maize breeding. However, the effectiveness of GS is contingent upon various factors, including the genetic architecture of breeding populations, sequencing platforms, and statistical models. Methods An association panel of maize inbred lines was grown across three sites over two years, divided into four subgroups. GS analysis for HTI prediction was performed using marker data from three sequencing platforms and six marker densities with six statistical methods. Results The findings indicate that a loosely attached husk can aid in the dissipation of water from kernels in temperate maize germplasms across most environments but not nessarily for tropical-origin maize. Considering the balance between GS prediction accuracy and breeding cost, the optimal prediction strategy is the rrBLUP model, the 50K sequencing platform, a 30% proportion of the test population, and a marker density of r2=0.1. Additionally, selecting a specific SS subgroup for sampling the testing set significantly enhances the predictive capacity for husk tightness. Discussion The determination of the optimal GS prediction strategy for HTI provides an economically feasible reference for the practice of molecular breeding. It also serves as a reference method for GS breeding of other agronomic traits.
Collapse
Affiliation(s)
- Yuncan Liu
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
- Shenyang City Key Laboratory of Maize Genomic Selection Breeding, College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang, Liaoning, China
| | - Man Ao
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
| | - Ming Lu
- Maize Research Institute, Jilin Academy of Agricultural Sciences, Gongzhuling, China
| | - Shubo Zheng
- Maize Research Institute, Jilin Academy of Agricultural Sciences, Gongzhuling, China
| | - Fangbo Zhu
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
| | - Yanye Ruan
- Shenyang City Key Laboratory of Maize Genomic Selection Breeding, College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang, Liaoning, China
| | - Yixin Guan
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
| | - Ao Zhang
- Shenyang City Key Laboratory of Maize Genomic Selection Breeding, College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang, Liaoning, China
| | - Zhenhai Cui
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
| |
Collapse
|
11
|
Wang Q, Jiang S, Li T, Qiu Z, Yan J, Fu R, Ma C, Wang X, Jiang S, Cheng Q. G2P Provides an Integrative Environment for Multi-model genomic selection analysis to improve genotype-to-phenotype prediction. FRONTIERS IN PLANT SCIENCE 2023; 14:1207139. [PMID: 37600179 PMCID: PMC10437076 DOI: 10.3389/fpls.2023.1207139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 07/21/2023] [Indexed: 08/22/2023]
Abstract
Genotype-to-phenotype (G2P) prediction has become a mainstream paradigm to facilitate genomic selection (GS)-assisted breeding in the seed industry. Many methods have been introduced for building GS models, but their prediction precision may vary depending on species and specific traits. Therefore, evaluation of multiple models and selection of the appropriate one is crucial to effective GS analysis. Here, we present the G2P container developed for the Singularity platform, which not only contains a library of 16 state-of-the-art GS models and 13 evaluation metrics. G2P works as an integrative environment offering comprehensive, unbiased evaluation analyses of the 16 GS models, which may be run in parallel on high-performance computing clusters. Based on the evaluation outcome, G2P performs auto-ensemble algorithms that not only can automatically select the most precise models but also can integrate prediction results from multiple models. This functionality should further improve the precision of G2P prediction. Another noteworthy function is the refinement design of the training set, in which G2P optimizes the training set based on the genetic diversity analysis of a studied population. Although the training samples in the optimized set are fewer than in the original set, the prediction precision is almost equivalent to that obtained when using the whole set. This functionality is quite useful in practice, as it reduces the cost of phenotyping when constructing training population. The G2P container and source codes are freely accessible at https://g2p-env.github.io/.
Collapse
Affiliation(s)
- Qian Wang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Shan Jiang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Tong Li
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Zhixu Qiu
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, China
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling, China
| | - Jun Yan
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Ran Fu
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Chuang Ma
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, China
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling, China
| | - Xiangfeng Wang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Shuqin Jiang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Qian Cheng
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| |
Collapse
|
12
|
Montesinos-López OA, Crespo-Herrera L, Saint Pierre C, Bentley AR, de la Rosa-Santamaria R, Ascencio-Laguna JA, Agbona A, Gerard GS, Montesinos-López A, Crossa J. Do feature selection methods for selecting environmental covariables enhance genomic prediction accuracy? Front Genet 2023; 14:1209275. [PMID: 37554404 PMCID: PMC10405933 DOI: 10.3389/fgene.2023.1209275] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/03/2023] [Indexed: 08/10/2023] Open
Abstract
Genomic selection (GS) is transforming plant and animal breeding, but its practical implementation for complex traits and multi-environmental trials remains challenging. To address this issue, this study investigates the integration of environmental information with genotypic information in GS. The study proposes the use of two feature selection methods (Pearson's correlation and Boruta) for the integration of environmental information. Results indicate that the simple incorporation of environmental covariates may increase or decrease prediction accuracy depending on the case. However, optimal incorporation of environmental covariates using feature selection significantly improves prediction accuracy in four out of six datasets between 14.25% and 218.71% under a leave one environment out cross validation scenario in terms of Normalized Root Mean Squared Error, but not relevant gain was observed in terms of Pearson´s correlation. In two datasets where environmental covariates are unrelated to the response variable, feature selection is unable to enhance prediction accuracy. Therefore, the study provides empirical evidence supporting the use of feature selection to improve the prediction power of GS.
Collapse
Affiliation(s)
| | | | | | - Alison R. Bentley
- International Maize and Wheat Improvement Center (CIMMYT), El Battan, Mexico
| | | | | | - Afolabi Agbona
- International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
- Molecular & Environmental Plant Sciences, Texas A&M University, College Station, TX, United States
| | - Guillermo S. Gerard
- International Maize and Wheat Improvement Center (CIMMYT), El Battan, Mexico
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, JA, Mexico
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), El Battan, Mexico
- Colegio de Postgraduados, Campus Montecillos, Montecillos, Mexico
| |
Collapse
|
13
|
Pégard M, Barre P, Delaunay S, Surault F, Karagić D, Milić D, Zorić M, Ruttink T, Julier B. Genome-wide genotyping data renew knowledge on genetic diversity of a worldwide alfalfa collection and give insights on genetic control of phenology traits. FRONTIERS IN PLANT SCIENCE 2023; 14:1196134. [PMID: 37476178 PMCID: PMC10354441 DOI: 10.3389/fpls.2023.1196134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 05/30/2023] [Indexed: 07/22/2023]
Abstract
China's and Europe's dependence on imported protein is a threat to the food self-sufficiency of these regions. It could be solved by growing more legumes, including alfalfa that is the highest protein producer under temperate climate. To create productive and high-value varieties, the use of large genetic diversity combined with genomic evaluation could improve current breeding programs. To study alfalfa diversity, we have used a set of 395 alfalfa accessions (i.e. populations), mainly from Europe, North and South America and China, with fall dormancy ranging from 3 to 7 on a scale of 11. Five breeders provided materials (617 accessions) that were compared to the 400 accessions. All accessions were genotyped using Genotyping-by-Sequencing (GBS) to obtain SNP allele frequency. These genomic data were used to describe genetic diversity and identify genetic groups. The accessions were phenotyped for phenology traits (fall dormancy and flowering date) at two locations (Lusignan in France, Novi Sad in Serbia) from 2018 to 2021. The QTL were detected by a Multi-Locus Mixed Model (mlmm). Subsequently, the quality of the genomic prediction for each trait was assessed. Cross-validation was used to assess the quality of prediction by testing GBLUP, Bayesian Ridge Regression (BRR), and Bayesian Lasso methods. A genetic structure with seven groups was found. Most of these groups were related to the geographical origin of the accessions and showed that European and American material is genetically distinct from Chinese material. Several QTL associated with fall dormancy were found and most of these were linked to genes. In our study, the infinitesimal methods showed a higher prediction quality than the Bayesian Lasso, and the genomic prediction achieved high (>0.75) predicting abilities in some cases. Our results are encouraging for alfalfa breeding by showing that it is possible to achieve high genomic prediction quality.
Collapse
Affiliation(s)
| | | | | | | | - Djura Karagić
- Login EKO doo, Bulevar Zorana Đinđića 125, Novi Beograd, Serbia
| | - Dragan Milić
- International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Miroslav Zorić
- Login EKO doo, Bulevar Zorana Đinđića 125, Novi Beograd, Serbia
| | | | | |
Collapse
|
14
|
He S, Liang S, Meng L, Cao L, Ye G. Sparse Phenotyping and Haplotype-Based Models for Genomic Prediction in Rice. RICE (NEW YORK, N.Y.) 2023; 16:27. [PMID: 37284992 DOI: 10.1186/s12284-023-00643-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 05/20/2023] [Indexed: 06/08/2023]
Abstract
The multi-environment genomic selection enables plant breeders to select varieties resilient to diverse environments or particularly adapted to specific environments, which holds a great potential to be used in rice breeding. To realize the multi-environment genomic selection, a robust training set with multi-environment phenotypic data is of necessity. Considering the huge potential of genomic prediction enhanced sparse phenotyping on the cost saving of multi-environment trials (MET), the establishment of a multi-environment training set could also benefit from it. Optimizing the genomic prediction methods is also crucial to enhance the multi-environment genomic selection. Using haplotype-based genomic prediction models is able to capture local epistatic effects which could be conserved and accumulated across generations much like additive effects thereby benefitting breeding. However, previous studies often used fixed length haplotypes composed by a few adjacent molecular markers disregarding the linkage disequilibrium (LD) which is of essential role in determining the haplotype length. In our study, based on three rice populations with different sizes and compositions, we investigated the usefulness and effectiveness of multi-environment training sets with varying phenotyping intensities and different haplotype-based genomic prediction models based on LD-derived haplotype blocks for two agronomic traits, i.e., days to heading (DTH) and plant height (PH). Results showed that phenotyping merely 30% records in multi-environment training set is able to provide a comparable prediction accuracy to high phenotyping intensities; the local epistatic effects are much likely existent in DTH; dividing the LD-derived haplotype blocks into small segments with two or three single nucleotide polymorphisms (SNPs) helps to maintain the predictive ability of haplotype-based models in large populations; modelling the covariances between environments improves genomic prediction accuracy. Our study provides means to improve the efficiency of multi-environment genomic selection in rice.
Collapse
Affiliation(s)
- Sang He
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, China
- CAAS-IRRI Joint Laboratory for Genomics-Assisted Germplasm Enhancement, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, China
| | - Shanshan Liang
- Tianjin Key Laboratory of Animal and Plant Resistance, College of Life Sciences, Tianjin Normal University, Tianjin, 300387, China
| | - Lijun Meng
- Kunpeng Institute of Modern Agriculture at Foshan, Foshan, 528200, China
| | - Liyong Cao
- Key Laboratory for Zhejiang Super Rice Research, China National Rice Research Institute, Hangzhou, 310006, China.
| | - Guoyou Ye
- CAAS-IRRI Joint Laboratory for Genomics-Assisted Germplasm Enhancement, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, China.
- Rice Breeding Innovations Platform, International Rice Research Institute, Metro Manila, Philippines.
| |
Collapse
|
15
|
Wu PY, Ou JH, Liao CT. Sample size determination for training set optimization in genomic prediction. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:57. [PMID: 36912999 PMCID: PMC10011335 DOI: 10.1007/s00122-023-04254-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 11/07/2022] [Indexed: 06/18/2023]
Abstract
A practical approach is developed to determine a cost-effective optimal training set for selective phenotyping in a genomic prediction study. An R function is provided to facilitate the application of the approach. Genomic prediction (GP) is a statistical method used to select quantitative traits in animal or plant breeding. For this purpose, a statistical prediction model is first built that uses phenotypic and genotypic data in a training set. The trained model is then used to predict genomic estimated breeding values (GEBVs) for individuals within a breeding population. Setting the sample size of the training set usually takes into account time and space constraints that are inevitable in an agricultural experiment. However, the determination of the sample size remains an unresolved issue for a GP study. By applying the logistic growth curve to identify prediction accuracy for the GEBVs and the training set size, a practical approach was developed to determine a cost-effective optimal training set for a given genome dataset with known genotypic data. Three real genome datasets were used to illustrate the proposed approach. An R function is provided to facilitate widespread application of this approach to sample size determination, which can help breeders to identify a set of genotypes with an economical sample size for selective phenotyping.
Collapse
Affiliation(s)
- Po-Ya Wu
- Department of Agronomy, National Taiwan University, Taipei, Taiwan
- Institute for Quantitative Genetics and Genomics of Plants, Heinrich Heine University, Düsseldorf, Germany
| | - Jen-Hsiang Ou
- Department of Agronomy, National Taiwan University, Taipei, Taiwan
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Chen-Tuo Liao
- Department of Agronomy, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
16
|
Ficht A, Konkin DJ, Cram D, Sidebottom C, Tan Y, Pozniak C, Rajcan I. Genomic selection for agronomic traits in a winter wheat breeding program. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:38. [PMID: 36897431 DOI: 10.1007/s00122-023-04294-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 12/19/2022] [Indexed: 06/18/2023]
Abstract
rAMP-seq based genomic selection for agronomic traits has been shown to be a useful tool for winter wheat breeding programs by increasing the rate of genetic gain. Genomic selection (GS) is an effective strategy to employ in a breeding program that focuses on optimizing quantitative traits, which results in the ability for breeders to select the best genotypes. GS was incorporated into a breeding program to determine the potential for implementation on an annual basis, with emphasis on selecting optimal parents and decreasing the time and costs associated with phenotyping large numbers of genotypes. The design options for applying repeat amplification sequencing (rAMP-seq) in bread wheat were explored, and a low-cost single primer pair strategy was implemented. A total of 1870 winter wheat genotypes were phenotyped and genotyped using rAMP-seq. The optimization of training to testing population size showed that the 70:30 ratio provided the most consistent prediction accuracy. Three GS models were tested, rrBLUP, RKHS and feed-forward neural networks using the University of Guelph Winter Wheat Breeding Program (UGWWBP) and Elite-UGWWBP populations. The models performed equally well for both populations and did not differ in prediction accuracy (r) for most agronomic traits, with the exception of yield, where RKHS performed the best with an r = 0.34 and 0.39 for each population, respectively. The ability to operate a breeding program where multiple selection strategies, including GS, are utilized will lead to higher efficiency in the program and ultimately lead to a higher rate of genetic gain.
Collapse
Affiliation(s)
- Alexandra Ficht
- Department of Plant Agriculture, University of Guelph, Crop Science Building, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| | - David J Konkin
- Aquatic and Crop Resource Development Research Centre, National Research Council of Canada, Saskatoon, Canada
| | - Dustin Cram
- Aquatic and Crop Resource Development Research Centre, National Research Council of Canada, Saskatoon, Canada
| | - Christine Sidebottom
- Aquatic and Crop Resource Development Research Centre, National Research Council of Canada, Saskatoon, Canada
| | - Yifang Tan
- Aquatic and Crop Resource Development Research Centre, National Research Council of Canada, Saskatoon, Canada
| | - Curtis Pozniak
- Department of Plant Sciences, Crop Development Centre, University of Saskatchewan, Room 2E64, Agriculture Building, 51 Campus Drive, Saskatoon, SK, S7N 5A8, Canada
| | - Istvan Rajcan
- Department of Plant Agriculture, University of Guelph, Crop Science Building, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada.
| |
Collapse
|
17
|
Fernández-González J, Akdemir D, Isidro Y Sánchez J. A comparison of methods for training population optimization in genomic selection. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:30. [PMID: 36892603 PMCID: PMC9998580 DOI: 10.1007/s00122-023-04265-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 11/21/2022] [Indexed: 06/18/2023]
Abstract
Maximizing CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50-55% (targeted) or 65-85% (untargeted) is needed to obtain 95% of the accuracy. With the advent of genomic selection (GS) as a widespread breeding tool, mechanisms to efficiently design an optimal training set for GS models became more relevant, since they allow maximizing the accuracy while minimizing the phenotyping costs. The literature described many training set optimization methods, but there is a lack of a comprehensive comparison among them. This work aimed to provide an extensive benchmark among optimization methods and optimal training set size by testing a wide range of them in seven datasets, six different species, different genetic architectures, population structure, heritabilities, and with several GS models to provide some guidelines about their application in breeding programs. Our results showed that targeted optimization (uses information from the test set) performed better than untargeted (does not use test set data), especially when heritability was low. The mean coefficient of determination was the best targeted method, although it was computationally intensive. Minimizing the average relationship within the training set was the best strategy for untargeted optimization. Regarding the optimal training set size, maximum accuracy was obtained when the training set was the entire candidate set. Nevertheless, a 50-55% of the candidate set was enough to reach 95-100% of the maximum accuracy in the targeted scenario, while we needed a 65-85% for untargeted optimization. Our results also suggested that a diverse training set makes GS robust against population structure, while including clustering information was less effective. The choice of the GS model did not have a significant influence on the prediction accuracies.
Collapse
Affiliation(s)
- Javier Fernández-González
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Madrid, Spain.
| | - Deniz Akdemir
- CIBMTR (Center for International Blood and Marrow Transplant Research), National Marrow Donor Program/Be The Match, Minneapolis, USA
| | - Julio Isidro Y Sánchez
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Madrid, Spain.
| |
Collapse
|
18
|
Jeon D, Kang Y, Lee S, Choi S, Sung Y, Lee TH, Kim C. Digitalizing breeding in plants: A new trend of next-generation breeding based on genomic prediction. FRONTIERS IN PLANT SCIENCE 2023; 14:1092584. [PMID: 36743488 PMCID: PMC9892199 DOI: 10.3389/fpls.2023.1092584] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 01/05/2023] [Indexed: 06/18/2023]
Abstract
As the world's population grows and food needs diversification, the demand for cereals and horticultural crops with beneficial traits increases. In order to meet a variety of demands, suitable cultivars and innovative breeding methods need to be developed. Breeding methods have changed over time following the advance of genetics. With the advent of new sequencing technology in the early 21st century, predictive breeding, such as genomic selection (GS), emerged when large-scale genomic information became available. GS shows good predictive ability for the selection of individuals with traits of interest even for quantitative traits by using various types of the whole genome-scanning markers, breaking away from the limitations of marker-assisted selection (MAS). In the current review, we briefly describe the history of breeding techniques, each breeding method, various statistical models applied to GS and methods to increase the GS efficiency. Consequently, we intend to propose and define the term digital breeding through this review article. Digital breeding is to develop a predictive breeding methods such as GS at a higher level, aiming to minimize human intervention by automatically proceeding breeding design, propagating breeding populations, and to make selections in consideration of various environments, climates, and topography during the breeding process. We also classified the phases of digital breeding based on the technologies and methods applied to each phase. This review paper will provide an understanding and a direction for the final evolution of plant breeding in the future.
Collapse
Affiliation(s)
- Donghyun Jeon
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
| | - Yuna Kang
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Solji Lee
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Sehyun Choi
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Yeonjun Sung
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
| | - Tae-Ho Lee
- Genomics Division, National Institute of Agricultural Sciences, Jeonju, Republic of Korea
| | - Changsoo Kim
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| |
Collapse
|
19
|
Gevartosky R, Carvalho HF, Costa-Neto G, Montesinos-López OA, Crossa J, Fritsche-Neto R. Enviromic-based kernels may optimize resource allocation with multi-trait multi-environment genomic prediction for tropical Maize. BMC PLANT BIOLOGY 2023; 23:10. [PMID: 36604618 PMCID: PMC9814176 DOI: 10.1186/s12870-022-03975-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND Success in any genomic prediction platform is directly dependent on establishing a representative training set. This is a complex task, even in single-trait single-environment conditions and tends to be even more intricated wherein additional information from envirotyping and correlated traits are considered. Here, we aimed to design optimized training sets focused on genomic prediction, considering multi-trait multi-environment trials, and how those methods may increase accuracy reducing phenotyping costs. For that, we considered single-trait multi-environment trials and multi-trait multi-environment trials for three traits: grain yield, plant height, and ear height, two datasets, and two cross-validation schemes. Next, two strategies for designing optimized training sets were conceived, first considering only the genomic by environment by trait interaction (GET), while a second including large-scale environmental data (W, enviromics) as genomic by enviromic by trait interaction (GWT). The effective number of individuals (genotypes × environments × traits) was assumed as those that represent at least 98% of each kernel (GET or GWT) variation, in which those individuals were then selected by a genetic algorithm based on prediction error variance criteria to compose an optimized training set for genomic prediction purposes. RESULTS The combined use of genomic and enviromic data efficiently designs optimized training sets for genomic prediction, improving the response to selection per dollar invested by up to 145% when compared to the model without enviromic data, and even more when compared to cross validation scheme with 70% of training set or pure phenotypic selection. Prediction models that include G × E or enviromic data + G × E yielded better prediction ability. CONCLUSIONS Our findings indicate that a genomic by enviromic by trait interaction kernel associated with genetic algorithms is efficient and can be proposed as a promising approach to designing optimized training sets for genomic prediction when the variance-covariance matrix of traits is available. Additionally, great improvements in the genetic gains per dollar invested were observed, suggesting that a good allocation of resources can be deployed by using the proposed approach.
Collapse
Affiliation(s)
- Raysa Gevartosky
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil.
| | - Humberto Fanelli Carvalho
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Germano Costa-Neto
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
- Institute for Genomics Diversity, Cornell University, Ithaca, NY, USA
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texcoco, Edo. de México, Mexico
- Colegio de Postgraduados, CP 56230, Montecillos, Edo. de México, Mexico
| | - Roberto Fritsche-Neto
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
| |
Collapse
|
20
|
Ballén-Taborda C, Lyerly J, Smith J, Howell K, Brown-Guedira G, Babar MA, Harrison SA, Mason RE, Mergoum M, Murphy JP, Sutton R, Griffey CA, Boyles RE. Utilizing genomics and historical data to optimize gene pools for new breeding programs: A case study in winter wheat. Front Genet 2022; 13:964684. [PMID: 36276956 PMCID: PMC9585219 DOI: 10.3389/fgene.2022.964684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 08/05/2022] [Indexed: 11/13/2022] Open
Abstract
With the rapid generation and preservation of both genomic and phenotypic information for many genotypes within crops and across locations, emerging breeding programs have a valuable opportunity to leverage these resources to 1) establish the most appropriate genetic foundation at program inception and 2) implement robust genomic prediction platforms that can effectively select future breeding lines. Integrating genomics-enabled1 breeding into cultivar development can save costs and allow resources to be reallocated towards advanced (i.e., later) stages of field evaluation, which can facilitate an increased number of testing locations and replicates within locations. In this context, a reestablished winter wheat breeding program was used as a case study to understand best practices to leverage and tailor existing genomic and phenotypic resources to determine optimal genetics for a specific target population of environments. First, historical multi-environment phenotype data, representing 1,285 advanced breeding lines, were compiled from multi-institutional testing as part of the SunGrains cooperative and used to produce GGE biplots and PCA for yield. Locations were clustered based on highly correlated line performance among the target population of environments into 22 subsets. For each of the subsets generated, EMMs and BLUPs were calculated using linear models with the ‘lme4’ R package. Second, for each subset, TPs representative of the new SC breeding lines were determined based on genetic relatedness using the ‘STPGA’ R package. Third, for each TP, phenotypic values and SNP data were incorporated into the ‘rrBLUP’ mixed models for generation of GEBVs of YLD, TW, HD and PH. Using a five-fold cross-validation strategy, an average accuracy of r = 0.42 was obtained for yield between all TPs. The validation performed with 58 SC elite breeding lines resulted in an accuracy of r = 0.62 when the TP included complete historical data. Lastly, QTL-by-environment interaction for 18 major effect genes across three geographic regions was examined. Lines harboring major QTL in the absence of disease could potentially underperform (e.g., Fhb1 R-gene), whereas it is advantageous to express a major QTL under biotic pressure (e.g., stripe rust R-gene). This study highlights the importance of genomics-enabled breeding and multi-institutional partnerships to accelerate cultivar development.
Collapse
Affiliation(s)
- Carolina Ballén-Taborda
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, United States
- Pee Dee Research and Education Center, Clemson University, Florence, SC, United States
| | - Jeanette Lyerly
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
| | - Jared Smith
- U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), Raleigh, NC, United States
| | - Kimberly Howell
- U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), Raleigh, NC, United States
| | - Gina Brown-Guedira
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
- U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), Raleigh, NC, United States
| | - Md. Ali Babar
- Agronomy Department, University of Florida, Gainesville, FL, United States
| | - Stephen A. Harrison
- School of Plant, Environmental and Soil Sciences, Louisiana State University, Baton Rouge, LA, United States
| | - Richard E. Mason
- College of Agricultural Sciences, Colorado State University, Fort Collins, CO, United States
| | - Mohamed Mergoum
- Department of Crop and Soil Sciences, University of Georgia, Griffin, GA, United States
| | - J. Paul Murphy
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
| | - Russell Sutton
- Department of Soil and Crop Sciences, Texas A&M University, Commerce, TX, United States
| | - Carl A. Griffey
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, United States
| | - Richard E. Boyles
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, United States
- Pee Dee Research and Education Center, Clemson University, Florence, SC, United States
- *Correspondence: Richard E. Boyles,
| |
Collapse
|
21
|
Li Z, Liu S, Conaty W, Zhu QH, Moncuquet P, Stiller W, Wilson I. Genomic prediction of cotton fibre quality and yield traits using Bayesian regression methods. Heredity (Edinb) 2022; 129:103-112. [PMID: 35523950 PMCID: PMC9338257 DOI: 10.1038/s41437-022-00537-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 04/05/2022] [Accepted: 04/07/2022] [Indexed: 01/26/2023] Open
Abstract
Genomic selection or genomic prediction (GP) has increasingly become an important molecular breeding technology for crop improvement. GP aims to utilise genome-wide marker data to predict genomic breeding value for traits of economic importance. Though GP studies have been widely conducted in various crop species such as wheat and maize, its application in cotton, an essential renewable textile fibre crop, is still significantly underdeveloped. We aim to develop a new GP-based breeding system that can improve the efficiency of our cotton breeding program. This article presents a GP study on cotton fibre quality and yield traits using 1385 breeding lines from the Commonwealth Scientific and Industrial Research Organisation (CSIRO, Australia) cotton breeding program which were genotyped using a high-density SNP chip that generated 12,296 informative SNPs. The aim of this study was twofold: (1) to identify the models and data sources (i.e. genomic and pedigree) that produce the highest prediction accuracies; and (2) to assess the effectiveness of GP as a selection tool in the CSIRO cotton breeding program. The prediction analyses were conducted under various scenarios using different Bayesian predictive models. Results highlighted that the model combining genomic and pedigree information resulted in the best cross validated prediction accuracies: 0.76 for fibre length, 0.65 for fibre strength, and 0.64 for lint yield. Overall, this work represents the largest scale genomic selection studies based on cotton breeding trial data. Prediction accuracies reported in our study indicate the potential of GP as a breeding tool for cotton. The study highlighted the importance of incorporating pedigree and environmental factors in GP models to optimise the prediction performance.
Collapse
Affiliation(s)
- Zitong Li
- CSIRO Agriculture & Food, GPO Box 1600, Canberra, ACT, 2601, Australia.
| | - Shiming Liu
- CSIRO Agriculture & Food, Locked Bag 59, Narrabri, NSW, 2390, Australia
| | - Warren Conaty
- CSIRO Agriculture & Food, Locked Bag 59, Narrabri, NSW, 2390, Australia
| | - Qian-Hao Zhu
- CSIRO Agriculture & Food, GPO Box 1600, Canberra, ACT, 2601, Australia
| | | | - Warwick Stiller
- CSIRO Agriculture & Food, Locked Bag 59, Narrabri, NSW, 2390, Australia
| | - Iain Wilson
- CSIRO Agriculture & Food, GPO Box 1600, Canberra, ACT, 2601, Australia
| |
Collapse
|
22
|
Building a Calibration Set for Genomic Prediction, Characteristics to Be Considered, and Optimization Approaches. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:77-112. [PMID: 35451773 DOI: 10.1007/978-1-0716-2205-6_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The efficiency of genomic selection strongly depends on the prediction accuracy of the genetic merit of candidates. Numerous papers have shown that the composition of the calibration set is a key contributor to prediction accuracy. A poorly defined calibration set can result in low accuracies, whereas an optimized one can considerably increase accuracy compared to random sampling, for a same size. Alternatively, optimizing the calibration set can be a way of decreasing the costs of phenotyping by enabling similar levels of accuracy compared to random sampling but with fewer phenotypic units. We present here the different factors that have to be considered when designing a calibration set, and review the different criteria proposed in the literature. We classified these criteria into two groups: model-free criteria based on relatedness, and criteria derived from the linear mixed model. We introduce criteria targeting specific prediction objectives including the prediction of highly diverse panels, biparental families, or hybrids. We also review different ways of updating the calibration set, and different procedures for optimizing phenotyping experimental designs.
Collapse
|
23
|
Bartholomé J, Prakash PT, Cobb JN. Genomic Prediction: Progress and Perspectives for Rice Improvement. Methods Mol Biol 2022; 2467:569-617. [PMID: 35451791 DOI: 10.1007/978-1-0716-2205-6_21] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genomic prediction can be a powerful tool to achieve greater rates of genetic gain for quantitative traits if thoroughly integrated into a breeding strategy. In rice as in other crops, the interest in genomic prediction is very strong with a number of studies addressing multiple aspects of its use, ranging from the more conceptual to the more practical. In this chapter, we review the literature on rice (Oryza sativa) and summarize important considerations for the integration of genomic prediction in breeding programs. The irrigated breeding program at the International Rice Research Institute is used as a concrete example on which we provide data and R scripts to reproduce the analysis but also to highlight practical challenges regarding the use of predictions. The adage "To someone with a hammer, everything looks like a nail" describes a common psychological pitfall that sometimes plagues the integration and application of new technologies to a discipline. We have designed this chapter to help rice breeders avoid that pitfall and appreciate the benefits and limitations of applying genomic prediction, as it is not always the best approach nor the first step to increasing the rate of genetic gain in every context.
Collapse
Affiliation(s)
- Jérôme Bartholomé
- CIRAD, UMR AGAP Institut, Montpellier, France.
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France.
- Rice Breeding Platform, International Rice Research Institute, Manila, Philippines.
| | | | | |
Collapse
|
24
|
Cazenave X, Petit B, Lateur M, Nybom H, Sedlak J, Tartarini S, Laurens F, Durel CE, Muranty H. Combining genetic resources and elite material populations to improve the accuracy of genomic prediction in apple. G3 (BETHESDA, MD.) 2021; 12:6459174. [PMID: 34893831 PMCID: PMC9210277 DOI: 10.1093/g3journal/jkab420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 11/29/2021] [Indexed: 11/12/2022]
Abstract
Genomic selection is an attractive strategy for apple breeding that could reduce the length of breeding cycles. A possible limitation to the practical implementation of this approach lies in the creation of a training set large and diverse enough to ensure accurate predictions. In this study, we investigated the potential of combining two available populations, i.e., genetic resources and elite material, in order to obtain a large training set with a high genetic diversity. We compared the predictive ability of genomic predictions within-population, across-population or when combining both populations, and tested a model accounting for population-specific marker effects in this last case. The obtained predictive abilities were moderate to high according to the studied trait and small increases in predictive ability could be obtained for some traits when the two populations were combined into a unique training set. We also investigated the potential of such a training set to predict hybrids resulting from crosses between the two populations, with a focus on the method to design the training set and the best proportion of each population to optimize predictions. The measured predictive abilities were very similar for all the proportions, except for the extreme cases where only one of the two populations was used in the training set, in which case predictive abilities could be lower than when using both populations. Using an optimization algorithm to choose the genotypes in the training set also led to higher predictive abilities than when the genotypes were chosen at random. Our results provide guidelines to initiate breeding programs that use genomic selection when the implementation of the training set is a limitation.
Collapse
Affiliation(s)
- Xabi Cazenave
- Univ Angers, INRAE, Institut Agro, IRHS, SFR QuaSaV, F-49000 Angers, France
| | - Bernard Petit
- Univ Angers, INRAE, Institut Agro, IRHS, SFR QuaSaV, F-49000 Angers, France
| | - Marc Lateur
- Plant Breeding and Biodiversity, Centre Wallon de Recherches Agronomiques, Gembloux, Belgium
| | - Hilde Nybom
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Kristianstad, Sweden
| | - Jiri Sedlak
- Výzkumný a Šlechtitelský ústav Ovocnářský Holovousy s.r.o, Holovousy, Czech Republic
| | - Stefano Tartarini
- Department of Agricultural Sciences, University of Bologna, Bologna, Italy
| | - François Laurens
- Univ Angers, INRAE, Institut Agro, IRHS, SFR QuaSaV, F-49000 Angers, France
| | - Charles-Eric Durel
- Univ Angers, INRAE, Institut Agro, IRHS, SFR QuaSaV, F-49000 Angers, France
| | - Hélène Muranty
- Univ Angers, INRAE, Institut Agro, IRHS, SFR QuaSaV, F-49000 Angers, France,Corresponding author:
| |
Collapse
|
25
|
Rio S, Gallego-Sánchez L, Montilla-Bascón G, Canales FJ, Isidro Y Sánchez J, Prats E. Genomic prediction and training set optimization in a structured Mediterranean oat population. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:3595-3609. [PMID: 34341832 DOI: 10.1007/s00122-021-03916-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 07/13/2021] [Indexed: 05/22/2023]
Abstract
The strong genetic structure observed in Mediterranean oats affects the predictive ability of genomic prediction as well as the performance of training set optimization methods. In this study, we investigated the efficiency of genomic prediction and training set optimization in a highly structured population of cultivars and landraces of cultivated oat (Avena sativa) from the Mediterranean basin, including white (subsp. sativa) and red (subsp. byzantina) oats, genotyped using genotype-by-sequencing markers and evaluated for agronomic traits in Southern Spain. For most traits, the predictive abilities were moderate to high with little differences between models, except for biomass for which Bayes-B showed a substantial gain compared to other models. The consistency between the structure of the training population and the population to be predicted was key to the predictive ability of genomic predictions. The predictive ability of inter-subspecies predictions was indeed much lower than that of intra-subspecies predictions for all traits. Regarding training set optimization, the linear mixed model optimization criteria (prediction error variance (PEVmean) and coefficient of determination (CDmean)) performed better than the heuristic approach "partitioning around medoids," even under high population structure. The superiority of CDmean and PEVmean could be explained by their ability to adapt the representation of each genetic group according to those represented in the population to be predicted. These results represent an important step towards the implementation of genomic prediction in oat breeding programs and address important issues faced by the genomic prediction community regarding population structure and training set optimization.
Collapse
Affiliation(s)
- Simon Rio
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain.
| | - Luis Gallego-Sánchez
- Institute for Sustainable Agriculture, Spanish Research Council (CSIC), Córdoba, Spain
| | | | - Francisco J Canales
- Institute for Sustainable Agriculture, Spanish Research Council (CSIC), Córdoba, Spain
| | - Julio Isidro Y Sánchez
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain
| | - Elena Prats
- Institute for Sustainable Agriculture, Spanish Research Council (CSIC), Córdoba, Spain
| |
Collapse
|
26
|
Dzievit MJ, Guo T, Li X, Yu J. Comprehensive analytical and empirical evaluation of genomic prediction across diverse accessions in maize. THE PLANT GENOME 2021; 14:e20160. [PMID: 34661990 DOI: 10.1002/tpg2.20160] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 08/23/2021] [Indexed: 06/13/2023]
Abstract
Efficiently exploiting natural genetic diversity captured by accessions stored in genebanks is crucial to genetic improvement of major crops. Selecting accessions of interest from genebanks has traditionally required information from extensive and expensive evaluation; however, low-cost genotyping combined with genomic prediction have enabled us to generate predicted genetic merits for the entire set with targeted phenotypic evaluation of representative subsets. To explore this general approach, analytical assessment and empirical validation of the maize (Zea mays L.) association population (MAP) as a training population were conducted in the present study. Cross-validation within the MAP revealed mostly modest to strong predictive ability for 36 traits, generally in parallel with the square root of heritability. The MAP was then used to train the prediction models to generate genomic estimated breeding values (GEBVs) for the Ames Diversity Panel. Empirical validation conducted for nine traits across two validation populations confirmed the accuracy level indicated by the cross-validation of the training population. An upper bound for reliability (U value) was calculated for the accessions in the prediction population using genotypic data. The group of accessions with high U values generally had high predictive ability, even though the range of observed trait values was similar to the group of accessions with low U values. Our comprehensive analysis validated the general approach of turbocharging genebanks with genomics and genomic prediction. In addition, breeders and researchers can consider both GEBVs and U values to balance the needs of improving specific traits and broadening genetic diversity when selecting accessions from genebanks.
Collapse
Affiliation(s)
| | - Tingting Guo
- Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
| | - Xianran Li
- Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
| | - Jianming Yu
- Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
| |
Collapse
|
27
|
Larkin DL, Mason RE, Moon DE, Holder AL, Ward BP, Brown-Guedira G. Predicting Fusarium Head Blight Resistance for Advanced Trials in a Soft Red Winter Wheat Breeding Program With Genomic Selection. FRONTIERS IN PLANT SCIENCE 2021; 12:715314. [PMID: 34745156 PMCID: PMC8569947 DOI: 10.3389/fpls.2021.715314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 09/27/2021] [Indexed: 06/13/2023]
Abstract
Many studies have evaluated the effectiveness of genomic selection (GS) using cross-validation within training populations; however, few have looked at its performance for forward prediction within a breeding program. The objectives for this study were to compare the performance of naïve GS (NGS) models without covariates and multi-trait GS (MTGS) models by predicting two years of F4: 7 advanced breeding lines for three Fusarium head blight (FHB) resistance traits, deoxynivalenol (DON) accumulation, Fusarium damaged kernels (FDK), and severity (SEV) in soft red winter wheat and comparing predictions with phenotypic performance over two years of selection based on selection accuracy and response to selection. On average, for DON, the NGS model correctly selected 69.2% of elite genotypes, while the MTGS model correctly selected 70.1% of elite genotypes compared with 33.0% based on phenotypic selection from the advanced generation. During the 2018 breeding cycle, GS models had the greatest response to selection for DON, FDK, and SEV compared with phenotypic selection. The MTGS model performed better than NGS during the 2019 breeding cycle for all three traits, whereas NGS outperformed MTGS during the 2018 breeding cycle for all traits except for SEV. Overall, GS models were comparable, if not better than phenotypic selection for FHB resistance traits. This is particularly helpful when adverse environmental conditions prohibit accurate phenotyping. This study also shows that MTGS models can be effective for forward prediction when there are strong correlations between traits of interest and covariates in both training and validation populations.
Collapse
Affiliation(s)
- Dylan L. Larkin
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Richard Esten Mason
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - David E. Moon
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Amanda L. Holder
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Brian P. Ward
- USDA-ARS SEA, Plant Science Research, Raleigh, NC, United States
| | - Gina Brown-Guedira
- USDA-ARS SEA, Plant Science Research, Raleigh, NC, United States
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, United States
| |
Collapse
|
28
|
Tanaka R, Lui-King J, Mandaharisoa ST, Rakotondramanana M, Ranaivo HN, Pariasca-Tanaka J, Kanegae HK, Iwata H, Wissuwa M. From gene banks to farmer's fields: using genomic selection to identify donors for a breeding program in rice to close the yield gap on smallholder farms. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:3397-3410. [PMID: 34264372 PMCID: PMC8440315 DOI: 10.1007/s00122-021-03909-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 07/06/2021] [Indexed: 06/13/2023]
Abstract
KEY MESSAGE Despite phenotyping the training set under unfavorable conditions on smallholder farms in Madagascar, we were able to successfully apply genomic prediction to select donors among gene bank accessions. Poor soil fertility and low fertilizer application rates are main reasons for the large yield gap observed for rice produced in sub-Saharan Africa. Traditional varieties that are preserved in gene banks were shown to possess traits and alleles that would improve the performance of modern variety under such low-input conditions. How to accelerate the utilization of gene bank resources in crop improvement is an unresolved question and here our objective was to test whether genomic prediction could aid in the selection of promising donors. A subset of the 3,024 sequenced accessions from the IRRI rice gene bank was phenotyped for yield and agronomic traits for two years in unfertilized farmers' fields in Madagascar, and based on these data, a genomic prediction model was developed. This model was applied to predict the performance of the entire set of 3024 accessions, and the top predicted performers were sent to Madagascar for confirmatory trials. The prediction accuracies ranged from 0.10 to 0.30 for grain yield, from 0.25 to 0.63 for straw biomass, to 0.71 for heading date. Two accessions have subsequently been utilized as donors in rice breeding programs in Madagascar. Despite having conducted phenotypic evaluations under challenging conditions on smallholder farms, our results are encouraging as the prediction accuracy realized in on-farm experiments was in the range of accuracies achieved in on-station studies. Thus, we could provide clear empirical evidence on the value of genomic selection in identifying suitable genetic resources for crop improvement, if genotypic data are available.
Collapse
Affiliation(s)
- Ryokei Tanaka
- Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo, Tokyo, 113-8657, Japan
| | - James Lui-King
- International Program in Agricultural Development Studies, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo, Tokyo, 113-8657, Japan
- Crop, Livestock and Environment Division, Japan International Research Center for Agricultural Sciences (JIRCAS), 1-1 Ohwashi, Tsukuba, Ibaraki, 305-8686, Japan
| | - Sarah Tojo Mandaharisoa
- Rice Research Department, The National Center for Applied Research On Rural Development (FOFIFA), Antananarivo, 101, Madagascar
| | - Mbolatantely Rakotondramanana
- Rice Research Department, The National Center for Applied Research On Rural Development (FOFIFA), Antananarivo, 101, Madagascar
| | - Harisoa Nicole Ranaivo
- Rice Research Department, The National Center for Applied Research On Rural Development (FOFIFA), Antananarivo, 101, Madagascar
| | - Juan Pariasca-Tanaka
- Crop, Livestock and Environment Division, Japan International Research Center for Agricultural Sciences (JIRCAS), 1-1 Ohwashi, Tsukuba, Ibaraki, 305-8686, Japan
| | - Hiromi Kajiya Kanegae
- Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo, Tokyo, 113-8657, Japan
| | - Hiroyoshi Iwata
- Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo, Tokyo, 113-8657, Japan
| | - Matthias Wissuwa
- Crop, Livestock and Environment Division, Japan International Research Center for Agricultural Sciences (JIRCAS), 1-1 Ohwashi, Tsukuba, Ibaraki, 305-8686, Japan.
| |
Collapse
|
29
|
Isidro y Sánchez J, Akdemir D. Training Set Optimization for Sparse Phenotyping in Genomic Selection: A Conceptual Overview. FRONTIERS IN PLANT SCIENCE 2021; 12:715910. [PMID: 34589099 PMCID: PMC8475495 DOI: 10.3389/fpls.2021.715910] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 08/10/2021] [Indexed: 06/13/2023]
Abstract
Genomic selection (GS) is becoming an essential tool in breeding programs due to its role in increasing genetic gain per unit time. The design of the training set (TRS) in GS is one of the key steps in the implementation of GS in plant and animal breeding programs mainly because (i) TRS optimization is critical for the efficiency and effectiveness of GS, (ii) breeders test genotypes in multi-year and multi-location trials to select the best-performing ones. In this framework, TRS optimization can help to decrease the number of genotypes to be tested and, therefore, reduce phenotyping cost and time, and (iii) we can obtain better prediction accuracies from optimally selected TRS than an arbitrary TRS. Here, we concentrate the efforts on reviewing the lessons learned from TRS optimization studies and their impact on crop breeding and discuss important features for the success of TRS optimization under different scenarios. In this article, we review the lessons learned from training population optimization in plants and the major challenges associated with the optimization of GS including population size, the relationship between training and test set (TS), update of TRS, and the use of different packages and algorithms for TRS implementation in GS. Finally, we describe general guidelines to improving the rate of genetic improvement by maximizing the use of the TRS optimization in the GS framework.
Collapse
Affiliation(s)
- Julio Isidro y Sánchez
- Centro de Biotecnologia y Genómica de Plantas, Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria, Universidad Politécnica de Madrid, Campus de Montegancedo, Madrid, Spain
| | - Deniz Akdemir
- Animal and Crop Science Division, Agriculture and Food Science Centre, University College Dublin, Dublin, Ireland
| |
Collapse
|
30
|
Zhang W, Boyle K, Brule-Babel A, Fedak G, Gao P, Djama ZR, Polley B, Cuthbert R, Randhawa H, Graf R, Jiang F, Eudes F, Fobert PR. Evaluation of Genomic Prediction for Fusarium Head Blight Resistance with a Multi-Parental Population. BIOLOGY 2021; 10:biology10080756. [PMID: 34439988 PMCID: PMC8389552 DOI: 10.3390/biology10080756] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/01/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022]
Abstract
Simple Summary Genomic selection is a promising approach to select superior wheat lines with better resistance to Fusarium head blight. The accuracy of genomic selection is determined by many factors. In this study, we found a training population with large size, genomic selection models incorporating biological information, and multi-environment modelling led to considerably better predictabilities. A training population designed by the coefficient of determination (CDmean) could increase accuracy of prediction. Relatedness between training population (TP) and testing population is the key for accuracies of genomic selection across populations. Abstract Fusarium head blight (FHB) resistance is quantitatively inherited, controlled by multiple minor effect genes, and highly affected by the interaction of genotype and environment. This makes genomic selection (GS) that uses genome-wide molecular marker data to predict the genetic breeding value as a promising approach to select superior lines with better resistance. However, various factors can affect accuracies of GS and better understanding how these factors affect GS accuracies could ensure the success of applying GS to improve FHB resistance in wheat. In this study, we performed a comprehensive evaluation of factors that affect GS accuracies with a multi-parental population designed for FHB resistance. We found larger sample sizes could get better accuracies. Training population designed by CDmean based optimization algorithms significantly increased accuracies than random sampling approach, while mean of predictor error variance (PEVmean) had the poorest performance. Different genomic selection models performed similarly for accuracies. Including prior known large effect quantitative trait loci (QTL) as fixed effect into the GS model considerably improved the predictability. Multi-traits models had almost no effects, while the multi-environment model outperformed the single environment model for prediction across different environments. By comparing within and across family prediction, better accuracies were obtained with the training population more closely related to the testing population. However, achieving good accuracies for GS prediction across populations is still a challenging issue for GS application.
Collapse
Affiliation(s)
- Wentao Zhang
- Aquatic and Crop Resources Development, National Research Council of Canada, Saskatoon, SK S7N 0W9, Canada; (K.B.); (P.G.); (B.P.)
- Correspondence: (W.Z.); (P.R.F.)
| | - Kerry Boyle
- Aquatic and Crop Resources Development, National Research Council of Canada, Saskatoon, SK S7N 0W9, Canada; (K.B.); (P.G.); (B.P.)
| | - Anita Brule-Babel
- Department of Plant Science, Agriculture Building, University of Manitoba, Winnipeg, MB R3T 2N2, Canada;
| | - George Fedak
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C6, Canada; (G.F.); (Z.R.D.)
| | - Peng Gao
- Aquatic and Crop Resources Development, National Research Council of Canada, Saskatoon, SK S7N 0W9, Canada; (K.B.); (P.G.); (B.P.)
| | - Zeinab Robleh Djama
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C6, Canada; (G.F.); (Z.R.D.)
| | - Brittany Polley
- Aquatic and Crop Resources Development, National Research Council of Canada, Saskatoon, SK S7N 0W9, Canada; (K.B.); (P.G.); (B.P.)
| | - Richard Cuthbert
- Swift Current Research and Development Centre, Agriculture and Agri-Food Canada, Swift Current, SK S9H 3X2, Canada;
| | - Harpinder Randhawa
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, AB T1J 4B1, Canada; (H.R.); (R.G.); (F.J.); (F.E.)
| | - Robert Graf
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, AB T1J 4B1, Canada; (H.R.); (R.G.); (F.J.); (F.E.)
| | - Fengying Jiang
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, AB T1J 4B1, Canada; (H.R.); (R.G.); (F.J.); (F.E.)
| | - Francois Eudes
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, AB T1J 4B1, Canada; (H.R.); (R.G.); (F.J.); (F.E.)
| | - Pierre R. Fobert
- Aquatic and Crop Resources Development, National Research Council of Canada, Ottawa, ON K1A 0R6, Canada
- Correspondence: (W.Z.); (P.R.F.)
| |
Collapse
|
31
|
Griot R, Allal F, Phocas F, Brard-Fudulea S, Morvezen R, Haffray P, François Y, Morin T, Bestin A, Bruant JS, Cariou S, Peyrou B, Brunier J, Vandeputte M. Optimization of Genomic Selection to Improve Disease Resistance in Two Marine Fishes, the European Sea Bass ( Dicentrarchus labrax) and the Gilthead Sea Bream ( Sparus aurata). Front Genet 2021; 12:665920. [PMID: 34335683 PMCID: PMC8317601 DOI: 10.3389/fgene.2021.665920] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 06/25/2021] [Indexed: 11/13/2022] Open
Abstract
Disease outbreaks are a major threat to the aquaculture industry, and can be controlled by selective breeding. With the development of high-throughput genotyping technologies, genomic selection may become accessible even in minor species. Training population size and marker density are among the main drivers of the prediction accuracy, which both have a high impact on the cost of genomic selection. In this study, we assessed the impact of training population size as well as marker density on the prediction accuracy of disease resistance traits in European sea bass (Dicentrarchus labrax) and gilthead sea bream (Sparus aurata). We performed a challenge to nervous necrosis virus (NNV) in two sea bass cohorts, a challenge to Vibrio harveyi in one sea bass cohort and a challenge to Photobacterium damselae subsp. piscicida in one sea bream cohort. Challenged individuals were genotyped on 57K-60K SNP chips. Markers were sampled to design virtual SNP chips of 1K, 3K, 6K, and 10K markers. Similarly, challenged individuals were randomly sampled to vary training population size from 50 to 800 individuals. The accuracy of genomic-based (GBLUP model) and pedigree-based estimated breeding values (EBV) (PBLUP model) was computed for each training population size using Monte-Carlo cross-validation. Genomic-based breeding values were also computed using the virtual chips to study the effect of marker density. For resistance to Viral Nervous Necrosis (VNN), as one major QTL was detected, the opportunity of marker-assisted selection was investigated by adding a QTL effect in both genomic and pedigree prediction models. As training population size increased, accuracy increased to reach values in range of 0.51-0.65 for full density chips. The accuracy could still increase with more individuals in the training population as the accuracy plateau was not reached. When using only the 6K density chip, accuracy reached at least 90% of that obtained with the full density chip. Adding the QTL effect increased the accuracy of the PBLUP model to values higher than the GBLUP model without the QTL effect. This work sets a framework for the practical implementation of genomic selection to improve the resistance to major diseases in European sea bass and gilthead sea bream.
Collapse
Affiliation(s)
- Ronan Griot
- SYSAAF, Station LPGP/INRAE, Campus de Beaulieu, Rennes, France.,Université Paris-Saclay, INRAE, AgroParisTech, GABI, Jouy-en-Josas, France.,MARBEC, Univ. Montpellier, Ifremer, CNRS, IRD, Palavas-les-Flots, France
| | - François Allal
- MARBEC, Univ. Montpellier, Ifremer, CNRS, IRD, Palavas-les-Flots, France
| | - Florence Phocas
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, Jouy-en-Josas, France
| | | | - Romain Morvezen
- SYSAAF, Station LPGP/INRAE, Campus de Beaulieu, Rennes, France
| | | | | | - Thierry Morin
- ANSES, Ploufragan-Plouzané-Niort Laboratory, Viral Fish Diseases Unit, National Reference Laboratory for Regulated Fish Diseases, Technopôle Brest-Iroise, Plouzané, France
| | | | | | | | - Bruno Peyrou
- Ecloserie Marine de Gravelines-Ichtus, Gravelines, France
| | - Joseph Brunier
- Ecloserie Marine de Gravelines-Ichtus, Gravelines, France
| | - Marc Vandeputte
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, Jouy-en-Josas, France.,MARBEC, Univ. Montpellier, Ifremer, CNRS, IRD, Palavas-les-Flots, France
| |
Collapse
|
32
|
Cersonsky RK, Helfrecht BA, Engel EA, Kliavinek S, Ceriotti M. Improving sample and feature selection with principal covariates regression. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abfe7c] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
33
|
Fritsche-Neto R, Galli G, Borges KLR, Costa-Neto G, Alves FC, Sabadin F, Lyra DH, Morais PPP, Braatz de Andrade LR, Granato I, Crossa J. Optimizing Genomic-Enabled Prediction in Small-Scale Maize Hybrid Breeding Programs: A Roadmap Review. FRONTIERS IN PLANT SCIENCE 2021; 12:658267. [PMID: 34276721 PMCID: PMC8281958 DOI: 10.3389/fpls.2021.658267] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 05/10/2021] [Indexed: 06/13/2023]
Abstract
The usefulness of genomic prediction (GP) for many animal and plant breeding programs has been highlighted for many studies in the last 20 years. In maize breeding programs, mostly dedicated to delivering more highly adapted and productive hybrids, this approach has been proved successful for both large- and small-scale breeding programs worldwide. Here, we present some of the strategies developed to improve the accuracy of GP in tropical maize, focusing on its use under low budget and small-scale conditions achieved for most of the hybrid breeding programs in developing countries. We highlight the most important outcomes obtained by the University of São Paulo (USP, Brazil) and how they can improve the accuracy of prediction in tropical maize hybrids. Our roadmap starts with the efforts for germplasm characterization, moving on to the practices for mating design, and the selection of the genotypes that are used to compose the training population in field phenotyping trials. Factors including population structure and the importance of non-additive effects (dominance and epistasis) controlling the desired trait are also outlined. Finally, we explain how the source of the molecular markers, environmental, and the modeling of genotype-environment interaction can affect the accuracy of GP. Results of 7 years of research in a public maize hybrid breeding program under tropical conditions are discussed, and with the great advances that have been made, we find that what is yet to come is exciting. The use of open-source software for the quality control of molecular markers, implementing GP, and envirotyping pipelines may reduce costs in an efficient computational manner. We conclude that exploring new models/tools using high-throughput phenotyping data along with large-scale envirotyping may bring more resolution and realism when predicting genotype performances. Despite the initial costs, mostly for genotyping, the GP platforms in combination with these other data sources can be a cost-effective approach for predicting the performance of maize hybrids for a large set of growing conditions.
Collapse
Affiliation(s)
- Roberto Fritsche-Neto
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Giovanni Galli
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Karina Lima Reis Borges
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Germano Costa-Neto
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Filipe Couto Alves
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, United States
| | - Felipe Sabadin
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Danilo Hottis Lyra
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpenden, United Kingdom
| | | | | | - Italo Granato
- Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux (LEPSE), Institut National de la Recherche Agronomique (INRA), Univ. Montpellier, SupAgro, Montpellier, France
| | - Jose Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz, Texcoco, Mexico
- Colegio de Posgraduado, Montecillo, Mexico
| |
Collapse
|
34
|
Atanda SA, Olsen M, Crossa J, Burgueño J, Rincent R, Dzidzienyo D, Beyene Y, Gowda M, Dreher K, Boddupalli PM, Tongoona P, Danquah EY, Olaoye G, Robbins KR. Scalable Sparse Testing Genomic Selection Strategy for Early Yield Testing Stage. FRONTIERS IN PLANT SCIENCE 2021; 12:658978. [PMID: 34239521 PMCID: PMC8259603 DOI: 10.3389/fpls.2021.658978] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 05/25/2021] [Indexed: 06/08/2023]
Abstract
To enable a scalable sparse testing genomic selection (GS) strategy at preliminary yield trials in the CIMMYT maize breeding program, optimal approaches to incorporate genotype by environment interaction (GEI) in genomic prediction models are explored. Two cross-validation schemes were evaluated: CV1, predicting the genetic merit of new bi-parental populations that have been evaluated in some environments and not others, and CV2, predicting the genetic merit of half of a bi-parental population that has been phenotyped in some environments and not others using the coefficient of determination (CDmean) to determine optimized subsets of a full-sib family to be evaluated in each environment. We report similar prediction accuracies in CV1 and CV2, however, CV2 has an intuitive appeal in that all bi-parental populations have representation across environments, allowing efficient use of information across environments. It is also ideal for building robust historical data because all individuals of a full-sib family have phenotypic data, albeit in different environments. Results show that grouping of environments according to similar growing/management conditions improved prediction accuracy and reduced computational requirements, providing a scalable, parsimonious approach to multi-environmental trials and GS in early testing stages. We further demonstrate that complementing the full-sib calibration set with optimized historical data results in improved prediction accuracy for the cross-validation schemes.
Collapse
Affiliation(s)
- Sikiru Adeniyi Atanda
- West Africa Center for Crop Improvement (WACCI), University of Ghana, Accra, Ghana
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, United States
| | - Michael Olsen
- International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Juan Burgueño
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Renaud Rincent
- French National Institute for Agriculture, Food, and Environment (INRAE), Paris, France
| | - Daniel Dzidzienyo
- West Africa Center for Crop Improvement (WACCI), University of Ghana, Accra, Ghana
| | - Yoseph Beyene
- International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Manje Gowda
- International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Kate Dreher
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - Pangirayi Tongoona
- West Africa Center for Crop Improvement (WACCI), University of Ghana, Accra, Ghana
| | | | - Gbadebo Olaoye
- Agronomy Department, University of Ilorin, Ilorin, Nigeria
| | - Kelly R. Robbins
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, United States
| |
Collapse
|
35
|
Puglisi D, Delbono S, Visioni A, Ozkan H, Kara İ, Casas AM, Igartua E, Valè G, Piero ARL, Cattivelli L, Tondelli A, Fricano A. Genomic Prediction of Grain Yield in a Barley MAGIC Population Modeling Genotype per Environment Interaction. FRONTIERS IN PLANT SCIENCE 2021; 12:664148. [PMID: 34108982 PMCID: PMC8183822 DOI: 10.3389/fpls.2021.664148] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 04/26/2021] [Indexed: 06/12/2023]
Abstract
Multi-parent Advanced Generation Inter-crosses (MAGIC) lines have mosaic genomes that are generated shuffling the genetic material of the founder parents following pre-defined crossing schemes. In cereal crops, these experimental populations have been extensively used to investigate the genetic bases of several traits and dissect the genetic bases of epistasis. In plants, genomic prediction models are usually fitted using either diverse panels of mostly unrelated accessions or individuals of biparental families and several empirical analyses have been conducted to evaluate the predictive ability of models fitted to these populations using different traits. In this paper, we constructed, genotyped and evaluated a barley MAGIC population of 352 individuals developed with a diverse set of eight founder parents showing contrasting phenotypes for grain yield. We combined phenotypic and genotypic information of this MAGIC population to fit several genomic prediction models which were cross-validated to conduct empirical analyses aimed at examining the predictive ability of these models varying the sizes of training populations. Moreover, several methods to optimize the composition of the training population were also applied to this MAGIC population and cross-validated to estimate the resulting predictive ability. Finally, extensive phenotypic data generated in field trials organized across an ample range of water regimes and climatic conditions in the Mediterranean were used to fit and cross-validate multi-environment genomic prediction models including G×E interaction, using both genomic best linear unbiased prediction and reproducing kernel Hilbert space along with a non-linear Gaussian Kernel. Overall, our empirical analyses showed that genomic prediction models trained with a limited number of MAGIC lines can be used to predict grain yield with values of predictive ability that vary from 0.25 to 0.60 and that beyond QTL mapping and analysis of epistatic effects, MAGIC population might be used to successfully fit genomic prediction models. We concluded that for grain yield, the single-environment genomic prediction models examined in this study are equivalent in terms of predictive ability while, in general, multi-environment models that explicitly split marker effects in main and environmental-specific effects outperform simpler multi-environment models.
Collapse
Affiliation(s)
- Damiano Puglisi
- Dipartimento di Agricoltura, Alimentazione e Ambiente (Di3A), Università di Catania, Catania, Italy
| | - Stefano Delbono
- Council for Agricultural Research and Economics–Research Centre for Genomics and Bioinformatics, Fiorenzuola d’Arda, Italy
| | - Andrea Visioni
- Biodiversity and Crop Improvement Program, International Center for Agricultural Research in the Dry Areas, Avenue Hafiane Cherkaoui, Rabat, Morocco
| | - Hakan Ozkan
- Department of Field Crops, Faculty of Agriculture, University of Cukurova, Adana, Turkey
| | - İbrahim Kara
- Bahri Dagdas International Agricultural Research Institute, Konya, Turkey
| | - Ana M. Casas
- Aula Dei Experimental Station (EEAD-CSIC), Spanish Research Council, Zaragoza, Spain
| | - Ernesto Igartua
- Aula Dei Experimental Station (EEAD-CSIC), Spanish Research Council, Zaragoza, Spain
| | - Giampiero Valè
- DiSIT, Dipartimento di Scienze e Innovazione Tecnologica, Università del Piemonte Orientale, Vercelli, Italy
| | - Angela Roberta Lo Piero
- Dipartimento di Agricoltura, Alimentazione e Ambiente (Di3A), Università di Catania, Catania, Italy
| | - Luigi Cattivelli
- Council for Agricultural Research and Economics–Research Centre for Genomics and Bioinformatics, Fiorenzuola d’Arda, Italy
| | - Alessandro Tondelli
- Council for Agricultural Research and Economics–Research Centre for Genomics and Bioinformatics, Fiorenzuola d’Arda, Italy
| | - Agostino Fricano
- Council for Agricultural Research and Economics–Research Centre for Genomics and Bioinformatics, Fiorenzuola d’Arda, Italy
| |
Collapse
|
36
|
Akdemir D, Rio S, Isidro y Sánchez J. TrainSel: An R Package for Selection of Training Populations. Front Genet 2021; 12:655287. [PMID: 34025720 PMCID: PMC8138169 DOI: 10.3389/fgene.2021.655287] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 03/31/2021] [Indexed: 01/01/2023] Open
Abstract
A major barrier to the wider use of supervised learning in emerging applications, such as genomic selection, is the lack of sufficient and representative labeled data to train prediction models. The amount and quality of labeled training data in many applications is usually limited and therefore careful selection of the training examples to be labeled can be useful for improving the accuracies in predictive learning tasks. In this paper, we present an R package, TrainSel, which provides flexible, efficient, and easy-to-use tools that can be used for the selection of training populations (STP). We illustrate its use, performance, and potentials in four different supervised learning applications within and outside of the plant breeding area.
Collapse
Affiliation(s)
- Deniz Akdemir
- Agriculture & Food Science Centre, Animal and Crop Science Division, University College Dublin, Dublin, Ireland
| | - Simon Rio
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Julio Isidro y Sánchez
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Madrid, Spain
| |
Collapse
|
37
|
David O, Le Rouzic A, Dillmann C. Optimization of sampling designs for pedigrees and association studies. Biometrics 2021; 78:1056-1066. [PMID: 33876835 DOI: 10.1111/biom.13476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 03/10/2021] [Accepted: 04/02/2021] [Indexed: 11/29/2022]
Abstract
In many studies, related individuals are phenotyped in order to infer how their genotype contributes to their phenotype, through the estimation of parameters such as breeding values or locus effects. When it is not possible to phenotype all the individuals, it is important to properly sample the population to improve the precision of the statistical analysis. This article studies how to optimize such sampling designs for pedigrees and association studies. Two sampling methods are developed, stratified sampling and D optimality. It is found that it is important to take account of mutation when sampling pedigrees with many generations: as the size of mutation effects increases, optimized designs sample more individuals in late generations. Optimized designs for association studies tend to improve the joint estimation of breeding values and locus effects, all the more as sample size is low and the genetic architecture of the trait is simple. When the trait is determined by few loci, they are reminiscent of classical experimental designs for regression models and tend to select homozygous individuals. When the trait is determined by many loci, locus effects may be difficult to estimate, even if an optimized design is used.
Collapse
Affiliation(s)
- Olivier David
- Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France
| | - Arnaud Le Rouzic
- Université Paris-Saclay, CNRS, IRD, Évolution, Génomes, Comportement, Écologie, 91198, Gif-sur-Yvette, France
| | - Christine Dillmann
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
| |
Collapse
|
38
|
Lopez-Cruz M, de Los Campos G. Optimal breeding-value prediction using a sparse selection index. Genetics 2021; 218:6179494. [PMID: 33748861 PMCID: PMC8128408 DOI: 10.1093/genetics/iyab030] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 02/13/2021] [Indexed: 02/06/2023] Open
Abstract
Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a sparse selection index (SSI) that integrates selection index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-Best Linear Unbiased Predictor (G-BLUP) (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in 10 different environments) that the SSI can achieve significant (anywhere between 5 and 10%) gains in prediction accuracy relative to the G-BLUP.
Collapse
Affiliation(s)
- Marco Lopez-Cruz
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA
| | - Gustavo de Los Campos
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA.,Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA.,Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
39
|
Beche E, Gillman JD, Song Q, Nelson R, Beissinger T, Decker J, Shannon G, Scaboo AM. Genomic prediction using training population design in interspecific soybean populations. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2021; 41:15. [PMID: 37309481 PMCID: PMC10236090 DOI: 10.1007/s11032-021-01203-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 01/11/2021] [Indexed: 06/14/2023]
Abstract
Agronomically important traits generally have complex genetic architecture, where many genes have a small and largely additive effect. Genomic prediction has been demonstrated to increase genetic gain and efficiency in plant breeding programs beyond marker-assisted selection and phenotypic selection. The objective of this study was to evaluate the impact of allelic origin, marker density, training population size, and cross-validation schemes on the accuracy of genomic prediction models in an interspecific soybean nested association mapping (NAM) panel. Three cross-validation schemes were used: (a) Within-Family (WF): training population and predictions are made exclusively within each family; (b) Across All families (AF): all the individuals from the three families were randomly assigned to either the training or validation set; (c) Leave one Family out (LFO): each family is predicted using a training set that contains the other two families. Predictive abilities increased with training population size up to 350 individuals, but no significant gains were noted beyond 250 individuals in the training population. The number of markers had a limited impact on the observed predictive ability across traits; increasing markers used in the model above 1000 revealed no significant increases in prediction accuracy. Predictive abilities for AF were not significantly different from the WF method, and predictive abilities across populations for the WF method had a range of 0.58 to 0.70 for maturity, protein, meal, and oil. Our results also showed encouraging prediction accuracies for grain yield (0.58-0.69) using the WF method. Partitioning genomic prediction between G. max and G. soja alleles revealed useful information to select material with a larger allele contribution from both parents and could accelerate allele introgression from exotic germplasm into the elite soybean gene pool. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-021-01203-6.
Collapse
Affiliation(s)
- Eduardo Beche
- Division of Plant Science, University of Missouri, Columbia, MO USA
| | | | - Qijian Song
- Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD USA
| | - Randall Nelson
- Department of Crop Sciences, University of Illinois, and USDA-Agricultural Research Service (retired), 1101 W. Peabody Dr., Urbana, IL 61801 USA
| | - Tim Beissinger
- Division of Plant Breeding Methodology, Department of Crop Sciences, Georg-August-Universität, Göttingen, Germany
| | - Jared Decker
- Division of Animal Science, University of Missouri, Columbia, MO USA
| | - Grover Shannon
- Division of Plant Science, University of Missouri, Columbia, MO USA
| | - Andrew M. Scaboo
- Division of Plant Science, University of Missouri, Columbia, MO USA
| |
Collapse
|
40
|
Kadam DC, Rodriguez OR, Lorenz AJ. Optimization of training sets for genomic prediction of early-stage single crosses in maize. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:687-699. [PMID: 33398385 DOI: 10.1007/s00122-020-03722-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 11/03/2020] [Indexed: 06/12/2023]
Abstract
Training population optimization algorithms are useful for efficiently training genomic prediction models for single-cross performance, especially if the population is extended beyond only realized crosses to all possible single crosses. Genomic prediction of single-cross performance could allow effective evaluation of all possible single crosses between all inbreds developed in a hybrid breeding program. The objectives of the present study were to investigate the effect of different levels of relatedness on genomic predictive ability of single crosses, evaluate the usefulness of deterministic formula to forecast prediction accuracy in advance, and determine the potential for TRS optimization based on prediction error variance (PEVmean) and coefficient of determination (CDmean) criteria. We used 481 single crosses made by crossing 89 random recombinant inbred lines (RILs) belonging to the Iowa stiff stalk synthetic group with 103 random RILs belonging to the non-stiff stalk synthetic heterotic group. As expected, predictive ability was enhanced by ensuring close relationships between TRSs and target sets, even when TRS sizes were smaller. We found that designing a TRS based on PEVmean or CDmean criteria is useful for increasing the efficiency of genomic prediction of maize single crosses. We went further and extended the sampling space from that of all observed single crosses to all possible single crosses, providing a much larger genetic space within which to design a training population. Using all possible single crosses increased the advantage of the PEVmean and CDmean methods based on expected prediction accuracy. This finding suggests that it may be worthwhile using an optimization algorithm to select a training population from all possible single crosses to maximize efficiency in training accurate models for hybrid genomic prediction.
Collapse
Affiliation(s)
- Dnyaneshwar C Kadam
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Oscar R Rodriguez
- Department of Agronomy and Horticulture, University of Nebraska, Lincoln, NE, 68583, USA
| | - Aaron J Lorenz
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA.
| |
Collapse
|
41
|
Yu X, Leiboff S, Li X, Guo T, Ronning N, Zhang X, Muehlbauer GJ, Timmermans MC, Schnable PS, Scanlon MJ, Yu J. Genomic prediction of maize microphenotypes provides insights for optimizing selection and mining diversity. PLANT BIOTECHNOLOGY JOURNAL 2020; 18:2456-2465. [PMID: 32452105 PMCID: PMC7680549 DOI: 10.1111/pbi.13420] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Revised: 05/05/2020] [Accepted: 05/13/2020] [Indexed: 05/25/2023]
Abstract
Effective evaluation of millions of crop genetic stocks is an essential component of exploiting genetic diversity to achieve global food security. By leveraging genomics and data analytics, genomic prediction is a promising strategy to efficiently explore the potential of these gene banks by starting with phenotyping a small designed subset. Reliable genomic predictions have enhanced selection of many macroscopic phenotypes in plants and animals. However, the use of genomicprediction strategies for analysis of microscopic phenotypes is limited. Here, we exploited the power of genomic prediction for eight maize traits related to the shoot apical meristem (SAM), the microscopic stem cell niche that generates all the above-ground organs of the plant. With 435 713 genomewide single-nucleotide polymorphisms (SNPs), we predicted SAM morphology traits for 2687 diverse maize inbreds based on a model trained from 369 inbreds. An empirical validation experiment with 488 inbreds obtained a prediction accuracy of 0.37-0.57 across eight traits. In addition, we show that a significantly higher prediction accuracy was achieved by leveraging the U value (upper bound for reliability) that quantifies the genomic relationships of the validation set with the training set. Our findings suggest that double selection considering both prediction and reliability can be implemented in choosing selection candidates for phenotyping when exploring new diversity is desired. In this case, individuals with less extreme predicted values and moderate reliability values can be considered. Our study expands the turbocharging gene banks via genomic prediction from the macrophenotypes into the microphenotypic space.
Collapse
Affiliation(s)
- Xiaoqing Yu
- Department of AgronomyIowa State UniversityAmesIAUSA
| | - Samuel Leiboff
- Plant Biology SectionSchool of Integrative Plant ScienceCornell UniversityIthacaNYUSA
| | - Xianran Li
- Department of AgronomyIowa State UniversityAmesIAUSA
| | - Tingting Guo
- Department of AgronomyIowa State UniversityAmesIAUSA
| | - Natalie Ronning
- Plant Biology SectionSchool of Integrative Plant ScienceCornell UniversityIthacaNYUSA
| | - Xiaoyu Zhang
- Department of Plant BiologyUniversity of GeorgiaAthensGAUSA
| | - Gary J. Muehlbauer
- Department of Agronomy and Plant GeneticsUniversity of MinnesotaSt. PaulMNUSA
| | | | | | - Michael J. Scanlon
- Plant Biology SectionSchool of Integrative Plant ScienceCornell UniversityIthacaNYUSA
| | - Jianming Yu
- Department of AgronomyIowa State UniversityAmesIAUSA
| |
Collapse
|
42
|
Pégard M, Segura V, Muñoz F, Bastien C, Jorge V, Sanchez L. Favorable Conditions for Genomic Evaluation to Outperform Classical Pedigree Evaluation Highlighted by a Proof-of-Concept Study in Poplar. FRONTIERS IN PLANT SCIENCE 2020; 11:581954. [PMID: 33193528 PMCID: PMC7655903 DOI: 10.3389/fpls.2020.581954] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 09/22/2020] [Indexed: 06/11/2023]
Abstract
Forest trees like poplar are particular in many ways compared to other domesticated species. They have long juvenile phases, ongoing crop-wild gene flow, extensive outcrossing, and slow growth. All these particularities tend to make the conduction of breeding programs and evaluation stages costly both in time and resources. Perennials like trees are therefore good candidates for the implementation of genomic selection (GS) which is a good way to accelerate the breeding process, by unchaining selection from phenotypic evaluation without affecting precision. In this study, we tried to compare GS to pedigree-based traditional evaluation, and evaluated under which conditions genomic evaluation outperforms classical pedigree evaluation. Several conditions were evaluated as the constitution of the training population by cross-validation, the implementation of multi-trait, single trait, additive and non-additive models with different estimation methods (G-BLUP or weighted G-BLUP). Finally, the impact of the marker densification was tested through four marker density sets. The population under study corresponds to a pedigree of 24 parents and 1,011 offspring, structured into 35 full-sib families. Four evaluation batches were planted in the same location and seven traits were evaluated on 1 and 2 years old trees. The quality of prediction was reported by the accuracy, the Spearman rank correlation and prediction bias and tested with a cross-validation and an independent individual test set. Our results show that genomic evaluation performance could be comparable to the already well-optimized pedigree-based evaluation under certain conditions. Genomic evaluation appeared to be advantageous when using an independent test set and a set of less precise phenotypes. Genome-based methods showed advantages over pedigree counterparts when ranking candidates at the within-family levels, for most of the families. Our study also showed that looking at ranking criteria as Spearman rank correlation can reveal benefits to genomic selection hidden by biased predictions.
Collapse
Affiliation(s)
| | - Vincent Segura
- BioForA, INRA, ONF, Orléans, France
- AGAP, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | | | | | | | | |
Collapse
|
43
|
Heslot N, Feoktistov V. Optimization of Selective Phenotyping and Population Design for Genomic Prediction. JOURNAL OF AGRICULTURAL, BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2020. [DOI: 10.1007/s13253-020-00415-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
44
|
Roth M, Muranty H, Di Guardo M, Guerra W, Patocchi A, Costa F. Genomic prediction of fruit texture and training population optimization towards the application of genomic selection in apple. HORTICULTURE RESEARCH 2020; 7:148. [PMID: 32922820 PMCID: PMC7459338 DOI: 10.1038/s41438-020-00370-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 07/18/2020] [Accepted: 07/24/2020] [Indexed: 05/11/2023]
Abstract
Texture is a complex trait and a major component of fruit quality in apple. While the major effect of MdPG1, a gene controlling firmness, has already been exploited in elite cultivars, the genetic basis of crispness remains poorly understood. To further improve fruit texture, harnessing loci with minor effects via genomic selection is therefore necessary. In this study, we measured acoustic and mechanical features in 537 genotypes to dissect the firmness and crispness components of fruit texture. Predictions of across-year phenotypic values for these components were calculated using a model calibrated with 8,294 SNP markers. The best prediction accuracies following cross-validations within the training set of 259 genotypes were obtained for the acoustic linear distance (0.64). Predictions for biparental families using the entire training set varied from low to high accuracy, depending on the family considered. While adding siblings or half-siblings into the training set did not clearly improve predictions, we performed an optimization of the training set size and composition for each validation set. This allowed us to increase prediction accuracies by 0.17 on average, with a maximal accuracy of 0.81 when predicting firmness in the 'Gala' × 'Pink Lady' family. Our results therefore identified key genetic parameters to consider when deploying genomic selection for texture in apple. In particular, we advise to rely on a large training population, with high phenotypic variability from which a 'tailored training population' can be extracted using a priori information on genetic relatedness, in order to predict a specific target population.
Collapse
Affiliation(s)
- Morgane Roth
- Plant Breeding Research Division, Agroscope, Wädenswil, Zurich, Switzerland
- Present Address: GAFL, INRAE, 84140 Montfavet, France
| | - Hélène Muranty
- IRHS, INRAE, Agrocampus-Ouest, Université d’Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - Mario Di Guardo
- Department of Genomics and Biology of Fruit Crops, Research and Innovation Centre, Fondazione Edmund Mach (FEM), Via E. Mach 1, 38010 San Michele all’Adige, Italy
- Department of Agriculture, Food and Environment (Di3A), University of Catania, Catania, Italy
| | - Walter Guerra
- Research Centre Laimburg, Laimburg 6, 39040 Auer, Italy
| | - Andrea Patocchi
- Plant Breeding Research Division, Agroscope, Wädenswil, Zurich, Switzerland
| | - Fabrizio Costa
- Department of Genomics and Biology of Fruit Crops, Research and Innovation Centre, Fondazione Edmund Mach (FEM), Via E. Mach 1, 38010 San Michele all’Adige, Italy
- Center Agriculture Food Environment, University of Trento, Via Mach 1, 38010 San Michele all’Adige, Italy
| |
Collapse
|
45
|
Preservation of Genetic Variation in a Breeding Population for Long-Term Genetic Gain. G3-GENES GENOMES GENETICS 2020; 10:2753-2762. [PMID: 32513654 PMCID: PMC7407475 DOI: 10.1534/g3.120.401354] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Genomic selection has been successfully implemented in plant and animal breeding. The transition of parental selection based on phenotypic characteristics to genomic selection (GS) has reduced breeding time and cost while accelerating the rate of genetic progression. Although breeding methods have been adapted to include genomic selection, parental selection often involves truncation selection, selecting the individuals with the highest genomic estimated breeding values (GEBVs) in the hope that favorable properties will be passed to their offspring. This ensures genetic progression and delivers offspring with high genetic values. However, several favorable quantitative trait loci (QTL) alleles risk being eliminated from the breeding population during breeding. We show that this could reduce the mean genetic value that the breeding population could reach in the long term with up to 40%. In this paper, by means of a simulation study, we propose a new method for parental mating that is able to preserve the genetic variation in the breeding population, preventing premature convergence of the genetic values to a local optimum, thus maximizing the genetic values in the long term. We do not only prevent the fixation of several unfavorable QTL alleles, but also demonstrate that the genetic values can be increased by up to 15 percentage points compared with truncation selection.
Collapse
|
46
|
Verges VL, Lyerly J, Dong Y, Van Sanford DA. Training Population Design With the Use of Regional Fusarium Head Blight Nurseries to Predict Independent Breeding Lines for FHB Traits. FRONTIERS IN PLANT SCIENCE 2020; 11:1083. [PMID: 32765564 PMCID: PMC7381120 DOI: 10.3389/fpls.2020.01083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Accepted: 06/30/2020] [Indexed: 06/11/2023]
Abstract
Fusarium head blight (FHB) is a devastating disease in cereals around the world. Because it is quantitatively inherited and technically difficult to reproduce, breeding to increase resistance in wheat germplasm is difficult and slow. Genomic selection (GS) is a form of marker-assisted selection (MAS) that simultaneously estimates all locus, haplotype, or marker effects across the entire genome to calculate genomic estimated breeding values (GEBVs). Since its inception, there have been many studies that demonstrate the utility of GS approaches to breeding for disease resistance in crops. In this study, the Uniform Northern (NUS) and Uniform Southern (SUS) soft red winter wheat scab nurseries (a total 452 lines) were evaluated as possible training populations (TP) to predict FHB traits in breeding lines of the UK (University of Kentucky) wheat breeding program. DON was best predicted by the SUS; Fusarium damaged kernels (FDK), FHB rating, and two indices, DSK index and DK index were best predicted by NUS. The highest prediction accuracies were obtained when the NUS and SUS were combined, reaching up to 0.5 for almost all traits except FHB rating. Highest prediction accuracies were obtained with bigger TP sizes (300-400) and there were not significant effects of TP optimization method for all traits, although at small TP size, the PEVmean algorithm worked better than other methods. To select for lines with tolerance to DON accumulation, a primary breeding target for many breeders, we compared selection based on DON BLUES with selection based on DON GEBVs, DSK GEBVs, and DK GEBVs. At selection intensities (SI) of 30-40%, DSK index showed the best performance with a 4-6% increase over direct selection for DON. Our results confirm the usefulness of regional nurseries as a source of lines to predict GEBVs for local breeding programs, and shows that an index that includes DON, together with FDK and FHB rating could be an excellent choice to identify lines with low DON content and an overall improved FHB resistance.
Collapse
Affiliation(s)
- Virginia L. Verges
- Department of Plant and Soil Sciences, University of Kentucky, Lexington, KY, United States
| | - Jeanette Lyerly
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, United States
| | - Yanhong Dong
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, United States
| | - David A. Van Sanford
- Department of Plant and Soil Sciences, University of Kentucky, Lexington, KY, United States
| |
Collapse
|
47
|
Sallam AH, Conley E, Prakapenka D, Da Y, Anderson JA. Improving Prediction Accuracy Using Multi-allelic Haplotype Prediction and Training Population Optimization in Wheat. G3 (BETHESDA, MD.) 2020; 10:2265-2273. [PMID: 32371453 PMCID: PMC7341132 DOI: 10.1534/g3.120.401165] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 04/29/2020] [Indexed: 02/01/2023]
Abstract
The use of haplotypes may improve the accuracy of genomic prediction over single SNPs because haplotypes can better capture linkage disequilibrium and genomic similarity in different lines and may capture local high-order allelic interactions. Additionally, prediction accuracy could be improved by portraying population structure in the calibration set. A set of 383 advanced lines and cultivars that represent the diversity of the University of Minnesota wheat breeding program was phenotyped for yield, test weight, and protein content and genotyped using the Illumina 90K SNP Assay. Population structure was confirmed using single SNPs. Haplotype blocks of 5, 10, 15, and 20 adjacent markers were constructed for all chromosomes. A multi-allelic haplotype prediction algorithm was implemented and compared with single SNPs using both k-fold cross validation and stratified sampling optimization. After confirming population structure, the stratified sampling improved the predictive ability compared with k-fold cross validation for yield and protein content, but reduced the predictive ability for test weight. In all cases, haplotype predictions outperformed single SNPs. Haplotypes of 15 adjacent markers showed the best improvement in accuracy for all traits; however, this was more pronounced in yield and protein content. The combined use of haplotypes of 15 adjacent markers and training population optimization significantly improved the predictive ability for yield and protein content by 14.3 (four percentage points) and 16.8% (seven percentage points), respectively, compared with using single SNPs and k-fold cross validation. These results emphasize the effectiveness of using haplotypes in genomic selection to increase genetic gain in self-fertilized crops.
Collapse
Affiliation(s)
| | - Emily Conley
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108
| | | | - Yang Da
- Department of Animal Science, and
| | - James A Anderson
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108
| |
Collapse
|
48
|
Ben-Sadoun S, Rincent R, Auzanneau J, Oury FX, Rolland B, Heumez E, Ravel C, Charmet G, Bouchet S. Economical optimization of a breeding scheme by selective phenotyping of the calibration set in a multi-trait context: application to bread making quality. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2020; 133:2197-2212. [PMID: 32303775 DOI: 10.1007/s00122-020-03590-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 03/31/2020] [Indexed: 05/27/2023]
Abstract
Trait-assisted genomic prediction approach is a way to improve genetic gain by cost unit, by reducing budget allocated to phenotyping or by increasing the program's size for the same budget. This study compares different strategies of genomic prediction to optimize resource allocation in breeding schemes by using information from cheaper correlated traits to predict a more expensive trait of interest. We used bread wheat baking score (BMS) calculated for French registration as a case study. To conduct this project, 398 lines from a public breeding program were genotyped and phenotyped for BMS and correlated traits in 11 locations in France between 2000 and 2016. Single-trait (ST), multi-trait (MT) and trait-assisted (TA) strategies were compared in terms of predictive ability and cost. In MT and TA strategies, information from dough strength (W), a cheaper trait correlated with BMS (r = 0.45), was evaluated in the training population or in both the training and the validation sets, respectively. TA models allowed to reduce the budget allocated to phenotyping by up to 65% while maintaining the predictive ability of BMS. TA models also improved the predictive ability of BMS compared to ST models for a fixed budget (maximum gain: + 0.14 in cross-validation and + 0.21 in forward prediction). We also demonstrated that the budget can be further reduced by approximately one fourth while maintaining the same predictive ability by reducing the number of phenotypic records to estimate BMS adjusted means. In addition, we showed that the choice of the lines to be phenotyped can be optimized to minimize cost or maximize predictive ability. To do so, we extended the mean of the generalized coefficient of determination (CDmean) criterion to the multi-trait context (CDmulti).
Collapse
Affiliation(s)
- S Ben-Sadoun
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - R Rincent
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - J Auzanneau
- Agri-Obtentions, Ferme de Gauvilliers, 78660, Orsonville, France
| | - F X Oury
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - B Rolland
- INRAE-Agrocampus Ouest-Université Rennes 1, UMR 1349, IGEPP, BP 35327, 35653, Le Rheu Cedex, France
| | - E Heumez
- INRAE-UE Lille, 2 chaussée Brunehaut, Estrées-Mons, BP 50136, 80203, Peronne Cedex, France
| | - C Ravel
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - G Charmet
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - S Bouchet
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France.
| |
Collapse
|
49
|
Seye AI, Bauland C, Charcosset A, Moreau L. Revisiting hybrid breeding designs using genomic predictions: simulations highlight the superiority of incomplete factorials between segregating families over topcross designs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2020; 133:1995-2010. [PMID: 32185420 DOI: 10.1007/s00122-020-03573-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 02/28/2020] [Indexed: 06/10/2023]
Abstract
Simulations showed that hybrid performances issued from an incomplete factorial between segregating families of two heterotic groups enable to calibrate genomic predictions of hybrid value more efficiently than tester-based designs. Genomic selection offers new opportunities to revisit hybrid breeding by replacing extensive phenotyping of hybrid combinations by genomic predictions. A key question remains to identify the best design to calibrate genomic prediction models. We proposed to use single-cross hybrids issued from an incomplete factorial design between segregating populations and compared this strategy with a conventional approach based on topcross evaluation. Two multiparental segregating populations of lines, each specific of one heterotic group, were simulated. Hybrids considered as training sets were generated using either (1) a parental line from the opposite group as tester or (2) following an incomplete factorial design. Different specific combining ability (SCA) proportions were simulated by considering different levels of group divergence and dominance effects for the simulated QTL. For the incomplete factorial design, for a same number of hybrids, we considered different numbers of parental lines and different contributions of lines (one to four) to calibration hybrids. We evaluated for different training set sizes prediction accuracies of new hybrids and genetic gains along three generations. At a given training set size, factorial design was as efficient (considering accuracy) as tester design in additive scenarios, but significantly outperformed tester design when SCA was present. The contribution number of each parental line to the incomplete factorial design had a small impact on accuracies. Our simulations confirmed experimental results and showed that calibrating models on hybrids between two multiparental populations is a cost-efficient way to perform genomic predictions in both groups, opening prospects for revisiting reciprocal recurrent selection schemes.
Collapse
Affiliation(s)
- A I Seye
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, 91190, Gif-sur-Yvette, France
| | - C Bauland
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, 91190, Gif-sur-Yvette, France
| | - A Charcosset
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, 91190, Gif-sur-Yvette, France
| | - L Moreau
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, 91190, Gif-sur-Yvette, France.
| |
Collapse
|
50
|
Abstract
We developed an integrated R library called BWGS to enable easy computation of Genomic Estimates of Breeding values (GEBV) for genomic selection. BWGS, for BreedWheat Genomic selection, was developed in the framework of a cooperative private-public partnership project called Breedwheat (https://breedwheat.fr) and relies on existing R-libraries, all freely available from CRAN servers. The two main functions enable to run 1) replicated random cross validations within a training set of genotyped and phenotyped lines and 2) GEBV prediction, for a set of genotyped-only lines. Options are available for 1) missing data imputation, 2) markers and training set selection and 3) genomic prediction with 15 different methods, either parametric or semi-parametric. The usefulness and efficiency of BWGS are illustrated using a population of wheat lines from a real breeding programme. Adjusted yield data from historical trials (highly unbalanced design) were used for testing the options of BWGS. On the whole, 760 candidate lines with adjusted phenotypes and genotypes for 47 839 robust SNP were used. With a simple desktop computer, we obtained results which compared with previously published results on wheat genomic selection. As predicted by the theory, factors that are most influencing predictive ability, for a given trait of moderate heritability, are the size of the training population and a minimum number of markers for capturing every QTL information. Missing data up to 40%, if randomly distributed, do not degrade predictive ability once imputed, and up to 80% randomly distributed missing data are still acceptable once imputed with Expectation-Maximization method of package rrBLUP. It is worth noticing that selecting markers that are most associated to the trait do improve predictive ability, compared with the whole set of markers, but only when marker selection is made on the whole population. When marker selection is made only on the sampled training set, this advantage nearly disappeared, since it was clearly due to overfitting. Few differences are observed between the 15 prediction models with this dataset. Although non-parametric methods that are supposed to capture non-additive effects have slightly better predictive accuracy, differences remain small. Finally, the GEBV from the 15 prediction models are all highly correlated to each other. These results are encouraging for an efficient use of genomic selection in applied breeding programmes and BWGS is a simple and powerful toolbox to apply in breeding programmes or training activities.
Collapse
|