1
|
Couto EGO, Chaves SFS, Dias KOG, Morales-Marroquín JA, Alves-Pereira A, Motoike SY, Colombo CA, Zucchi MI. Training set optimization is a feasible alternative for perennial orphan crop domestication and germplasm management: an Acrocomia aculeata example. FRONTIERS IN PLANT SCIENCE 2024; 15:1441683. [PMID: 39323537 PMCID: PMC11423296 DOI: 10.3389/fpls.2024.1441683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 08/14/2024] [Indexed: 09/27/2024]
Abstract
Orphan perennial native species are gaining importance as sustainability in agriculture becomes crucial to mitigate climate change. Nevertheless, issues related to the undomesticated status and lack of improved germplasm impede the evolution of formal agricultural initiatives. Acrocomia aculeata - a neotropical palm with potential for oil production - is an example. Breeding efforts can aid the species to reach its full potential and increase market competitiveness. Here, we present genomic information and training set optimization as alternatives to boost orphan perennial native species breeding using Acrocomia aculeata as an example. Furthermore, we compared three SNP calling methods and, for the first time, presented the prediction accuracies of three yield-related traits. We collected data for two years from 201 wild individuals. These trees were genotyped, and three references were used for SNP calling: the oil palm genome, de novo sequencing, and the A. aculeata transcriptome. The traits analyzed were fruit dry mass (FDM), pulp dry mass (PDM), and pulp oil content (OC). We compared the predictive ability of GBLUP and BayesB models in cross- and real validation procedures. Afterwards, we tested several optimization criteria regarding consistency and the ability to provide the optimized training set that yielded less risk in both targeted and untargeted scenarios. Using the oil palm genome as a reference and GBLUP models had better results for the genomic prediction of FDM, OC, and PDM (prediction accuracies of 0.46, 0.45, and 0.39, respectively). Using the criteria PEV, r-score and core collection methodology provides risk-averse decisions. Training set optimization is an alternative to improve decision-making while leveraging genomic information as a cost-saving tool to accelerate plant domestication and breeding. The optimized training set can be used as a reference for the characterization of native species populations, aiding in decisions involving germplasm collection and construction of breeding populations.
Collapse
Affiliation(s)
| | | | | | | | - Alessandro Alves-Pereira
- Genetics and Molecular Biology Department, Biology Institute, University of Campinas (UNICAMP), Campinas, Brazil
| | | | - Carlos Augusto Colombo
- Research Center of Plant Genetic Resources, Campinas Agronomic Institute, Campinas, Brazil
| | - Maria Imaculada Zucchi
- Department of Genetics, "Luiz de Queiroz" College of Agriculture, University of São Paulo, Piracicaba, Brazil
| |
Collapse
|
2
|
Adunola P, Ferrão LFV, Benevenuto J, Azevedo CF, Munoz PR. Genomic selection optimization in blueberry: Data-driven methods for marker and training population design. THE PLANT GENOME 2024:e20488. [PMID: 39087863 DOI: 10.1002/tpg2.20488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/25/2024] [Accepted: 06/04/2024] [Indexed: 08/02/2024]
Abstract
Genomic prediction is a modern approach that uses genome-wide markers to predict the genetic merit of unphenotyped individuals. With the potential to reduce the breeding cycles and increase the selection accuracy, this tool has been designed to rank genotypes and maximize genetic gains. Despite this importance, its practical implementation in breeding programs requires critical allocation of resources for its application in a predictive framework. In this study, we integrated genetic and data-driven methods to allocate resources for phenotyping and genotyping tailored to genomic prediction. To this end, we used a historical blueberry (Vaccinium corymbosun L.) breeding dataset containing more than 3000 individuals, genotyped using probe-based target sequencing and phenotyped for three fruit quality traits over several years. Our contribution in this study is threefold: (i) for the genotyping resource allocation, the use of genetic data-driven methods to select an optimal set of markers slightly improved prediction results for all the traits; (ii) for the long-term implication, we carried out a simulation study and emphasized that data-driven method results in a slight improvement in genetic gain over 30 cycles than random marker sampling; and (iii) for the phenotyping resource allocation, we compared different optimization algorithms to select training population, showing that it can be leveraged to increase predictive performances. Altogether, we provided a data-oriented decision-making approach for breeders by demonstrating that critical breeding decisions associated with resource allocation for genomic prediction can be tackled through a combination of statistics and genetic methods.
Collapse
Affiliation(s)
- Paul Adunola
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, Florida, USA
| | - Luis Felipe V Ferrão
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, Florida, USA
| | - Juliana Benevenuto
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, Florida, USA
| | - Camila F Azevedo
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, Florida, USA
- Statistics Department, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | - Patricio R Munoz
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
3
|
Alemu A, Åstrand J, Montesinos-López OA, Isidro Y Sánchez J, Fernández-Gónzalez J, Tadesse W, Vetukuri RR, Carlsson AS, Ceplitis A, Crossa J, Ortiz R, Chawade A. Genomic selection in plant breeding: Key factors shaping two decades of progress. MOLECULAR PLANT 2024; 17:552-578. [PMID: 38475993 DOI: 10.1016/j.molp.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/14/2024]
Abstract
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.
Collapse
Affiliation(s)
- Admas Alemu
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Johanna Åstrand
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden; Lantmännen Lantbruk, Svalöv, Sweden
| | | | - Julio Isidro Y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Javier Fernández-Gónzalez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Wuletaw Tadesse
- International Center for Agricultural Research in the Dry Areas (ICARDA), Rabat, Morocco
| | - Ramesh R Vetukuri
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - Anders S Carlsson
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco, México 52640, Mexico
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| |
Collapse
|
4
|
Fernández-González J, Haquin B, Combes E, Bernard K, Allard A, Isidro Y Sánchez J. Maximizing efficiency in sunflower breeding through historical data optimization. PLANT METHODS 2024; 20:42. [PMID: 38493115 PMCID: PMC10943787 DOI: 10.1186/s13007-024-01151-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 01/30/2024] [Indexed: 03/18/2024]
Abstract
Genomic selection (GS) has become an increasingly popular tool in plant breeding programs, propelled by declining genotyping costs, an increase in computational power, and rediscovery of the best linear unbiased prediction methodology over the past two decades. This development has led to an accumulation of extensive historical datasets with genotypic and phenotypic information, triggering the question of how to best utilize these datasets. Here, we investigate whether all available data or a subset should be used to calibrate GS models for across-year predictions in a 7-year dataset of a commercial hybrid sunflower breeding program. We employed a multi-objective optimization approach to determine the ideal years to include in the training set (TRS). Next, for a given combination of TRS years, we further optimized the TRS size and its genetic composition. We developed the Min_GRM size optimization method which consistently found the optimal TRS size, reducing dimensionality by 20% with an approximately 1% loss in predictive ability. Additionally, the Tails_GEGVs algorithm displayed potential, outperforming the use of all data by using just 60% of it for grain yield, a high-complexity, low-heritability trait. Moreover, maximizing the genetic diversity of the TRS resulted in a consistent predictive ability across the entire range of genotypic values in the test set. Interestingly, the Tails_GEGVs algorithm, due to its ability to leverage heterogeneity, enhanced predictive performance for key hybrids with extreme genotypic values. Our study provides new insights into the optimal utilization of historical data in plant breeding programs, resulting in improved GS model predictive ability.
Collapse
Affiliation(s)
- Javier Fernández-González
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Campus de Montegancedo-UPM, Pozuelo de Alarcón, Madrid, 28223, Spain.
| | | | | | | | | | - Julio Isidro Y Sánchez
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Campus de Montegancedo-UPM, Pozuelo de Alarcón, Madrid, 28223, Spain.
| |
Collapse
|
5
|
de Verdal H, Baertschi C, Frouin J, Quintero C, Ospina Y, Alvarez MF, Cao TV, Bartholomé J, Grenier C. Optimization of Multi-Generation Multi-location Genomic Prediction Models for Recurrent Genomic Selection in an Upland Rice Population. RICE (NEW YORK, N.Y.) 2023; 16:43. [PMID: 37758969 PMCID: PMC10533757 DOI: 10.1186/s12284-023-00661-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 09/19/2023] [Indexed: 09/29/2023]
Abstract
Genomic selection is a worthy breeding method to improve genetic gain in recurrent selection breeding schemes. The integration of multi-generation and multi-location information could significantly improve genomic prediction models in the context of shuttle breeding. The Cirad-CIAT upland rice breeding program applies recurrent genomic selection and seeks to optimize the scheme to increase genetic gain while reducing phenotyping efforts. We used a synthetic population (PCT27) of which S0 plants were all genotyped and advanced by selfing and bulk seed harvest to the S0:2, S0:3, and S0:4 generations. The PCT27 was then divided into two sets. The S0:2 and S0:3 progenies for PCT27A and the S0:4 progenies for PCT27B were phenotyped in two locations: Santa Rosa the target selection location, within the upland rice growing area, and Palmira, the surrogate location, far from the upland rice growing area but easier for experimentation. While the calibration used either one of the two sets phenotyped in one or two locations, the validation population was only the PCT27B phenotyped in Santa Rosa. Five scenarios of genomic prediction and 24 models were performed and compared. Training the prediction model with the PCT27B phenotyped in Santa Rosa resulted in predictive abilities ranging from 0.19 for grain zinc concentration to 0.30 for grain yield. Expanding the training set with the inclusion of the PCT27A resulted in greater predictive abilities for all traits but grain yield, with increases from 5% for plant height to 61% for grain zinc concentration. Models with the PCT27B phenotyped in two locations resulted in higher prediction accuracy when the models assumed no genotype-by-environment (G × E) interaction for flowering (0.38) and grain zinc concentration (0.27). For plant height, the model assuming a single G × E variance provided higher accuracy (0.28). The gain in predictive ability for grain yield was the greatest (0.25) when environment-specific variance deviation effect for G × E was considered. While the best scenario was specific to each trait, the results indicated that the gain in predictive ability provided by the multi-location and multi-generation calibration was low. Yet, this approach could lead to increased selection intensity, acceleration of the breeding cycle, and a sizable economic advantage for the program.
Collapse
Affiliation(s)
- Hugues de Verdal
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France.
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France.
| | - Cédric Baertschi
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France
| | - Julien Frouin
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France
| | - Constanza Quintero
- Alliance Bioversity-CIAT, A.A.6713, Km 17 Recta Palmira Cali, Cali, Colombia
| | - Yolima Ospina
- Alliance Bioversity-CIAT, A.A.6713, Km 17 Recta Palmira Cali, Cali, Colombia
| | | | - Tuong-Vi Cao
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France
| | - Jérôme Bartholomé
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France
- Alliance Bioversity-CIAT, A.A.6713, Km 17 Recta Palmira Cali, Cali, Colombia
| | - Cécile Grenier
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France.
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France.
- Alliance Bioversity-CIAT, A.A.6713, Km 17 Recta Palmira Cali, Cali, Colombia.
| |
Collapse
|
6
|
Vanavermaete D, Maenhout S, Fostier J, De Baets B. Oracle selection provides insight into how far off practice is from Utopia in plant breeding. FRONTIERS IN PLANT SCIENCE 2023; 14:1218665. [PMID: 37546253 PMCID: PMC10401442 DOI: 10.3389/fpls.2023.1218665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 06/27/2023] [Indexed: 08/08/2023]
Abstract
Since the introduction of genomic selection in plant breeding, high genetic gains have been realized in different plant breeding programs. Various methods based on genomic estimated breeding values (GEBVs) for selecting parental lines that maximize the genetic gain as well as methods for improving the predictive performance of genomic selection have been proposed. Unfortunately, it remains difficult to measure to what extent these methods really maximize long-term genetic values. In this study, we propose oracle selection, a hypothetical frame of mind that uses the ground truth to optimally select parents or optimize the training population in order to maximize the genetic gain in each breeding cycle. Clearly, oracle selection cannot be applied in a true breeding program, but allows for the assessment of existing parental selection and training population update methods and the evaluation of how far these methods are from the optimal utopian solution.
Collapse
Affiliation(s)
- David Vanavermaete
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Steven Maenhout
- Predictive Breeding, Department of Plants and Crops, Ghent University, Ghent, Belgium
| | - Jan Fostier
- IDLab, Department of Information Technology, Ghent University - imec, Ghent, Belgium
| | - Bernard De Baets
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| |
Collapse
|
7
|
Zhao F, Zhang P, Wang X, Akdemir D, Garrick D, He J, Wang L. Genetic gain and inbreeding from simulation of different genomic mating schemes for pig improvement. J Anim Sci Biotechnol 2023; 14:87. [PMID: 37309010 DOI: 10.1186/s40104-023-00872-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 04/02/2023] [Indexed: 06/14/2023] Open
Abstract
BACKGROUND Genomic selection involves choosing as parents those elite individuals with the higher genomic estimated breeding values (GEBV) to accelerate the speed of genetic improvement in domestic animals. But after multi-generation selection, the rate of inbreeding and the occurrence of homozygous harmful alleles might increase, which would reduce performance and genetic diversity. To mitigate the above problems, we can utilize genomic mating (GM) based upon optimal mate allocation to construct the best genotypic combinations in the next generation. In this study, we used stochastic simulation to investigate the impact of various factors on the efficiencies of GM to optimize pairing combinations after genomic selection of candidates in a pig population. These factors included: the algorithm used to derive inbreeding coefficients; the trait heritability (0.1, 0.3 or 0.5); the kind of GM scheme (focused average GEBV or inbreeding); the approach for computing the genomic relationship matrix (by SNP or runs of homozygosity (ROH)). The outcomes were compared to three traditional mating schemes (random, positive assortative or negative assortative matings). In addition, the performance of the GM approach was tested on real datasets obtained from a Large White pig breeding population. RESULTS Genomic mating outperforms other approaches in limiting the inbreeding accumulation for the same expected genetic gain. The use of ROH-based genealogical relatedness in GM achieved faster genetic gains than using relatedness based on individual SNPs. The GROH-based GM schemes with the maximum genetic gain resulted in 0.9%-2.6% higher rates of genetic gain ΔG, and 13%-83.3% lower ΔF than positive assortative mating regardless of heritability. The rates of inbreeding were always the fastest with positive assortative mating. Results from a purebred Large White pig population, confirmed that GM with ROH-based GRM was more efficient than traditional mating schemes. CONCLUSION Compared with traditional mating schemes, genomic mating can not only achieve sustainable genetic progress but also effectively control the rates of inbreeding accumulation in the population. Our findings demonstrated that breeders should consider using genomic mating for genetic improvement of pigs.
Collapse
Affiliation(s)
- Fuping Zhao
- Key Laboratory of Animal Genetics, Breeding and Reproduction (Poultry) of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Pengfei Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction (Poultry) of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Xiaoqing Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction (Poultry) of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Deniz Akdemir
- Center for Blood and Marrow Transplant Research, Minneapolis, MN, USA
| | - Dorian Garrick
- AL Rae Centre for Genetics and Breeding, Massey University, Hamilton, 3240, New Zealand
| | - Jun He
- College of Animal Science and Biotechnology, Hunnan Agricultural University, Changsha, 410128, China
| | - Lixian Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction (Poultry) of Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| |
Collapse
|
8
|
Fernández-González J, Akdemir D, Isidro Y Sánchez J. A comparison of methods for training population optimization in genomic selection. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:30. [PMID: 36892603 PMCID: PMC9998580 DOI: 10.1007/s00122-023-04265-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 11/21/2022] [Indexed: 06/18/2023]
Abstract
Maximizing CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50-55% (targeted) or 65-85% (untargeted) is needed to obtain 95% of the accuracy. With the advent of genomic selection (GS) as a widespread breeding tool, mechanisms to efficiently design an optimal training set for GS models became more relevant, since they allow maximizing the accuracy while minimizing the phenotyping costs. The literature described many training set optimization methods, but there is a lack of a comprehensive comparison among them. This work aimed to provide an extensive benchmark among optimization methods and optimal training set size by testing a wide range of them in seven datasets, six different species, different genetic architectures, population structure, heritabilities, and with several GS models to provide some guidelines about their application in breeding programs. Our results showed that targeted optimization (uses information from the test set) performed better than untargeted (does not use test set data), especially when heritability was low. The mean coefficient of determination was the best targeted method, although it was computationally intensive. Minimizing the average relationship within the training set was the best strategy for untargeted optimization. Regarding the optimal training set size, maximum accuracy was obtained when the training set was the entire candidate set. Nevertheless, a 50-55% of the candidate set was enough to reach 95-100% of the maximum accuracy in the targeted scenario, while we needed a 65-85% for untargeted optimization. Our results also suggested that a diverse training set makes GS robust against population structure, while including clustering information was less effective. The choice of the GS model did not have a significant influence on the prediction accuracies.
Collapse
Affiliation(s)
- Javier Fernández-González
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Madrid, Spain.
| | - Deniz Akdemir
- CIBMTR (Center for International Blood and Marrow Transplant Research), National Marrow Donor Program/Be The Match, Minneapolis, USA
| | - Julio Isidro Y Sánchez
- Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Madrid, Spain.
| |
Collapse
|
9
|
Shahinnia F, Mohler V, Hartl L. Genetic Basis of Resistance to Warrior (-) Yellow Rust Race at the Seedling Stage in Current Central and Northern European Winter Wheat Germplasm. PLANTS (BASEL, SWITZERLAND) 2023; 12:420. [PMID: 36771509 PMCID: PMC9920722 DOI: 10.3390/plants12030420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 01/09/2023] [Accepted: 01/13/2023] [Indexed: 06/18/2023]
Abstract
To evaluate genetic variability and seedling plant response to a dominating Warrior (-) race of yellow rust in Northern and Central European germplasm, we used a population of 229 winter wheat cultivars and breeding lines for a genome-wide association study (GWAS). A wide variation in yellow rust disease severity (based on infection types 1-9) was observed in this panel. Four breeding lines, TS049 (from Austria), TS111, TS185, and TS229 (from Germany), and one cultivar, TS158 (KWS Talent), from Germany were found to be resistant to Warrior (-) FS 53/20 and Warrior (-) G 23/19. The GWAS identified five significant SNPs associated with yellow rust on chromosomes 1B, 2A, 5B, and 7A for Warrior (-) FS 53/20, while one SNP on chromosome 5B was associated with disease for Warrior (-) G 23/19. For Warrior (-) FS 53/20, we discovered a new QTL for yellow rust resistance associated with the marker Kukri_c5357_323 on chromosome 1B. The resistant alleles G and T at the marker loci Kukri_c5357_323 on chromosome 1B and Excalibur_c17489_804 on chromosome 5B showed the largest effects (1.21 and 0.81, respectively) on the severity of Warrior (-) FS 53/20 and Warrior (-) G 23/19. Our results provide the basis for knowledge-based resistance breeding in the face of the enormous impact of the Warrior (-) race on wheat production in Europe.
Collapse
|
10
|
Shahinnia F, Geyer M, Schürmann F, Rudolphi S, Holzapfel J, Kempf H, Stadlmeier M, Löschenberger F, Morales L, Buerstmayr H, Sánchez JIY, Akdemir D, Mohler V, Lillemo M, Hartl L. Genome-wide association study and genomic prediction of resistance to stripe rust in current Central and Northern European winter wheat germplasm. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:3583-3595. [PMID: 36018343 PMCID: PMC9519682 DOI: 10.1007/s00122-022-04202-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 08/17/2022] [Indexed: 05/03/2023]
Abstract
We found two loci on chromosomes 2BS and 6AL that significantly contribute to stripe rust resistance in current European winter wheat germplasm. Stripe or yellow rust, caused by the fungus Puccinia striiformis Westend f. sp. tritici, is one of the most destructive wheat diseases. Sustainable management of wheat stripe rust can be achieved through the deployment of rust resistant cultivars. To detect effective resistance loci for use in breeding programs, an association mapping panel of 230 winter wheat cultivars and breeding lines from Northern and Central Europe was employed. Genotyping with the Illumina® iSelect® 25 K Infinium® single nucleotide polymorphism (SNP) genotyping array yielded 8812 polymorphic markers. Structure analysis revealed two subpopulations with 92 Austrian breeding lines and cultivars, which were separated from the other 138 genotypes from Germany, Norway, Sweden, Denmark, Poland, and Switzerland. Genome-wide association study for adult plant stripe rust resistance identified 12 SNP markers on six wheat chromosomes which showed consistent effects over several testing environments. Among these, two marker loci on chromosomes 2BS (RAC875_c1226_652) and 6AL (Tdurum_contig29607_413) were highly predictive in three independent validation populations of 1065, 1001, and 175 breeding lines. Lines with the resistant haplotype at both loci were nearly free of stipe rust symptoms. By using mixed linear models with those markers as fixed effects, we could increase predictive ability in the three populations by 0.13-0.46 compared to a standard genomic best linear unbiased prediction approach. The obtained results facilitate an efficient selection for stripe rust resistance against the current pathogen population in the Northern and Central European winter wheat gene pool.
Collapse
Affiliation(s)
- Fahimeh Shahinnia
- Bavarian State Research Center for Agriculture, Institute for Crop Science and Plant Breeding, 85354, Freising, Germany.
| | - Manuel Geyer
- Bavarian State Research Center for Agriculture, Institute for Crop Science and Plant Breeding, 85354, Freising, Germany
| | | | - Sabine Rudolphi
- SECOBRA Saatzucht GmbH, Lagesche Str. 250, 32657, Lemgo, Germany
| | - Josef Holzapfel
- SECOBRA Saatzucht GmbH, Feldkirchen 3, 85368, Moosburg, Germany
| | - Hubert Kempf
- SECOBRA Saatzucht GmbH, Feldkirchen 3, 85368, Moosburg, Germany
| | | | | | - Laura Morales
- Department of Agrobiotechnology, Institute of Biotechnology in Plant Production, University of Natural Resources and Life Sciences Vienna, Konrad-Lorenz-Straße 20, 3430, Tulln an der Donau, Austria
| | - Hermann Buerstmayr
- Department of Agrobiotechnology, Institute of Biotechnology in Plant Production, University of Natural Resources and Life Sciences Vienna, Konrad-Lorenz-Straße 20, 3430, Tulln an der Donau, Austria
| | - Julio Isidro Y Sánchez
- Centro de Biotecnologia y Genómica de Plantas, Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria, Universidad Politécnica de Madrid, Campus de Montegancedo, Madrid, Spain
| | - Deniz Akdemir
- Center for International Blood and Marrow Transplant Research (CIBMTR), National Marrow Donor Program/Be The Match, Minneapolis, MN, USA
| | - Volker Mohler
- Bavarian State Research Center for Agriculture, Institute for Crop Science and Plant Breeding, 85354, Freising, Germany
| | - Morten Lillemo
- Department of Plant Sciences, Norwegian University of Life Sciences, P.O. Box 5003, 1432, Ås, Norway
| | - Lorenz Hartl
- Bavarian State Research Center for Agriculture, Institute for Crop Science and Plant Breeding, 85354, Freising, Germany.
| |
Collapse
|
11
|
Building a Calibration Set for Genomic Prediction, Characteristics to Be Considered, and Optimization Approaches. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:77-112. [PMID: 35451773 DOI: 10.1007/978-1-0716-2205-6_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The efficiency of genomic selection strongly depends on the prediction accuracy of the genetic merit of candidates. Numerous papers have shown that the composition of the calibration set is a key contributor to prediction accuracy. A poorly defined calibration set can result in low accuracies, whereas an optimized one can considerably increase accuracy compared to random sampling, for a same size. Alternatively, optimizing the calibration set can be a way of decreasing the costs of phenotyping by enabling similar levels of accuracy compared to random sampling but with fewer phenotypic units. We present here the different factors that have to be considered when designing a calibration set, and review the different criteria proposed in the literature. We classified these criteria into two groups: model-free criteria based on relatedness, and criteria derived from the linear mixed model. We introduce criteria targeting specific prediction objectives including the prediction of highly diverse panels, biparental families, or hybrids. We also review different ways of updating the calibration set, and different procedures for optimizing phenotyping experimental designs.
Collapse
|
12
|
NMR in Metabolomics: From Conventional Statistics to Machine Learning and Neural Network Approaches. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12062824] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
NMR measurements combined with chemometrics allow achieving a great amount of information for the identification of potential biomarkers responsible for a precise metabolic pathway. These kinds of data are useful in different fields, ranging from food to biomedical fields, including health science. The investigation of the whole set of metabolites in a sample, representing its fingerprint in the considered condition, is known as metabolomics and may take advantage of different statistical tools. The new frontier is to adopt self-learning techniques to enhance clustering or classification actions that can improve the predictive power over large amounts of data. Although machine learning is already employed in metabolomics, deep learning and artificial neural networks approaches were only recently successfully applied. In this work, we give an overview of the statistical approaches underlying the wide range of opportunities that machine learning and neural networks allow to perform with accurate metabolites assignment and quantification.Various actual challenges are discussed, such as proper metabolomics, deep learning architectures and model accuracy.
Collapse
|
13
|
Rio S, Akdemir D, Carvalho T, Sánchez JIY. Assessment of genomic prediction reliability and optimization of experimental designs in multi-environment trials. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:405-419. [PMID: 34807267 PMCID: PMC8866390 DOI: 10.1007/s00122-021-03972-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Accepted: 10/08/2021] [Indexed: 06/13/2023]
Abstract
New forms of the coefficient of determination can help to forecast the accuracy of genomic prediction and optimize experimental designs in multi-environment trials with genotype-by-environment interactions. In multi-environment trials, the relative performance of genotypes may vary depending on the environmental conditions, and this phenomenon is commonly referred to as genotype-by-environment interaction (G[Formula: see text]E). With genomic prediction, G[Formula: see text]E can be accounted for by modeling the genetic covariance between trials, even when the overall experimental design is highly unbalanced between trials, thanks to the genomic relationship between genotypes. In this study, we propose new forms of the coefficient of determination (CD, i.e., the expected model-based square correlation between a genetic value and its corresponding prediction) that can be used to forecast the genomic prediction reliability of genotypes, both for their trial-specific performance and their mean performance. As the expected prediction reliability based on these new CD criteria is generally a good approximation of the observed reliability, we demonstrate that they can be used to optimize multi-environment trials in the presence of G[Formula: see text]E. In addition, this reliability may be highly variable between genotypes, especially in unbalanced designs with complex pedigree relationships between genotypes. Therefore, it can be useful for breeders to assess it before selecting genotypes based on their predicted genetic values. Using a wheat population evaluated both for simulated and phenology traits, and two maize populations evaluated for grain yield, we illustrate this approach and confirm the value of our new CD criteria.
Collapse
Affiliation(s)
- Simon Rio
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA) Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón Madrid, Spain
| | - Deniz Akdemir
- CIBMTR (Center for International Blood and Marrow Transplant Research), National Marrow Donor Program/Be The Match, Minneapolis, MN USA
| | - Tiago Carvalho
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA) Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón Madrid, Spain
| | - Julio Isidro y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA) Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón Madrid, Spain
| |
Collapse
|
14
|
Isidro y Sánchez J, Akdemir D. Training Set Optimization for Sparse Phenotyping in Genomic Selection: A Conceptual Overview. FRONTIERS IN PLANT SCIENCE 2021; 12:715910. [PMID: 34589099 PMCID: PMC8475495 DOI: 10.3389/fpls.2021.715910] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 08/10/2021] [Indexed: 06/13/2023]
Abstract
Genomic selection (GS) is becoming an essential tool in breeding programs due to its role in increasing genetic gain per unit time. The design of the training set (TRS) in GS is one of the key steps in the implementation of GS in plant and animal breeding programs mainly because (i) TRS optimization is critical for the efficiency and effectiveness of GS, (ii) breeders test genotypes in multi-year and multi-location trials to select the best-performing ones. In this framework, TRS optimization can help to decrease the number of genotypes to be tested and, therefore, reduce phenotyping cost and time, and (iii) we can obtain better prediction accuracies from optimally selected TRS than an arbitrary TRS. Here, we concentrate the efforts on reviewing the lessons learned from TRS optimization studies and their impact on crop breeding and discuss important features for the success of TRS optimization under different scenarios. In this article, we review the lessons learned from training population optimization in plants and the major challenges associated with the optimization of GS including population size, the relationship between training and test set (TS), update of TRS, and the use of different packages and algorithms for TRS implementation in GS. Finally, we describe general guidelines to improving the rate of genetic improvement by maximizing the use of the TRS optimization in the GS framework.
Collapse
Affiliation(s)
- Julio Isidro y Sánchez
- Centro de Biotecnologia y Genómica de Plantas, Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria, Universidad Politécnica de Madrid, Campus de Montegancedo, Madrid, Spain
| | - Deniz Akdemir
- Animal and Crop Science Division, Agriculture and Food Science Centre, University College Dublin, Dublin, Ireland
| |
Collapse
|
15
|
Zhang W, Boyle K, Brule-Babel A, Fedak G, Gao P, Djama ZR, Polley B, Cuthbert R, Randhawa H, Graf R, Jiang F, Eudes F, Fobert PR. Evaluation of Genomic Prediction for Fusarium Head Blight Resistance with a Multi-Parental Population. BIOLOGY 2021; 10:biology10080756. [PMID: 34439988 PMCID: PMC8389552 DOI: 10.3390/biology10080756] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/01/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022]
Abstract
Simple Summary Genomic selection is a promising approach to select superior wheat lines with better resistance to Fusarium head blight. The accuracy of genomic selection is determined by many factors. In this study, we found a training population with large size, genomic selection models incorporating biological information, and multi-environment modelling led to considerably better predictabilities. A training population designed by the coefficient of determination (CDmean) could increase accuracy of prediction. Relatedness between training population (TP) and testing population is the key for accuracies of genomic selection across populations. Abstract Fusarium head blight (FHB) resistance is quantitatively inherited, controlled by multiple minor effect genes, and highly affected by the interaction of genotype and environment. This makes genomic selection (GS) that uses genome-wide molecular marker data to predict the genetic breeding value as a promising approach to select superior lines with better resistance. However, various factors can affect accuracies of GS and better understanding how these factors affect GS accuracies could ensure the success of applying GS to improve FHB resistance in wheat. In this study, we performed a comprehensive evaluation of factors that affect GS accuracies with a multi-parental population designed for FHB resistance. We found larger sample sizes could get better accuracies. Training population designed by CDmean based optimization algorithms significantly increased accuracies than random sampling approach, while mean of predictor error variance (PEVmean) had the poorest performance. Different genomic selection models performed similarly for accuracies. Including prior known large effect quantitative trait loci (QTL) as fixed effect into the GS model considerably improved the predictability. Multi-traits models had almost no effects, while the multi-environment model outperformed the single environment model for prediction across different environments. By comparing within and across family prediction, better accuracies were obtained with the training population more closely related to the testing population. However, achieving good accuracies for GS prediction across populations is still a challenging issue for GS application.
Collapse
Affiliation(s)
- Wentao Zhang
- Aquatic and Crop Resources Development, National Research Council of Canada, Saskatoon, SK S7N 0W9, Canada; (K.B.); (P.G.); (B.P.)
- Correspondence: (W.Z.); (P.R.F.)
| | - Kerry Boyle
- Aquatic and Crop Resources Development, National Research Council of Canada, Saskatoon, SK S7N 0W9, Canada; (K.B.); (P.G.); (B.P.)
| | - Anita Brule-Babel
- Department of Plant Science, Agriculture Building, University of Manitoba, Winnipeg, MB R3T 2N2, Canada;
| | - George Fedak
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C6, Canada; (G.F.); (Z.R.D.)
| | - Peng Gao
- Aquatic and Crop Resources Development, National Research Council of Canada, Saskatoon, SK S7N 0W9, Canada; (K.B.); (P.G.); (B.P.)
| | - Zeinab Robleh Djama
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C6, Canada; (G.F.); (Z.R.D.)
| | - Brittany Polley
- Aquatic and Crop Resources Development, National Research Council of Canada, Saskatoon, SK S7N 0W9, Canada; (K.B.); (P.G.); (B.P.)
| | - Richard Cuthbert
- Swift Current Research and Development Centre, Agriculture and Agri-Food Canada, Swift Current, SK S9H 3X2, Canada;
| | - Harpinder Randhawa
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, AB T1J 4B1, Canada; (H.R.); (R.G.); (F.J.); (F.E.)
| | - Robert Graf
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, AB T1J 4B1, Canada; (H.R.); (R.G.); (F.J.); (F.E.)
| | - Fengying Jiang
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, AB T1J 4B1, Canada; (H.R.); (R.G.); (F.J.); (F.E.)
| | - Francois Eudes
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, AB T1J 4B1, Canada; (H.R.); (R.G.); (F.J.); (F.E.)
| | - Pierre R. Fobert
- Aquatic and Crop Resources Development, National Research Council of Canada, Ottawa, ON K1A 0R6, Canada
- Correspondence: (W.Z.); (P.R.F.)
| |
Collapse
|