1
|
Shahi D, Todd J, Gravois K, Hale A, Blanchard B, Kimbeng C, Pontif M, Baisakh N. Exploiting historical agronomic data to develop genomic prediction strategies for early clonal selection in the Louisiana sugarcane variety development program. THE PLANT GENOME 2025; 18:e20545. [PMID: 39740237 DOI: 10.1002/tpg2.20545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Revised: 11/15/2024] [Accepted: 11/18/2024] [Indexed: 01/02/2025]
Abstract
Genomic selection can enhance the rate of genetic gain of cane and sucrose yield in sugarcane (Saccharum L.), an important industrial crop worldwide. We assessed the predictive ability (PA) for six traits, such as theoretical recoverable sugar (TRS), number of stalks (NS), stalk weight (SW), cane yield (CY), sugar yield (SY), and fiber content (Fiber) using 20,451 single nucleotide polymorphisms (SNPs) with 22 statistical models based on the genomic estimated breeding values of 567 genotypes within and across five stages of the Louisiana sugarcane breeding program. TRS and SW with high heritability showed higher PA compared to other traits, while NS had the lowest. Machine learning (ML) methods, such as random forest and support vector machine (SVM), outperformed others in predicting traits with low heritability. ML methods predicted TRS and SY with the highest accuracy in cross-stage predictions, while Bayesian models predicted NS and CY with the highest accuracy. Extended genomic best linear unbiased prediction models accounting for dominance and epistasis effects showed a slight improvement in PA for a few traits. When both NS and TRS, which can be available as early as stage 2, were considered in a multi-trait selection model, the PA for SY in stage 5 could increase up to 0.66 compared to 0.30 with a single-trait model. Marker density assessment suggested 9091 SNPs were sufficient for optimal PA of all traits. The study demonstrated the potential of using historical data to devise genomic prediction strategies for clonal selection early in sugarcane breeding programs.
Collapse
Affiliation(s)
- Dipendra Shahi
- School of Plant, Environmental and Soil Sciences, Louisiana State University Agricultural Center, Baton Rouge, Louisiana, USA
| | - James Todd
- Sugarcane Research Unit, USDA-ARS, Houma, Louisiana, USA
| | - Kenneth Gravois
- Sugar Research Station, Louisiana State University Agricultural Center, St. Gabriel, Louisiana, USA
| | - Anna Hale
- Sugarcane Research Unit, USDA-ARS, Houma, Louisiana, USA
| | - Brayden Blanchard
- Sugar Research Station, Louisiana State University Agricultural Center, St. Gabriel, Louisiana, USA
| | - Collins Kimbeng
- Sugar Research Station, Louisiana State University Agricultural Center, St. Gabriel, Louisiana, USA
| | - Michael Pontif
- Sugar Research Station, Louisiana State University Agricultural Center, St. Gabriel, Louisiana, USA
| | - Niranjan Baisakh
- School of Plant, Environmental and Soil Sciences, Louisiana State University Agricultural Center, Baton Rouge, Louisiana, USA
| |
Collapse
|
2
|
Weber SE, Roscher-Ehrig L, Kox T, Abbadi A, Stahl A, Snowdon RJ. Genomic prediction in Brassica napus: evaluating the benefit of imputed whole-genome sequencing data. Genome 2024; 67:210-222. [PMID: 38708850 DOI: 10.1139/gen-2023-0126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Advances in sequencing technology allow whole plant genomes to be sequenced with high quality. Combining genotypic and phenotypic data in genomic prediction helps breeders to select crossing partners in partially phenotyped populations. In plant breeding programs, the cost of sequencing entire breeding populations still exceeds available genotyping budgets. Hence, the method for genotyping is still mainly single nucleotide polymorphism (SNP) arrays; however, arrays are unable to assess the entire genome- and population-wide diversity. A compromise involves genotyping the entire population using an SNP array and a subset of the population with whole-genome sequencing. Both datasets can then be used to impute markers from whole-genome sequencing onto the entire population. Here, we evaluate whether imputation of whole-genome sequencing data enhances genomic predictions, using data from a nested association mapping population of rapeseed (Brassica napus). Employing two cross-validation schemes that mimic scenarios for the prediction of close and distant relatives, we show that imputed marker data do not significantly improve prediction accuracy, likely due to redundancy in relationship estimates and imputation errors. In simulation studies, only small improvements were observed, further corroborating the findings. We conclude that SNP arrays are already equipped with the information that is added by imputation through relationship and linkage disequilibrium.
Collapse
Affiliation(s)
- Sven E Weber
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Giessen, Germany
| | - Lennard Roscher-Ehrig
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Giessen, Germany
| | | | | | - Andreas Stahl
- Julius Kuehn Institute (JKI), Federal Research Centre for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Quedlinburg, Germany
| | - Rod J Snowdon
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Giessen, Germany
| |
Collapse
|
3
|
Chen C, Bhuiyan SA, Ross E, Powell O, Dinglasan E, Wei X, Atkin F, Deomano E, Hayes B. Genomic prediction for sugarcane diseases including hybrid Bayesian-machine learning approaches. FRONTIERS IN PLANT SCIENCE 2024; 15:1398903. [PMID: 38751840 PMCID: PMC11095127 DOI: 10.3389/fpls.2024.1398903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 04/15/2024] [Indexed: 05/18/2024]
Abstract
Sugarcane smut and Pachymetra root rots are two serious diseases of sugarcane, with susceptible infected crops losing over 30% of yield. A heritable component to both diseases has been demonstrated, suggesting selection could improve disease resistance. Genomic selection could accelerate gains even further, enabling early selection of resistant seedlings for breeding and clonal propagation. In this study we evaluated four types of algorithms for genomic predictions of clonal performance for disease resistance. These algorithms were: Genomic best linear unbiased prediction (GBLUP), including extensions to model dominance and epistasis, Bayesian methods including BayesC and BayesR, Machine learning methods including random forest, multilayer perceptron (MLP), modified convolutional neural network (CNN) and attention networks designed to capture epistasis across the genome-wide markers. Simple hybrid methods, that first used BayesR/GWAS to identify a subset of 1000 markers with moderate to large marginal additive effects, then used attention networks to derive predictions from these effects and their interactions, were also developed and evaluated. The hypothesis for this approach was that using a subset of markers more likely to have an effect would enable better estimation of interaction effects than when there were an extremely large number of possible interactions, especially with our limited data set size. To evaluate the methods, we applied both random five-fold cross-validation and a structured PCA based cross-validation that separated 4702 sugarcane clones (that had disease phenotypes and genotyped for 26k genome wide SNP markers) by genomic relationship. The Bayesian methods (BayesR and BayesC) gave the highest accuracy of prediction, followed closely by hybrid methods with attention networks. The hybrid methods with attention networks gave the lowest variation in accuracy of prediction across validation folds (and lowest MSE), which may be a criteria worth considering in practical breeding programs. This suggests that hybrid methods incorporating the attention mechanism could be useful for genomic prediction of clonal performance, particularly where non-additive effects may be important.
Collapse
Affiliation(s)
- Chensong Chen
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Shamsul A. Bhuiyan
- Sugar Research Australia, Woodford, QLD, Australia
- Queensland Micro- and Nanotechnology Centre, Griffith University, Nathan, QLD, Australia
| | - Elizabeth Ross
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Owen Powell
- Center for Crop Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Eric Dinglasan
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Xianming Wei
- Sugar Research Australia, Indooroopilly, QLD, Australia
| | | | - Emily Deomano
- Sugar Research Australia, Indooroopilly, QLD, Australia
| | - Ben Hayes
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
4
|
Alemu A, Åstrand J, Montesinos-López OA, Isidro Y Sánchez J, Fernández-Gónzalez J, Tadesse W, Vetukuri RR, Carlsson AS, Ceplitis A, Crossa J, Ortiz R, Chawade A. Genomic selection in plant breeding: Key factors shaping two decades of progress. MOLECULAR PLANT 2024; 17:552-578. [PMID: 38475993 DOI: 10.1016/j.molp.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/14/2024]
Abstract
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.
Collapse
Affiliation(s)
- Admas Alemu
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Johanna Åstrand
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden; Lantmännen Lantbruk, Svalöv, Sweden
| | | | - Julio Isidro Y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Javier Fernández-Gónzalez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Wuletaw Tadesse
- International Center for Agricultural Research in the Dry Areas (ICARDA), Rabat, Morocco
| | - Ramesh R Vetukuri
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - Anders S Carlsson
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco, México 52640, Mexico
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| |
Collapse
|
5
|
Lin YC, Mayer M, Valle Torres D, Pook T, Hölker AC, Presterl T, Ouzunova M, Schön CC. Genomic prediction within and across maize landrace derived populations using haplotypes. FRONTIERS IN PLANT SCIENCE 2024; 15:1351466. [PMID: 38584949 PMCID: PMC10995330 DOI: 10.3389/fpls.2024.1351466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 03/05/2024] [Indexed: 04/09/2024]
Abstract
Genomic prediction (GP) using haplotypes is considered advantageous compared to GP solely reliant on single nucleotide polymorphisms (SNPs), owing to haplotypes' enhanced ability to capture ancestral information and their higher linkage disequilibrium with quantitative trait loci (QTL). Many empirical studies supported the advantages of haplotype-based GP over SNP-based approaches. Nevertheless, the performance of haplotype-based GP can vary significantly depending on multiple factors, including the traits being studied, the genetic structure of the population under investigation, and the particular method employed for haplotype construction. In this study, we compared haplotype and SNP based prediction accuracies in four populations derived from European maize landraces. Populations comprised either doubled haploid lines (DH) derived directly from landraces, or gamete capture lines (GC) derived from crosses of the landraces with an inbred line. For two different landraces, both types of populations were generated, genotyped with 600k SNPs and phenotyped as lines per se for five traits. Our study explores three prediction scenarios: (i) within each of the four populations, (ii) across DH and GC populations from the same landrace, and (iii) across landraces using either DH or GC populations. Three haplotype construction methods were evaluated: 1. fixed-window blocks (FixedHB), 2. LD-based blocks (HaploView), and 3. IBD-based blocks (HaploBlocker). In within population predictions, FixedHB and HaploView methods performed as well as or slightly better than SNPs for all traits. HaploBlocker improved accuracy for certain traits but exhibited inferior performance for others. In prediction across populations, the parameter setting from HaploBlocker which controls the construction of shared haplotypes between populations played a crucial role for obtaining optimal results. When predicting across landraces, accuracies were low for both, SNP and haplotype approaches, but for specific traits substantial improvement was observed with HaploBlocker. This study provides recommendations for optimal haplotype construction and identifies relevant parameters for constructing haplotypes in the context of genomic prediction.
Collapse
Affiliation(s)
- Yan-Cheng Lin
- Chair of Plant Breeding, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Manfred Mayer
- Chair of Plant Breeding, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- Bayer CropScience Deutschland GmbH, Borken, Germany
| | - Daniel Valle Torres
- Chair of Plant Breeding, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- Sugar Beet Breeding, Strube Research GmbH & Co. KG, Söllingen, Germany
| | - Torsten Pook
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, Netherlands
| | - Armin C. Hölker
- Product Development Maize and Oil Crops, KWS SAAT SE & Co. KGaA, Einbeck, Germany
| | - Thomas Presterl
- Product Development Maize and Oil Crops, KWS SAAT SE & Co. KGaA, Einbeck, Germany
| | - Milena Ouzunova
- Product Development Maize and Oil Crops, KWS SAAT SE & Co. KGaA, Einbeck, Germany
| | - Chris-Carolin Schön
- Chair of Plant Breeding, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| |
Collapse
|