1
|
Chen C, Powell O, Dinglasan E, Ross EM, Yadav S, Wei X, Atkin F, Deomano E, Hayes BJ. Genomic prediction with machine learning in sugarcane, a complex highly polyploid clonally propagated crop with substantial non-additive variation for key traits. THE PLANT GENOME 2023; 16:e20390. [PMID: 37728221 DOI: 10.1002/tpg2.20390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 08/01/2023] [Accepted: 08/29/2023] [Indexed: 09/21/2023]
Abstract
Sugarcane has a complex, highly polyploid genome with multi-species ancestry. Additive models for genomic prediction of clonal performance might not capture interactions between genes and alleles from different ploidies and ancestral species. As such, genomic prediction in sugarcane presents an interesting case for machine learning (ML) methods, which are purportedly able to deal with high levels of complexity in prediction. Here, we investigated deep learning (DL) neural networks, including multilayer networks (MLP) and convolution neural networks (CNN), and an ensemble machine learning approach, random forest (RF), for genomic prediction in sugarcane. The data set used was 2912 sugarcane clones, scored for 26,086 genome wide single nucleotide polymorphism markers, with final assessment trial data for total cane harvested (TCH), commercial cane sugar (CCS), and fiber content (Fiber). The clones in the latest trial (2017) were used as a validation set. We compared prediction accuracy of these methods to genomic best linear unbiased prediction (GBLUP) extended to include dominance and epistatic effects. The prediction accuracies from GBLUP models were up to 0.37 for TCH, 0.43 for CCS, and 0.48 for Fiber, while the optimized ML models had prediction accuracies of 0.35 for TCH, 0.38 for CCS, and 0.48 for Fiber. Both RF and DL neural network models have comparable predictive ability with the additive GBLUP model but are less accurate than the extended GBLUP model.
Collapse
Affiliation(s)
- Chensong Chen
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Owen Powell
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Eric Dinglasan
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Elizabeth M Ross
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Seema Yadav
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | | | | | | | - Ben J Hayes
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| |
Collapse
|
2
|
Ma J, Cao Y, Wang Y, Ding Y. Development of the maize 5.5K loci panel for genomic prediction through genotyping by target sequencing. FRONTIERS IN PLANT SCIENCE 2022; 13:972791. [PMID: 36438102 PMCID: PMC9691890 DOI: 10.3389/fpls.2022.972791] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 10/24/2022] [Indexed: 06/16/2023]
Abstract
Genotyping platforms are important for genetic research and molecular breeding. In this study, a low-density genotyping platform containing 5.5K SNP markers was successfully developed in maize using genotyping by target sequencing (GBTS) technology with capture-in-solution. Two maize populations (Pop1 and Pop2) were used to validate the GBTS panel for genetic and molecular breeding studies. Pop1 comprised 942 hybrids derived from 250 inbred lines and four testers, and Pop2 contained 540 hybrids which were generated from 123 new-developed inbred lines and eight testers. The genetic analyses showed that the average polymorphic information content and genetic diversity values ranged from 0.27 to 0.38 in both populations using all filtered genotyping data. The mean missing rate was 1.23% across populations. The Structure and UPGMA tree analyses revealed similar genetic divergences (76-89%) in both populations. Genomic prediction analyses showed that the prediction accuracy of reproducing kernel Hilbert space (RKHS) was slightly lower than that of genomic best linear unbiased prediction (GBLUP) and three Bayesian methods for general combining ability of grain yield per plant and three yield-related traits in both populations, whereas RKHS with additive effects showed superior advantages over the other four methods in Pop1. In Pop1, the GBLUP and three Bayesian methods with additive-dominance model improved the prediction accuracies by 4.89-134.52% for the four traits in comparison to the additive model. In Pop2, the inclusion of dominance did not improve the accuracy in most cases. In general, low accuracies (0.33-0.43) were achieved for general combing ability of the four traits in Pop1, whereas moderate-to-high accuracies (0.52-0.65) were observed in Pop2. For hybrid performance prediction, the accuracies were moderate to high (0.51-0.75) for the four traits in both populations using the additive-dominance model. This study suggests a reliable genotyping platform that can be implemented in genomic selection-assisted breeding to accelerate maize new cultivar development and improvement.
Collapse
|
3
|
Roth M, Beugnot A, Mary-Huard T, Moreau L, Charcosset A, Fiévet JB. Improving genomic predictions with inbreeding and nonadditive effects in two admixed maize hybrid populations in single and multienvironment contexts. Genetics 2022; 220:6527635. [PMID: 35150258 PMCID: PMC8982028 DOI: 10.1093/genetics/iyac018] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Accepted: 01/28/2022] [Indexed: 11/12/2022] Open
Abstract
Genetic admixture, resulting from the recombination between structural groups, is frequently encountered in breeding populations. In hybrid breeding, crossing admixed lines can generate substantial nonadditive genetic variance and contrasted levels of inbreeding which can impact trait variation. This study aimed at testing recent methodological developments for the modeling of inbreeding and nonadditive effects in order to increase prediction accuracy in admixed populations. Using two maize (Zea mays L.) populations of hybrids admixed between dent and flint heterotic groups, we compared a suite of five genomic prediction models incorporating (or not) parameters accounting for inbreeding and nonadditive effects with the natural and orthogonal interaction approach in single and multienvironment contexts. In both populations, variance decompositions showed the strong impact of inbreeding on plant yield, height, and flowering time which was supported by the superiority of prediction models incorporating this effect (+0.038 in predictive ability for mean yield). In most cases dominance variance was reduced when inbreeding was accounted for. The model including additivity, dominance, epistasis, and inbreeding effects appeared to be the most robust for prediction across traits and populations (+0.054 in predictive ability for mean yield). In a multienvironment context, we found that the inclusion of nonadditive and inbreeding effects was advantageous when predicting hybrids not yet observed in any environment. Overall, comparing variance decompositions was helpful to guide model selection for genomic prediction. Finally, we recommend the use of models including inbreeding and nonadditive parameters following the natural and orthogonal interaction approach to increase prediction accuracy in admixed populations.
Collapse
Affiliation(s)
- Morgane Roth
- Plant Breeding Research Division, Agroscope, Wädenswil, 8820 Zurich, Switzerland,Corresponding author: INRAE GAFL, 67 Allée des Chênes 84140 Montfavet, France.
| | - Aurélien Beugnot
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, UMR GQE-Le Moulon, 91190 Gif-sur-Yvette, France
| | - Tristan Mary-Huard
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, UMR GQE-Le Moulon, 91190 Gif-sur-Yvette, France,Université Paris-Saclay, INRAE, AgroParisTech, UMR MIA-Paris Paris, 75005 Paris, France
| | - Laurence Moreau
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, UMR GQE-Le Moulon, 91190 Gif-sur-Yvette, France
| | - Alain Charcosset
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, UMR GQE-Le Moulon, 91190 Gif-sur-Yvette, France
| | - Julie B Fiévet
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, UMR GQE-Le Moulon, 91190 Gif-sur-Yvette, France
| |
Collapse
|
4
|
Ishimori M, Takanashi H, Fujimoto M, Kajiya-Kanegae H, Yoneda J, Tokunaga T, Tsutsumi N, Iwata H. Spatial kernel models capturing field heterogeneity for accurate estimation of genetic potential. BREEDING SCIENCE 2021; 71:444-455. [PMID: 34912171 PMCID: PMC8661485 DOI: 10.1270/jsbbs.20060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 05/19/2021] [Indexed: 06/14/2023]
Abstract
According to Fisher's principles, an experimental field is typically divided into multiple blocks for local control. Although homogeneity is supposed within a block, this assumption may not be practical for large blocks, such as those including hundreds of plots. In line evaluation trials, which are essential in plant breeding, field heterogeneity must be carefully treated, because it can cause bias in the estimation of genetic potential. To more accurately estimate genotypic values in a large field trial, we developed spatial kernel models incorporating genome-wide markers, which consider continuous heterogeneity within a block and over the field. In the simulation study, the spatial kernel models were robust under various conditions. Although heritability, spatial autocorrelation range, replication number, and missing plots directly affected the estimation accuracy of genotypic values, the spatial kernel models always showed superior performance over the classical block model. We also employed these spatial kernel models for quantitative trait locus mapping. Finally, using field experimental data of bioenergy sorghum lines, we validated the performance of the spatial kernel models. The results suggested that a spatial kernel model is effective for evaluating the genetic potential of lines in a heterogeneous field.
Collapse
Affiliation(s)
- Motoyuki Ishimori
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo, Tokyo 113-8657, Japan
| | - Hideki Takanashi
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo, Tokyo 113-8657, Japan
| | - Masaru Fujimoto
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo, Tokyo 113-8657, Japan
| | - Hiromi Kajiya-Kanegae
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo, Tokyo 113-8657, Japan
| | - Junichi Yoneda
- EARTHNOTE Co. Ltd., 1388 Sokei, Ginoza, Okinawa 904-1303, Japan
| | | | - Nobuhiro Tsutsumi
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo, Tokyo 113-8657, Japan
| | - Hiroyoshi Iwata
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo, Tokyo 113-8657, Japan
| |
Collapse
|
5
|
Varshney RK, Bohra A, Yu J, Graner A, Zhang Q, Sorrells ME. Designing Future Crops: Genomics-Assisted Breeding Comes of Age. TRENDS IN PLANT SCIENCE 2021; 26:631-649. [PMID: 33893045 DOI: 10.1016/j.tplants.2021.03.010] [Citation(s) in RCA: 154] [Impact Index Per Article: 51.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 03/16/2021] [Accepted: 03/17/2021] [Indexed: 05/18/2023]
Abstract
Over the past decade, genomics-assisted breeding (GAB) has been instrumental in harnessing the potential of modern genome resources and characterizing and exploiting allelic variation for germplasm enhancement and cultivar development. Sustaining GAB in the future (GAB 2.0) will rely upon a suite of new approaches that fast-track targeted manipulation of allelic variation for creating novel diversity and facilitate their rapid and efficient incorporation in crop improvement programs. Genomic breeding strategies that optimize crop genomes with accumulation of beneficial alleles and purging of deleterious alleles will be indispensable for designing future crops. In coming decades, GAB 2.0 is expected to play a crucial role in breeding more climate-smart crop cultivars with higher nutritional value in a cost-effective and timely manner.
Collapse
Affiliation(s)
- Rajeev K Varshney
- Center of Excellence in Genomics and Systems Biology (CEGSB), International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India; State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Western Australia, Australia.
| | - Abhishek Bohra
- Crop Improvement Division, ICAR- Indian Institute of Pulses Research (ICAR- IIPR), Kanpur, India
| | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, IA, USA
| | - Andreas Graner
- Leibniz Institute of Plant Genetics and Crops Plant Research (IPK), Gatersleben, Germany
| | - Qifa Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Mark E Sorrells
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY, USA
| |
Collapse
|