1
|
Fernandes IK, Vieira CC, Dias KOG, Fernandes SB. Using machine learning to combine genetic and environmental data for maize grain yield predictions across multi-environment trials. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:189. [PMID: 39044035 PMCID: PMC11266441 DOI: 10.1007/s00122-024-04687-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 06/29/2024] [Indexed: 07/25/2024]
Abstract
KEY MESSAGE Incorporating feature-engineered environmental data into machine learning-based genomic prediction models is an efficient approach to indirectly model genotype-by-environment interactions. Complementing phenotypic traits and molecular markers with high-dimensional data such as climate and soil information is becoming a common practice in breeding programs. This study explored new ways to combine non-genetic information in genomic prediction models using machine learning. Using the multi-environment trial data from the Genomes To Fields initiative, different models to predict maize grain yield were adjusted using various inputs: genetic, environmental, or a combination of both, either in an additive (genetic-and-environmental; G+E) or a multiplicative (genotype-by-environment interaction; GEI) manner. When including environmental data, the mean prediction accuracy of machine learning genomic prediction models increased up to 7% over the well-established Factor Analytic Multiplicative Mixed Model among the three cross-validation scenarios evaluated. Moreover, using the G+E model was more advantageous than the GEI model given the superior, or at least comparable, prediction accuracy, the lower usage of computational memory and time, and the flexibility of accounting for interactions by construction. Our results illustrate the flexibility provided by the ML framework, particularly with feature engineering. We show that the feature engineering stage offers a viable option for envirotyping and generates valuable information for machine learning-based genomic prediction models. Furthermore, we verified that the genotype-by-environment interactions may be considered using tree-based approaches without explicitly including interactions in the model. These findings support the growing interest in merging high-dimensional genotypic and environmental data into predictive modeling.
Collapse
Affiliation(s)
- Igor K Fernandes
- Department of Crop, Soil, and Environmental Sciences, Center for Agricultural Data Analytics, University of Arkansas, Fayetteville, AR, USA
| | - Caio C Vieira
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, USA
| | - Kaio O G Dias
- Department of General Biology, Federal University of Viçosa, Viçosa, Brazil
| | - Samuel B Fernandes
- Department of Crop, Soil, and Environmental Sciences, Center for Agricultural Data Analytics, University of Arkansas, Fayetteville, AR, USA.
| |
Collapse
|
2
|
Peixoto MA, Leach KA, Jarquin D, Flannery P, Zystro J, Tracy WF, Bhering L, Resende MFR. Utilizing genomic prediction to boost hybrid performance in a sweet corn breeding program. FRONTIERS IN PLANT SCIENCE 2024; 15:1293307. [PMID: 38726298 PMCID: PMC11080654 DOI: 10.3389/fpls.2024.1293307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 03/26/2024] [Indexed: 05/12/2024]
Abstract
Sweet corn breeding programs, like field corn, focus on the development of elite inbred lines to produce commercial hybrids. For this reason, genomic selection models can help the in silico prediction of hybrid crosses from the elite lines, which is hypothesized to improve the test cross scheme, leading to higher genetic gain in a breeding program. This study aimed to explore the potential of implementing genomic selection in a sweet corn breeding program through hybrid prediction in a within-site across-year and across-site framework. A total of 506 hybrids were evaluated in six environments (California, Florida, and Wisconsin, in the years 2020 and 2021). A total of 20 traits from three different groups were measured (plant-, ear-, and flavor-related traits) across the six environments. Eight statistical models were considered for prediction, as the combination of two genomic prediction models (GBLUP and RKHS) with two different kernels (additive and additive + dominance), and in a single- and multi-trait framework. Also, three different cross-validation schemes were tested (CV1, CV0, and CV00). The different models were then compared based on the correlation between the estimated breeding values/total genetic values and phenotypic measurements. Overall, heritabilities and correlations varied among the traits. The models implemented showed good accuracies for trait prediction. The GBLUP implementation outperformed RKHS in all cross-validation schemes and models. Models with additive plus dominance kernels presented a slight improvement over the models with only additive kernels for some of the models examined. In addition, models for within-site across-year and across-site performed better in the CV0 than the CV00 scheme, on average. Hence, GBLUP should be considered as a standard model for sweet corn hybrid prediction. In addition, we found that the implementation of genomic prediction in a sweet corn breeding program presented reliable results, which can improve the testcross stage by identifying the top candidates that will reach advanced field-testing stages.
Collapse
Affiliation(s)
- Marco Antônio Peixoto
- Laboratório de Biometria, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
- Department of Horticultural Sciences, University of Florida, Gainesville, FL, United States
| | - Kristen A. Leach
- Department of Horticultural Sciences, University of Florida, Gainesville, FL, United States
| | - Diego Jarquin
- Department of Agronomy, University of Florida, Gainesville, FL, United States
| | - Patrick Flannery
- Department of Plant and Agroecosystem Sciences, University of Wisconsin-Madison, Madison, WI, United States
| | - Jared Zystro
- Organic Seed Alliance, Port Townsend, WA, United States
| | - William F. Tracy
- Department of Plant and Agroecosystem Sciences, University of Wisconsin-Madison, Madison, WI, United States
| | - Leonardo Bhering
- Laboratório de Biometria, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Márcio F. R. Resende
- Department of Horticultural Sciences, University of Florida, Gainesville, FL, United States
| |
Collapse
|
3
|
Toda Y, Sasaki G, Ohmori Y, Yamasaki Y, Takahashi H, Takanashi H, Tsuda M, Kajiya-Kanegae H, Tsujimoto H, Kaga A, Hirai M, Nakazono M, Fujiwara T, Iwata H. Reaction norm for genomic prediction of plant growth: modeling drought stress response in soybean. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:77. [PMID: 38460027 PMCID: PMC10924738 DOI: 10.1007/s00122-024-04565-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 01/30/2024] [Indexed: 03/11/2024]
Abstract
KEY MESSAGE We proposed models to predict the effects of genomic and environmental factors on daily soybean growth and applied them to soybean growth data obtained with unmanned aerial vehicles. Advances in high-throughput phenotyping technology have made it possible to obtain time-series plant growth data in field trials, enabling genotype-by-environment interaction (G × E) modeling of plant growth. Although the reaction norm is an effective method for quantitatively evaluating G × E and has been implemented in genomic prediction models, no reaction norm models have been applied to plant growth data. Here, we propose a novel reaction norm model for plant growth using spline and random forest models, in which daily growth is explained by environmental factors one day prior. The proposed model was applied to soybean canopy area and height to evaluate the influence of drought stress levels. Changes in the canopy area and height of 198 cultivars were measured by remote sensing using unmanned aerial vehicles. Multiple drought stress levels were set as treatments, and their time-series soil moisture was measured. The models were evaluated using three cross-validation schemes. Although accuracy of the proposed models did not surpass that of single-trait genomic prediction, the results suggest that our model can capture G × E, especially the latter growth period for the random forest model. Also, significant variations in the G × E of the canopy height during the early growth period were visualized using the spline model. This result indicates the effectiveness of the proposed models on plant growth data and the possibility of revealing G × E in various growth stages in plant breeding by applying statistical or machine learning models to time-series phenotype data.
Collapse
Affiliation(s)
- Yusuke Toda
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Goshi Sasaki
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Yoshihiro Ohmori
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Yuji Yamasaki
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
- Arid Land Research Center, Tottori University, Tottori, Japan
| | - Hirokazu Takahashi
- Graduate School of Bioagricultural Sciences, Nagoya University, Nagoya, Japan
| | - Hideki Takanashi
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Mai Tsuda
- Tsukuba-Plant Innovation Research Center (T-PIRC), University of Tsukuba, Tsukuba, Japan
| | | | | | - Akito Kaga
- Institute of Crop Science, NARO, Tsukuba, Japan
| | - Masami Hirai
- RIKEN Center for Sustainable Resource Science, Tsukuba, Japan
| | - Mikio Nakazono
- Graduate School of Bioagricultural Sciences, Nagoya University, Nagoya, Japan
| | - Toru Fujiwara
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Hiroyoshi Iwata
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
4
|
Singer WM, Lee YC, Shea Z, Vieira CC, Lee D, Li X, Cunicelli M, Kadam SS, Khan MAW, Shannon G, Mian MAR, Nguyen HT, Zhang B. Soybean genetics, genomics, and breeding for improving nutritional value and reducing antinutritional traits in food and feed. THE PLANT GENOME 2023; 16:e20415. [PMID: 38084377 DOI: 10.1002/tpg2.20415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 10/25/2023] [Accepted: 10/27/2023] [Indexed: 12/22/2023]
Abstract
Soybean [Glycine max (L.) Merr.] is a globally important crop due to its valuable seed composition, versatile feed, food, and industrial end-uses, and consistent genetic gain. Successful genetic gain in soybean has led to widespread adaptation and increased value for producers, processors, and consumers. Specific focus on the nutritional quality of soybean seed composition for food and feed has further elucidated genetic knowledge and bolstered breeding progress. Seed components are historical and current targets for soybean breeders seeking to improve nutritional quality of soybean. This article reviews genetic and genomic foundations for improvement of nutritionally important traits, such as protein and amino acids, oil and fatty acids, carbohydrates, and specific food-grade considerations; discusses the application of advanced breeding technology such as CRISPR/Cas9 in creating seed composition variations; and provides future directions and breeding recommendations regarding soybean seed composition traits.
Collapse
Affiliation(s)
- William M Singer
- School of Plant and Environmental Sciences, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, USA
| | - Yi-Chen Lee
- Department of Agriculture, Fort Hays State University, Hays, Kansas, USA
| | - Zachary Shea
- School of Plant and Environmental Sciences, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, USA
| | - Caio Canella Vieira
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, Arkansas, USA
| | - Dongho Lee
- Fisher Delta Research, Extension, and Education Center, University of Missouri, Portageville, Missouri, USA
| | - Xiaoying Li
- School of Plant and Environmental Sciences, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, USA
| | - Mia Cunicelli
- Soybean and Nitrogen Fixation Research Unit, USDA-ARS, Raleigh, North Carolina, USA
| | - Shaila S Kadam
- Division of Plant Science and Technology, University of Missouri, Columbia, Missouri, USA
| | | | - Grover Shannon
- Fisher Delta Research, Extension, and Education Center, University of Missouri, Portageville, Missouri, USA
| | - M A Rouf Mian
- Soybean and Nitrogen Fixation Research Unit, USDA-ARS, Raleigh, North Carolina, USA
| | - Henry T Nguyen
- Division of Plant Science and Technology, University of Missouri, Columbia, Missouri, USA
| | - Bo Zhang
- School of Plant and Environmental Sciences, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, USA
| |
Collapse
|
5
|
Jackson R, Buntjer JB, Bentley AR, Lage J, Byrne E, Burt C, Jack P, Berry S, Flatman E, Poupard B, Smith S, Hayes C, Barber T, Love B, Gaynor RC, Gorjanc G, Howell P, Mackay IJ, Hickey JM, Ober ES. Phenomic and genomic prediction of yield on multiple locations in winter wheat. Front Genet 2023; 14:1164935. [PMID: 37229190 PMCID: PMC10203586 DOI: 10.3389/fgene.2023.1164935] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 04/20/2023] [Indexed: 05/27/2023] Open
Abstract
Genomic selection has recently become an established part of breeding strategies in cereals. However, a limitation of linear genomic prediction models for complex traits such as yield is that these are unable to accommodate Genotype by Environment effects, which are commonly observed over trials on multiple locations. In this study, we investigated how this environmental variation can be captured by the collection of a large number of phenomic markers using high-throughput field phenotyping and whether it can increase GS prediction accuracy. For this purpose, 44 winter wheat (Triticum aestivum L.) elite populations, comprising 2,994 lines, were grown on two sites over 2 years, to approximate the size of trials in a practical breeding programme. At various growth stages, remote sensing data from multi- and hyperspectral cameras, as well as traditional ground-based visual crop assessment scores, were collected with approximately 100 different data variables collected per plot. The predictive power for grain yield was tested for the various data types, with or without genome-wide marker data sets. Models using phenomic traits alone had a greater predictive value (R2 = 0.39-0.47) than genomic data (approximately R2 = 0.1). The average improvement in predictive power by combining trait and marker data was 6%-12% over the best phenomic-only model, and performed best when data from one full location was used to predict the yield on an entire second location. The results suggest that genetic gain in breeding programmes can be increased by utilisation of large numbers of phenotypic variables using remote sensing in field trials, although at what stage of the breeding cycle phenomic selection could be most profitably applied remains to be answered.
Collapse
Affiliation(s)
- Robert Jackson
- The John Bingham Laboratory, NIAB, Cambridge, United Kingdom
| | - Jaap B. Buntjer
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Scotland, United Kingdom
| | | | - Jacob Lage
- KWS UK Ltd, Thriplow, Royston, Cambridgeshire, United Kingdom
| | - Ed Byrne
- KWS UK Ltd, Thriplow, Royston, Cambridgeshire, United Kingdom
| | - Chris Burt
- RAGT UK, Ickleton, Saffron Walden, Cambridgeshire, United Kingdom
| | - Peter Jack
- RAGT UK, Ickleton, Saffron Walden, Cambridgeshire, United Kingdom
| | - Simon Berry
- Limagrain UK Ltd, Rothwell, Market Rasen, Lincolnshire, United Kingdom
| | - Edward Flatman
- Limagrain UK Ltd, Rothwell, Market Rasen, Lincolnshire, United Kingdom
| | - Bruno Poupard
- Limagrain UK Ltd, Rothwell, Market Rasen, Lincolnshire, United Kingdom
| | - Stephen Smith
- Elsoms Wheat Limited, Spalding, Linconshire, United Kingdom
| | | | - Tobias Barber
- The John Bingham Laboratory, NIAB, Cambridge, United Kingdom
| | - Bethany Love
- The John Bingham Laboratory, NIAB, Cambridge, United Kingdom
| | - R. Chris Gaynor
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Scotland, United Kingdom
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Scotland, United Kingdom
| | - Phil Howell
- The John Bingham Laboratory, NIAB, Cambridge, United Kingdom
| | - Ian J. Mackay
- The John Bingham Laboratory, NIAB, Cambridge, United Kingdom
| | - John M. Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Scotland, United Kingdom
| | - Eric S. Ober
- The John Bingham Laboratory, NIAB, Cambridge, United Kingdom
| |
Collapse
|