1
|
Resende RT, Xavier A, Silva PIT, Resende MPM, Jarquin D, Marcatti GE. GIS-based G × E modeling of maize hybrids through enviromic markers engineering. THE NEW PHYTOLOGIST 2024. [PMID: 39014516 DOI: 10.1111/nph.19951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 06/22/2024] [Indexed: 07/18/2024]
Abstract
Through enviromics, precision breeding leverages innovative geotechnologies to customize crop varieties to specific environments, potentially improving both crop yield and genetic selection gains. In Brazil's four southernmost states, data from 183 distinct geographic field trials (also accounting for 2017-2021) covered information on 164 genotypes: 79 phenotyped maize hybrid genotypes for grain yield and their 85 nonphenotyped parents. Additionally, 1342 envirotypic covariates from weather, soil, sensor-based, and satellite sources were collected to engineer 10 K synthetic enviromic markers via machine learning. Soil, radiation light, and surface temperature variations remarkably affect differential genotype yield, hinting at ecophysiological adjustments including evapotranspiration and photosynthesis. The enviromic ensemble-based random regression model showcases superior predictive performance and efficiency compared to the baseline and kernel models, matching the best genotypes to specific geographic coordinates. Clustering analysis has identified regions that minimize genotype-environment (G × E) interactions. These findings underscore the potential of enviromics in crafting specific parental combinations to breed new, higher-yielding hybrid crops. The adequate use of envirotypic information can enhance the precision and efficiency of maize breeding by providing important inputs about the environmental factors that affect the average crop performance. Generating enviromic markers associated with grain yield can enable a better selection of hybrids for specific environments.
Collapse
Affiliation(s)
- Rafael T Resende
- Plant Breeding Sector, School of Agronomy (EA), Federal University of Goiás (UFG), Av. Esperança, s/n, Samambaia Campus, Goiânia, GO, 74690-900, Brazil
- TheCROP, A Precision Breeding Project, Av. Esperança, n° 1533, FUNAPE, Samambaia Technological Park, Samambaia Campus - UFG, Goiânia, GO, 74690-612, Brazil
| | - Alencar Xavier
- Corteva Agriscience, 8305 NW 62ndAve, Johnston, IA, 50131, USA
- Purdue University, 915 Mitch Daniels Blvd, West Lafayette, IN, 47907, USA
| | | | - Marcela P M Resende
- Plant Breeding Sector, School of Agronomy (EA), Federal University of Goiás (UFG), Av. Esperança, s/n, Samambaia Campus, Goiânia, GO, 74690-900, Brazil
| | - Diego Jarquin
- University of Florida, 1604 McCarty Drive G052B McCarty Hall D, Gainesville, FL, 32611, USA
| | - Gustavo E Marcatti
- TheCROP, A Precision Breeding Project, Av. Esperança, n° 1533, FUNAPE, Samambaia Technological Park, Samambaia Campus - UFG, Goiânia, GO, 74690-612, Brazil
- Forest Engineering Department, Federal University of São João del Rei (UFSJ), Sete Lagoas Campus, MG-424 Highway, Km 47, Sete Lagoas, MG, 35701-970, Brazil
| |
Collapse
|
2
|
Delattre M, Toda Y, Tressou J, Iwata H. Modeling soybean growth: A mixed model approach. PLoS Comput Biol 2024; 20:e1011258. [PMID: 38990979 PMCID: PMC11265664 DOI: 10.1371/journal.pcbi.1011258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/23/2024] [Accepted: 06/17/2024] [Indexed: 07/13/2024] Open
Abstract
The evaluation of plant and animal growth, separately for genetic and environmental effects, is necessary for genetic understanding and genetic improvement of environmental responses of plants and animals. We propose to extend an existing approach that combines nonlinear mixed-effects model (NLMEM) and the stochastic approximation of the Expectation-Maximization algorithm (SAEM) to analyze genetic and environmental effects on plant growth. These tools are widely used in many fields but very rarely in plant biology. During model formulation, a nonlinear function describes the shape of growth, and random effects describe genetic and environmental effects and their variability. Genetic relationships among the varieties were also integrated into the model using a genetic relationship matrix. The SAEM algorithm was chosen as an efficient alternative to MCMC methods, which are more commonly used in the domain. It was implemented to infer the expected growth patterns in the analyzed population and the expected curves for each variety through a maximum-likelihood and a maximum-a-posteriori approaches, respectively. The obtained estimates can be used to predict the growth curves for each variety. We illustrate the strengths of the proposed approach using simulated data and soybean plant growth data obtained from a soybean cultivation experiment conducted at the Arid Land Research Center, Tottori University. In this experiment, plant height was measured daily using drones, and the growth was monitored for approximately 200 soybean cultivars for which whole-genome sequence data were available. The NLMEM approach improved our understanding of the determinants of soybean growth and can be successfully used for the genomic prediction of growth pattern characteristics.
Collapse
Affiliation(s)
- Maud Delattre
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| | - Yusuke Toda
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Jessica Tressou
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
- Paris-Saclay University-AgroParisTech-INRAE, UMR MIA-Paris-Saclay, Palaiseau, France
| | - Hiroyoshi Iwata
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
3
|
Carvalho HF, Rio S, García-Abadillo J, Isidro Y Sánchez J. Revisiting superiority and stability metrics of cultivar performances using genomic data: derivations of new estimators. PLANT METHODS 2024; 20:85. [PMID: 38844940 PMCID: PMC11155189 DOI: 10.1186/s13007-024-01207-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 05/08/2024] [Indexed: 06/10/2024]
Abstract
The selection of highly productive genotypes with stable performance across environments is a major challenge of plant breeding programs due to genotype-by-environment (GE) interactions. Over the years, different metrics have been proposed that aim at characterizing the superiority and/or stability of genotype performance across environments. However, these metrics are traditionally estimated using phenotypic values only and are not well suited to an unbalanced design in which genotypes are not observed in all environments. The objective of this research was to propose and evaluate new estimators of the following GE metrics: Ecovalence, Environmental Variance, Finlay-Wilkinson regression coefficient, and Lin-Binns superiority measure. Drawing from a multi-environment genomic prediction model, we derived the best linear unbiased prediction for each GE metric. These derivations included both a squared expectation and a variance term. To assess the effectiveness of our new estimators, we conducted simulations that varied in traits and environment parameters. In our results, new estimators consistently outperformed traditional phenotype-based estimators in terms of accuracy. By incorporating a variance term into our new estimators, in addition to the squared expectation term, we were able to improve the precision of our estimates, particularly for Ecovalence in situations where heritability was low and/or sparseness was high. All methods are implemented in a new R-package: GEmetrics. These genomic-based estimators enable estimating GE metrics in unbalanced designs and predicting GE metrics for new genotypes, which should help improve the selection efficiency of high-performance and stable genotypes across environments.
Collapse
Affiliation(s)
- Humberto Fanelli Carvalho
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA)-Universidad Politécnica de Madrid (UPM)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain
| | - Simon Rio
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Julian García-Abadillo
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA)-Universidad Politécnica de Madrid (UPM)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain
| | - Julio Isidro Y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA)-Universidad Politécnica de Madrid (UPM)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain.
| |
Collapse
|
4
|
Montesinos-López OA, Herr AW, Crossa J, Montesinos-López A, Carter AH. Enhancing winter wheat prediction with genomics, phenomics and environmental data. BMC Genomics 2024; 25:544. [PMID: 38822262 PMCID: PMC11143639 DOI: 10.1186/s12864-024-10438-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 05/21/2024] [Indexed: 06/02/2024] Open
Abstract
In the realm of multi-environment prediction, when the goal is to predict a complete environment using the others as a training set, the efficiency of genomic selection (GS) falls short of expectations. Genotype by environment interaction poses a challenge in achieving high prediction accuracies. Consequently, current efforts are focused on enhancing efficiency by integrating various types of inputs, such as phenomics data, environmental information, and other omics data. In this study, we sought to evaluate the impact of incorporating environmental information into the modeling process, in addition to genomic and phenomics information. Our evaluation encompassed five data sets of soft white winter wheat, and the results revealed a significant improvement in prediction accuracy, as measured by the normalized root mean square error (NRMSE), through the integration of environmental information. Notably, there was an average gain in prediction accuracy of 49.19% in terms of NRMSE across the data sets. Moreover, the observed prediction accuracy ranged from 5.68% (data set 3) to 60.36% (data set 4), underscoring the substantial effect of integrating environmental information. By including genomic, phenomic, and environmental data in prediction models, plant breeding programs can improve selection efficiency across locations.
Collapse
Affiliation(s)
| | - Andrew W Herr
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, 99164, USA
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México- Veracruz, Edo. de México, CP 52640, México
- Universidad de Guadalajara, Montecillos, Edo. de México, CP 56230, México
| | | | - Arron H Carter
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, 99164, USA.
| |
Collapse
|
5
|
Montesinos-López OA, Crespo-Herrera L, Pierre CS, Cano-Paez B, Huerta-Prado GI, Mosqueda-González BA, Ramos-Pulido S, Gerard G, Alnowibet K, Fritsche-Neto R, Montesinos-López A, Crossa J. Feature engineering of environmental covariates improves plant genomic-enabled prediction. FRONTIERS IN PLANT SCIENCE 2024; 15:1349569. [PMID: 38812738 PMCID: PMC11135473 DOI: 10.3389/fpls.2024.1349569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 04/11/2024] [Indexed: 05/31/2024]
Abstract
Introduction Because Genomic selection (GS) is a predictive methodology, it needs to guarantee high-prediction accuracies for practical implementations. However, since many factors affect the prediction performance of this methodology, its practical implementation still needs to be improved in many breeding programs. For this reason, many strategies have been explored to improve the prediction performance of this methodology. Methods When environmental covariates are incorporated as inputs in the genomic prediction models, this information only sometimes helps increase prediction performance. For this reason, this investigation explores the use of feature engineering on the environmental covariates to enhance the prediction performance of genomic prediction models. Results and discussion We found that across data sets, feature engineering helps reduce prediction error regarding only the inclusion of the environmental covariates without feature engineering by 761.625% across predictors. These results are very promising regarding the potential of feature engineering to enhance prediction accuracy. However, since a significant gain in prediction accuracy was observed in only some data sets, further research is required to guarantee a robust feature engineering strategy to incorporate the environmental covariates.
Collapse
Affiliation(s)
| | | | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Edo. de Mexico, Mexico
| | - Bernabe Cano-Paez
- Facultad de Ciencias, Universidad Nacioanl Autónoma de México (UNAM), México City, Mexico
| | | | | | - Sofia Ramos-Pulido
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Guillermo Gerard
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Edo. de Mexico, Mexico
| | - Khalid Alnowibet
- Department of Statistics and Operations Research, King Saud University, Riyah, Saudi Arabia
| | | | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Edo. de Mexico, Mexico
- Louisiana State University, Baton Rouge, LA, United States
- Distinguished Scientist Fellowship Program, King Saud University, Riyah, Saudi Arabia
- Instituto de Socieconomia, Estadistica e Informatica, Colegio de Postgraduados, Montecillos, Edo. de México, Texcoco, Mexico
| |
Collapse
|
6
|
Peixoto MA, Leach KA, Jarquin D, Flannery P, Zystro J, Tracy WF, Bhering L, Resende MFR. Utilizing genomic prediction to boost hybrid performance in a sweet corn breeding program. FRONTIERS IN PLANT SCIENCE 2024; 15:1293307. [PMID: 38726298 PMCID: PMC11080654 DOI: 10.3389/fpls.2024.1293307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 03/26/2024] [Indexed: 05/12/2024]
Abstract
Sweet corn breeding programs, like field corn, focus on the development of elite inbred lines to produce commercial hybrids. For this reason, genomic selection models can help the in silico prediction of hybrid crosses from the elite lines, which is hypothesized to improve the test cross scheme, leading to higher genetic gain in a breeding program. This study aimed to explore the potential of implementing genomic selection in a sweet corn breeding program through hybrid prediction in a within-site across-year and across-site framework. A total of 506 hybrids were evaluated in six environments (California, Florida, and Wisconsin, in the years 2020 and 2021). A total of 20 traits from three different groups were measured (plant-, ear-, and flavor-related traits) across the six environments. Eight statistical models were considered for prediction, as the combination of two genomic prediction models (GBLUP and RKHS) with two different kernels (additive and additive + dominance), and in a single- and multi-trait framework. Also, three different cross-validation schemes were tested (CV1, CV0, and CV00). The different models were then compared based on the correlation between the estimated breeding values/total genetic values and phenotypic measurements. Overall, heritabilities and correlations varied among the traits. The models implemented showed good accuracies for trait prediction. The GBLUP implementation outperformed RKHS in all cross-validation schemes and models. Models with additive plus dominance kernels presented a slight improvement over the models with only additive kernels for some of the models examined. In addition, models for within-site across-year and across-site performed better in the CV0 than the CV00 scheme, on average. Hence, GBLUP should be considered as a standard model for sweet corn hybrid prediction. In addition, we found that the implementation of genomic prediction in a sweet corn breeding program presented reliable results, which can improve the testcross stage by identifying the top candidates that will reach advanced field-testing stages.
Collapse
Affiliation(s)
- Marco Antônio Peixoto
- Laboratório de Biometria, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
- Department of Horticultural Sciences, University of Florida, Gainesville, FL, United States
| | - Kristen A. Leach
- Department of Horticultural Sciences, University of Florida, Gainesville, FL, United States
| | - Diego Jarquin
- Department of Agronomy, University of Florida, Gainesville, FL, United States
| | - Patrick Flannery
- Department of Plant and Agroecosystem Sciences, University of Wisconsin-Madison, Madison, WI, United States
| | - Jared Zystro
- Organic Seed Alliance, Port Townsend, WA, United States
| | - William F. Tracy
- Department of Plant and Agroecosystem Sciences, University of Wisconsin-Madison, Madison, WI, United States
| | - Leonardo Bhering
- Laboratório de Biometria, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Márcio F. R. Resende
- Department of Horticultural Sciences, University of Florida, Gainesville, FL, United States
| |
Collapse
|
7
|
Araújo MS, Chaves SFS, Dias LAS, Ferreira FM, Pereira GR, Bezerra ARG, Alves RS, Heinemann AB, Breseghello F, Carneiro PCS, Krause MD, Costa-Neto G, Dias KOG. GIS-FA: an approach to integrating thematic maps, factor-analytic, and envirotyping for cultivar targeting. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:80. [PMID: 38472532 DOI: 10.1007/s00122-024-04579-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 02/06/2024] [Indexed: 03/14/2024]
Abstract
KEY MESSAGE We propose an "enviromics" prediction model for recommending cultivars based on thematic maps aimed at decision-makers. Parsimonious methods that capture genotype-by-environment interaction (GEI) in multi-environment trials (MET) are important in breeding programs. Understanding the causes and factors of GEI allows the utilization of genotype adaptations in the target population of environments through environmental features and factor-analytic (FA) models. Here, we present a novel predictive breeding approach called GIS-FA, which integrates geographic information systems (GIS) techniques, FA models, partial least squares (PLS) regression, and enviromics to predict phenotypic performance in untested environments. The GIS-FA approach enables: (i) the prediction of the phenotypic performance of tested genotypes in untested environments, (ii) the selection of the best-ranking genotypes based on their overall performance and stability using the FA selection tools, and (iii) the creation of thematic maps showing overall or pairwise performance and stability for decision-making. We exemplify the usage of the GIS-FA approach using two datasets of rice [Oryza sativa (L.)] and soybean [Glycine max (L.) Merr.] in MET spread over tropical areas. In summary, our novel predictive method allows the identification of new breeding scenarios by pinpointing groups of environments where genotypes demonstrate superior predicted performance. It also facilitates and optimizes cultivar recommendations by utilizing thematic maps.
Collapse
Affiliation(s)
- Maurício S Araújo
- Department of Agronomy, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | - Saulo F S Chaves
- Department of Agronomy, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | - Luiz A S Dias
- Department of Agronomy, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | - Filipe M Ferreira
- Department of Crop Science - College of Agricultural Sciences, São Paulo State University, Botucatu, São Paulo, Brazil
| | - Guilherme R Pereira
- Department of Agronomy, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | | | - Rodrigo S Alves
- Department of General Biology, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | - Alexandre B Heinemann
- Brazilian Agricultural Research Corporation (Embrapa Rice and Beans), Santo Antônio de Goiás, Goiás, Brazil
| | - Flávio Breseghello
- Brazilian Agricultural Research Corporation (Embrapa Rice and Beans), Santo Antônio de Goiás, Goiás, Brazil
| | - Pedro C S Carneiro
- Department of General Biology, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | | | | | - Kaio O G Dias
- Department of General Biology, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil.
| |
Collapse
|
8
|
Zhang Y, Zhang N, Chai X, Sun T. Machine learning for image-based multi-omics analysis of leaf veins. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:4928-4941. [PMID: 37410807 DOI: 10.1093/jxb/erad251] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 06/29/2023] [Indexed: 07/08/2023]
Abstract
Veins are a critical component of the plant growth and development system, playing an integral role in supporting and protecting leaves, as well as transporting water, nutrients, and photosynthetic products. A comprehensive understanding of the form and function of veins requires a dual approach that combines plant physiology with cutting-edge image recognition technology. The latest advancements in computer vision and machine learning have facilitated the creation of algorithms that can identify vein networks and explore their developmental progression. Here, we review the functional, environmental, and genetic factors associated with vein networks, along with the current status of research on image analysis. In addition, we discuss the methods of venous phenotype extraction and multi-omics association analysis using machine learning technology, which could provide a theoretical basis for improving crop productivity by optimizing the vein network architecture.
Collapse
Affiliation(s)
- Yubin Zhang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Ning Zhang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Xiujuan Chai
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Tan Sun
- Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing, China
- Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| |
Collapse
|
9
|
Messina CD, Gho C, Hammer GL, Tang T, Cooper M. Two decades of harnessing standing genetic variation for physiological traits to improve drought tolerance in maize. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:4847-4861. [PMID: 37354091 PMCID: PMC10474595 DOI: 10.1093/jxb/erad231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 06/15/2023] [Indexed: 06/26/2023]
Abstract
We review approaches to maize breeding for improved drought tolerance during flowering and grain filling in the central and western US corn belt and place our findings in the context of results from public breeding. Here we show that after two decades of dedicated breeding efforts, the rate of crop improvement under drought increased from 6.2 g m-2 year-1 to 7.5 g m-2 year-1, closing the genetic gain gap with respect to the 8.6 g m-2 year-1 observed under water-sufficient conditions. The improvement relative to the long-term genetic gain was possible by harnessing favourable alleles for physiological traits available in the reference population of genotypes. Experimentation in managed stress environments that maximized the genetic correlation with target environments was key for breeders to identify and select for these alleles. We also show that the embedding of physiological understanding within genomic selection methods via crop growth models can hasten genetic gain under drought. We estimate a prediction accuracy differential (Δr) above current prediction approaches of ~30% (Δr=0.11, r=0.38), which increases with increasing complexity of the trait environment system as estimated by Shannon information theory. We propose this framework to inform breeding strategies for drought stress across geographies and crops.
Collapse
Affiliation(s)
- Carlos D Messina
- Horticultural Sciences Department, University of Florida, Gainesville, FL, USA
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, The University of Queensland, Brisbane, Qld 4072, Australia
| | - Carla Gho
- School of Agriculture & Food Sciences, The University of Queensland, Brisbane, Qld 4072, Australia
| | - Graeme L Hammer
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, The University of Queensland, Brisbane, Qld 4072, Australia
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, Qld 4072, Australia
| | - Tom Tang
- Corteva Agrisciences, Johnston, IA, USA
| | - Mark Cooper
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, The University of Queensland, Brisbane, Qld 4072, Australia
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, Qld 4072, Australia
| |
Collapse
|
10
|
Mora-Poblete F, Maldonado C, Henrique L, Uhdre R, Scapim CA, Mangolim CA. Multi-trait and multi-environment genomic prediction for flowering traits in maize: a deep learning approach. FRONTIERS IN PLANT SCIENCE 2023; 14:1153040. [PMID: 37593046 PMCID: PMC10428628 DOI: 10.3389/fpls.2023.1153040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 07/12/2023] [Indexed: 08/19/2023]
Abstract
Maize (Zea mays L.), the third most widely cultivated cereal crop in the world, plays a critical role in global food security. To improve the efficiency of selecting superior genotypes in breeding programs, researchers have aimed to identify key genomic regions that impact agronomic traits. In this study, the performance of multi-trait, multi-environment deep learning models was compared to that of Bayesian models (Markov Chain Monte Carlo generalized linear mixed models (MCMCglmm), Bayesian Genomic Genotype-Environment Interaction (BGGE), and Bayesian Multi-Trait and Multi-Environment (BMTME)) in terms of the prediction accuracy of flowering-related traits (Anthesis-Silking Interval: ASI, Female Flowering: FF, and Male Flowering: MF). A tropical maize panel of 258 inbred lines from Brazil was evaluated in three sites (Cambira-2018, Sabaudia-2018, and Iguatemi-2020 and 2021) using approximately 290,000 single nucleotide polymorphisms (SNPs). The results demonstrated a 14.4% increase in prediction accuracy when employing multi-trait models compared to the use of a single trait in a single environment approach. The accuracy of predictions also improved by 6.4% when using a single trait in a multi-environment scheme compared to using multi-trait analysis. Additionally, deep learning models consistently outperformed Bayesian models in both single and multiple trait and environment approaches. A complementary genome-wide association study identified associations with 26 candidate genes related to flowering time traits, and 31 marker-trait associations were identified, accounting for 37%, 37%, and 22% of the phenotypic variation of ASI, FF and MF, respectively. In conclusion, our findings suggest that deep learning models have the potential to significantly improve the accuracy of predictions, regardless of the approach used and provide support for the efficacy of this method in genomic selection for flowering-related traits in tropical maize.
Collapse
Affiliation(s)
| | - Carlos Maldonado
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
| | - Luma Henrique
- Department of Agronomy, State University of Maringá, Paraná, Brazil
| | - Renan Uhdre
- Department of Agronomy, State University of Maringá, Paraná, Brazil
| | | | | |
Collapse
|
11
|
Montesinos-López OA, Crespo-Herrera L, Saint Pierre C, Bentley AR, de la Rosa-Santamaria R, Ascencio-Laguna JA, Agbona A, Gerard GS, Montesinos-López A, Crossa J. Do feature selection methods for selecting environmental covariables enhance genomic prediction accuracy? Front Genet 2023; 14:1209275. [PMID: 37554404 PMCID: PMC10405933 DOI: 10.3389/fgene.2023.1209275] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/03/2023] [Indexed: 08/10/2023] Open
Abstract
Genomic selection (GS) is transforming plant and animal breeding, but its practical implementation for complex traits and multi-environmental trials remains challenging. To address this issue, this study investigates the integration of environmental information with genotypic information in GS. The study proposes the use of two feature selection methods (Pearson's correlation and Boruta) for the integration of environmental information. Results indicate that the simple incorporation of environmental covariates may increase or decrease prediction accuracy depending on the case. However, optimal incorporation of environmental covariates using feature selection significantly improves prediction accuracy in four out of six datasets between 14.25% and 218.71% under a leave one environment out cross validation scenario in terms of Normalized Root Mean Squared Error, but not relevant gain was observed in terms of Pearson´s correlation. In two datasets where environmental covariates are unrelated to the response variable, feature selection is unable to enhance prediction accuracy. Therefore, the study provides empirical evidence supporting the use of feature selection to improve the prediction power of GS.
Collapse
Affiliation(s)
| | | | | | - Alison R. Bentley
- International Maize and Wheat Improvement Center (CIMMYT), El Battan, Mexico
| | | | | | - Afolabi Agbona
- International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
- Molecular & Environmental Plant Sciences, Texas A&M University, College Station, TX, United States
| | - Guillermo S. Gerard
- International Maize and Wheat Improvement Center (CIMMYT), El Battan, Mexico
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, JA, Mexico
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), El Battan, Mexico
- Colegio de Postgraduados, Campus Montecillos, Montecillos, Mexico
| |
Collapse
|
12
|
Li Z, Gutierrez L. Editorial: Statistical methods for analyzing multiple environmental quantitative genomic data. Front Genet 2023; 14:1212804. [PMID: 37404327 PMCID: PMC10316013 DOI: 10.3389/fgene.2023.1212804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 06/09/2023] [Indexed: 07/06/2023] Open
Affiliation(s)
- Zitong Li
- CSIRO Agriculture and Food, Canberra, ACT, Australia
| | - Lucia Gutierrez
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|
13
|
Montesinos-López A, Rivera C, Pinto F, Piñera F, Gonzalez D, Reynolds M, Pérez-Rodríguez P, Li H, Montesinos-López OA, Crossa J. Multimodal deep learning methods enhance genomic prediction of wheat breeding. G3 (BETHESDA, MD.) 2023; 13:jkad045. [PMID: 36869747 PMCID: PMC10151399 DOI: 10.1093/g3journal/jkad045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 02/21/2023] [Accepted: 02/22/2023] [Indexed: 03/05/2023]
Abstract
While several statistical machine learning methods have been developed and studied for assessing the genomic prediction (GP) accuracy of unobserved phenotypes in plant breeding research, few methods have linked genomics and phenomics (imaging). Deep learning (DL) neural networks have been developed to increase the GP accuracy of unobserved phenotypes while simultaneously accounting for the complexity of genotype-environment interaction (GE); however, unlike conventional GP models, DL has not been investigated for when genomics is linked with phenomics. In this study we used 2 wheat data sets (DS1 and DS2) to compare a novel DL method with conventional GP models. Models fitted for DS1 were GBLUP, gradient boosting machine (GBM), support vector regression (SVR) and the DL method. Results indicated that for 1 year, DL provided better GP accuracy than results obtained by the other models. However, GP accuracy obtained for other years indicated that the GBLUP model was slightly superior to the DL. DS2 is comprised only of genomic data from wheat lines tested for 3 years, 2 environments (drought and irrigated) and 2-4 traits. DS2 results showed that when predicting the irrigated environment with the drought environment, DL had higher accuracy than the GBLUP model in all analyzed traits and years. When predicting drought environment with information on the irrigated environment, the DL model and GBLUP model had similar accuracy. The DL method used in this study is novel and presents a strong degree of generalization as several modules can potentially be incorporated and concatenated to produce an output for a multi-input data structure.
Collapse
Affiliation(s)
- Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, Mexico
| | - Carolina Rivera
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Francisco Pinto
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Francisco Piñera
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - David Gonzalez
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Mathew Reynolds
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | | | - Huihui Li
- Institute of Crop Sciences, The National Key Facility for Crop Gene Resources and Genetic Improvement and CIMMYT China office, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | | | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
- Colegio de Postgraduados, Montecillos, Edo. de México, CP 56230, Mexico
| |
Collapse
|
14
|
Montesinos-López OA, Herr AW, Crossa J, Carter AH. Genomics combined with UAS data enhances prediction of grain yield in winter wheat. Front Genet 2023; 14:1124218. [PMID: 37065497 PMCID: PMC10090417 DOI: 10.3389/fgene.2023.1124218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 03/17/2023] [Indexed: 03/31/2023] Open
Abstract
With the human population continuing to increase worldwide, there is pressure to employ novel technologies to increase genetic gain in plant breeding programs that contribute to nutrition and food security. Genomic selection (GS) has the potential to increase genetic gain because it can accelerate the breeding cycle, increase the accuracy of estimated breeding values, and improve selection accuracy. However, with recent advances in high throughput phenotyping in plant breeding programs, the opportunity to integrate genomic and phenotypic data to increase prediction accuracy is present. In this paper, we applied GS to winter wheat data integrating two types of inputs: genomic and phenotypic. We observed the best accuracy of grain yield when combining both genomic and phenotypic inputs, while only using genomic information fared poorly. In general, the predictions with only phenotypic information were very competitive to using both sources of information, and in many cases using only phenotypic information provided the best accuracy. Our results are encouraging because it is clear we can enhance the prediction accuracy of GS by integrating high quality phenotypic inputs in the models.
Collapse
Affiliation(s)
| | - Andrew W. Herr
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Edo. de México, México
- Colegio de Postgraduados, Montecillos, Edo. de México, México
| | - Arron H. Carter
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
- *Correspondence: Arron H. Carter,
| |
Collapse
|
15
|
Fradgley NS, Bacon J, Bentley AR, Costa‐Neto G, Cottrell A, Crossa J, Cuevas J, Kerton M, Pope E, Swarbreck SM, Gardner KA. Prediction of near-term climate change impacts on UK wheat quality and the potential for adaptation through plant breeding. GLOBAL CHANGE BIOLOGY 2023; 29:1296-1313. [PMID: 36482280 PMCID: PMC10108302 DOI: 10.1111/gcb.16552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 11/17/2022] [Accepted: 11/29/2022] [Indexed: 05/26/2023]
Abstract
Wheat is a major crop worldwide, mainly cultivated for human consumption and animal feed. Grain quality is paramount in determining its value and downstream use. While we know that climate change threatens global crop yields, a better understanding of impacts on wheat end-use quality is also critical. Combining quantitative genetics with climate model outputs, we investigated UK-wide trends in genotypic adaptation for wheat quality traits. In our approach, we augmented genomic prediction models with environmental characterisation of field trials to predict trait values and climate effects in historical field trial data between 2001 and 2020. Addition of environmental covariates, such as temperature and rainfall, successfully enabled prediction of genotype by environment interactions (G × E), and increased prediction accuracy of most traits for new genotypes in new year cross validation. We then extended predictions from these models to much larger numbers of simulated environments using climate scenarios projected under Representative Concentration Pathways 8.5 for 2050-2069. We found geographically varying climate change impacts on wheat quality due to contrasting associations between specific weather covariables and quality traits across the UK. Notably, negative impacts on quality traits were predicted in the East of the UK due to increased summer temperatures while the climate in the North and South-west may become more favourable with increased summer temperatures. Furthermore, by projecting 167,040 simulated future genotype-environment combinations, we found only limited potential for breeding to exploit predictable G × E to mitigate year-to-year environmental variability for most traits except Hagberg falling number. This suggests low adaptability of current UK wheat germplasm across future UK climates. More generally, approaches demonstrated here will be critical to enable adaptation of global crops to near-term climate change.
Collapse
Affiliation(s)
| | | | - Alison R. Bentley
- NIABCambridgeUK
- International Maize and Wheat Improvement Center (CIMMYT)Carretera México‐VeracruzMexico
| | | | | | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT)Carretera México‐VeracruzMexico
| | - Jaime Cuevas
- Universidad Autonoma del Estado de Quintana RooChetumalQuintana RooMexico
| | | | | | | | - Keith A. Gardner
- NIABCambridgeUK
- International Maize and Wheat Improvement Center (CIMMYT)Carretera México‐VeracruzMexico
| |
Collapse
|
16
|
Nguyen VH, Morantte RIZ, Lopena V, Verdeprado H, Murori R, Ndayiragije A, Katiyar SK, Islam MR, Juma RU, Flandez-Galvez H, Glaszmann JC, Cobb JN, Bartholomé J. Multi-environment Genomic Selection in Rice Elite Breeding Lines. RICE (NEW YORK, N.Y.) 2023; 16:7. [PMID: 36752880 PMCID: PMC9908796 DOI: 10.1186/s12284-023-00623-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND Assessing the performance of elite lines in target environments is essential for breeding programs to select the most relevant genotypes. One of the main complexities in this task resides in accounting for the genotype by environment interactions. Genomic prediction models that integrate information from multi-environment trials and environmental covariates can be efficient tools in this context. The objective of this study was to assess the predictive ability of different genomic prediction models to optimize the use of multi-environment information. We used 111 elite breeding lines representing the diversity of the international rice research institute breeding program for irrigated ecosystems. The lines were evaluated for three traits (days to flowering, plant height, and grain yield) in 15 environments in Asia and Africa and genotyped with 882 SNP markers. We evaluated the efficiency of genomic prediction to predict untested environments using seven multi-environment models and three cross-validation scenarios. RESULTS The elite lines were found to belong to the indica group and more specifically the indica-1B subgroup which gathered improved material originating from the Green Revolution. Phenotypic correlations between environments were high for days to flowering and plant height (33% and 54% of pairwise correlation greater than 0.5) but low for grain yield (lower than 0.2 in most cases). Clustering analyses based on environmental covariates separated Asia's and Africa's environments into different clusters or subclusters. The predictive abilities ranged from 0.06 to 0.79 for days to flowering, 0.25-0.88 for plant height, and - 0.29-0.62 for grain yield. We found that models integrating genotype-by-environment interaction effects did not perform significantly better than models integrating only main effects (genotypes and environment or environmental covariates). The different cross-validation scenarios showed that, in most cases, the use of all available environments gave better results than a subset. CONCLUSION Multi-environment genomic prediction models with main effects were sufficient for accurate phenotypic prediction of elite lines in targeted environments. These results will help refine the testing strategy to update the genomic prediction models to improve predictive ability.
Collapse
Affiliation(s)
- Van Hieu Nguyen
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
- Institute of Crop Science, College of Agriculture and Food Science, University of the Philippines, Los Baños, Laguna, Philippines
| | - Rose Imee Zhella Morantte
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Vitaliano Lopena
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Holden Verdeprado
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Rosemary Murori
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Alexis Ndayiragije
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Sanjay Kumar Katiyar
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Md Rafiqul Islam
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Roselyne Uside Juma
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Hayde Flandez-Galvez
- Institute of Crop Science, College of Agriculture and Food Science, University of the Philippines, Los Baños, Laguna, Philippines
| | - Jean-Christophe Glaszmann
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Joshua N Cobb
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
- RiceTec. Inc, PO Box 1305, Alvin, TX, 77512, USA
| | - Jérôme Bartholomé
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France.
- CIRAD, UMR AGAP Institut, Cali, Colombia.
- Alliance Bioversity-CIAT, Cali, Colombia.
| |
Collapse
|
17
|
Gevartosky R, Carvalho HF, Costa-Neto G, Montesinos-López OA, Crossa J, Fritsche-Neto R. Enviromic-based kernels may optimize resource allocation with multi-trait multi-environment genomic prediction for tropical Maize. BMC PLANT BIOLOGY 2023; 23:10. [PMID: 36604618 PMCID: PMC9814176 DOI: 10.1186/s12870-022-03975-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND Success in any genomic prediction platform is directly dependent on establishing a representative training set. This is a complex task, even in single-trait single-environment conditions and tends to be even more intricated wherein additional information from envirotyping and correlated traits are considered. Here, we aimed to design optimized training sets focused on genomic prediction, considering multi-trait multi-environment trials, and how those methods may increase accuracy reducing phenotyping costs. For that, we considered single-trait multi-environment trials and multi-trait multi-environment trials for three traits: grain yield, plant height, and ear height, two datasets, and two cross-validation schemes. Next, two strategies for designing optimized training sets were conceived, first considering only the genomic by environment by trait interaction (GET), while a second including large-scale environmental data (W, enviromics) as genomic by enviromic by trait interaction (GWT). The effective number of individuals (genotypes × environments × traits) was assumed as those that represent at least 98% of each kernel (GET or GWT) variation, in which those individuals were then selected by a genetic algorithm based on prediction error variance criteria to compose an optimized training set for genomic prediction purposes. RESULTS The combined use of genomic and enviromic data efficiently designs optimized training sets for genomic prediction, improving the response to selection per dollar invested by up to 145% when compared to the model without enviromic data, and even more when compared to cross validation scheme with 70% of training set or pure phenotypic selection. Prediction models that include G × E or enviromic data + G × E yielded better prediction ability. CONCLUSIONS Our findings indicate that a genomic by enviromic by trait interaction kernel associated with genetic algorithms is efficient and can be proposed as a promising approach to designing optimized training sets for genomic prediction when the variance-covariance matrix of traits is available. Additionally, great improvements in the genetic gains per dollar invested were observed, suggesting that a good allocation of resources can be deployed by using the proposed approach.
Collapse
Affiliation(s)
- Raysa Gevartosky
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil.
| | - Humberto Fanelli Carvalho
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Germano Costa-Neto
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
- Institute for Genomics Diversity, Cornell University, Ithaca, NY, USA
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texcoco, Edo. de México, Mexico
- Colegio de Postgraduados, CP 56230, Montecillos, Edo. de México, Mexico
| | - Roberto Fritsche-Neto
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
| |
Collapse
|
18
|
Nishio M, Inoue K, Arakawa A, Ichinoseki K, Kobayashi E, Okamura T, Fukuzawa Y, Ogawa S, Taniguchi M, Oe M, Takeda M, Kamata T, Konno M, Takagi M, Sekiya M, Matsuzawa T, Inoue Y, Watanabe A, Kobayashi H, Shibata E, Ohtani A, Yazaki R, Nakashima R, Ishii K. Application of linear and machine learning models to genomic prediction of fatty acid composition in Japanese Black cattle. Anim Sci J 2023; 94:e13883. [PMID: 37909231 DOI: 10.1111/asj.13883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 08/29/2023] [Accepted: 09/15/2023] [Indexed: 11/02/2023]
Abstract
We collected 3180 records of oleic acid (C18:1) and monounsaturated fatty acid (MUFA) measured using gas chromatography (GC) and 6960 records of C18:1 and MUFA measured using near-infrared spectroscopy (NIRS) in intermuscular fat samples of Japanese Black cattle. We compared genomic prediction performance for four linear models (genomic best linear unbiased prediction [GBLUP], kinship-adjusted multiple loci [KAML], BayesC, and BayesLASSO) and five machine learning models (Gaussian kernel [GK], deep kernel [DK], random forest [RF], extreme gradient boost [XGB], and convolutional neural network [CNN]). For GC-based C18:1 and MUFA, KAML showed the highest accuracies, followed by BayesC, XGB, DK, GK, and BayesLASSO, with more than 6% gain of accuracy by KAML over GBLUP. Meanwhile, DK had the highest prediction accuracy for NIRS-based C18:1 and MUFA, but the difference in accuracies between DK and KAML was slight. For all traits, accuracies of RF and CNN were lower than those of GBLUP. The KAML extends GBLUP methods, of which marker effects are weighted, and involves only additive genetic effects; whereas machine learning methods capture non-additive genetic effects. Thus, KAML is the most suitable method for breeding of fatty acid composition in Japanese Black cattle.
Collapse
Affiliation(s)
- Motohide Nishio
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | - Keiichi Inoue
- National Livestock Breeding Center, Fukushima, Japan
- University of Miyazaki, Miyazaki, Japan
| | - Aisaku Arakawa
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | | | - Eiji Kobayashi
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | | | - Yo Fukuzawa
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | - Shinichiro Ogawa
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | | | - Mika Oe
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | | | - Takehiro Kamata
- Aomori Prefectural Industrial Technology Research Center, Tsugaru, Japan
| | - Masaru Konno
- Iwate Agricultural Research Center Animal Industry Research Institute, Takizawa, Japan
| | - Michihiro Takagi
- Miyagi Prefecture Animal Industry Experiment Station, Osaki, Japan
| | - Mario Sekiya
- Akita Prefectural Livestock Experiment Station, Daisen, Japan
| | - Tamotsu Matsuzawa
- Livestock Research Centre, Fukushima Agricultural Technology Centre, Fukushima, Japan
| | - Yoshinobu Inoue
- Tottori Prefectural Livestock Research Center, Tottori, Japan
| | | | - Hiroshi Kobayashi
- Institute of Animal Production Okayama Prefectural Technology Center for Agriculture, Forestry and Fisheries, Misaki, Japan
| | - Eri Shibata
- Hiroshima Prefectural Technology Research Institute, Livestock Technology Research Center, Shobara, Japan
| | - Akihumi Ohtani
- Yamaguchi Prefectural Agriculture and Forestry General Technology Center, Mine, Japan
| | - Ryu Yazaki
- Oita Prefectural Agriculture, Forestry, and Fisheries Research Center, Takeda, Japan
| | - Ryotaro Nakashima
- Cattle Breeding Development Institute of Kagoshima Prefecture, Soo, Japan
| | - Kazuo Ishii
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| |
Collapse
|
19
|
Costa-Neto G, Crespo-Herrera L, Fradgley N, Gardner K, Bentley AR, Dreisigacker S, Fritsche-Neto R, Montesinos-López OA, Crossa J. Envirome-wide associations enhance multi-year genome-based prediction of historical wheat breeding data. G3 (BETHESDA, MD.) 2022; 13:6861853. [PMID: 36454213 PMCID: PMC9911085 DOI: 10.1093/g3journal/jkac313] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/02/2022] [Accepted: 11/03/2022] [Indexed: 12/03/2022]
Abstract
Linking high-throughput environmental data (enviromics) to genomic prediction (GP) is a cost-effective strategy for increasing selection intensity under genotype-by-environment interactions (G × E). This study developed a data-driven approach based on Environment-Phenotype Association (EPA) aimed at recycling important G × E information from historical breeding data. EPA was developed in two applications: (1) scanning a secondary source of genetic variation, weighted from the shared reaction-norms of past-evaluated genotypes and (2) pinpointing weights of the similarity among trial-sites (locations), given the historical impact of each envirotyping data variable for a given site. These results were then used as a dimensionality reduction strategy, integrating historical data to feed multi-environment GP models, which led to the development of four new G × E kernels considering genomics, enviromics, and EPA outcomes. The wheat trial data used included 36 locations, 8 years, and three target populations of environments (TPEs) in India. Four prediction scenarios and six kernel models within/across TPEs were tested. Our results suggest that the conventional GBLUP, without enviromic data or when omitting EPA, is inefficient in predicting the performance of wheat lines in future years. Nevertheless, when EPA was introduced as an intermediary learning step to reduce the dimensionality of the G × E kernels while connecting phenotypic and environmental-wide variation, a significant enhancement of G × E prediction accuracy was evident. EPA revealed that the effect of seasonality makes strategies such as "covariable selection" unfeasible because G × E is year-germplasm specific. We propose that the EPA effectively serves as a "reinforcement learner" algorithm capable of uncovering the effect of seasonality over the reaction-norms, with the benefits of better forecasting the similarities between past and future trialing sites. EPA combines the benefits of dimensionality reduction while reducing the uncertainty of genotype-by-year predictions and increasing the resolution of GP for the genotype-specific level.
Collapse
Affiliation(s)
- Germano Costa-Neto
- Institute for Genomics Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Leonardo Crespo-Herrera
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Nick Fradgley
- NIAB, 93 Lawrence Weaver Road, Cambridge CB3 0LE, UK
| | - Keith Gardner
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Alison R Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | | | - Osval A Montesinos-López
- Corresponding authors: Facultad de Telematica, Universidad de Colima, Mexico. ; and International Maize and Wheat Improvement Center (CIMMYT) and Colegio de Post-Graduados, Mexico.
| | - Jose Crossa
- Corresponding authors: Facultad de Telematica, Universidad de Colima, Mexico. ; and International Maize and Wheat Improvement Center (CIMMYT) and Colegio de Post-Graduados, Mexico.
| |
Collapse
|
20
|
Yue H, Olivoto T, Bu J, Li J, Wei J, Xie J, Chen S, Peng H, Nardino M, Jiang X. Multi-trait selection for mean performance and stability of maize hybrids in mega-environments delineated using envirotyping techniques. FRONTIERS IN PLANT SCIENCE 2022; 13:1030521. [PMID: 36452111 PMCID: PMC9702090 DOI: 10.3389/fpls.2022.1030521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 10/26/2022] [Indexed: 06/17/2023]
Abstract
Under global climate changes, understanding climate variables that are most associated with environmental kinships can contribute to improving the success of hybrid selection, mainly in environments with high climate variations. The main goal of this study is to integrate envirotyping techniques and multi-trait selection for mean performance and the stability of maize genotypes growing in the Huanghuaihai plain in China. A panel of 26 maize hybrids growing in 10 locations in two crop seasons was evaluated for 9 traits. Considering 20 years of climate information and 19 environmental covariables, we identified four mega-environments (ME) in the Huanghuaihai plain which grouped locations that share similar long-term weather patterns. All the studied traits were significantly affected by the genotype × mega-environment × year interaction, suggesting that evaluating maize stability using single-year, multi-environment trials may provide misleading recommendations. Counterintuitively, the highest yields were not observed in the locations with higher accumulated rainfall, leading to the hypothesis that lower vapor pressure deficit, minimum temperatures, and high relative humidity are climate variables that -under no water restriction- reduce plant transpiration and consequently the yield. Utilizing the multi-trait mean performance and stability index (MTMPS) prominent hybrids with satisfactory mean performance and stability across cultivation years were identified. G23 and G25 were selected within three out of the four mega-environments, being considered the most stable and widely adapted hybrids from the panel. The G5 showed satisfactory yield and stability across contrasting years in the drier, warmer, and with higher vapor pressure deficit mega-environment, which included locations in the Hubei province. Overall, this study opens the door to a more systematic and dynamic characterization of the environment to better understand the genotype-by-environment interaction in multi-environment trials.
Collapse
Affiliation(s)
- Haiwang Yue
- Hebei Provincial Key Laboratory of Crops Drought Resistance Research, Dryland Farming Institute, Hebei Academy of Agriculture and Forestry Sciences, Hengshui, China
| | - Tiago Olivoto
- Department of Plant Science, Center of Agrarian Sciences, Federal University of Santa Catarina, Florianópolis, SC, Brazil
| | - Junzhou Bu
- Hebei Provincial Key Laboratory of Crops Drought Resistance Research, Dryland Farming Institute, Hebei Academy of Agriculture and Forestry Sciences, Hengshui, China
| | - Jie Li
- Hebei Provincial Key Laboratory of Crops Drought Resistance Research, Dryland Farming Institute, Hebei Academy of Agriculture and Forestry Sciences, Hengshui, China
| | - Jianwei Wei
- Hebei Provincial Key Laboratory of Crops Drought Resistance Research, Dryland Farming Institute, Hebei Academy of Agriculture and Forestry Sciences, Hengshui, China
| | - Junliang Xie
- Hebei Provincial Key Laboratory of Crops Drought Resistance Research, Dryland Farming Institute, Hebei Academy of Agriculture and Forestry Sciences, Hengshui, China
| | - Shuping Chen
- Hebei Provincial Key Laboratory of Crops Drought Resistance Research, Dryland Farming Institute, Hebei Academy of Agriculture and Forestry Sciences, Hengshui, China
| | - Haicheng Peng
- Hebei Provincial Key Laboratory of Crops Drought Resistance Research, Dryland Farming Institute, Hebei Academy of Agriculture and Forestry Sciences, Hengshui, China
| | - Maicon Nardino
- Department of Agronomy, Federal University of Viçosa, Viçosa, MG, Brazil
| | - Xuwen Jiang
- Maize Research Institute, Qingdao Agricultural University, Qingdao, China
| |
Collapse
|
21
|
Ma J, Cao Y, Wang Y, Ding Y. Development of the maize 5.5K loci panel for genomic prediction through genotyping by target sequencing. FRONTIERS IN PLANT SCIENCE 2022; 13:972791. [PMID: 36438102 PMCID: PMC9691890 DOI: 10.3389/fpls.2022.972791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 10/24/2022] [Indexed: 06/16/2023]
Abstract
Genotyping platforms are important for genetic research and molecular breeding. In this study, a low-density genotyping platform containing 5.5K SNP markers was successfully developed in maize using genotyping by target sequencing (GBTS) technology with capture-in-solution. Two maize populations (Pop1 and Pop2) were used to validate the GBTS panel for genetic and molecular breeding studies. Pop1 comprised 942 hybrids derived from 250 inbred lines and four testers, and Pop2 contained 540 hybrids which were generated from 123 new-developed inbred lines and eight testers. The genetic analyses showed that the average polymorphic information content and genetic diversity values ranged from 0.27 to 0.38 in both populations using all filtered genotyping data. The mean missing rate was 1.23% across populations. The Structure and UPGMA tree analyses revealed similar genetic divergences (76-89%) in both populations. Genomic prediction analyses showed that the prediction accuracy of reproducing kernel Hilbert space (RKHS) was slightly lower than that of genomic best linear unbiased prediction (GBLUP) and three Bayesian methods for general combining ability of grain yield per plant and three yield-related traits in both populations, whereas RKHS with additive effects showed superior advantages over the other four methods in Pop1. In Pop1, the GBLUP and three Bayesian methods with additive-dominance model improved the prediction accuracies by 4.89-134.52% for the four traits in comparison to the additive model. In Pop2, the inclusion of dominance did not improve the accuracy in most cases. In general, low accuracies (0.33-0.43) were achieved for general combing ability of the four traits in Pop1, whereas moderate-to-high accuracies (0.52-0.65) were observed in Pop2. For hybrid performance prediction, the accuracies were moderate to high (0.51-0.75) for the four traits in both populations using the additive-dominance model. This study suggests a reliable genotyping platform that can be implemented in genomic selection-assisted breeding to accelerate maize new cultivar development and improvement.
Collapse
|
22
|
Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022; 15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]
Abstract
The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.
Collapse
Affiliation(s)
- Yunbi Xu
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; CIMMYT-China Tropical Maize Research Center, School of Food Science and Engineering, Foshan University, Foshan, Guangdong 528231, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China.
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China
| | - Huihui Li
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan 572024, China
| | - Hongjian Zheng
- CIMMYT-China Specialty Maize Research Center, Shanghai Academy of Agricultural Sciences, Shanghai 201400, China
| | - Jianan Zhang
- MolBreeding Biotechnology Co., Ltd., Shijiazhuang, Hebei 050035, China
| | - Michael S Olsen
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Rajeev K Varshney
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Australia
| | - Boddupalli M Prasanna
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Qian Qian
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
23
|
Westhues CC, Simianer H, Beissinger TM. learnMET: an R package to apply machine learning methods for genomic prediction using multi-environment trial data. G3 GENES|GENOMES|GENETICS 2022; 12:6705235. [PMID: 36124944 PMCID: PMC9635651 DOI: 10.1093/g3journal/jkac226] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 07/29/2022] [Indexed: 12/04/2022]
Abstract
We introduce the R-package learnMET, developed as a flexible framework to enable a collection of analyses on multi-environment trial breeding data with machine learning-based models. learnMET allows the combination of genomic information with environmental data such as climate and/or soil characteristics. Notably, the package offers the possibility of incorporating weather data from field weather stations, or to retrieve global meteorological datasets from a NASA database. Daily weather data can be aggregated over specific periods of time based on naive (for instance, nonoverlapping 10-day windows) or phenological approaches. Different machine learning methods for genomic prediction are implemented, including gradient-boosted decision trees, random forests, stacked ensemble models, and multilayer perceptrons. These prediction models can be evaluated via a collection of cross-validation schemes that mimic typical scenarios encountered by plant breeders working with multi-environment trial experimental data in a user-friendly way. The package is published under an MIT license and accessible on GitHub.
Collapse
Affiliation(s)
- Cathy C Westhues
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen , 37075 Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen , 37075 Goettingen, Germany
| | - Henner Simianer
- Center for Integrated Breeding Research, University of Goettingen , 37075 Goettingen, Germany
- Animal Breeding and Genetics Group, Department of Animal Sciences, University of Gottingen , 37075 Gottingen, Germany
| | - Timothy M Beissinger
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen , 37075 Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen , 37075 Goettingen, Germany
| |
Collapse
|
24
|
Montesinos-López OA, Montesinos-López A, Cano-Paez B, Hernández-Suárez CM, Santana-Mancilla PC, Crossa J. A Comparison of Three Machine Learning Methods for Multivariate Genomic Prediction Using the Sparse Kernels Method (SKM) Library. Genes (Basel) 2022; 13:genes13081494. [PMID: 36011405 PMCID: PMC9407886 DOI: 10.3390/genes13081494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/10/2022] [Accepted: 08/19/2022] [Indexed: 11/30/2022] Open
Abstract
Genomic selection (GS) changed the way plant breeders select genotypes. GS takes advantage of phenotypic and genotypic information to training a statistical machine learning model, which is used to predict phenotypic (or breeding) values of new lines for which only genotypic information is available. Therefore, many statistical machine learning methods have been proposed for this task. Multi-trait (MT) genomic prediction models take advantage of correlated traits to improve prediction accuracy. Therefore, some multivariate statistical machine learning methods are popular for GS. In this paper, we compare the prediction performance of three MT methods: the MT genomic best linear unbiased predictor (GBLUP), the MT partial least squares (PLS) and the multi-trait random forest (RF) methods. Benchmarking was performed with six real datasets. We found that the three investigated methods produce similar results, but under predictors with genotype (G) and environment (E), that is, E + G, the MT GBLUP achieved superior performance, whereas under predictors E + G + genotype × environment (GE) and G + GE, random forest achieved the best results. We also found that the best predictions were achieved under the predictors E + G and E + G + GE. Here, we also provide the R code for the implementation of these three statistical machine learning methods in the sparse kernel method (SKM) library, which offers not only options for single-trait prediction with various statistical machine learning methods but also some options for MT predictions that can help to capture improved complex patterns in datasets that are common in genomic selection.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44100, Mexico
- Correspondence: (A.M.-L.); (J.C.)
| | - Bernabe Cano-Paez
- Facultad de Ciencias, Universidad Nacional Autónoma de México (UNAM), México City 04510, Mexico
| | - Carlos Moisés Hernández-Suárez
- Instituto de Ciencias Tecnología e Innovación, Universidad Francisco Gavidia, El Progreso St., No. 2748, Colonia Flor Blanca, San Salvador CP 1101, El Salvador
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco 56237, Mexico
- Colegio de Postgraduados, Montecillo 56230, Mexico
- Correspondence: (A.M.-L.); (J.C.)
| |
Collapse
|
25
|
Montesinos-López OA, Montesinos-López A, Kismiantini, Roman-Gallardo A, Gardner K, Lillemo M, Fritsche-Neto R, Crossa J. Partial Least Squares Enhances Genomic Prediction of New Environments. Front Genet 2022; 13:920689. [PMID: 36313422 PMCID: PMC9608852 DOI: 10.3389/fgene.2022.920689] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 05/19/2022] [Indexed: 12/01/2022] Open
Abstract
In plant breeding, the need to improve the prediction of future seasons or new locations and/or environments, also denoted as “leave one environment out,” is of paramount importance to increase the genetic gain in breeding programs and contribute to food and nutrition security worldwide. Genomic selection (GS) has the potential to increase the accuracy of future seasons or new locations because it is a predictive methodology. However, most statistical machine learning methods used for the task of predicting a new environment or season struggle to produce moderate or high prediction accuracies. For this reason, in this study we explore the use of the partial least squares (PLS) regression methodology for this specific task, and we benchmark its performance with the Bayesian Genomic Best Linear Unbiased Predictor (GBLUP) method. The benchmarking process was done with 14 real datasets. We found that in all datasets the PLS method outperformed the popular GBLUP method by margins between 0% (in the Indica data) and 228.28% (in the Disease data) across traits, environments, and types of predictors. Our results show great empirical evidence of the power of the PLS methodology for the prediction of future seasons or new environments.
Collapse
|
26
|
Bustos-Korts D, Boer MP, Layton J, Gehringer A, Tang T, Wehrens R, Messina C, de la Vega AJ, van Eeuwijk FA. Identification of environment types and adaptation zones with self-organizing maps; applications to sunflower multi-environment data in Europe. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:2059-2082. [PMID: 35524815 PMCID: PMC9205840 DOI: 10.1007/s00122-022-04098-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 04/07/2022] [Indexed: 06/14/2023]
Abstract
We evaluate self-organizing maps (SOM) to identify adaptation zones and visualize multi-environment genotypic responses. We apply SOM to multiple traits and crop growth model output of large-scale European sunflower data. Genotype-by-environment interactions (G × E) complicate the selection of well-adapted varieties. A possible solution is to group trial locations into adaptation zones with G × E occurring mainly between zones. By selecting for good performance inside those zones, response to selection is increased. In this paper, we present a two-step procedure to identify adaptation zones that starts from a self-organizing map (SOM). In the SOM, trials across locations and years are assigned to groups, called units, that are organized on a two-dimensional grid. Units that are further apart contain more distinct trials. In an iterative process of reweighting trial contributions to units, the grid configuration is learnt simultaneously with the trial assignment to units. An aggregation of the units in the SOM by hierarchical clustering then produces environment types, i.e. trials with similar growing conditions. Adaptation zones can subsequently be identified by grouping trial locations with similar distributions of environment types across years. For the construction of SOMs, multiple data types can be combined. We compared environment types and adaptation zones obtained for European sunflower from quantitative traits like yield, oil content, phenology and disease scores with those obtained from environmental indices calculated with the crop growth model Sunflo. We also show how results are affected by input data organization and user-defined weights for genotypes and traits. Adaptation zones for European sunflower as identified by our SOM-based strategy captured substantial genotype-by-location interaction and pointed to trials in Spain, Turkey and South Bulgaria as inducing different genotypic responses.
Collapse
Affiliation(s)
- Daniela Bustos-Korts
- Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands.
| | - Martin P Boer
- Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands
| | - Jamie Layton
- Corteva Agriscience, Ferme Barbara - 265, Route de Boutoli, 82700, Montech, France
| | - Anke Gehringer
- Corteva Agriscience, Ferme Barbara - 265, Route de Boutoli, 82700, Montech, France
| | - Tom Tang
- Corteva Agriscience, 7300 62nd Avenue, Johnston, IA, 50131, USA
| | - Ron Wehrens
- Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands
| | - Charlie Messina
- Corteva Agriscience, 7300 62nd Avenue, Johnston, IA, 50131, USA
- Horticultural Sciences Department, University of Florida, 2550 Hull Rd, Gainesville, FL, 32611, USA
| | | | - Fred A van Eeuwijk
- Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands
| |
Collapse
|
27
|
Atanda SA, Govindan V, Singh R, Robbins KR, Crossa J, Bentley AR. Sparse testing using genomic prediction improves selection for breeding targets in elite spring wheat. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:1939-1950. [PMID: 35348821 PMCID: PMC9205816 DOI: 10.1007/s00122-022-04085-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 03/16/2022] [Indexed: 06/08/2023]
Abstract
Sparse testing using genomic prediction can be efficiently used to increase the number of testing environments while maintaining selection intensity in the early yield testing stage without increasing the breeding budget. Sparse testing using genomic prediction enables expanded use of selection environments in early-stage yield testing without increasing phenotyping cost. We evaluated different sparse testing strategies in the yield testing stage of a CIMMYT spring wheat breeding pipeline characterized by multiple populations each with small family sizes of 1-9 individuals. Our results indicated that a substantial overlap between lines across environments should be used to achieve optimal prediction accuracy. As sparse testing leverages information generated within and across environments, the genetic correlations between environments and genomic relationships of lines across environments were the main drivers of prediction accuracy in multi-environment yield trials. Including information from previous evaluation years did not consistently improve the prediction performance. Genomic best linear unbiased prediction was found to be the best predictor of true breeding value, and therefore, we propose that it should be used as a selection decision metric in the early yield testing stages. We also propose it as a proxy for assessing prediction performance to mirror breeder's advancement decisions in a breeding program so that it can be readily applied for advancement decisions by breeding programs.
Collapse
Affiliation(s)
| | - Velu Govindan
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Ravi Singh
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Kelly R Robbins
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, USA
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Alison R Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico.
| |
Collapse
|
28
|
Heritable and Climatic Sources of Variation in Juvenile Tree Growth in an Austrian Common Garden Experiment of Central European Norway Spruce Populations. FORESTS 2022. [DOI: 10.3390/f13050809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
We leveraged publicly available data on juvenile tree height of 299 Central European Norway spruce populations grown in a common garden experiment across 24 diverse trial locations in Austria and weather data from the trial locations and population provenances to parse the heritable and climatic components of juvenile tree height variation. Principal component analysis of geospatial and weather variables demonstrated high interannual variation among trial environments, largely driven by differences in precipitation, and separation of population provenances based on altitude, temperature, and snowfall. Tree height was highly heritable and modeling the covariance between populations and trial environments based on climatic data led to more stable estimation of heritability and population × environment variance. Climatic similarity among population provenances was highly predictive of population × environment estimates for tree height.
Collapse
|
29
|
Galli G, Sabadin F, Yassue RM, Galves C, Carvalho HF, Crossa J, Montesinos-López OA, Fritsche-Neto R. Automated Machine Learning: A Case Study of Genomic "Image-Based" Prediction in Maize Hybrids. FRONTIERS IN PLANT SCIENCE 2022; 13:845524. [PMID: 35321444 PMCID: PMC8936805 DOI: 10.3389/fpls.2022.845524] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 02/03/2022] [Indexed: 06/14/2023]
Abstract
Machine learning methods such as multilayer perceptrons (MLP) and Convolutional Neural Networks (CNN) have emerged as promising methods for genomic prediction (GP). In this context, we assess the performance of MLP and CNN on regression and classification tasks in a case study with maize hybrids. The genomic information was provided to the MLP as a relationship matrix and to the CNN as "genomic images." In the regression task, the machine learning models were compared along with GBLUP. Under the classification task, MLP and CNN were compared. In this case, the traits (plant height and grain yield) were discretized in such a way to create balanced (moderate selection intensity) and unbalanced (extreme selection intensity) datasets for further evaluations. An automatic hyperparameter search for MLP and CNN was performed, and the best models were reported. For both task types, several metrics were calculated under a validation scheme to assess the effect of the prediction method and other variables. Overall, MLP and CNN presented competitive results to GBLUP. Also, we bring new insights on automated machine learning for genomic prediction and its implications to plant breeding.
Collapse
Affiliation(s)
- Giovanni Galli
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Felipe Sabadin
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, United States
| | - Rafael Massahiro Yassue
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Cassia Galves
- Department of Food Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | | | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - Roberto Fritsche-Neto
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
- International Rice Research Institute (IRRI), Los Baños, Philippines
| |
Collapse
|
30
|
Including dominance effects in the prediction model through locus-specific weights on heterozygous genotypes can greatly improve genomic predictive abilities. Heredity (Edinb) 2022; 128:154-158. [PMID: 35132207 PMCID: PMC8897419 DOI: 10.1038/s41437-022-00504-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 01/18/2022] [Accepted: 01/19/2022] [Indexed: 11/29/2022] Open
Abstract
The dominance effect is considered to be a key factor affecting complex traits. However, previous studies have shown that the improvement of the model, including the dominance effect, is usually less than 1%. This study proposes a novel genomic prediction method called CADM, which combines additive and dominance genetic effects through locus-specific weights on heterozygous genotypes. To the best of our knowledge, this is the first study of weighting dominance effects for genomic prediction. This method was applied to the analysis of chicken (511 birds) and pig (3534 animals) datasets. A 5-fold cross-validation method was used to evaluate the genomic predictive ability. The CADM model was compared with typical models considering additive and dominance genetic effects (ADM) and the model considering only additive genetic effects (AM). Based on the chicken data, using the CADM model, the genomic predictive abilities were improved for all three traits (body weight at 12th week, eviscerating percentage, and breast muscle percentage), and the average improvement in prediction accuracy was 27.1% compared with the AM model, while the ADM model was not better than the AM model. Based on the pig data, the CADM model increased the genomic predictive ability for all the three pig traits (trait names are masked, here designated as T1, T2, and T3), with an average increase of 26.3%, and the ADM model did not improve, or even slightly decreased, compared with the AM model. The results indicate that dominant genetic variation is one of the important sources of phenotypic variation, and the novel prediction model significantly improves the accuracy of genomic prediction.
Collapse
|
31
|
Cañas-Gutiérrez GP, Sepulveda-Ortega S, López-Hernández F, Navas-Arboleda AA, Cortés AJ. Inheritance of Yield Components and Morphological Traits in Avocado cv. Hass From "Criollo" "Elite Trees" via Half-Sib Seedling Rootstocks. FRONTIERS IN PLANT SCIENCE 2022; 13:843099. [PMID: 35685008 PMCID: PMC9171141 DOI: 10.3389/fpls.2022.843099] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 02/10/2022] [Indexed: 05/11/2023]
Abstract
Grafting induces precocity and maintains clonal integrity in fruit tree crops. However, the complex rootstock × scion interaction often precludes understanding how the tree phenotype is shaped, limiting the potential to select optimum rootstocks. Therefore, it is necessary to assess (1) how seedling progenies inherit trait variation from elite 'plus trees', and (2) whether such family superiority may be transferred after grafting to the clonal scion. To bridge this gap, we quantified additive genetic parameters (i.e., narrow sense heritability-h 2, and genetic-estimated breeding values-GEBVs) across landraces, "criollo", "plus trees" of the super-food fruit tree crop avocado (Persea americana Mill.), and their open-pollinated (OP) half-sib seedling families. Specifically, we used a genomic best linear unbiased prediction (G-BLUP) model to merge phenotypic characterization of 17 morpho-agronomic traits with genetic screening of 13 highly polymorphic SSR markers in a diverse panel of 104 avocado "criollo" "plus trees." Estimated additive genetic parameters were validated at a 5-year-old common garden trial (i.e., provenance test), in which 22 OP half-sib seedlings from 82 elite "plus trees" served as rootstocks for the cv. Hass clone. Heritability (h 2) scores in the "criollo" "plus trees" ranged from 0.28 to 0.51. The highest h 2 values were observed for ribbed petiole and adaxial veins with 0.47 (CI 95%0.2-0.8) and 0.51 (CI 0.2-0.8), respectively. The h 2 scores for the agronomic traits ranged from 0.34 (CI 0.2-0.6) to 0.39 (CI 0.2-0.6) for seed weight, fruit weight, and total volume, respectively. When inspecting yield variation across 5-year-old grafted avocado cv. Hass trees with elite OP half-sib seedling rootstocks, the traits total number of fruits and fruits' weight, respectively, exhibited h 2 scores of 0.36 (± 0.23) and 0.11 (± 0.09). Our results indicate that elite "criollo" "plus trees" may serve as promissory donors of seedling rootstocks for avocado cv. Hass orchards due to the inheritance of their outstanding trait values. This reinforces the feasibility to leverage natural variation from "plus trees" via OP half-sib seedling rootstock families. By jointly estimating half-sib family effects and rootstock-mediated heritability, this study promises boosting seedling rootstock breeding programs, while better discerning the consequences of grafting in fruit tree crops.
Collapse
Affiliation(s)
- Gloria Patricia Cañas-Gutiérrez
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Rionegro, Colombia
- Corporation for Biological Research (CIB), Unit of Phytosanity and Biological Control, Medellín, Colombia
- *Correspondence: Gloria Patricia Cañas-Gutiérrez,
| | - Stella Sepulveda-Ortega
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Rionegro, Colombia
| | - Felipe López-Hernández
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Rionegro, Colombia
| | | | - Andrés J. Cortés
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Rionegro, Colombia
- Andrés J. Cortés,
| |
Collapse
|
32
|
Manthena V, Jarquín D, Howard R. Integrating and optimizing genomic, weather, and secondary trait data for multiclass classification. Front Genet 2022; 13:1032691. [PMID: 37065625 PMCID: PMC10090538 DOI: 10.3389/fgene.2022.1032691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 12/22/2022] [Indexed: 04/18/2023] Open
Abstract
Modern plant breeding programs collect several data types such as weather, images, and secondary or associated traits besides the main trait (e.g., grain yield). Genomic data is high-dimensional and often over-crowds smaller data types when naively combined to explain the response variable. There is a need to develop methods able to effectively combine different data types of differing sizes to improve predictions. Additionally, in the face of changing climate conditions, there is a need to develop methods able to effectively combine weather information with genotype data to predict the performance of lines better. In this work, we develop a novel three-stage classifier to predict multi-class traits by combining three data types-genomic, weather, and secondary trait. The method addressed various challenges in this problem, such as confounding, differing sizes of data types, and threshold optimization. The method was examined in different settings, including binary and multi-class responses, various penalization schemes, and class balances. Then, our method was compared to standard machine learning methods such as random forests and support vector machines using various classification accuracy metrics and using model size to evaluate the sparsity of the model. The results showed that our method performed similarly to or better than machine learning methods across various settings. More importantly, the classifiers obtained were highly sparse, allowing for a straightforward interpretation of relationships between the response and the selected predictors.
Collapse
Affiliation(s)
- Vamsi Manthena
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Diego Jarquín
- Agronomy Department, University of Florida, Gainesville, FL, United States
| | - Reka Howard
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE, United States
- *Correspondence: Reka Howard,
| |
Collapse
|
33
|
Martins Oliveira IC, Bernardeli A, Soler Guilhen JH, Pastina MM. Genomic Prediction of Complex Traits in an Allogamous Annual Crop: The Case of Maize Single-Cross Hybrids. Methods Mol Biol 2022; 2467:543-567. [PMID: 35451790 DOI: 10.1007/978-1-0716-2205-6_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
For many plant and animal species, commercial products are hybrids between individuals from different genetic groups. For allogamous plant species such as maize, the breeding objective is to produce single-cross hybrid varieties from two inbred lines each selected in complementary groups. Efficient hybrid breeding requires methods that (1) quickly generate homozygous and homogeneous parental lines with high combining abilities, (2) efficiently choose among the large number of available parental lines the most promising ones, and (3) predict the performances of sets of non-phenotyped single-cross hybrids, or hybrids phenotyped in a limited number of environments, based on their relationship with another set of hybrids with known performances. The maize breeding community has been developing model-based prediction of hybrid performances well before the genomic era. This chapter (1) provides a reminder of the maize breeding scheme before the genomic era; (2) describes how genomic data were incorporated in the prediction models involved in different steps of genomic-based single-cross maize hybrid breeding; and (3) reviews factors affecting the accuracy of genomic prediction, approaches for optimizing GP-based single-cross maize hybrid breeding schemes, and ensuring the long-term sustainability of genomic selection.
Collapse
Affiliation(s)
| | - Arthur Bernardeli
- Department of Agronomy, Universidade Federal de Viçosa, Viçosa-MG, Brazil
| | | | | |
Collapse
|
34
|
Crossa J, Montesinos-López OA, Pérez-Rodríguez P, Costa-Neto G, Fritsche-Neto R, Ortiz R, Martini JWR, Lillemo M, Montesinos-López A, Jarquin D, Breseghello F, Cuevas J, Rincent R. Genome and Environment Based Prediction Models and Methods of Complex Traits Incorporating Genotype × Environment Interaction. Methods Mol Biol 2022; 2467:245-283. [PMID: 35451779 DOI: 10.1007/978-1-0716-2205-6_9] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genomic-enabled prediction models are of paramount importance for the successful implementation of genomic selection (GS) based on breeding values. As opposed to animal breeding, plant breeding includes extensive multienvironment and multiyear field trial data. Hence, genomic-enabled prediction models should include genotype × environment (G × E) interaction, which most of the time increases the prediction performance when the response of lines are different from environment to environment. In this chapter, we describe a historical timeline since 2012 related to advances of the GS models that take into account G × E interaction. We describe theoretical and practical aspects of those GS models, including the gains in prediction performance when including G × E structures for both complex continuous and categorical scale traits. Then, we detailed and explained the main G × E genomic prediction models for complex traits measured in continuous and noncontinuous (categorical) scale. Related to G × E interaction models this review also examine the analyses of the information generated with high-throughput phenotype data (phenomic) and the joint analyses of multitrait and multienvironment field trial data that is also employed in the general assessment of multitrait G × E interaction. The inclusion of nongenomic data in increasing the accuracy and biological reliability of the G × E approach is also outlined. We show the recent advances in large-scale envirotyping (enviromics), and how the use of mechanistic computational modeling can derive the crop growth and development aspects useful for predicting phenotypes and explaining G × E.
Collapse
Affiliation(s)
- José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, Mexico
- Colegio de Postgraduados, Montecillos, Mexico
| | | | | | - Germano Costa-Neto
- Departamento de Genética, Escola Superior de Agricultura "Luiz de Queiroz" (ESALQ/USP), São Paulo, Brazil
| | - Roberto Fritsche-Neto
- Departamento de Genética, Escola Superior de Agricultura "Luiz de Queiroz" (ESALQ/USP), São Paulo, Brazil
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), Alnarp, Sweden
| | - Johannes W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, Mexico
| | - Morten Lillemo
- Department of Plant Sciences, Norwegian University of Life Sciences, IHA/CIGENE, Ås, Norway
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | | | | | - Jaime Cuevas
- Universidad de Quintana Roo, Chetumal, Quintana Roo, Mexico.
| | - Renaud Rincent
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, Gif-sur-Yvette, France.
| |
Collapse
|
35
|
Rogers AR, Holland JB. Environment-specific genomic prediction ability in maize using environmental covariates depends on environmental similarity to training data. G3 (BETHESDA, MD.) 2021; 12:6486423. [PMID: 35100364 PMCID: PMC9245610 DOI: 10.1093/g3journal/jkab440] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 12/06/2021] [Indexed: 12/30/2022]
Abstract
Technology advances have made possible the collection of a wealth of genomic, environmental, and phenotypic data for use in plant breeding. Incorporation of environmental data into environment-specific genomic prediction is hindered in part because of inherently high data dimensionality. Computationally efficient approaches to combining genomic and environmental information may facilitate extension of genomic prediction models to new environments and germplasm, and better understanding of genotype-by-environment (G × E) interactions. Using genomic, yield trial, and environmental data on 1,918 unique hybrids evaluated in 59 environments from the maize Genomes to Fields project, we determined that a set of 10,153 SNP dominance coefficients and a 5-day temporal window size for summarizing environmental variables were optimal for genomic prediction using only genetic and environmental main effects. Adding marker-by-environment variable interactions required dimension reduction, and we found that reducing dimensionality of the genetic data while keeping the full set of environmental covariates was best for environment-specific genomic prediction of grain yield, leading to an increase in prediction ability of 2.7% to achieve a prediction ability of 80% across environments when data were masked at random. We then measured how prediction ability within environments was affected under stratified training-testing sets to approximate scenarios commonly encountered by plant breeders, finding that incorporation of marker-by-environment effects improved prediction ability in cases where training and test sets shared environments, but did not improve prediction in new untested environments. The environmental similarity between training and testing sets had a greater impact on the efficacy of prediction than genetic similarity between training and test sets.
Collapse
Affiliation(s)
- Anna R Rogers
- Program in Genetics, North Carolina State University, Raleigh, NC
27695, USA
| | - James B Holland
- Program in Genetics, North Carolina State University, Raleigh, NC
27695, USA,USDA-ARS Plant Science Research Unit, North Carolina State
University, Raleigh, NC 27695, USA,Department of Crop and Soil Sciences, North Carolina State
University, Raleigh, NC 27695, USA,Corresponding author: Department of Agriculture—Agriculture
Research Service, Box 7620 North Carolina State University, Raleigh, NC 27695-7620, USA.
| |
Collapse
|
36
|
Westhues CC, Mahone GS, da Silva S, Thorwarth P, Schmidt M, Richter JC, Simianer H, Beissinger TM. Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks. FRONTIERS IN PLANT SCIENCE 2021; 12:699589. [PMID: 34880880 PMCID: PMC8647909 DOI: 10.3389/fpls.2021.699589] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 10/15/2021] [Indexed: 05/26/2023]
Abstract
The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method (elastic net) and to two nonlinear gradient boosting methods based on decision tree algorithms (XGBoost, LightGBM). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments.
Collapse
Affiliation(s)
- Cathy C. Westhues
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen, Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen, Goettingen, Germany
| | | | - Sofia da Silva
- Kleinwanzlebener Saatzucht (KWS) SAAT SE, Einbeck, Germany
| | | | - Malthe Schmidt
- Kleinwanzlebener Saatzucht (KWS) SAAT SE, Einbeck, Germany
| | | | - Henner Simianer
- Center for Integrated Breeding Research, University of Goettingen, Goettingen, Germany
- Animal Breeding and Genetics Group, Department of Animal Sciences, University of Goettingen, Goettingen, Germany
| | - Timothy M. Beissinger
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen, Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen, Goettingen, Germany
| |
Collapse
|
37
|
Fonseca JMO, Klein PE, Crossa J, Pacheco A, Perez-Rodriguez P, Ramasamy P, Klein R, Rooney WL. Assessing combining abilities, genomic data, and genotype × environment interactions to predict hybrid grain sorghum performance. THE PLANT GENOME 2021; 14:e20127. [PMID: 34370387 DOI: 10.1002/tpg2.20127] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 06/08/2021] [Indexed: 05/02/2023]
Abstract
Genomic selection in maize (Zea mays L.) has been one factor that has increased the rate of genetic gain when compared with other cereals. However, the technological foundations in maize also exist in other cereal crops that would allow prediction of hybrid performance based on general (GCA) and specific (SCA) combining abilities applied through genomic-enabled prediction models. Further, the incorporation of genotype × environment (G × E) interaction effects present an opportunity to deploy hybrids to targeted environments. To test these concepts, a factorial mating design of elite yet divergent grain sorghum lines generated hybrids for evaluation. Inbred parents were genotyped, and markers were used to assess population structure and develop the genomic relationship matrix (GRM). Grain yield, height, and days to anthesis were collected for hybrids in replicated trials, and best linear unbiased estimates were used to train classical GCA-SCA-based and genomic (GB) models under a hierarchical Bayesian framework. To incorporate population structure, GB was fitted using the GRM of both parents and hybrids. For GB models, G × E interaction effects were included by the Hadamard product between GRM and environments. A leave-one-out cross-validation scheme was used to study the prediction capacity of models. Classical and genomic models effectively predicted hybrid performance and prediction accuracy increased by including genomic data. Genomic models effectively partitioned the variation due to GCA, SCA, and their interaction with the environment. A strategy to implement genomic selection for hybrid sorghum [Sorghum bicolor (L.) Moench] breeding is presented herein.
Collapse
Affiliation(s)
- Jales M O Fonseca
- Dep. of Soil and Crop Sciences, Texas A&M Univ., College Station, TX, 77843, USA
| | - Patricia E Klein
- Dep. of Horticultural Sciences, Texas A&M Univ., College Station, TX, 77843, USA
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Él Batán, Mexico
| | - Angela Pacheco
- International Maize and Wheat Improvement Center (CIMMYT), Él Batán, Mexico
| | | | - Perumal Ramasamy
- Agriculture Research Center, Kansas State Univ., Hays, KS, 67601, USA
| | - Robert Klein
- Southern Plains Agricultural Research Center, USDA-ARS, College Station, TX, 77845, USA
| | - William L Rooney
- Dep. of Soil and Crop Sciences, Texas A&M Univ., College Station, TX, 77843, USA
| |
Collapse
|
38
|
Gianola D. Opinionated Views on Genome-Assisted Inference and Prediction During a Pandemic. FRONTIERS IN PLANT SCIENCE 2021; 12:717284. [PMID: 34421971 PMCID: PMC8377666 DOI: 10.3389/fpls.2021.717284] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 06/30/2021] [Indexed: 06/13/2023]
|
39
|
Costa-Neto G, Galli G, Carvalho HF, Crossa J, Fritsche-Neto R. EnvRtype: a software to interplay enviromics and quantitative genomics in agriculture. G3-GENES GENOMES GENETICS 2021; 11:6129777. [PMID: 33835165 PMCID: PMC8049414 DOI: 10.1093/g3journal/jkab040] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 01/21/2021] [Indexed: 11/13/2022]
Abstract
Envirotyping is an essential technique used to unfold the nongenetic drivers associated with the phenotypic adaptation of living organisms. Here, we introduce the EnvRtype R package, a novel toolkit developed to interplay large-scale envirotyping data (enviromics) into quantitative genomics. To start a user-friendly envirotyping pipeline, this package offers: (1) remote sensing tools for collecting (get_weather and extract_GIS functions) and processing ecophysiological variables (processWTH function) from raw environmental data at single locations or worldwide; (2) environmental characterization by typing environments and profiling descriptors of environmental quality (env_typing function), in addition to gathering environmental covariables as quantitative descriptors for predictive purposes (W_matrix function); and (3) identification of environmental similarity that can be used as an enviromic-based kernel (env_typing function) in whole-genome prediction (GP), aimed at increasing ecophysiological knowledge in genomic best-unbiased predictions (GBLUP) and emulating reaction norm effects (get_kernel and kernel_model functions). We highlight literature mining concepts in fine-tuning envirotyping parameters for each plant species and target growing environments. We show that envirotyping for predictive breeding collects raw data and processes it in an eco-physiologically smart way. Examples of its use for creating global-scale envirotyping networks and integrating reaction-norm modeling in GP are also outlined. We conclude that EnvRtype provides a cost-effective envirotyping pipeline capable of providing high quality enviromic data for a diverse set of genomic-based studies, especially for increasing accuracy in GP across untested growing environments.
Collapse
Affiliation(s)
- Germano Costa-Neto
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
| | - Giovanni Galli
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
| | - Humberto Fanelli Carvalho
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
| | - José Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera Mexico-Veracruz, El Batan Km. 45, CP 56237 Mexico; Colegio de Postgraduados, Montecillos, Edo. de Mexico, CP 56264, Mexico
| | - Roberto Fritsche-Neto
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil.,Quantitative Genetics and Biometrics Cluster, International Rice Research Institute (IRRI), Los Baños, Philippines
| |
Collapse
|
40
|
Fritsche-Neto R, Galli G, Borges KLR, Costa-Neto G, Alves FC, Sabadin F, Lyra DH, Morais PPP, Braatz de Andrade LR, Granato I, Crossa J. Optimizing Genomic-Enabled Prediction in Small-Scale Maize Hybrid Breeding Programs: A Roadmap Review. FRONTIERS IN PLANT SCIENCE 2021; 12:658267. [PMID: 34276721 PMCID: PMC8281958 DOI: 10.3389/fpls.2021.658267] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 05/10/2021] [Indexed: 06/13/2023]
Abstract
The usefulness of genomic prediction (GP) for many animal and plant breeding programs has been highlighted for many studies in the last 20 years. In maize breeding programs, mostly dedicated to delivering more highly adapted and productive hybrids, this approach has been proved successful for both large- and small-scale breeding programs worldwide. Here, we present some of the strategies developed to improve the accuracy of GP in tropical maize, focusing on its use under low budget and small-scale conditions achieved for most of the hybrid breeding programs in developing countries. We highlight the most important outcomes obtained by the University of São Paulo (USP, Brazil) and how they can improve the accuracy of prediction in tropical maize hybrids. Our roadmap starts with the efforts for germplasm characterization, moving on to the practices for mating design, and the selection of the genotypes that are used to compose the training population in field phenotyping trials. Factors including population structure and the importance of non-additive effects (dominance and epistasis) controlling the desired trait are also outlined. Finally, we explain how the source of the molecular markers, environmental, and the modeling of genotype-environment interaction can affect the accuracy of GP. Results of 7 years of research in a public maize hybrid breeding program under tropical conditions are discussed, and with the great advances that have been made, we find that what is yet to come is exciting. The use of open-source software for the quality control of molecular markers, implementing GP, and envirotyping pipelines may reduce costs in an efficient computational manner. We conclude that exploring new models/tools using high-throughput phenotyping data along with large-scale envirotyping may bring more resolution and realism when predicting genotype performances. Despite the initial costs, mostly for genotyping, the GP platforms in combination with these other data sources can be a cost-effective approach for predicting the performance of maize hybrids for a large set of growing conditions.
Collapse
Affiliation(s)
- Roberto Fritsche-Neto
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Giovanni Galli
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Karina Lima Reis Borges
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Germano Costa-Neto
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Filipe Couto Alves
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, United States
| | - Felipe Sabadin
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Danilo Hottis Lyra
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpenden, United Kingdom
| | | | | | - Italo Granato
- Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux (LEPSE), Institut National de la Recherche Agronomique (INRA), Univ. Montpellier, SupAgro, Montpellier, France
| | - Jose Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz, Texcoco, Mexico
- Colegio de Posgraduado, Montecillo, Mexico
| |
Collapse
|
41
|
Li X, Guo T, Wang J, Bekele WA, Sukumaran S, Vanous AE, McNellie JP, Tibbs-Cortes LE, Lopes MS, Lamkey KR, Westgate ME, McKay JK, Archontoulis SV, Reynolds MP, Tinker NA, Schnable PS, Yu J. An integrated framework reinstating the environmental dimension for GWAS and genomic selection in crops. MOLECULAR PLANT 2021; 14:874-887. [PMID: 33713844 DOI: 10.1016/j.molp.2021.03.010] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 02/03/2021] [Accepted: 03/09/2021] [Indexed: 05/08/2023]
Abstract
Identifying mechanisms and pathways involved in gene-environment interplay and phenotypic plasticity is a long-standing challenge. It is highly desirable to establish an integrated framework with an environmental dimension for complex trait dissection and prediction. A critical step is to identify an environmental index that is both biologically relevant and estimable for new environments. With extensive field-observed complex traits, environmental profiles, and genome-wide single nucleotide polymorphisms for three major crops (maize, wheat, and oat), we demonstrated that identifying such an environmental index (i.e., a combination of environmental parameter and growth window) enables genome-wide association studies and genomic selection of complex traits to be conducted with an explicit environmental dimension. Interestingly, genes identified for two reaction-norm parameters (i.e., intercept and slope) derived from flowering time values along the environmental index were less colocalized for a diverse maize panel than for wheat and oat breeding panels, agreeing with the different diversity levels and genetic constitutions of the panels. In addition, we showcased the usefulness of this framework for systematically forecasting the performance of diverse germplasm panels in new environments. This general framework and the companion CERIS-JGRA analytical package should facilitate biologically informed dissection of complex traits, enhanced performance prediction in breeding for future climates, and coordinated efforts to enrich our understanding of mechanisms underlying phenotypic variation.
Collapse
Affiliation(s)
- Xianran Li
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Tingting Guo
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Jinyu Wang
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Wubishet A Bekele
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON, Canada
| | - Sivakumar Sukumaran
- International Maize and Wheat Improvement Center (CIMMYT), Mexico City, Mexico
| | - Adam E Vanous
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - James P McNellie
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | | | - Marta S Lopes
- International Maize and Wheat Improvement Center (CIMMYT), Mexico City, Mexico
| | - Kendall R Lamkey
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Mark E Westgate
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - John K McKay
- Department of Bioagricultural Sciences and Pest Management, Colorado State University, Fort Collins, CO 80523, USA
| | | | - Matthew P Reynolds
- International Maize and Wheat Improvement Center (CIMMYT), Mexico City, Mexico
| | - Nicholas A Tinker
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON, Canada
| | | | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
42
|
Powell OM, Voss-Fels KP, Jordan DR, Hammer G, Cooper M. Perspectives on Applications of Hierarchical Gene-To-Phenotype (G2P) Maps to Capture Non-stationary Effects of Alleles in Genomic Prediction. FRONTIERS IN PLANT SCIENCE 2021; 12:663565. [PMID: 34149761 PMCID: PMC8211918 DOI: 10.3389/fpls.2021.663565] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 04/13/2021] [Indexed: 05/26/2023]
Abstract
Genomic prediction of complex traits across environments, breeding cycles, and populations remains a challenge for plant breeding. A potential explanation for this is that underlying non-additive genetic (GxG) and genotype-by-environment (GxE) interactions generate allele substitution effects that are non-stationary across different contexts. Such non-stationary effects of alleles are either ignored or assumed to be implicitly captured by most gene-to-phenotype (G2P) maps used in genomic prediction. The implicit capture of non-stationary effects of alleles requires the G2P map to be re-estimated across different contexts. We discuss the development and application of hierarchical G2P maps that explicitly capture non-stationary effects of alleles and have successfully increased short-term prediction accuracy in plant breeding. These hierarchical G2P maps achieve increases in prediction accuracy by allowing intermediate processes such as other traits and environmental factors and their interactions to contribute to complex trait variation. However, long-term prediction remains a challenge. The plant breeding community should undertake complementary simulation and empirical experiments to interrogate various hierarchical G2P maps that connect GxG and GxE interactions simultaneously. The existing genetic correlation framework can be used to assess the magnitude of non-stationary effects of alleles and the predictive ability of these hierarchical G2P maps in long-term, multi-context genomic predictions of complex traits in plant breeding.
Collapse
Affiliation(s)
- Owen M. Powell
- Queensland Alliance for Agriculture and Food Innovation, Centre for Crop Science, The University of Queensland, St Lucia, QLD, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, The University of Queensland, St Lucia, QLD, Australia
| | - Kai P. Voss-Fels
- Queensland Alliance for Agriculture and Food Innovation, Centre for Crop Science, The University of Queensland, St Lucia, QLD, Australia
| | - David R. Jordan
- Queensland Alliance for Agriculture and Food Innovation, Hermitage Research Facility, The University of Queensland, Warwick, QLD, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, The University of Queensland, St Lucia, QLD, Australia
| | - Graeme Hammer
- Queensland Alliance for Agriculture and Food Innovation, Centre for Crop Science, The University of Queensland, St Lucia, QLD, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, The University of Queensland, St Lucia, QLD, Australia
| | - Mark Cooper
- Queensland Alliance for Agriculture and Food Innovation, Centre for Crop Science, The University of Queensland, St Lucia, QLD, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, The University of Queensland, St Lucia, QLD, Australia
| |
Collapse
|
43
|
Smith DT, Potgieter AB, Chapman SC. Scaling up high-throughput phenotyping for abiotic stress selection in the field. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:1845-1866. [PMID: 34076731 DOI: 10.1007/s00122-021-03864-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 05/13/2021] [Indexed: 05/18/2023]
Abstract
High-throughput phenotyping (HTP) is in its infancy for deployment in large-scale breeding programmes. With the ability to measure correlated traits associated with physiological ideotypes, in-field phenotyping methods are available for screening of abiotic stress responses. As cropping environments become more hostile and unpredictable due to the effects of climate change, the need to characterise variability across spatial and temporal scales will become increasingly important. The sensor technologies that have enabled HTP from macroscopic through to satellite sensors may also be utilised here to complement spatial characterisation using envirotyping, which can improve estimations of genotypic performance across environments by better accounting for variation at the plot, trial and inter-trial levels. Climate change is leading to increased variation at all physical and temporal scales in the cropping environment. Maintaining yield stability under circumstances with greater levels of abiotic stress while capitalising upon yield potential in good years, requires approaches to plant breeding that target the physiological limitations to crop performance in specific environments. This requires dynamic modelling of conditions within target populations of environments, GxExM predictions, clustering of environments so breeding trajectories can be defined, and the development of screens that enable selection for genetic gain to occur. High-throughput phenotyping (HTP), combined with related technologies used for envirotyping, can help to address these challenges. Non-destructive analysis of the morphological, biochemical and physiological qualities of plant canopies using HTP has great potential to complement whole-genome selection, which is becoming increasingly common in breeding programmes. A range of novel analytic techniques, such as machine learning and deep learning, combined with a widening range of sensors, allow rapid assessment of large breeding populations that are repeatable and objective. Secondary traits underlying radiation use efficiency and water use efficiency can be screened with HTP for selection at the early stages of a breeding programme. HTP and envirotyping technologies can also characterise spatial variability at trial and within-plot levels, which can be used to correct for spatial variations that confound measurements of genotypic values. This review explores HTP for abiotic stress selection through a physiological trait lens and additionally investigates the use of envirotyping and EC to characterise spatial variability at all physical scales in METs.
Collapse
Affiliation(s)
- Daniel T Smith
- The University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia
| | - Andries B Potgieter
- Centre for Crop Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, QLD, 4072, Australia
| | - Scott C Chapman
- The University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia.
| |
Collapse
|
44
|
Cortés AJ, López-Hernández F. Harnessing Crop Wild Diversity for Climate Change Adaptation. Genes (Basel) 2021; 12:783. [PMID: 34065368 PMCID: PMC8161384 DOI: 10.3390/genes12050783] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 04/28/2021] [Accepted: 05/19/2021] [Indexed: 12/20/2022] Open
Abstract
Warming and drought are reducing global crop production with a potential to substantially worsen global malnutrition. As with the green revolution in the last century, plant genetics may offer concrete opportunities to increase yield and crop adaptability. However, the rate at which the threat is happening requires powering new strategies in order to meet the global food demand. In this review, we highlight major recent 'big data' developments from both empirical and theoretical genomics that may speed up the identification, conservation, and breeding of exotic and elite crop varieties with the potential to feed humans. We first emphasize the major bottlenecks to capture and utilize novel sources of variation in abiotic stress (i.e., heat and drought) tolerance. We argue that adaptation of crop wild relatives to dry environments could be informative on how plant phenotypes may react to a drier climate because natural selection has already tested more options than humans ever will. Because isolated pockets of cryptic diversity may still persist in remote semi-arid regions, we encourage new habitat-based population-guided collections for genebanks. We continue discussing how to systematically study abiotic stress tolerance in these crop collections of wild and landraces using geo-referencing and extensive environmental data. By uncovering the genes that underlie the tolerance adaptive trait, natural variation has the potential to be introgressed into elite cultivars. However, unlocking adaptive genetic variation hidden in related wild species and early landraces remains a major challenge for complex traits that, as abiotic stress tolerance, are polygenic (i.e., regulated by many low-effect genes). Therefore, we finish prospecting modern analytical approaches that will serve to overcome this issue. Concretely, genomic prediction, machine learning, and multi-trait gene editing, all offer innovative alternatives to speed up more accurate pre- and breeding efforts toward the increase in crop adaptability and yield, while matching future global food demands in the face of increased heat and drought. In order for these 'big data' approaches to succeed, we advocate for a trans-disciplinary approach with open-source data and long-term funding. The recent developments and perspectives discussed throughout this review ultimately aim to contribute to increased crop adaptability and yield in the face of heat waves and drought events.
Collapse
Affiliation(s)
- Andrés J. Cortés
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Km 7 Vía Rionegro, Las Palmas, Rionegro 054048, Colombia;
- Departamento de Ciencias Forestales, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Sede Medellín, Medellín 050034, Colombia
| | - Felipe López-Hernández
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Km 7 Vía Rionegro, Las Palmas, Rionegro 054048, Colombia;
| |
Collapse
|
45
|
Costa-Neto G, Galli G, Carvalho HF, Crossa J, Fritsche-Neto R. EnvRtype: a software to interplay enviromics and quantitative genomics in agriculture. G3 (BETHESDA, MD.) 2021; 11. [PMID: 33835165 DOI: 10.1101/2020.10.14.339705] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 01/21/2021] [Indexed: 05/20/2023]
Abstract
Envirotyping is an essential technique used to unfold the nongenetic drivers associated with the phenotypic adaptation of living organisms. Here, we introduce the EnvRtype R package, a novel toolkit developed to interplay large-scale envirotyping data (enviromics) into quantitative genomics. To start a user-friendly envirotyping pipeline, this package offers: (1) remote sensing tools for collecting (get_weather and extract_GIS functions) and processing ecophysiological variables (processWTH function) from raw environmental data at single locations or worldwide; (2) environmental characterization by typing environments and profiling descriptors of environmental quality (env_typing function), in addition to gathering environmental covariables as quantitative descriptors for predictive purposes (W_matrix function); and (3) identification of environmental similarity that can be used as an enviromic-based kernel (env_typing function) in whole-genome prediction (GP), aimed at increasing ecophysiological knowledge in genomic best-unbiased predictions (GBLUP) and emulating reaction norm effects (get_kernel and kernel_model functions). We highlight literature mining concepts in fine-tuning envirotyping parameters for each plant species and target growing environments. We show that envirotyping for predictive breeding collects raw data and processes it in an eco-physiologically smart way. Examples of its use for creating global-scale envirotyping networks and integrating reaction-norm modeling in GP are also outlined. We conclude that EnvRtype provides a cost-effective envirotyping pipeline capable of providing high quality enviromic data for a diverse set of genomic-based studies, especially for increasing accuracy in GP across untested growing environments.
Collapse
Affiliation(s)
- Germano Costa-Neto
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
| | - Giovanni Galli
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
| | - Humberto Fanelli Carvalho
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
| | - José Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera Mexico-Veracruz, El Batan Km. 45, CP 56237 Mexico; Colegio de Postgraduados, Montecillos, Edo. de Mexico, CP 56264, Mexico
| | - Roberto Fritsche-Neto
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
- Quantitative Genetics and Biometrics Cluster, International Rice Research Institute (IRRI), Los Baños, Philippines
| |
Collapse
|
46
|
Costa-Neto G, Crossa J, Fritsche-Neto R. Enviromic Assembly Increases Accuracy and Reduces Costs of the Genomic Prediction for Yield Plasticity in Maize. FRONTIERS IN PLANT SCIENCE 2021; 12:717552. [PMID: 34691099 PMCID: PMC8529011 DOI: 10.3389/fpls.2021.717552] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 09/03/2021] [Indexed: 05/21/2023]
Abstract
Quantitative genetics states that phenotypic variation is a consequence of the interaction between genetic and environmental factors. Predictive breeding is based on this statement, and because of this, ways of modeling genetic effects are still evolving. At the same time, the same refinement must be used for processing environmental information. Here, we present an "enviromic assembly approach," which includes using ecophysiology knowledge in shaping environmental relatedness into whole-genome predictions (GP) for plant breeding (referred to as enviromic-aided genomic prediction, E-GP). We propose that the quality of an environment is defined by the core of environmental typologies and their frequencies, which describe different zones of plant adaptation. From this, we derived markers of environmental similarity cost-effectively. Combined with the traditional additive and non-additive effects, this approach may better represent the putative phenotypic variation observed across diverse growing conditions (i.e., phenotypic plasticity). Then, we designed optimized multi-environment trials coupling genetic algorithms, enviromic assembly, and genomic kinships capable of providing in-silico realization of the genotype-environment combinations that must be phenotyped in the field. As proof of concept, we highlighted two E-GP applications: (1) managing the lack of phenotypic information in training accurate GP models across diverse environments and (2) guiding an early screening for yield plasticity exerting optimized phenotyping efforts. Our approach was tested using two tropical maize sets, two types of enviromics assembly, six experimental network sizes, and two types of optimized training set across environments. We observed that E-GP outperforms benchmark GP in all scenarios, especially when considering smaller training sets. The representativeness of genotype-environment combinations is more critical than the size of multi-environment trials (METs). The conventional genomic best-unbiased prediction (GBLUP) is inefficient in predicting the quality of a yet-to-be-seen environment, while enviromic assembly enabled it by increasing the accuracy of yield plasticity predictions. Furthermore, we discussed theoretical backgrounds underlying how intrinsic envirotype-phenotype covariances within the phenotypic records can impact the accuracy of GP. The E-GP is an efficient approach to better use environmental databases to deliver climate-smart solutions, reduce field costs, and anticipate future scenarios.
Collapse
Affiliation(s)
- Germano Costa-Neto
- Department of Genetics, “Luiz de Queiroz” Agriculture College, University of São Paulo (ESALQ/USP), Piracicaba, Brazil
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, United States
- *Correspondence: Germano Costa-Neto
| | - Jose Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Mexico City, Mexico
- Colegio de Posgraduado, Mexico City, Mexico
| | - Roberto Fritsche-Neto
- Department of Genetics, “Luiz de Queiroz” Agriculture College, University of São Paulo (ESALQ/USP), Piracicaba, Brazil
- Breeding Analytics and Data Management Unit, International Rice Research Institute (IRRI), Los Baños, Philippines
| |
Collapse
|
47
|
Crossa J, Fritsche-Neto R, Montesinos-Lopez OA, Costa-Neto G, Dreisigacker S, Montesinos-Lopez A, Bentley AR. The Modern Plant Breeding Triangle: Optimizing the Use of Genomics, Phenomics, and Enviromics Data. FRONTIERS IN PLANT SCIENCE 2021; 12:651480. [PMID: 33936136 PMCID: PMC8085545 DOI: 10.3389/fpls.2021.651480] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 02/11/2021] [Indexed: 05/04/2023]
Affiliation(s)
- Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, de Mexico, Mexico
- Colegio de Postgraduados, Montecillo, Edo. de Mexico, Mexico
| | - Roberto Fritsche-Neto
- Department of Genetics, “Luiz de Queiroz” Agriculture College, University of São Paulo, São Paulo, Brazil
| | | | - Germano Costa-Neto
- Department of Genetics, “Luiz de Queiroz” Agriculture College, University of São Paulo, São Paulo, Brazil
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, de Mexico, Mexico
| | - Abelardo Montesinos-Lopez
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Mexico
| | - Alison R. Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, de Mexico, Mexico
- *Correspondence: Alison R. Bentley
| |
Collapse
|
48
|
Reyes-Herrera PH, Muñoz-Baena L, Velásquez-Zapata V, Patiño L, Delgado-Paz OA, Díaz-Diez CA, Navas-Arboleda AA, Cortés AJ. Inheritance of Rootstock Effects in Avocado ( Persea americana Mill.) cv. Hass. FRONTIERS IN PLANT SCIENCE 2020; 11:555071. [PMID: 33424874 PMCID: PMC7785968 DOI: 10.3389/fpls.2020.555071] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 11/17/2020] [Indexed: 05/16/2023]
Abstract
Grafting is typically utilized to merge adapted seedling rootstocks with highly productive clonal scions. This process implies the interaction of multiple genomes to produce a unique tree phenotype. However, the interconnection of both genotypes obscures individual contributions to phenotypic variation (rootstock-mediated heritability), hampering tree breeding. Therefore, our goal was to quantify the inheritance of seedling rootstock effects on scion traits using avocado (Persea americana Mill.) cv. Hass as a model fruit tree. We characterized 240 diverse rootstocks from 8 avocado cv. Hass orchards with similar management in three regions of the province of Antioquia, northwest Andes of Colombia, using 13 microsatellite markers simple sequence repeats (SSRs). Parallel to this, we recorded 20 phenotypic traits (including morphological, biomass/reproductive, and fruit yield and quality traits) in the scions for 3 years (2015-2017). Relatedness among rootstocks was inferred through the genetic markers and inputted in a "genetic prediction" model to calculate narrow-sense heritabilities (h 2) on scion traits. We used three different randomization tests to highlight traits with consistently significant heritability estimates. This strategy allowed us to capture five traits with significant heritability values that ranged from 0.33 to 0.45 and model fits (r) that oscillated between 0.58 and 0.73 across orchards. The results showed significance in the rootstock effects for four complex harvest and quality traits (i.e., total number of fruits, number of fruits with exportation quality, and number of fruits discarded because of low weight or thrips damage), whereas the only morphological trait that had a significant heritability value was overall trunk height (an emergent property of the rootstock-scion interaction). These findings suggest the inheritance of rootstock effects, beyond root phenotype, on a surprisingly wide spectrum of scion traits in "Hass" avocado. They also reinforce the utility of polymorphic SSRs for relatedness reconstruction and genetic prediction of complex traits. This research is, up to date, the most cohesive evidence of narrow-sense inheritance of rootstock effects in a tropical fruit tree crop. Ultimately, our work highlights the importance of considering the rootstock-scion interaction to broaden the genetic basis of fruit tree breeding programs while enhancing our understanding of the consequences of grafting.
Collapse
Affiliation(s)
- Paula H. Reyes-Herrera
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)—CI Tibaitatá, Mosquera, Colombia
| | - Laura Muñoz-Baena
- Department of Microbiology and Immunology, Western University, London, ON, Canada
| | - Valeria Velásquez-Zapata
- Department of Plant Pathology and Microbiology, Interdepartmental Bioinformatics and Computational Biology, Iowa State University, Ames, IA, United States
| | - Laura Patiño
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)—CI La Selva, Rionegro, Colombia
| | - Oscar A. Delgado-Paz
- Facultad de Ingenierías, Universidad Católica de Oriente—UCO, Rionegro, Antioquia
| | - Cipriano A. Díaz-Diez
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)—CI La Selva, Rionegro, Colombia
| | | | - Andrés J. Cortés
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)—CI La Selva, Rionegro, Colombia
| |
Collapse
|
49
|
Cortés AJ, Restrepo-Montoya M, Bedoya-Canas LE. Modern Strategies to Assess and Breed Forest Tree Adaptation to Changing Climate. FRONTIERS IN PLANT SCIENCE 2020; 11:583323. [PMID: 33193532 PMCID: PMC7609427 DOI: 10.3389/fpls.2020.583323] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 09/29/2020] [Indexed: 05/02/2023]
Abstract
Studying the genetics of adaptation to new environments in ecologically and industrially important tree species is currently a major research line in the fields of plant science and genetic improvement for tolerance to abiotic stress. Specifically, exploring the genomic basis of local adaptation is imperative for assessing the conditions under which trees will successfully adapt in situ to global climate change. However, this knowledge has scarcely been used in conservation and forest tree improvement because woody perennials face major research limitations such as their outcrossing reproductive systems, long juvenile phase, and huge genome sizes. Therefore, in this review we discuss predictive genomic approaches that promise increasing adaptive selection accuracy and shortening generation intervals. They may also assist the detection of novel allelic variants from tree germplasm, and disclose the genomic potential of adaptation to different environments. For instance, natural populations of tree species invite using tools from the population genomics field to study the signatures of local adaptation. Conventional genetic markers and whole genome sequencing both help identifying genes and markers that diverge between local populations more than expected under neutrality, and that exhibit unique signatures of diversity indicative of "selective sweeps." Ultimately, these efforts inform the conservation and breeding status capable of pivoting forest health, ecosystem services, and sustainable production. Key long-term perspectives include understanding how trees' phylogeographic history may affect the adaptive relevant genetic variation available for adaptation to environmental change. Encouraging "big data" approaches (machine learning-ML) capable of comprehensively merging heterogeneous genomic and ecological datasets is becoming imperative, too.
Collapse
Affiliation(s)
- Andrés J. Cortés
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, Rionegro, Colombia
- Departamento de Ciencias Forestales, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia – Sede Medellín, Medellín, Colombia
| | - Manuela Restrepo-Montoya
- Departamento de Ciencias Forestales, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia – Sede Medellín, Medellín, Colombia
| | - Larry E. Bedoya-Canas
- Departamento de Ciencias Forestales, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia – Sede Medellín, Medellín, Colombia
| |
Collapse
|