1
|
Ray S, Jarquin D, Howard R. Comparing artificial-intelligence techniques with state-of-the-art parametric prediction models for predicting soybean traits. THE PLANT GENOME 2023; 16:e20263. [PMID: 36484148 DOI: 10.1002/tpg2.20263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 05/16/2022] [Indexed: 05/10/2023]
Abstract
Soybean [Glycine max (L.) Merr.] is a significant source of protein and oil and is also widely used as animal feed. Thus, developing lines that are superior in terms of yield, protein, and oil content is important to feed the ever-growing population. As opposed to high-cost phenotyping, genotyping is both cost and time efficient for breeders because evaluating new lines in different environments (location-year combinations) can be costly. Several genomic prediction (GP) methods have been developed to use the marker and environment data effectively to predict the yield or other relevant phenotypic traits of crops. Our study compares a conventional GP method (genomic best linear unbiased predictor [GBLUP]), a kernel method (Gaussian kernel [GK]), an artificial-intelligence (AI) method (deep learning [DL]), and a hybrid method that corresponds to the emulation of a DL model using a kernel method (an arc-cosine kernel [AK]) in terms of their prediction accuracies for predicting grain yield, oil, and protein using data from the soybean nested association mapping experiment (1,379 genotypes tested in six environments, all genotypes in all environments). The relative performance of the four methods varied with the response variable and whether the model includes the genotype × environmental interaction (G×E) effects or not. The GBLUP consistently showed better performances, whereas GK and AK followed a similar pattern to GBLUP and DL performed slightly worse than the other three methods in most of the cases; however, this may also be attributed to suboptimal hyperparameters. The DL method performed particularly worse than the other three methods in presence of the G×E effects.
Collapse
Affiliation(s)
- Susweta Ray
- Dep. of Statistics, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583, USA
| | - Diego Jarquin
- Dep. of Agronomy, Univ. of Florida, Gainesville, FL, 32611, USA
| | - Reka Howard
- Dep. of Statistics, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583, USA
| |
Collapse
|
2
|
Bermann M, Lourenco D, Forneris NS, Legarra A, Misztal I. On the equivalence between marker effect models and breeding value models and direct genomic values with the Algorithm for Proven and Young. Genet Sel Evol 2022; 54:52. [PMID: 35842585 PMCID: PMC9288049 DOI: 10.1186/s12711-022-00741-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 06/29/2022] [Indexed: 12/04/2022] Open
Abstract
Background Single-step genomic predictions obtained from a breeding value model require calculating the inverse of the genomic relationship matrix \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$({\mathbf{G}}^{-1})$$\end{document}(G-1). The Algorithm for Proven and Young (APY) creates a sparse representation of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{G}}^{-1}$$\end{document}G-1 with a low computational cost. APY consists of selecting a group of core animals and expressing the breeding values of the remaining animals as a linear combination of those from the core animals plus an error term. The objectives of this study were to: (1) extend APY to marker effects models; (2) derive equations for marker effect estimates when APY is used for breeding value models, and (3) show the implication of selecting a specific group of core animals in terms of a marker effects model. Results We derived a family of marker effects models called APY-SNP-BLUP. It differs from the classic marker effects model in that the row space of the genotype matrix is reduced and an error term is fitted for non-core animals. We derived formulas for marker effect estimates that take this error term in account. The prediction error variance (PEV) of the marker effect estimates depends on the PEV for core animals but not directly on the PEV of the non-core animals. We extended the APY-SNP-BLUP to include a residual polygenic effect and accommodate non-genotyped animals. We show that selecting a specific group of core animals is equivalent to select a subspace of the row space of the genotype matrix. As the number of core animals increases, subspaces corresponding to different sets of core animals tend to overlap, showing that random selection of core animals is algebraically justified. Conclusions The APY-(ss)GBLUP models can be expressed in terms of marker effect models. When the number of core animals is equal to the rank of the genotype matrix, APY-SNP-BLUP is identical to the classic marker effects model. If the number of core animals is less than the rank of the genotype matrix, genotypes for non-core animals are imputed as a linear combination of the genotypes of the core animals. For estimating SNP effects, only relationships and estimated breeding values for core animals are needed. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-022-00741-7.
Collapse
Affiliation(s)
- Matias Bermann
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| | - Natalia S Forneris
- Facultad de Agronomía, Universidad de Buenos Aires, C1417DSQ, Buenos Aires, Argentina.,Instituto de Investigaciones en Producción Animal (INPA), CONICET - Universidad de Buenos Aires, C1427CWO, Buenos Aires, Argentina
| | | | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| |
Collapse
|
3
|
Montesinos López OA, Mosqueda González BA, Palafox González A, Montesinos López A, Crossa J. A General-Purpose Machine Learning R Library for Sparse Kernels Methods With an Application for Genome-Based Prediction. Front Genet 2022; 13:887643. [PMID: 35719365 PMCID: PMC9205295 DOI: 10.3389/fgene.2022.887643] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 05/02/2022] [Indexed: 11/28/2022] Open
Abstract
The adoption of machine learning frameworks in areas beyond computer science have been facilitated by the development of user-friendly software tools that do not require an advanced understanding of computer programming. In this paper, we present a new package (sparse kernel methods, SKM) software developed in R language for implementing six (generalized boosted machines, generalized linear models, support vector machines, random forest, Bayesian regression models and deep neural networks) of the most popular supervised machine learning algorithms with the optional use of sparse kernels. The SKM focuses on user simplicity, as it does not try to include all the available machine learning algorithms, but rather the most important aspects of these six algorithms in an easy-to-understand format. Another relevant contribution of this package is a function for the computation of seven different kernels. These are Linear, Polynomial, Sigmoid, Gaussian, Exponential, Arc-Cosine 1 and Arc-Cosine L (with L = 2, 3, … ) and their sparse versions, which allow users to create kernel machines without modifying the statistical machine learning algorithm. It is important to point out that the main contribution of our package resides in the functionality for the computation of the sparse version of seven basic kernels, which is indispensable for reducing computational resources to implement kernel machine learning methods without a significant loss in prediction performance. Performance of the SKM is evaluated in a genome-based prediction framework using both a maize and wheat data set. As such, the use of this package is not restricted to genome prediction problems, and can be used in many different applications.
Collapse
Affiliation(s)
| | | | - Abel Palafox González
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Mexico
| | - Abelardo Montesinos López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Mexico
- *Correspondence: Abelardo Montesinos López, ; José Crossa,
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
- Colegio de Postgraduados, Montecillo, Mexico
- *Correspondence: Abelardo Montesinos López, ; José Crossa,
| |
Collapse
|
4
|
Brzozowski LJ, Campbell MT, Hu H, Caffe M, Gutiérrez LA, Smith KP, Sorrells ME, Gore MA, Jannink JL. Generalizable approaches for genomic prediction of metabolites in plants. THE PLANT GENOME 2022; 15:e20205. [PMID: 35470586 DOI: 10.1002/tpg2.20205] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 02/21/2022] [Indexed: 06/14/2023]
Abstract
Plant metabolites are important traits for plant breeders seeking to improve nutrition and agronomic performance yet integrating selection for metabolomic traits can be limited by phenotyping expense and degree of genetic characterization, especially of uncommon metabolites. As such, developing generalizable genomic selection methods based on biochemical pathway biology for metabolites that are transferable across plant populations would benefit plant breeding programs. We tested genomic prediction accuracy for >600 metabolites measured by gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS) in oat (Avena sativa L.) seed. Using a discovery germplasm panel, we conducted metabolite genome-wide association study (mGWAS) and selected loci to use in multikernel models that encompassed metabolome-wide mGWAS results or mGWAS from specific metabolite structures or biosynthetic pathways. Metabolite kernels developed from LC-MS metabolites in the discovery panel improved prediction accuracy of LC-MS metabolite traits in the validation panel consisting of more advanced breeding lines. No approach, however, improved prediction accuracy for GC-MS metabolites. We ranked model performance by metabolite and found that metabolites with similar polarity had consistent rankings of models. Overall, testing biological rationales for developing kernels for genomic prediction across populations contributes to developing frameworks for plant breeding for metabolite traits.
Collapse
Affiliation(s)
- Lauren J Brzozowski
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Malachy T Campbell
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Haixiao Hu
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Melanie Caffe
- Dep. of Agronomy, Horticulture & Plant Science, South Dakota State Univ., Brookings, SD, 57006, USA
| | - Lucı A Gutiérrez
- Dep. of Agronomy, Univ. of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Kevin P Smith
- Dep. of Agronomy & Plant Genetics, Univ. of Minnesota, St. Paul, MN, 55108, USA
| | - Mark E Sorrells
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Jean-Luc Jannink
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
- USDA-ARS, Robert W. Holley Center for Agriculture and Health, Ithaca, NY, 14853, USA
| |
Collapse
|
5
|
Billings GT, Jones MA, Rustgi S, Bridges WC, Holland JB, Hulse-Kemp AM, Campbell BT. Outlook for Implementation of Genomics-Based Selection in Public Cotton Breeding Programs. PLANTS 2022; 11:plants11111446. [PMID: 35684219 PMCID: PMC9182660 DOI: 10.3390/plants11111446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 05/09/2022] [Accepted: 05/16/2022] [Indexed: 11/16/2022]
Abstract
Researchers have used quantitative genetics to map cotton fiber quality and agronomic performance loci, but many alleles may be population or environment-specific, limiting their usefulness in a pedigree selection, inbreeding-based system. Here, we utilized genotypic and phenotypic data on a panel of 80 important historical Upland cotton (Gossypium hirsutum L.) lines to investigate the potential for genomics-based selection within a cotton breeding program’s relatively closed gene pool. We performed a genome-wide association study (GWAS) to identify alleles correlated to 20 fiber quality, seed composition, and yield traits and looked for a consistent detection of GWAS hits across 14 individual field trials. We also explored the potential for genomic prediction to capture genotypic variation for these quantitative traits and tested the incorporation of GWAS hits into the prediction model. Overall, we found that genomic selection programs for fiber quality can begin immediately, and the prediction ability for most other traits is lower but commensurate with heritability. Stably detected GWAS hits can improve prediction accuracy, although a significance threshold must be carefully chosen to include a marker as a fixed effect. We place these results in the context of modern public cotton line-breeding and highlight the need for a community-based approach to amass the data and expertise necessary to launch US public-sector cotton breeders into the genomics-based selection era.
Collapse
Affiliation(s)
- Grant T. Billings
- Bioinformatics Graduate Program, North Carolina State University, Raleigh, NC 27695, USA; (G.T.B.); (J.B.H.)
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Michael A. Jones
- Pee Dee Research and Education Center, Clemson University, Florence, SC 29506, USA; (M.A.J.); (S.R.)
| | - Sachin Rustgi
- Pee Dee Research and Education Center, Clemson University, Florence, SC 29506, USA; (M.A.J.); (S.R.)
| | - William C. Bridges
- Department of Mathematical and Statistical Sciences, Clemson University, Clemson, SC 29634, USA;
| | - James B. Holland
- Bioinformatics Graduate Program, North Carolina State University, Raleigh, NC 27695, USA; (G.T.B.); (J.B.H.)
- Plant Sciences Research Unit, The Agricultural Research Service of U.S. Department of Agriculture, Raleigh, NC 27695, USA
| | - Amanda M. Hulse-Kemp
- Bioinformatics Graduate Program, North Carolina State University, Raleigh, NC 27695, USA; (G.T.B.); (J.B.H.)
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
- Genomics and Bioinformatics Research Unit, The Agricultural Research Service of U.S. Department of Agriculture, Raleigh, NC 27965, USA
- Correspondence: (A.M.H.-K.); (B.T.C.)
| | - B. Todd Campbell
- Coastal Plains Soil, Water, and Plant Research Center, The Agricultural Research Service of U.S. Department of Agriculture, Florence, SC 29501, USA
- Correspondence: (A.M.H.-K.); (B.T.C.)
| |
Collapse
|
6
|
Montesinos-López OA, Montesinos-López JC, Montesinos-López A, Ramírez-Alcaraz JM, Poland J, Singh R, Dreisigacker S, Crespo L, Mondal S, Govidan V, Juliana P, Espino JH, Shrestha S, Varshney RK, Crossa J. Bayesian multitrait kernel methods improve multienvironment genome-based prediction. G3 (BETHESDA, MD.) 2022; 12:6446035. [PMID: 34849802 PMCID: PMC9210316 DOI: 10.1093/g3journal/jkab406] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 11/18/2021] [Indexed: 11/14/2022]
Abstract
When multitrait data are available, the preferred models are those that are able to account for correlations between phenotypic traits because when the degree of correlation is moderate or large, this increases the genomic prediction accuracy. For this reason, in this article, we explore Bayesian multitrait kernel methods for genomic prediction and we illustrate the power of these models with three-real datasets. The kernels under study were the linear, Gaussian, polynomial, and sigmoid kernels; they were compared with the conventional Ridge regression and GBLUP multitrait models. The results show that, in general, the Gaussian kernel method outperformed conventional Bayesian Ridge and GBLUP multitrait linear models by 2.2–17.45% (datasets 1–3) in terms of prediction performance based on the mean square error of prediction. This improvement in terms of prediction performance of the Bayesian multitrait kernel method can be attributed to the fact that the proposed model is able to capture nonlinear patterns more efficiently than linear multitrait models. However, not all kernels perform well in the datasets used for evaluation, which is why more than one kernel should be evaluated to be able to choose the best kernel.
Collapse
Affiliation(s)
| | | | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Guadalajara 44430, Mexico
- Corresponding author: Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco 44430, Mexico. (A.M.-L.); International Maize and Wheat Improvement Center (CIMMYT). Km 45 Carretera Mexico-Veracruz, CP 52640, Texcoco, Edo de Mexico, Mexico. (J.C.)
| | | | - Jesse Poland
- Department of Agronomy, Kansas State University, 2004 Throckmorton Plant Science Center, Manhattan, KS 66506, USA
| | - Ravi Singh
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
| | - Leonardo Crespo
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
| | - Sushismita Mondal
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
| | - Velu Govidan
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
| | - Philomin Juliana
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
| | - Julio Huerta Espino
- Campo Experimental Valle de Mexico, Instituto Nacional de Investigaciones Forestales, Agricolas y Pecuarias (INIFAP), Universidad Autónoma de Chapingo, Texcoco 56235, Mexico
| | - Sandesh Shrestha
- Department of Agronomy, Kansas State University, 2004 Throckmorton Plant Science Center, Manhattan, KS 66506, USA
| | - Rajeev K Varshney
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch 6150, Australia
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
- Colegio de Postgraduados, Montecillos, Edo. de México 56230, Mexico
- Corresponding author: Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco 44430, Mexico. (A.M.-L.); International Maize and Wheat Improvement Center (CIMMYT). Km 45 Carretera Mexico-Veracruz, CP 52640, Texcoco, Edo de Mexico, Mexico. (J.C.)
| |
Collapse
|
7
|
Abstract
Motivation Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but these libraries are vastly incomplete; in silico methods search in structure databases, allowing us to overcome this limitation. The best-performing in silico methods use machine learning to predict a molecular fingerprint from tandem mass spectra, then use the predicted fingerprint to search in a molecular structure database. Predicted molecular fingerprints are also of great interest for compound class annotation, de novo structure elucidation, and other tasks. So far, kernel support vector machines are the best tool for fingerprint prediction. However, they cannot be trained on all publicly available reference spectra because their training time scales cubically with the number of training data. Results We use the Nyström approximation to transform the kernel into a linear feature map. We evaluate two methods that use this feature map as input: a linear support vector machine and a deep neural network (DNN). For evaluation, we use a cross-validated dataset of 156 017 compounds and three independent datasets with 1734 compounds. We show that the combination of kernel method and DNN outperforms the kernel support vector machine, which is the current gold standard, as well as a DNN on tandem mass spectra on all evaluation datasets. Availability and implementation The deep kernel learning method for fingerprint prediction is part of the SIRIUS software, available at https://bio.informatik.uni-jena.de/software/sirius.
Collapse
Affiliation(s)
- Kai Dührkop
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
8
|
Crossa J, Montesinos-López OA, Pérez-Rodríguez P, Costa-Neto G, Fritsche-Neto R, Ortiz R, Martini JWR, Lillemo M, Montesinos-López A, Jarquin D, Breseghello F, Cuevas J, Rincent R. Genome and Environment Based Prediction Models and Methods of Complex Traits Incorporating Genotype × Environment Interaction. Methods Mol Biol 2022; 2467:245-283. [PMID: 35451779 DOI: 10.1007/978-1-0716-2205-6_9] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genomic-enabled prediction models are of paramount importance for the successful implementation of genomic selection (GS) based on breeding values. As opposed to animal breeding, plant breeding includes extensive multienvironment and multiyear field trial data. Hence, genomic-enabled prediction models should include genotype × environment (G × E) interaction, which most of the time increases the prediction performance when the response of lines are different from environment to environment. In this chapter, we describe a historical timeline since 2012 related to advances of the GS models that take into account G × E interaction. We describe theoretical and practical aspects of those GS models, including the gains in prediction performance when including G × E structures for both complex continuous and categorical scale traits. Then, we detailed and explained the main G × E genomic prediction models for complex traits measured in continuous and noncontinuous (categorical) scale. Related to G × E interaction models this review also examine the analyses of the information generated with high-throughput phenotype data (phenomic) and the joint analyses of multitrait and multienvironment field trial data that is also employed in the general assessment of multitrait G × E interaction. The inclusion of nongenomic data in increasing the accuracy and biological reliability of the G × E approach is also outlined. We show the recent advances in large-scale envirotyping (enviromics), and how the use of mechanistic computational modeling can derive the crop growth and development aspects useful for predicting phenotypes and explaining G × E.
Collapse
Affiliation(s)
- José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, Mexico
- Colegio de Postgraduados, Montecillos, Mexico
| | | | | | - Germano Costa-Neto
- Departamento de Genética, Escola Superior de Agricultura "Luiz de Queiroz" (ESALQ/USP), São Paulo, Brazil
| | - Roberto Fritsche-Neto
- Departamento de Genética, Escola Superior de Agricultura "Luiz de Queiroz" (ESALQ/USP), São Paulo, Brazil
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), Alnarp, Sweden
| | - Johannes W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, Mexico
| | - Morten Lillemo
- Department of Plant Sciences, Norwegian University of Life Sciences, IHA/CIGENE, Ås, Norway
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | | | | | - Jaime Cuevas
- Universidad de Quintana Roo, Chetumal, Quintana Roo, Mexico.
| | - Renaud Rincent
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, Gif-sur-Yvette, France.
| |
Collapse
|
9
|
Aoun M, Carter A, Thompson YA, Ward B, Morris CF. Environment characterization and genomic prediction for end-use quality traits in soft white winter wheat. THE PLANT GENOME 2021; 14:e20128. [PMID: 34396703 DOI: 10.1002/tpg2.20128] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 06/08/2021] [Indexed: 06/13/2023]
Abstract
End-use quality phenotyping is laborious and expensive, thus, testing may not occur until later generations in wheat breeding programs. We investigated the pattern of genotype × environment (G × E) interaction for end-use quality traits in soft white wheat (Triticum aestivum L.) and tested the effectiveness of implementing genomic selection to optimize breeding for these traits. We used a multi-environment unbalanced dataset comprised of 672 breeding lines and cultivars adapted to the Pacific Northwest region of the United States, which were evaluated for 14 end-use quality traits. Genetic correlations between environments based on factor analytic models showed low-to-moderate G × E interaction for most traits but high G × E interaction for grain and flour protein. A total of 40,518 single-nucleotide polymorphism markers were used for genomic prediction. Genomic prediction accuracies were high for most traits thereby justifying the use of genomic selection to assist breeding for superior end-use quality in soft white wheat. Excluding outlier environments based on genetic correlations between environments was more effective in increasing genomic prediction accuracies compared with that based on environment clustering analysis. For kernel size, kernel weight, milling score, ash, and flour swelling volume, excluding outlier environments increased prediction accuracies by 1-11%. However, for grain and flour protein, flour yield, and cookie diameter, excluding outlier environments did not improve genomic prediction performance.
Collapse
Affiliation(s)
- Meriem Aoun
- Dep. of Crop and Soil Sciences, Washington State Univ., Pullman, WA, 99164, USA
| | - Arron Carter
- Dep. of Crop and Soil Sciences, Washington State Univ., Pullman, WA, 99164, USA
| | - Yvonne A Thompson
- USDA-ARS Western Wheat & Pulse Quality Laboratory, Washington State Univ., Pullman, WA, 99164, USA
| | - Brian Ward
- USDA-ARS Plant Science Research Campus, Raleigh, NC, 27695, USA
- Dep. of Horticulture and Crop Science, Ohio State University, Wooster, OH, 44691, USA
| | - Craig F Morris
- USDA-ARS Western Wheat & Pulse Quality Laboratory, Washington State Univ., Pullman, WA, 99164, USA
| |
Collapse
|
10
|
Fritsche-Neto R, Galli G, Borges KLR, Costa-Neto G, Alves FC, Sabadin F, Lyra DH, Morais PPP, Braatz de Andrade LR, Granato I, Crossa J. Optimizing Genomic-Enabled Prediction in Small-Scale Maize Hybrid Breeding Programs: A Roadmap Review. FRONTIERS IN PLANT SCIENCE 2021; 12:658267. [PMID: 34276721 PMCID: PMC8281958 DOI: 10.3389/fpls.2021.658267] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 05/10/2021] [Indexed: 06/13/2023]
Abstract
The usefulness of genomic prediction (GP) for many animal and plant breeding programs has been highlighted for many studies in the last 20 years. In maize breeding programs, mostly dedicated to delivering more highly adapted and productive hybrids, this approach has been proved successful for both large- and small-scale breeding programs worldwide. Here, we present some of the strategies developed to improve the accuracy of GP in tropical maize, focusing on its use under low budget and small-scale conditions achieved for most of the hybrid breeding programs in developing countries. We highlight the most important outcomes obtained by the University of São Paulo (USP, Brazil) and how they can improve the accuracy of prediction in tropical maize hybrids. Our roadmap starts with the efforts for germplasm characterization, moving on to the practices for mating design, and the selection of the genotypes that are used to compose the training population in field phenotyping trials. Factors including population structure and the importance of non-additive effects (dominance and epistasis) controlling the desired trait are also outlined. Finally, we explain how the source of the molecular markers, environmental, and the modeling of genotype-environment interaction can affect the accuracy of GP. Results of 7 years of research in a public maize hybrid breeding program under tropical conditions are discussed, and with the great advances that have been made, we find that what is yet to come is exciting. The use of open-source software for the quality control of molecular markers, implementing GP, and envirotyping pipelines may reduce costs in an efficient computational manner. We conclude that exploring new models/tools using high-throughput phenotyping data along with large-scale envirotyping may bring more resolution and realism when predicting genotype performances. Despite the initial costs, mostly for genotyping, the GP platforms in combination with these other data sources can be a cost-effective approach for predicting the performance of maize hybrids for a large set of growing conditions.
Collapse
Affiliation(s)
- Roberto Fritsche-Neto
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Giovanni Galli
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Karina Lima Reis Borges
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Germano Costa-Neto
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Filipe Couto Alves
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, United States
| | - Felipe Sabadin
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Danilo Hottis Lyra
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpenden, United Kingdom
| | | | | | - Italo Granato
- Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux (LEPSE), Institut National de la Recherche Agronomique (INRA), Univ. Montpellier, SupAgro, Montpellier, France
| | - Jose Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz, Texcoco, Mexico
- Colegio de Posgraduado, Montecillo, Mexico
| |
Collapse
|
11
|
Sinha P, Singh VK, Bohra A, Kumar A, Reif JC, Varshney RK. Genomics and breeding innovations for enhancing genetic gain for climate resilience and nutrition traits. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:1829-1843. [PMID: 34014373 PMCID: PMC8205890 DOI: 10.1007/s00122-021-03847-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/29/2021] [Indexed: 05/03/2023]
Abstract
KEY MESSAGE Integrating genomics technologies and breeding methods to tweak core parameters of the breeder's equation could accelerate delivery of climate-resilient and nutrient rich crops for future food security. Accelerating genetic gain in crop improvement programs with respect to climate resilience and nutrition traits, and the realization of the improved gain in farmers' fields require integration of several approaches. This article focuses on innovative approaches to address core components of the breeder's equation. A prerequisite to enhancing genetic variance (σ2g) is the identification or creation of favorable alleles/haplotypes and their deployment for improving key traits. Novel alleles for new and existing target traits need to be accessed and added to the breeding population while maintaining genetic diversity. Selection intensity (i) in the breeding program can be improved by testing a larger population size, enabled by the statistical designs with minimal replications and high-throughput phenotyping. Selection priorities and criteria to select appropriate portion of the population too assume an important role. The most important component of breeder's equation is heritability (h2). Heritability estimates depend on several factors including the size and the type of population and the statistical methods. The present article starts with a brief discussion on the potential ways to enhance σ2g in the population. We highlight statistical methods and experimental designs that could improve trait heritability estimation. We also offer a perspective on reducing the breeding cycle time (t), which could be achieved through the selection of appropriate parents, optimizing the breeding scheme, rapid fixation of target alleles, and combining speed breeding with breeding programs to optimize trials for release. Finally, we summarize knowledge from multiple disciplines for enhancing genetic gains for climate resilience and nutritional traits.
Collapse
Affiliation(s)
- Pallavi Sinha
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
- International Rice Research Institute (IRRI), IRRI South Asia Hub, ICRISAT, Hyderabad, India
| | - Vikas K Singh
- International Rice Research Institute (IRRI), IRRI South Asia Hub, ICRISAT, Hyderabad, India
| | - Abhishek Bohra
- ICAR- Indian Institute of Pulses Research (IIPR), Kanpur, India
| | - Arvind Kumar
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Jochen C Reif
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | - Rajeev K Varshney
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India.
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, WA, Australia.
| |
Collapse
|
12
|
Montesinos-López A, Montesinos-López OA, Montesinos-López JC, Flores-Cortes CA, de la Rosa R, Crossa J. A guide for kernel generalized regression methods for genomic-enabled prediction. Heredity (Edinb) 2021; 126:577-596. [PMID: 33649571 PMCID: PMC8115678 DOI: 10.1038/s41437-021-00412-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 01/23/2021] [Accepted: 01/24/2021] [Indexed: 01/30/2023] Open
Abstract
The primary objective of this paper is to provide a guide on implementing Bayesian generalized kernel regression methods for genomic prediction in the statistical software R. Such methods are quite efficient for capturing complex non-linear patterns that conventional linear regression models cannot. Furthermore, these methods are also powerful for leveraging environmental covariates, such as genotype × environment (G×E) prediction, among others. In this study we provide the building process of seven kernel methods: linear, polynomial, sigmoid, Gaussian, Exponential, Arc-cosine 1 and Arc-cosine L. Additionally, we highlight illustrative examples for implementing exact kernel methods for genomic prediction under a single-environment, a multi-environment and multi-trait framework, as well as for the implementation of sparse kernel methods under a multi-environment framework. These examples are followed by a discussion on the strengths and limitations of kernel methods and, subsequently by conclusions about the main contributions of this paper.
Collapse
Affiliation(s)
- Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, México
| | | | | | | | - Roberto de la Rosa
- Colegio de Postgraduados (CP), Campus Tabasco, Producción Agroalimentaria en el Trópico, H. Cárdenas, Tabasco, México
| | - José Crossa
- Colegio de Postgraduados, Campus Montecillos, CP 56230, Montecillos, Edo. de México, México.
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45, CP 52640, Carretera Mexico-Veracruz, México.
| |
Collapse
|