1
|
Bartholomé J, Ospina JO, Sandoval M, Espinosa N, Arcos J, Ospina Y, Frouin J, Beartschi C, Ghneim T, Grenier C. Genomic selection for tolerance to aluminum toxicity in a synthetic population of upland rice. PLoS One 2024; 19:e0307009. [PMID: 39173048 PMCID: PMC11341055 DOI: 10.1371/journal.pone.0307009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 06/28/2024] [Indexed: 08/24/2024] Open
Abstract
Over half of the world's arable land is acidic, which constrains cereal production. In South America, different rice-growing regions (Cerrado in Brazil and Llanos in Colombia and Venezuela) are particularly affected due to high aluminum toxicity levels. For this reason, efforts have been made to breed for tolerance to aluminum toxicity using synthetic populations. The breeding program of CIAT-CIRAD is a good example of the use of recurrent selection to increase productivity for the Llanos in Colombia. In this study, we evaluated the performance of genomic prediction models to optimize the breeding scheme by hastening the development of an improved synthetic population and elite lines. We characterized 334 families at the S0:4 generation in two conditions. One condition was the control, managed with liming, while the other had high aluminum toxicity. Four traits were considered: days to flowering (FL), plant height (PH), grain yield (YLD), and zinc concentration in the polished grain (ZN). The population presented a high tolerance to aluminum toxicity, with more than 72% of the families showing a higher yield under aluminum conditions. The performance of the families under the aluminum toxicity condition was predicted using four different models: a single-environment model and three multi-environment models. The multi-environment models differed in the way they integrated genotype-by-environment interactions. The best predictive abilities were achieved using multi-environment models: 0.67 for FL, 0.60 for PH, 0.53 for YLD, and 0.65 for ZN. The gain of multi-environment over single-environment models ranged from 71% for YLD to 430% for FL. The selection of the best-performing families based on multi-trait indices, including the four traits mentioned above, facilitated the identification of suitable families for recombination. This information will be used to develop a new cycle of recurrent selection through genomic selection.
Collapse
Affiliation(s)
- Jérôme Bartholomé
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- Alliance Bioversity CIAT, Cali, Colombia
| | | | | | - Natalia Espinosa
- Alliance Bioversity CIAT, Cali, Colombia
- FEDEARROZ–Fondo Nacional del Arroz, Bogotá, Colombia
| | - Jairo Arcos
- HarvestPlus Program, Alliance Bioversity CIAT, Cali, Colombia
| | | | - Julien Frouin
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Cédric Beartschi
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Thaura Ghneim
- Departamento de Ciencias Biológicas, Universidad ICESI, Cali, Colombia
| | - Cécile Grenier
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| |
Collapse
|
2
|
Bose S, Banerjee S, Kumar S, Saha A, Nandy D, Hazra S. Review of applications of artificial intelligence (AI) methods in crop research. J Appl Genet 2024; 65:225-240. [PMID: 38216788 DOI: 10.1007/s13353-023-00826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 12/23/2023] [Accepted: 12/26/2023] [Indexed: 01/14/2024]
Abstract
Sophisticated and modern crop improvement techniques can bridge the gap for feeding the ever-increasing population. Artificial intelligence (AI) refers to the simulation of human intelligence in machines, which refers to the application of computational algorithms, machine learning (ML) and deep learning (DL) techniques. This is aimed to generalise patterns and relationships from historical data, employing various mathematical optimisation techniques thus making prediction models for facilitating selection of superior genotypes. These techniques are less resource intensive and can solve the problem based on the analysis of large-scale phenotypic datasets. ML for genomic selection (GS) uses high-throughput genotyping technologies to gather genetic information on a large number of markers across the genome. The prediction of GS models is based on the mathematical relation between genotypic and phenotypic data from the training population. ML techniques have emerged as powerful tools for genome editing through analysing large-scale genomic data and facilitating the development of accurate prediction models. Precise phenotyping is a prerequisite to advance crop breeding for solving agricultural production-related issues. ML algorithms can solve this problem through generating predictive models, based on the analysis of large-scale phenotypic datasets. DL models also have the potential reliability of precise phenotyping. This review provides a comprehensive overview on various ML and DL models, their applications, potential to enhance the efficiency, specificity and safety towards advanced crop improvement protocols such as genomic selection, genome editing, along with phenotypic prediction to promote accelerated breeding.
Collapse
Affiliation(s)
- Suvojit Bose
- Department of Vegetables and Spice Crops, Uttar Banga Krishi Viswavidyalaya, Pundibari, Cooch Behar, 736165, West Bengal, India
| | | | - Soumya Kumar
- School of Agricultural Sciences, JIS University, Kolkata, 700109, West Bengal, India
| | - Akash Saha
- School of Agricultural Sciences, JIS University, Kolkata, 700109, West Bengal, India
| | - Debalina Nandy
- School of Agricultural Sciences, JIS University, Kolkata, 700109, West Bengal, India
| | - Soham Hazra
- Department of Agriculture, Brainware University, Barasat, 700125, West Bengal, India.
| |
Collapse
|
3
|
Crozier D, Winans ND, Hoffmann L, Patil NY, Klein PE, Klein RR, Rooney WL. Evaluating and Predicting the Performance of Sorghum Lines in an Elite by Exotic Backcross-Nested Association Mapping Population. PLANTS (BASEL, SWITZERLAND) 2024; 13:879. [PMID: 38592905 PMCID: PMC10975396 DOI: 10.3390/plants13060879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/09/2024] [Accepted: 03/14/2024] [Indexed: 04/11/2024]
Abstract
Maintaining or introducing genetic diversity into plant breeding programs is necessary for continual genetic gain; however, diversity at the cost of reduced performance is not something sought by breeders. To this end, backcross-nested association mapping (BC-NAM) populations, in which the recurrent parent is an elite line, can be employed as a strategy to introgress diversity from unadapted accessions while maintaining agronomic performance. This study evaluates (i) the hybrid performance of sorghum lines from 18 BC1-NAM families and (ii) the potential of genomic prediction to screen lines from BC1-NAM families for hybrid performance prior to phenotypic evaluation. Despite the diverse geographical origins and agronomic performance of the unadapted parents for BC1-NAM families, many BC1-derived lines performed significantly better in the hybrid trials than the elite recurrent parent, R.Tx436. The genomic prediction accuracies for grain yield, plant height, and days to mid-anthesis were acceptable, but the prediction accuracies for plant height were lower than expected. While the prediction accuracies increased when including more individuals in the training set, improvements tended to plateau between two and five lines per family, with larger training sets being required for more complex traits such as grain yield. Therefore, genomic prediction models can be optimized in a large BC1-NAM population with a relatively low fraction of individuals needing to be evaluated. These results suggest that genomic prediction is an effective method of pre-screening lines within BC1-NAM families prior to evaluation in extensive hybrid field trials.
Collapse
Affiliation(s)
- Daniel Crozier
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Noah D. Winans
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Leo Hoffmann
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
- Department of Horticulture Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Nikhil Y. Patil
- Department of Horticultural Sciences, Texas A&M University, College Station, TX 77845, USA
- Health Sciences Center, University of Oklahoma, Oklahoma City, OK 73104, USA
| | - Patricia E. Klein
- Health Sciences Center, University of Oklahoma, Oklahoma City, OK 73104, USA
| | - Robert R. Klein
- Crop Germplasm Research Unit, United States Department of Agriculture Agricultural Research Service, College Station, TX 77843, USA;
| | - William L. Rooney
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
4
|
Martins FB, Aono AH, Moraes ADCL, Ferreira RCU, Vilela MDM, Pessoa-Filho M, Rodrigues-Motta M, Simeão RM, de Souza AP. Genome-wide family prediction unveils molecular mechanisms underlying the regulation of agronomic traits in Urochloa ruziziensis. FRONTIERS IN PLANT SCIENCE 2023; 14:1303417. [PMID: 38148869 PMCID: PMC10749977 DOI: 10.3389/fpls.2023.1303417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 11/15/2023] [Indexed: 12/28/2023]
Abstract
Tropical forage grasses, particularly those belonging to the Urochloa genus, play a crucial role in cattle production and serve as the main food source for animals in tropical and subtropical regions. The majority of these species are apomictic and tetraploid, highlighting the significance of U. ruziziensis, a sexual diploid species that can be tetraploidized for use in interspecific crosses with apomictic species. As a means to support breeding programs, our study investigates the feasibility of genome-wide family prediction in U. ruziziensis families to predict agronomic traits. Fifty half-sibling families were assessed for green matter yield, dry matter yield, regrowth capacity, leaf dry matter, and stem dry matter across different clippings established in contrasting seasons with varying available water capacity. Genotyping was performed using a genotyping-by-sequencing approach based on DNA samples from family pools. In addition to conventional genomic prediction methods, machine learning and feature selection algorithms were employed to reduce the necessary number of markers for prediction and enhance predictive accuracy across phenotypes. To explore the regulation of agronomic traits, our study evaluated the significance of selected markers for prediction using a tree-based approach, potentially linking these regions to quantitative trait loci (QTLs). In a multiomic approach, genes from the species transcriptome were mapped and correlated to those markers. A gene coexpression network was modeled with gene expression estimates from a diverse set of U. ruziziensis genotypes, enabling a comprehensive investigation of molecular mechanisms associated with these regions. The heritabilities of the evaluated traits ranged from 0.44 to 0.92. A total of 28,106 filtered SNPs were used to predict phenotypic measurements, achieving a mean predictive ability of 0.762. By employing feature selection techniques, we could reduce the dimensionality of SNP datasets, revealing potential genotype-phenotype associations. The functional annotation of genes near these markers revealed associations with auxin transport and biosynthesis of lignin, flavonol, and folic acid. Further exploration with the gene coexpression network uncovered associations with DNA metabolism, stress response, and circadian rhythm. These genes and regions represent important targets for expanding our understanding of the metabolic regulation of agronomic traits and offer valuable insights applicable to species breeding. Our work represents an innovative contribution to molecular breeding techniques for tropical forages, presenting a viable marker-assisted breeding approach and identifying target regions for future molecular studies on these agronomic traits.
Collapse
Affiliation(s)
- Felipe Bitencourt Martins
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Alexandre Hild Aono
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Aline da Costa Lima Moraes
- Department of Plant Biology, Biology Institute, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | | | | | - Marco Pessoa-Filho
- Embrapa Cerrados, Brazilian Agricultural Research Corporation, Brasília, Brazil
| | | | - Rosangela Maria Simeão
- Embrapa Gado de Corte, Brazilian Agricultural Research Corporation, Campo Grande, Mato Grosso, Brazil
| | - Anete Pereira de Souza
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
- Department of Plant Biology, Biology Institute, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| |
Collapse
|
5
|
de Verdal H, Baertschi C, Frouin J, Quintero C, Ospina Y, Alvarez MF, Cao TV, Bartholomé J, Grenier C. Optimization of Multi-Generation Multi-location Genomic Prediction Models for Recurrent Genomic Selection in an Upland Rice Population. RICE (NEW YORK, N.Y.) 2023; 16:43. [PMID: 37758969 PMCID: PMC10533757 DOI: 10.1186/s12284-023-00661-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 09/19/2023] [Indexed: 09/29/2023]
Abstract
Genomic selection is a worthy breeding method to improve genetic gain in recurrent selection breeding schemes. The integration of multi-generation and multi-location information could significantly improve genomic prediction models in the context of shuttle breeding. The Cirad-CIAT upland rice breeding program applies recurrent genomic selection and seeks to optimize the scheme to increase genetic gain while reducing phenotyping efforts. We used a synthetic population (PCT27) of which S0 plants were all genotyped and advanced by selfing and bulk seed harvest to the S0:2, S0:3, and S0:4 generations. The PCT27 was then divided into two sets. The S0:2 and S0:3 progenies for PCT27A and the S0:4 progenies for PCT27B were phenotyped in two locations: Santa Rosa the target selection location, within the upland rice growing area, and Palmira, the surrogate location, far from the upland rice growing area but easier for experimentation. While the calibration used either one of the two sets phenotyped in one or two locations, the validation population was only the PCT27B phenotyped in Santa Rosa. Five scenarios of genomic prediction and 24 models were performed and compared. Training the prediction model with the PCT27B phenotyped in Santa Rosa resulted in predictive abilities ranging from 0.19 for grain zinc concentration to 0.30 for grain yield. Expanding the training set with the inclusion of the PCT27A resulted in greater predictive abilities for all traits but grain yield, with increases from 5% for plant height to 61% for grain zinc concentration. Models with the PCT27B phenotyped in two locations resulted in higher prediction accuracy when the models assumed no genotype-by-environment (G × E) interaction for flowering (0.38) and grain zinc concentration (0.27). For plant height, the model assuming a single G × E variance provided higher accuracy (0.28). The gain in predictive ability for grain yield was the greatest (0.25) when environment-specific variance deviation effect for G × E was considered. While the best scenario was specific to each trait, the results indicated that the gain in predictive ability provided by the multi-location and multi-generation calibration was low. Yet, this approach could lead to increased selection intensity, acceleration of the breeding cycle, and a sizable economic advantage for the program.
Collapse
Affiliation(s)
- Hugues de Verdal
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France.
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France.
| | - Cédric Baertschi
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France
| | - Julien Frouin
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France
| | - Constanza Quintero
- Alliance Bioversity-CIAT, A.A.6713, Km 17 Recta Palmira Cali, Cali, Colombia
| | - Yolima Ospina
- Alliance Bioversity-CIAT, A.A.6713, Km 17 Recta Palmira Cali, Cali, Colombia
| | | | - Tuong-Vi Cao
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France
| | - Jérôme Bartholomé
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France
- Alliance Bioversity-CIAT, A.A.6713, Km 17 Recta Palmira Cali, Cali, Colombia
| | - Cécile Grenier
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France.
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398, Montpellier, France.
- Alliance Bioversity-CIAT, A.A.6713, Km 17 Recta Palmira Cali, Cali, Colombia.
| |
Collapse
|
6
|
Mora-Poblete F, Maldonado C, Henrique L, Uhdre R, Scapim CA, Mangolim CA. Multi-trait and multi-environment genomic prediction for flowering traits in maize: a deep learning approach. FRONTIERS IN PLANT SCIENCE 2023; 14:1153040. [PMID: 37593046 PMCID: PMC10428628 DOI: 10.3389/fpls.2023.1153040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 07/12/2023] [Indexed: 08/19/2023]
Abstract
Maize (Zea mays L.), the third most widely cultivated cereal crop in the world, plays a critical role in global food security. To improve the efficiency of selecting superior genotypes in breeding programs, researchers have aimed to identify key genomic regions that impact agronomic traits. In this study, the performance of multi-trait, multi-environment deep learning models was compared to that of Bayesian models (Markov Chain Monte Carlo generalized linear mixed models (MCMCglmm), Bayesian Genomic Genotype-Environment Interaction (BGGE), and Bayesian Multi-Trait and Multi-Environment (BMTME)) in terms of the prediction accuracy of flowering-related traits (Anthesis-Silking Interval: ASI, Female Flowering: FF, and Male Flowering: MF). A tropical maize panel of 258 inbred lines from Brazil was evaluated in three sites (Cambira-2018, Sabaudia-2018, and Iguatemi-2020 and 2021) using approximately 290,000 single nucleotide polymorphisms (SNPs). The results demonstrated a 14.4% increase in prediction accuracy when employing multi-trait models compared to the use of a single trait in a single environment approach. The accuracy of predictions also improved by 6.4% when using a single trait in a multi-environment scheme compared to using multi-trait analysis. Additionally, deep learning models consistently outperformed Bayesian models in both single and multiple trait and environment approaches. A complementary genome-wide association study identified associations with 26 candidate genes related to flowering time traits, and 31 marker-trait associations were identified, accounting for 37%, 37%, and 22% of the phenotypic variation of ASI, FF and MF, respectively. In conclusion, our findings suggest that deep learning models have the potential to significantly improve the accuracy of predictions, regardless of the approach used and provide support for the efficacy of this method in genomic selection for flowering-related traits in tropical maize.
Collapse
Affiliation(s)
| | - Carlos Maldonado
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
| | - Luma Henrique
- Department of Agronomy, State University of Maringá, Paraná, Brazil
| | - Renan Uhdre
- Department of Agronomy, State University of Maringá, Paraná, Brazil
| | | | | |
Collapse
|
7
|
Fradgley NS, Bacon J, Bentley AR, Costa‐Neto G, Cottrell A, Crossa J, Cuevas J, Kerton M, Pope E, Swarbreck SM, Gardner KA. Prediction of near-term climate change impacts on UK wheat quality and the potential for adaptation through plant breeding. GLOBAL CHANGE BIOLOGY 2023; 29:1296-1313. [PMID: 36482280 PMCID: PMC10108302 DOI: 10.1111/gcb.16552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 11/17/2022] [Accepted: 11/29/2022] [Indexed: 05/26/2023]
Abstract
Wheat is a major crop worldwide, mainly cultivated for human consumption and animal feed. Grain quality is paramount in determining its value and downstream use. While we know that climate change threatens global crop yields, a better understanding of impacts on wheat end-use quality is also critical. Combining quantitative genetics with climate model outputs, we investigated UK-wide trends in genotypic adaptation for wheat quality traits. In our approach, we augmented genomic prediction models with environmental characterisation of field trials to predict trait values and climate effects in historical field trial data between 2001 and 2020. Addition of environmental covariates, such as temperature and rainfall, successfully enabled prediction of genotype by environment interactions (G × E), and increased prediction accuracy of most traits for new genotypes in new year cross validation. We then extended predictions from these models to much larger numbers of simulated environments using climate scenarios projected under Representative Concentration Pathways 8.5 for 2050-2069. We found geographically varying climate change impacts on wheat quality due to contrasting associations between specific weather covariables and quality traits across the UK. Notably, negative impacts on quality traits were predicted in the East of the UK due to increased summer temperatures while the climate in the North and South-west may become more favourable with increased summer temperatures. Furthermore, by projecting 167,040 simulated future genotype-environment combinations, we found only limited potential for breeding to exploit predictable G × E to mitigate year-to-year environmental variability for most traits except Hagberg falling number. This suggests low adaptability of current UK wheat germplasm across future UK climates. More generally, approaches demonstrated here will be critical to enable adaptation of global crops to near-term climate change.
Collapse
Affiliation(s)
| | | | - Alison R. Bentley
- NIABCambridgeUK
- International Maize and Wheat Improvement Center (CIMMYT)Carretera México‐VeracruzMexico
| | | | | | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT)Carretera México‐VeracruzMexico
| | - Jaime Cuevas
- Universidad Autonoma del Estado de Quintana RooChetumalQuintana RooMexico
| | | | | | | | - Keith A. Gardner
- NIABCambridgeUK
- International Maize and Wheat Improvement Center (CIMMYT)Carretera México‐VeracruzMexico
| |
Collapse
|
8
|
Filho CCF, Andrade MHML, Nunes JAR, Jarquin DH, Rios EF. Genomic prediction for complex traits across multiples harvests in alfalfa (Medicago sativa L.) is enhanced by enviromics. THE PLANT GENOME 2023:e20306. [PMID: 36815221 DOI: 10.1002/tpg2.20306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 12/17/2022] [Indexed: 06/18/2023]
Abstract
Breeding for dry matter yield and persistence in alfalfa (Medicago sativa L.) can take several years as these traits must be evaluated under multiple harvests. Therefore, genotype-by-harvest interaction should be incorporated into genomic prediction models to explore genotypes' adaptability and stability. In this study, we investigated how enviromics could help to predict the genotypic performance under multiharvest alfalfa breeding trials by evaluating 177 families across 11 harvests under four cross-validation scenarios. All scenarios were analyzed using six models in a Bayesian mixed model framework. Our results demonstrate that models accounting to the enviromics information led to an increase of genetic variance and a decrease in the error variance, indicating better biological explanation when the enviromic information was incorporated. Furthermore, models that accounted for enviromic data led to higher predictive ability (PA) in a reduced number of harvests used in the training data set. The best enviromic models (M2 and M3) outperformed the base model (GBLUP model-M0) for predicting adaptability and persistence across all cross-validation scenarios. Incorporating environmental covariates also provided higher PA for persistence compared with the base model, as predictions increased from 0 to 0.16, 0.20, 0.56, and 0.46 for CV00, CV1, CV0, and CV2. The results also demonstrate that GBLUP without enviromics term has low power to predict persistence, thus the adoption of enviromics is a cheap and efficient alternative to increase accuracy and biological meaning.
Collapse
Affiliation(s)
| | | | - José Airton Rodrigues Nunes
- Departamento de Biologia, Instituto de Ciências Naturais, Universidade Federal de Lavras, Lavras, Minas Gerais, Brazil
| | | | | |
Collapse
|
9
|
Morales L, Ametz C, Dallinger HG, Löschenberger F, Neumayer A, Zimmerl S, Buerstmayr H. Comparison of linear and semi-parametric models incorporating genomic, pedigree, and associated loci information for the prediction of resistance to stripe rust in an Austrian winter wheat breeding program. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:23. [PMID: 36692839 PMCID: PMC9873752 DOI: 10.1007/s00122-023-04249-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 11/11/2022] [Indexed: 06/17/2023]
Abstract
We used a historical dataset on stripe rust resistance across 11 years in an Austrian winter wheat breeding program to evaluate genomic and pedigree-based linear and semi-parametric prediction methods. Stripe rust (yellow rust) is an economically important foliar disease of wheat (Triticum aestivum L.) caused by the fungus Puccinia striiformis f. sp. tritici. Resistance to stripe rust is controlled by both qualitative (R-genes) and quantitative (small- to medium-effect quantitative trait loci, QTL) mechanisms. Genomic and pedigree-based prediction methods can accelerate selection for quantitative traits such as stripe rust resistance. Here we tested linear and semi-parametric models incorporating genomic, pedigree, and QTL information for cross-validated, forward, and pairwise prediction of adult plant resistance to stripe rust across 11 years (2008-2018) in an Austrian winter wheat breeding program. Semi-parametric genomic modeling had the greatest predictive ability and genetic variance overall, but differences between models were small. Including QTL as covariates improved predictive ability in some years where highly significant QTL had been detected via genome-wide association analysis. Predictive ability was moderate within years (cross-validated) but poor in cross-year frameworks.
Collapse
Affiliation(s)
- Laura Morales
- Institute of Biotechnology in Plant Production, Department of Agrobiotechnology, University of Natural Resources and Life Sciences Vienna, Tulln, Austria.
| | | | - Hermann Gregor Dallinger
- Institute of Biotechnology in Plant Production, Department of Agrobiotechnology, University of Natural Resources and Life Sciences Vienna, Tulln, Austria
- Saatzucht Donau GmbH and CoKG, Probstdorf, Austria
| | | | | | - Simone Zimmerl
- Institute of Biotechnology in Plant Production, Department of Agrobiotechnology, University of Natural Resources and Life Sciences Vienna, Tulln, Austria
| | - Hermann Buerstmayr
- Institute of Biotechnology in Plant Production, Department of Agrobiotechnology, University of Natural Resources and Life Sciences Vienna, Tulln, Austria
| |
Collapse
|
10
|
Costa-Neto G, Crespo-Herrera L, Fradgley N, Gardner K, Bentley AR, Dreisigacker S, Fritsche-Neto R, Montesinos-López OA, Crossa J. Envirome-wide associations enhance multi-year genome-based prediction of historical wheat breeding data. G3 (BETHESDA, MD.) 2022; 13:6861853. [PMID: 36454213 PMCID: PMC9911085 DOI: 10.1093/g3journal/jkac313] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/02/2022] [Accepted: 11/03/2022] [Indexed: 12/03/2022]
Abstract
Linking high-throughput environmental data (enviromics) to genomic prediction (GP) is a cost-effective strategy for increasing selection intensity under genotype-by-environment interactions (G × E). This study developed a data-driven approach based on Environment-Phenotype Association (EPA) aimed at recycling important G × E information from historical breeding data. EPA was developed in two applications: (1) scanning a secondary source of genetic variation, weighted from the shared reaction-norms of past-evaluated genotypes and (2) pinpointing weights of the similarity among trial-sites (locations), given the historical impact of each envirotyping data variable for a given site. These results were then used as a dimensionality reduction strategy, integrating historical data to feed multi-environment GP models, which led to the development of four new G × E kernels considering genomics, enviromics, and EPA outcomes. The wheat trial data used included 36 locations, 8 years, and three target populations of environments (TPEs) in India. Four prediction scenarios and six kernel models within/across TPEs were tested. Our results suggest that the conventional GBLUP, without enviromic data or when omitting EPA, is inefficient in predicting the performance of wheat lines in future years. Nevertheless, when EPA was introduced as an intermediary learning step to reduce the dimensionality of the G × E kernels while connecting phenotypic and environmental-wide variation, a significant enhancement of G × E prediction accuracy was evident. EPA revealed that the effect of seasonality makes strategies such as "covariable selection" unfeasible because G × E is year-germplasm specific. We propose that the EPA effectively serves as a "reinforcement learner" algorithm capable of uncovering the effect of seasonality over the reaction-norms, with the benefits of better forecasting the similarities between past and future trialing sites. EPA combines the benefits of dimensionality reduction while reducing the uncertainty of genotype-by-year predictions and increasing the resolution of GP for the genotype-specific level.
Collapse
Affiliation(s)
- Germano Costa-Neto
- Institute for Genomics Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Leonardo Crespo-Herrera
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Nick Fradgley
- NIAB, 93 Lawrence Weaver Road, Cambridge CB3 0LE, UK
| | - Keith Gardner
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Alison R Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | | | - Osval A Montesinos-López
- Corresponding authors: Facultad de Telematica, Universidad de Colima, Mexico. ; and International Maize and Wheat Improvement Center (CIMMYT) and Colegio de Post-Graduados, Mexico.
| | - Jose Crossa
- Corresponding authors: Facultad de Telematica, Universidad de Colima, Mexico. ; and International Maize and Wheat Improvement Center (CIMMYT) and Colegio de Post-Graduados, Mexico.
| |
Collapse
|
11
|
Mbo Nkoulou LF, Ngalle HB, Cros D, Adje COA, Fassinou NVH, Bell J, Achigan-Dako EG. Perspective for genomic-enabled prediction against black sigatoka disease and drought stress in polyploid species. FRONTIERS IN PLANT SCIENCE 2022; 13:953133. [PMID: 36388523 PMCID: PMC9650417 DOI: 10.3389/fpls.2022.953133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 09/28/2022] [Indexed: 06/16/2023]
Abstract
Genomic selection (GS) in plant breeding is explored as a promising tool to solve the problems related to the biotic and abiotic threats. Polyploid plants like bananas (Musa spp.) face the problem of drought and black sigatoka disease (BSD) that restrict their production. The conventional plant breeding is experiencing difficulties, particularly phenotyping costs and long generation interval. To overcome these difficulties, GS in plant breeding is explored as an alternative with a great potential for reducing costs and time in selection process. So far, GS does not have the same success in polyploid plants as with diploid plants because of the complexity of their genome. In this review, we present the main constraints to the application of GS in polyploid plants and the prospects for overcoming these constraints. Particular emphasis is placed on breeding for BSD and drought-two major threats to banana production-used in this review as a model of polyploid plant. It emerges that the difficulty in obtaining markers of good quality in polyploids is the first challenge of GS on polyploid plants, because the main tools used were developed for diploid species. In addition to that, there is a big challenge of mastering genetic interactions such as dominance and epistasis effects as well as the genotype by environment interaction, which are very common in polyploid plants. To get around these challenges, we have presented bioinformatics tools, as well as artificial intelligence approaches, including machine learning. Furthermore, a scheme for applying GS to banana for BSD and drought has been proposed. This review is of paramount impact for breeding programs that seek to reduce the selection cycle of polyploids despite the complexity of their genome.
Collapse
Affiliation(s)
- Luther Fort Mbo Nkoulou
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
- Institute of Agricultural Research for Development, Centre de Recherche Agricole de Mbalmayo (CRAM), Mbalmayo, Cameroon
| | - Hermine Bille Ngalle
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
| | - David Cros
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Unité Mixte de Recherche (UMR) Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP) Institut, Montpellier, France
- Unité Mixte de Recherche (UMR) Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP) Institut, University of Montpellier, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Institut Agro, Montpellier, France
| | - Charlotte O. A. Adje
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| | - Nicodeme V. H. Fassinou
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| | - Joseph Bell
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
| | - Enoch G. Achigan-Dako
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| |
Collapse
|
12
|
A divide-and-conquer approach for genomic prediction in rubber tree using machine learning. Sci Rep 2022; 12:18023. [PMID: 36289298 PMCID: PMC9605989 DOI: 10.1038/s41598-022-20416-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 09/13/2022] [Indexed: 01/20/2023] Open
Abstract
Rubber tree (Hevea brasiliensis) is the main feedstock for commercial rubber; however, its long vegetative cycle has hindered the development of more productive varieties via breeding programs. With the availability of H. brasiliensis genomic data, several linkage maps with associated quantitative trait loci have been constructed and suggested as a tool for marker-assisted selection. Nonetheless, novel genomic strategies are still needed, and genomic selection (GS) may facilitate rubber tree breeding programs aimed at reducing the required cycles for performance assessment. Even though such a methodology has already been shown to be a promising tool for rubber tree breeding, increased model predictive capabilities and practical application are still needed. Here, we developed a novel machine learning-based approach for predicting rubber tree stem circumference based on molecular markers. Through a divide-and-conquer strategy, we propose a neural network prediction system with two stages: (1) subpopulation prediction and (2) phenotype estimation. This approach yielded higher accuracies than traditional statistical models in a single-environment scenario. By delivering large accuracy improvements, our methodology represents a powerful tool for use in Hevea GS strategies. Therefore, the incorporation of machine learning techniques into rubber tree GS represents an opportunity to build more robust models and optimize Hevea breeding programs.
Collapse
|
13
|
Westhues CC, Simianer H, Beissinger TM. learnMET: an R package to apply machine learning methods for genomic prediction using multi-environment trial data. G3 GENES|GENOMES|GENETICS 2022; 12:6705235. [PMID: 36124944 PMCID: PMC9635651 DOI: 10.1093/g3journal/jkac226] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 07/29/2022] [Indexed: 12/04/2022]
Abstract
We introduce the R-package learnMET, developed as a flexible framework to enable a collection of analyses on multi-environment trial breeding data with machine learning-based models. learnMET allows the combination of genomic information with environmental data such as climate and/or soil characteristics. Notably, the package offers the possibility of incorporating weather data from field weather stations, or to retrieve global meteorological datasets from a NASA database. Daily weather data can be aggregated over specific periods of time based on naive (for instance, nonoverlapping 10-day windows) or phenological approaches. Different machine learning methods for genomic prediction are implemented, including gradient-boosted decision trees, random forests, stacked ensemble models, and multilayer perceptrons. These prediction models can be evaluated via a collection of cross-validation schemes that mimic typical scenarios encountered by plant breeders working with multi-environment trial experimental data in a user-friendly way. The package is published under an MIT license and accessible on GitHub.
Collapse
Affiliation(s)
- Cathy C Westhues
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen , 37075 Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen , 37075 Goettingen, Germany
| | - Henner Simianer
- Center for Integrated Breeding Research, University of Goettingen , 37075 Goettingen, Germany
- Animal Breeding and Genetics Group, Department of Animal Sciences, University of Gottingen , 37075 Gottingen, Germany
| | - Timothy M Beissinger
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen , 37075 Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen , 37075 Goettingen, Germany
| |
Collapse
|
14
|
Ortiz R, Crossa J, Reslow F, Perez-Rodriguez P, Cuevas J. Genome-Based Genotype × Environment Prediction Enhances Potato ( Solanum tuberosum L.) Improvement Using Pseudo-Diploid and Polysomic Tetraploid Modeling. FRONTIERS IN PLANT SCIENCE 2022; 13:785196. [PMID: 35197995 PMCID: PMC8859116 DOI: 10.3389/fpls.2022.785196] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 01/05/2022] [Indexed: 05/03/2023]
Abstract
Potato breeding must improve its efficiency by increasing the reliability of selection as well as identifying a promising germplasm for crossing. This study shows the prediction accuracy of genomic-estimated breeding values for several potato (Solanum tuberosum L.) breeding clones and the released cultivars that were evaluated at three locations in northern and southern Sweden for various traits. Three dosages of marker alleles [pseudo-diploid (A), additive tetrasomic polyploidy (B), and additive-non-additive tetrasomic polyploidy (C)] were considered in the genome-based prediction models, for single environments and multiple environments (accounting for the genotype-by-environment interaction or G × E), and for comparing two kernels, the conventional linear, Genomic Best Linear Unbiased Prediction (GBLUP) (GB), and the non-linear Gaussian kernel (GK), when used with the single-kernel genetic matrices of A, B, C, or when employing two-kernel genetic matrices in the model using the kernels from B and C for a single environment (models 1 and 2, respectively), and for multi-environments (models 3 and 4, respectively). Concerning the single site analyses, the trait with the highest prediction accuracy for all sites under A, B, C for model 1, model 2, and for GB and GK methods was tuber starch percentage. Another trait with relatively high prediction accuracy was the total tuber weight. Results show an increase in prediction accuracy of model 2 over model 1. Non-linear Gaussian kernel (GK) did not show any clear advantage over the linear kernel GBLUP (GB). Results from the multi-environments had prediction accuracy estimates (models 3 and 4) higher than those obtained from the single-environment analyses. Model 4 with GB was the best method in combination with the marker structure B for predicting most of the tuber traits. Most of the traits gave relatively high prediction accuracy under this combination of marker structure (A, B, C, and B-C), and methods GB and GK combined with the multi-environment with G × E model.
Collapse
Affiliation(s)
- Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), Lomma, Sweden
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Fredrik Reslow
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), Lomma, Sweden
| | | | - Jaime Cuevas
- División de Ciencias, Ingeniería y Tecnologías, Universidad de Quintana Roo, Chetumal, Mexico
| |
Collapse
|
15
|
Crossa J, Montesinos-López OA, Pérez-Rodríguez P, Costa-Neto G, Fritsche-Neto R, Ortiz R, Martini JWR, Lillemo M, Montesinos-López A, Jarquin D, Breseghello F, Cuevas J, Rincent R. Genome and Environment Based Prediction Models and Methods of Complex Traits Incorporating Genotype × Environment Interaction. Methods Mol Biol 2022; 2467:245-283. [PMID: 35451779 DOI: 10.1007/978-1-0716-2205-6_9] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genomic-enabled prediction models are of paramount importance for the successful implementation of genomic selection (GS) based on breeding values. As opposed to animal breeding, plant breeding includes extensive multienvironment and multiyear field trial data. Hence, genomic-enabled prediction models should include genotype × environment (G × E) interaction, which most of the time increases the prediction performance when the response of lines are different from environment to environment. In this chapter, we describe a historical timeline since 2012 related to advances of the GS models that take into account G × E interaction. We describe theoretical and practical aspects of those GS models, including the gains in prediction performance when including G × E structures for both complex continuous and categorical scale traits. Then, we detailed and explained the main G × E genomic prediction models for complex traits measured in continuous and noncontinuous (categorical) scale. Related to G × E interaction models this review also examine the analyses of the information generated with high-throughput phenotype data (phenomic) and the joint analyses of multitrait and multienvironment field trial data that is also employed in the general assessment of multitrait G × E interaction. The inclusion of nongenomic data in increasing the accuracy and biological reliability of the G × E approach is also outlined. We show the recent advances in large-scale envirotyping (enviromics), and how the use of mechanistic computational modeling can derive the crop growth and development aspects useful for predicting phenotypes and explaining G × E.
Collapse
Affiliation(s)
- José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, Mexico
- Colegio de Postgraduados, Montecillos, Mexico
| | | | | | - Germano Costa-Neto
- Departamento de Genética, Escola Superior de Agricultura "Luiz de Queiroz" (ESALQ/USP), São Paulo, Brazil
| | - Roberto Fritsche-Neto
- Departamento de Genética, Escola Superior de Agricultura "Luiz de Queiroz" (ESALQ/USP), São Paulo, Brazil
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), Alnarp, Sweden
| | - Johannes W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, Mexico
| | - Morten Lillemo
- Department of Plant Sciences, Norwegian University of Life Sciences, IHA/CIGENE, Ås, Norway
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | | | | | - Jaime Cuevas
- Universidad de Quintana Roo, Chetumal, Quintana Roo, Mexico.
| | - Renaud Rincent
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, Gif-sur-Yvette, France.
| |
Collapse
|
16
|
Rogers AR, Holland JB. Environment-specific genomic prediction ability in maize using environmental covariates depends on environmental similarity to training data. G3 (BETHESDA, MD.) 2021; 12:6486423. [PMID: 35100364 PMCID: PMC9245610 DOI: 10.1093/g3journal/jkab440] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 12/06/2021] [Indexed: 12/30/2022]
Abstract
Technology advances have made possible the collection of a wealth of genomic, environmental, and phenotypic data for use in plant breeding. Incorporation of environmental data into environment-specific genomic prediction is hindered in part because of inherently high data dimensionality. Computationally efficient approaches to combining genomic and environmental information may facilitate extension of genomic prediction models to new environments and germplasm, and better understanding of genotype-by-environment (G × E) interactions. Using genomic, yield trial, and environmental data on 1,918 unique hybrids evaluated in 59 environments from the maize Genomes to Fields project, we determined that a set of 10,153 SNP dominance coefficients and a 5-day temporal window size for summarizing environmental variables were optimal for genomic prediction using only genetic and environmental main effects. Adding marker-by-environment variable interactions required dimension reduction, and we found that reducing dimensionality of the genetic data while keeping the full set of environmental covariates was best for environment-specific genomic prediction of grain yield, leading to an increase in prediction ability of 2.7% to achieve a prediction ability of 80% across environments when data were masked at random. We then measured how prediction ability within environments was affected under stratified training-testing sets to approximate scenarios commonly encountered by plant breeders, finding that incorporation of marker-by-environment effects improved prediction ability in cases where training and test sets shared environments, but did not improve prediction in new untested environments. The environmental similarity between training and testing sets had a greater impact on the efficacy of prediction than genetic similarity between training and test sets.
Collapse
Affiliation(s)
- Anna R Rogers
- Program in Genetics, North Carolina State University, Raleigh, NC
27695, USA
| | - James B Holland
- Program in Genetics, North Carolina State University, Raleigh, NC
27695, USA,USDA-ARS Plant Science Research Unit, North Carolina State
University, Raleigh, NC 27695, USA,Department of Crop and Soil Sciences, North Carolina State
University, Raleigh, NC 27695, USA,Corresponding author: Department of Agriculture—Agriculture
Research Service, Box 7620 North Carolina State University, Raleigh, NC 27695-7620, USA.
| |
Collapse
|
17
|
Ueki M, Tamiya G. Smooth-threshold multivariate genetic prediction incorporating gene–environment interactions. G3 GENES|GENOMES|GENETICS 2021; 11:6343458. [PMID: 34849749 PMCID: PMC8664495 DOI: 10.1093/g3journal/jkab278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Accepted: 07/12/2021] [Indexed: 11/17/2022]
Abstract
We propose a genetic prediction modeling approach for genome-wide association study (GWAS) data that can include not only marginal gene effects but also gene–environment (GxE) interaction effects—i.e., multiplicative effects of environmental factors with genes rather than merely additive effects of each. The proposed approach is a straightforward extension of our previous multiple regression-based method, STMGP (smooth-threshold multivariate genetic prediction), with the new feature being that genome-wide test statistics from a GxE interaction analysis are used to weight the corresponding variants. We develop a simple univariate regression approximation to the GxE interaction effect that allows a direct fit of the STMGP framework without modification. The sparse nature of our model automatically removes irrelevant predictors (including variants and GxE combinations), and the model is able to simultaneously incorporate multiple environmental variables. Simulation studies to evaluate the proposed method in comparison with other modeling approaches demonstrate its superior performance under the presence of GxE interaction effects. We illustrate the usefulness of our prediction model through application to real GWAS data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI).
Collapse
Affiliation(s)
- Masao Ueki
- School of Information and Data Sciences, Nagasaki University, Nagasaki 852-8521, Japan
| | - Gen Tamiya
- Tohoku University Graduate School of Medicine, Sendai, Miyagi 980-8575, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Chuo-ku, Tokyo 103-0027, Japan
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Miyagi 980-8573, Japan
| | | |
Collapse
|
18
|
Jighly A, Hayden M, Daetwyler H. Integrating genomic selection with a genotype plus genotype x environment (GGE) model improves prediction accuracy and computational efficiency. PLANT, CELL & ENVIRONMENT 2021; 44:3459-3470. [PMID: 34231236 DOI: 10.1111/pce.14145] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 06/30/2021] [Indexed: 06/13/2023]
Abstract
Genotype-by-environment interaction (GEI) is one of the major factors affecting the prediction accuracy of genomic selection (GS) models. Standard models have low power to model complex GEI, and they fail to predict phenotypes in unobserved environments. Here, we developed a novel prediction model that account for GEI, named 3GS, that combines genotype plus genotype × environment (GGE) analysis with GS. The model calculates the principal components (PCs) of the environmental phenotypes using GGE analysis and predict the performance of these PCs using standard GS models before converting the GEBVs of these PCs (pcGEBVs) back to the original phenotypes. We demonstrated three advantages of the new model. First, 3GS showed significantly higher prediction accuracy primarily for deviated environments that have low to negative correlations to other environments. Second, 3GS can predict new genotypes in unobserved environments with high accuracy. Third, the computational complexity of 3GS increases linearly with increasing the number of environments and the population size, unlike the standard models that exhibit exponential increase, making it hundreds of times faster than the standard models for large data sets. 3GS can improve prediction accuracy with minimal resources in modern breeding programmes in which massive populations get multi-environment phenotypes with high-throughput techniques.
Collapse
Affiliation(s)
- Abdulqader Jighly
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, Victoria, Australia
| | - Matthew Hayden
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, Victoria, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, Victoria, Australia
| | - Hans Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, Victoria, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, Victoria, Australia
| |
Collapse
|
19
|
Costa-Neto G, Galli G, Carvalho HF, Crossa J, Fritsche-Neto R. EnvRtype: a software to interplay enviromics and quantitative genomics in agriculture. G3-GENES GENOMES GENETICS 2021; 11:6129777. [PMID: 33835165 PMCID: PMC8049414 DOI: 10.1093/g3journal/jkab040] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 01/21/2021] [Indexed: 11/13/2022]
Abstract
Envirotyping is an essential technique used to unfold the nongenetic drivers associated with the phenotypic adaptation of living organisms. Here, we introduce the EnvRtype R package, a novel toolkit developed to interplay large-scale envirotyping data (enviromics) into quantitative genomics. To start a user-friendly envirotyping pipeline, this package offers: (1) remote sensing tools for collecting (get_weather and extract_GIS functions) and processing ecophysiological variables (processWTH function) from raw environmental data at single locations or worldwide; (2) environmental characterization by typing environments and profiling descriptors of environmental quality (env_typing function), in addition to gathering environmental covariables as quantitative descriptors for predictive purposes (W_matrix function); and (3) identification of environmental similarity that can be used as an enviromic-based kernel (env_typing function) in whole-genome prediction (GP), aimed at increasing ecophysiological knowledge in genomic best-unbiased predictions (GBLUP) and emulating reaction norm effects (get_kernel and kernel_model functions). We highlight literature mining concepts in fine-tuning envirotyping parameters for each plant species and target growing environments. We show that envirotyping for predictive breeding collects raw data and processes it in an eco-physiologically smart way. Examples of its use for creating global-scale envirotyping networks and integrating reaction-norm modeling in GP are also outlined. We conclude that EnvRtype provides a cost-effective envirotyping pipeline capable of providing high quality enviromic data for a diverse set of genomic-based studies, especially for increasing accuracy in GP across untested growing environments.
Collapse
Affiliation(s)
- Germano Costa-Neto
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
| | - Giovanni Galli
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
| | - Humberto Fanelli Carvalho
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
| | - José Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera Mexico-Veracruz, El Batan Km. 45, CP 56237 Mexico; Colegio de Postgraduados, Montecillos, Edo. de Mexico, CP 56264, Mexico
| | - Roberto Fritsche-Neto
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil.,Quantitative Genetics and Biometrics Cluster, International Rice Research Institute (IRRI), Los Baños, Philippines
| |
Collapse
|
20
|
Fritsche-Neto R, Galli G, Borges KLR, Costa-Neto G, Alves FC, Sabadin F, Lyra DH, Morais PPP, Braatz de Andrade LR, Granato I, Crossa J. Optimizing Genomic-Enabled Prediction in Small-Scale Maize Hybrid Breeding Programs: A Roadmap Review. FRONTIERS IN PLANT SCIENCE 2021; 12:658267. [PMID: 34276721 PMCID: PMC8281958 DOI: 10.3389/fpls.2021.658267] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 05/10/2021] [Indexed: 06/13/2023]
Abstract
The usefulness of genomic prediction (GP) for many animal and plant breeding programs has been highlighted for many studies in the last 20 years. In maize breeding programs, mostly dedicated to delivering more highly adapted and productive hybrids, this approach has been proved successful for both large- and small-scale breeding programs worldwide. Here, we present some of the strategies developed to improve the accuracy of GP in tropical maize, focusing on its use under low budget and small-scale conditions achieved for most of the hybrid breeding programs in developing countries. We highlight the most important outcomes obtained by the University of São Paulo (USP, Brazil) and how they can improve the accuracy of prediction in tropical maize hybrids. Our roadmap starts with the efforts for germplasm characterization, moving on to the practices for mating design, and the selection of the genotypes that are used to compose the training population in field phenotyping trials. Factors including population structure and the importance of non-additive effects (dominance and epistasis) controlling the desired trait are also outlined. Finally, we explain how the source of the molecular markers, environmental, and the modeling of genotype-environment interaction can affect the accuracy of GP. Results of 7 years of research in a public maize hybrid breeding program under tropical conditions are discussed, and with the great advances that have been made, we find that what is yet to come is exciting. The use of open-source software for the quality control of molecular markers, implementing GP, and envirotyping pipelines may reduce costs in an efficient computational manner. We conclude that exploring new models/tools using high-throughput phenotyping data along with large-scale envirotyping may bring more resolution and realism when predicting genotype performances. Despite the initial costs, mostly for genotyping, the GP platforms in combination with these other data sources can be a cost-effective approach for predicting the performance of maize hybrids for a large set of growing conditions.
Collapse
Affiliation(s)
- Roberto Fritsche-Neto
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Giovanni Galli
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Karina Lima Reis Borges
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Germano Costa-Neto
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Filipe Couto Alves
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, United States
| | - Felipe Sabadin
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Danilo Hottis Lyra
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpenden, United Kingdom
| | | | | | - Italo Granato
- Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux (LEPSE), Institut National de la Recherche Agronomique (INRA), Univ. Montpellier, SupAgro, Montpellier, France
| | - Jose Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz, Texcoco, Mexico
- Colegio de Posgraduado, Montecillo, Mexico
| |
Collapse
|
21
|
Sinha P, Singh VK, Bohra A, Kumar A, Reif JC, Varshney RK. Genomics and breeding innovations for enhancing genetic gain for climate resilience and nutrition traits. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:1829-1843. [PMID: 34014373 PMCID: PMC8205890 DOI: 10.1007/s00122-021-03847-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/29/2021] [Indexed: 05/03/2023]
Abstract
KEY MESSAGE Integrating genomics technologies and breeding methods to tweak core parameters of the breeder's equation could accelerate delivery of climate-resilient and nutrient rich crops for future food security. Accelerating genetic gain in crop improvement programs with respect to climate resilience and nutrition traits, and the realization of the improved gain in farmers' fields require integration of several approaches. This article focuses on innovative approaches to address core components of the breeder's equation. A prerequisite to enhancing genetic variance (σ2g) is the identification or creation of favorable alleles/haplotypes and their deployment for improving key traits. Novel alleles for new and existing target traits need to be accessed and added to the breeding population while maintaining genetic diversity. Selection intensity (i) in the breeding program can be improved by testing a larger population size, enabled by the statistical designs with minimal replications and high-throughput phenotyping. Selection priorities and criteria to select appropriate portion of the population too assume an important role. The most important component of breeder's equation is heritability (h2). Heritability estimates depend on several factors including the size and the type of population and the statistical methods. The present article starts with a brief discussion on the potential ways to enhance σ2g in the population. We highlight statistical methods and experimental designs that could improve trait heritability estimation. We also offer a perspective on reducing the breeding cycle time (t), which could be achieved through the selection of appropriate parents, optimizing the breeding scheme, rapid fixation of target alleles, and combining speed breeding with breeding programs to optimize trials for release. Finally, we summarize knowledge from multiple disciplines for enhancing genetic gains for climate resilience and nutritional traits.
Collapse
Affiliation(s)
- Pallavi Sinha
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
- International Rice Research Institute (IRRI), IRRI South Asia Hub, ICRISAT, Hyderabad, India
| | - Vikas K Singh
- International Rice Research Institute (IRRI), IRRI South Asia Hub, ICRISAT, Hyderabad, India
| | - Abhishek Bohra
- ICAR- Indian Institute of Pulses Research (IIPR), Kanpur, India
| | - Arvind Kumar
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Jochen C Reif
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | - Rajeev K Varshney
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India.
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, WA, Australia.
| |
Collapse
|
22
|
Costa-Neto G, Galli G, Carvalho HF, Crossa J, Fritsche-Neto R. EnvRtype: a software to interplay enviromics and quantitative genomics in agriculture. G3 (BETHESDA, MD.) 2021; 11. [PMID: 33835165 DOI: 10.1101/2020.10.14.339705] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 01/21/2021] [Indexed: 05/20/2023]
Abstract
Envirotyping is an essential technique used to unfold the nongenetic drivers associated with the phenotypic adaptation of living organisms. Here, we introduce the EnvRtype R package, a novel toolkit developed to interplay large-scale envirotyping data (enviromics) into quantitative genomics. To start a user-friendly envirotyping pipeline, this package offers: (1) remote sensing tools for collecting (get_weather and extract_GIS functions) and processing ecophysiological variables (processWTH function) from raw environmental data at single locations or worldwide; (2) environmental characterization by typing environments and profiling descriptors of environmental quality (env_typing function), in addition to gathering environmental covariables as quantitative descriptors for predictive purposes (W_matrix function); and (3) identification of environmental similarity that can be used as an enviromic-based kernel (env_typing function) in whole-genome prediction (GP), aimed at increasing ecophysiological knowledge in genomic best-unbiased predictions (GBLUP) and emulating reaction norm effects (get_kernel and kernel_model functions). We highlight literature mining concepts in fine-tuning envirotyping parameters for each plant species and target growing environments. We show that envirotyping for predictive breeding collects raw data and processes it in an eco-physiologically smart way. Examples of its use for creating global-scale envirotyping networks and integrating reaction-norm modeling in GP are also outlined. We conclude that EnvRtype provides a cost-effective envirotyping pipeline capable of providing high quality enviromic data for a diverse set of genomic-based studies, especially for increasing accuracy in GP across untested growing environments.
Collapse
Affiliation(s)
- Germano Costa-Neto
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
| | - Giovanni Galli
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
| | - Humberto Fanelli Carvalho
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
| | - José Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera Mexico-Veracruz, El Batan Km. 45, CP 56237 Mexico; Colegio de Postgraduados, Montecillos, Edo. de Mexico, CP 56264, Mexico
| | - Roberto Fritsche-Neto
- Department of Genetics, 'Luiz de Queiroz' Agriculture College, University of São Paulo, São Paulo, Brazil
- Quantitative Genetics and Biometrics Cluster, International Rice Research Institute (IRRI), Los Baños, Philippines
| |
Collapse
|
23
|
Costa-Neto G, Crossa J, Fritsche-Neto R. Enviromic Assembly Increases Accuracy and Reduces Costs of the Genomic Prediction for Yield Plasticity in Maize. FRONTIERS IN PLANT SCIENCE 2021; 12:717552. [PMID: 34691099 PMCID: PMC8529011 DOI: 10.3389/fpls.2021.717552] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 09/03/2021] [Indexed: 05/21/2023]
Abstract
Quantitative genetics states that phenotypic variation is a consequence of the interaction between genetic and environmental factors. Predictive breeding is based on this statement, and because of this, ways of modeling genetic effects are still evolving. At the same time, the same refinement must be used for processing environmental information. Here, we present an "enviromic assembly approach," which includes using ecophysiology knowledge in shaping environmental relatedness into whole-genome predictions (GP) for plant breeding (referred to as enviromic-aided genomic prediction, E-GP). We propose that the quality of an environment is defined by the core of environmental typologies and their frequencies, which describe different zones of plant adaptation. From this, we derived markers of environmental similarity cost-effectively. Combined with the traditional additive and non-additive effects, this approach may better represent the putative phenotypic variation observed across diverse growing conditions (i.e., phenotypic plasticity). Then, we designed optimized multi-environment trials coupling genetic algorithms, enviromic assembly, and genomic kinships capable of providing in-silico realization of the genotype-environment combinations that must be phenotyped in the field. As proof of concept, we highlighted two E-GP applications: (1) managing the lack of phenotypic information in training accurate GP models across diverse environments and (2) guiding an early screening for yield plasticity exerting optimized phenotyping efforts. Our approach was tested using two tropical maize sets, two types of enviromics assembly, six experimental network sizes, and two types of optimized training set across environments. We observed that E-GP outperforms benchmark GP in all scenarios, especially when considering smaller training sets. The representativeness of genotype-environment combinations is more critical than the size of multi-environment trials (METs). The conventional genomic best-unbiased prediction (GBLUP) is inefficient in predicting the quality of a yet-to-be-seen environment, while enviromic assembly enabled it by increasing the accuracy of yield plasticity predictions. Furthermore, we discussed theoretical backgrounds underlying how intrinsic envirotype-phenotype covariances within the phenotypic records can impact the accuracy of GP. The E-GP is an efficient approach to better use environmental databases to deliver climate-smart solutions, reduce field costs, and anticipate future scenarios.
Collapse
Affiliation(s)
- Germano Costa-Neto
- Department of Genetics, “Luiz de Queiroz” Agriculture College, University of São Paulo (ESALQ/USP), Piracicaba, Brazil
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, United States
- *Correspondence: Germano Costa-Neto
| | - Jose Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Mexico City, Mexico
- Colegio de Posgraduado, Mexico City, Mexico
| | - Roberto Fritsche-Neto
- Department of Genetics, “Luiz de Queiroz” Agriculture College, University of São Paulo (ESALQ/USP), Piracicaba, Brazil
- Breeding Analytics and Data Management Unit, International Rice Research Institute (IRRI), Los Baños, Philippines
| |
Collapse
|
24
|
Cuevas J, Montesinos-López OA, Martini JWR, Pérez-Rodríguez P, Lillemo M, Crossa J. Approximate Genome-Based Kernel Models for Large Data Sets Including Main Effects and Interactions. Front Genet 2020; 11:567757. [PMID: 33193659 PMCID: PMC7594507 DOI: 10.3389/fgene.2020.567757] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 08/28/2020] [Indexed: 11/23/2022] Open
Abstract
The rapid development of molecular markers and sequencing technologies has made it possible to use genomic prediction (GP) and selection (GS) in animal and plant breeding. However, when the number of observations (n) is large (thousands or millions), computational difficulties when handling these large genomic kernel relationship matrices (inverting and decomposing) increase exponentially. This problem increases when genomic × environment interaction and multi-trait kernels are included in the model. In this research we propose selecting a small number of lines m(m < n) for constructing an approximate kernel of lower rank than the original and thus exponentially decreasing the required computing time. First, we describe the full genomic method for single environment (FGSE) with a covariance matrix (kernel) including all n lines. Second, we select m lines and approximate the original kernel for the single environment model (APSE). Similarly, but including main effects and G × E, we explain a full genomic method with genotype × environment model (FGGE), and including m lines, we approximated the kernel method with G × E (APGE). We applied the proposed method to two different wheat data sets of different sizes (n) using the standard linear kernel Genomic Best Linear Unbiased Predictor (GBLUP) and also using eigen value decomposition. In both data sets, we compared the prediction performance and computing time for FGSE versus APSE; we also compared FGGE versus APGE. Results showed a competitive prediction performance of the approximated methods with a significant reduction in computing time. Genomic prediction accuracy depends on the decay of the eigenvalues (amount of variance information loss) of the original kernel as well as on the size of the selected lines m.
Collapse
Affiliation(s)
| | | | - J W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - Morten Lillemo
- Department of Plant Sciences (IPV), Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico.,Colegio de Postgraduados, Texcoco, Mexico
| |
Collapse
|
25
|
Costa-Neto G, Fritsche-Neto R, Crossa J. Nonlinear kernels, dominance, and envirotyping data increase the accuracy of genome-based prediction in multi-environment trials. Heredity (Edinb) 2020; 126:92-106. [PMID: 32855544 PMCID: PMC7852533 DOI: 10.1038/s41437-020-00353-1] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Revised: 07/29/2020] [Accepted: 07/30/2020] [Indexed: 01/15/2023] Open
Abstract
Modern whole-genome prediction (WGP) frameworks that focus on multi-environment trials (MET) integrate large-scale genomics, phenomics, and envirotyping data. However, the more complex the statistical model, the longer the computational processing times, which do not always result in accuracy gains. We investigated the use of new kernel methods and modeling structures involving genomics and nongenomic sources of variation in two MET maize data sets. Five WGP models were considered, advancing in complexity from a main-effect additive model (A) to more complex structures, including dominance deviations (D), genotype × environment interaction (AE and DE), and the reaction-norm model using environmental covariables (W) and their interaction with A and D (AW + DW). A combination of those models built with three different kernel methods, Gaussian kernel (GK), Deep kernel (DK), and the benchmark genomic best linear-unbiased predictor (GBLUP/GB), was tested under three prediction scenarios: newly developed hybrids (CV1), sparse MET conditions (CV2), and new environments (CV0). GK and DK outperformed GB in prediction accuracy and reduction of computation time (~up to 20%) under all model-kernel scenarios. GK was more efficient in capturing the variation due to A + AE and D + DE effects and translated it into accuracy gains (~up to 85% compared with GB). DK provided more consistent predictions, even for more complex structures such as W + AW + DW. Our results suggest that DK and GK are more efficient in translating model complexity into accuracy, and more suitable for including dominance and reaction-norm effects in a biologically accurate and faster way.
Collapse
Affiliation(s)
- Germano Costa-Neto
- Department of Genetics, "Luiz de Queiroz" Agriculture College, University of São Paulo, São Paulo, Brazil
| | - Roberto Fritsche-Neto
- Department of Genetics, "Luiz de Queiroz" Agriculture College, University of São Paulo, São Paulo, Brazil
| | - José Crossa
- Biometrics and Statistics Unit, Genetic Resources Program, and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), Mexico, Mexico.
| |
Collapse
|
26
|
Azodi CB, Pardo J, VanBuren R, de Los Campos G, Shiu SH. Transcriptome-Based Prediction of Complex Traits in Maize. THE PLANT CELL 2020; 32:139-151. [PMID: 31641024 PMCID: PMC6961623 DOI: 10.1105/tpc.19.00332] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 09/24/2019] [Accepted: 10/21/2019] [Indexed: 05/11/2023]
Abstract
The ability to predict traits from genome-wide sequence information (i.e., genomic prediction) has improved our understanding of the genetic basis of complex traits and transformed breeding practices. Transcriptome data may also be useful for genomic prediction. However, it remains unclear how well transcript levels can predict traits, particularly when traits are scored at different development stages. Using maize (Zea mays) genetic markers and transcript levels from seedlings to predict mature plant traits, we found that transcript and genetic marker models have similar performance. When the transcripts and genetic markers with the greatest weights (i.e., the most important) in those models were used in one joint model, performance increased. Furthermore, genetic markers important for predictions were not close to or identified as regulatory variants for important transcripts. These findings demonstrate that transcript levels are useful for predicting traits and that their predictive power is not simply due to genetic variation in the transcribed genomic regions. Finally, genetic marker models identified only 1 of 14 benchmark flowering-time genes, while transcript models identified 5. These data highlight that, in addition to being useful for genomic prediction, transcriptome data can provide a link between traits and variation that cannot be readily captured at the sequence level.
Collapse
Affiliation(s)
- Christina B Azodi
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
- The DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, Michigan, 48824
| | - Jeremy Pardo
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
- Plant Resilience Institute, Michigan State University, East Lansing, Michigan 48824
| | - Robert VanBuren
- Plant Resilience Institute, Michigan State University, East Lansing, Michigan 48824
- Department of Horticulture, Michigan State University, East Lansing, Michigan 48824
| | - Gustavo de Los Campos
- Epidemiology and Biostatistics and Statistics and Probability Departments, Michigan State University, East Lansing, Michigan 48824
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
- The DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, Michigan, 48824
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, Michigan 48824
| |
Collapse
|
27
|
Cuevas J, Montesinos-López O, Juliana P, Guzmán C, Pérez-Rodríguez P, González-Bucio J, Burgueño J, Montesinos-López A, Crossa J. Deep Kernel for Genomic and Near Infrared Predictions in Multi-environment Breeding Trials. G3 (BETHESDA, MD.) 2019; 9:2913-2924. [PMID: 31289023 PMCID: PMC6723142 DOI: 10.1534/g3.119.400493] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 07/04/2019] [Indexed: 01/15/2023]
Abstract
Kernel methods are flexible and easy to interpret and have been successfully used in genomic-enabled prediction of various plant species. Kernel methods used in genomic prediction comprise the linear genomic best linear unbiased predictor (GBLUP or GB) kernel, and the Gaussian kernel (GK). In general, these kernels have been used with two statistical models: single-environment and genomic × environment (GE) models. Recently near infrared spectroscopy (NIR) has been used as an inexpensive and non-destructive high-throughput phenotyping method for predicting unobserved line performance in plant breeding trials. In this study, we used a non-linear arc-cosine kernel (AK) that emulates deep learning artificial neural networks. We compared AK prediction accuracy with the prediction accuracy of GB and GK kernel methods in four genomic data sets, one of which also includes pedigree and NIR information. Results show that for all four data sets, AK and GK kernels achieved higher prediction accuracy than the linear GB kernel for the single-environment and GE multi-environment models. In addition, AK achieved similar or slightly higher prediction accuracy than the GK kernel. For all data sets, the GE model achieved higher prediction accuracy than the single-environment model. For the data set that includes pedigree, markers and NIR, results show that the NIR wavelength alone achieved lower prediction accuracy than the genomic information alone; however, the pedigree plus NIR information achieved only slightly lower prediction accuracy than the marker plus the NIR high-throughput data.
Collapse
Affiliation(s)
- Jaime Cuevas
- Universidad de Quintana Roo, Chetumal, Quintana Roo, 77019 México
| | | | - Philomin Juliana
- International Maize and Wheat Improvement Center (CIMMYT), Carretera Mexico- Veracruz Km. 45, El Batán, 56237, Texcoco, Edo. de Mexico, Mexico
| | - Carlos Guzmán
- International Maize and Wheat Improvement Center (CIMMYT), Carretera Mexico- Veracruz Km. 45, El Batán, 56237, Texcoco, Edo. de Mexico, Mexico
| | | | | | - Juan Burgueño
- International Maize and Wheat Improvement Center (CIMMYT), Carretera Mexico- Veracruz Km. 45, El Batán, 56237, Texcoco, Edo. de Mexico, Mexico
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías, (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, 44430
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera Mexico- Veracruz Km. 45, El Batán, 56237, Texcoco, Edo. de Mexico, Mexico
| |
Collapse
|
28
|
An R Package for Bayesian Analysis of Multi-environment and Multi-trait Multi-environment Data for Genome-Based Prediction. G3-GENES GENOMES GENETICS 2019; 9:1355-1369. [PMID: 30819822 PMCID: PMC6505148 DOI: 10.1534/g3.119.400126] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Evidence that genomic selection (GS) is a technology that is revolutionizing plant breeding continues to grow. However, it is very well documented that its success strongly depends on statistical models, which are used by GS to perform predictions of candidate genotypes that were not phenotyped. Because there is no universally better model for prediction and models for each type of response variable are needed (continuous, binary, ordinal, count, etc.), an active area of research aims to develop statistical models for the prediction of univariate and multivariate traits in GS. However, most of the models developed so far are for univariate and continuous (Gaussian) traits. Therefore, to overcome the lack of multivariate statistical models for genome-based prediction by improving the original version of the BMTME, we propose an improved Bayesian multi-trait and multi-environment (BMTME) R package for analyzing breeding data with multiple traits and multiple environments. We also introduce Bayesian multi-output regressor stacking (BMORS) functions that are considerably efficient in terms of computational resources. The package allows parameter estimation and evaluates the prediction performance of multi-trait and multi-environment data in a reliable, efficient and user-friendly way. We illustrate the use of the BMTME with real toy datasets to show all the facilities that the software offers the user. However, for large datasets, the BME() and BMTME() functions of the BMTME R package are very intense in terms of computing time; on the other hand, less intensive computing is required with BMORS functions BMORS() and BMORS_Env() that are also included in the BMTME package.
Collapse
|
29
|
Souza LM, Francisco FR, Gonçalves PS, Scaloppi Junior EJ, Le Guen V, Fritsche-Neto R, Souza AP. Genomic Selection in Rubber Tree Breeding: A Comparison of Models and Methods for Managing G×E Interactions. FRONTIERS IN PLANT SCIENCE 2019; 10:1353. [PMID: 31708955 PMCID: PMC6824234 DOI: 10.3389/fpls.2019.01353] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 10/01/2019] [Indexed: 05/18/2023]
Abstract
Several genomic prediction models combining genotype × environment (G×E) interactions have recently been developed and used for genomic selection (GS) in plant breeding programs. G×E interactions reduce selection accuracy and limit genetic gains in plant breeding. Two data sets were used to compare the prediction abilities of multienvironment G×E genomic models and two kernel methods. Specifically, a linear kernel, or GB (genomic best linear unbiased predictor [GBLUP]), and a nonlinear kernel, or Gaussian kernel (GK), were used to compare the prediction accuracies (PAs) of four genomic prediction models: 1) a single-environment, main genotypic effect model (SM); 2) a multienvironment, main genotypic effect model (MM); 3) a multienvironment, single-variance G×E deviation model (MDs); and 4) a multienvironment, environment-specific variance G×E deviation model (MDe). We evaluated the utility of genomic selection (GS) for 435 individual rubber trees at two sites and genotyped the individuals via genotyping-by-sequencing (GBS) of single-nucleotide polymorphisms (SNPs). Prediction models were used to estimate stem circumference (SC) during the first 4 years of tree development in conjunction with a broad-sense heritability (H 2) of 0.60. Applying the model (SM, MM, MDs, and MDe) and kernel method (GB and GK) combinations to the rubber tree data revealed that the multienvironment models were superior to the single-environment genomic models, regardless of the kernel (GB or GK) used, suggesting that introducing interactions between markers and environmental conditions increases the proportion of variance explained by the model and, more importantly, the PA. Compared with the classic breeding method (CBM), methods in which GS is incorporated resulted in a 5-fold increase in response to selection for SC with multienvironment GS (MM, MDe, or MDs). Furthermore, GS resulted in a more balanced selection response for SC and contributed to a reduction in selection time when used in conjunction with traditional genetic breeding programs. Given the rapid advances in genotyping methods and their declining costs and given the overall costs of large-scale progeny testing and shortened breeding cycles, we expect GS to be implemented in rubber tree breeding programs.
Collapse
Affiliation(s)
- Livia M. Souza
- Molecular Biology and Genetic Engineering Center (CBMEG), University of Campinas (UNICAMP), Campinas, Brazil
| | - Felipe R. Francisco
- Molecular Biology and Genetic Engineering Center (CBMEG), University of Campinas (UNICAMP), Campinas, Brazil
| | - Paulo S. Gonçalves
- Center of Rubber Tree and Agroforestry Systems, Agronomic Institute (IAC), Votuporanga, Brazil
| | | | - Vincent Le Guen
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), UMR AGAP, Montpellier, France
| | - Roberto Fritsche-Neto
- Departamento de Genética, Escola Superior de Agricultura “Luiz de Queiroz” Universidade de São Paulo (ESALQ/USP), Piracicaba, Brazil
| | - Anete P. Souza
- Molecular Biology and Genetic Engineering Center (CBMEG), University of Campinas (UNICAMP), Campinas, Brazil
- Department of Plant Biology, Biology Institute, University of Campinas (UNICAMP), Campinas, Brazil
- *Correspondence: Anete P. Souza,
| |
Collapse
|