1
|
Montesinos-López OA, Chavira-Flores M, Kiasmiantini, Crespo-Herrera L, Saint Piere C, Li H, Fritsche-Neto R, Al-Nowibet K, Montesinos-López A, Crossa J. A review of multimodal deep learning methods for genomic-enabled prediction in plant breeding. Genetics 2024:iyae161. [PMID: 39499217 DOI: 10.1093/genetics/iyae161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Accepted: 09/25/2024] [Indexed: 11/07/2024] Open
Abstract
Deep learning methods have been applied when working to enhance the prediction accuracy of traditional statistical methods in the field of plant breeding. Although deep learning seems to be a promising approach for genomic prediction, it has proven to have some limitations, since its conventional methods fail to leverage all available information. Multimodal deep learning methods aim to improve the predictive power of their unimodal counterparts by introducing several modalities (sources) of input information. In this review, we introduce some theoretical basic concepts of multimodal deep learning and provide a list of the most widely used neural network architectures in deep learning, as well as the available strategies to fuse data from different modalities. We mention some of the available computational resources for the practical implementation of multimodal deep learning problems. We finally performed a review of applications of multimodal deep learning to genomic selection in plant breeding and other related fields. We present a meta-picture of the practical performance of multimodal deep learning methods to highlight how these tools can help address complex problems in the field of plant breeding. We discussed some relevant considerations that researchers should keep in mind when applying multimodal deep learning methods. Multimodal deep learning holds significant potential for various fields, including genomic selection. While multimodal deep learning displays enhanced prediction capabilities over unimodal deep learning and other machine learning methods, it demands more computational resources. Multimodal deep learning effectively captures intermodal interactions, especially when integrating data from different sources. To apply multimodal deep learning in genomic selection, suitable architectures and fusion strategies must be chosen. It is relevant to keep in mind that multimodal deep learning, like unimodal deep learning, is a powerful tool but should be carefully applied. Given its predictive edge over traditional methods, multimodal deep learning is valuable in addressing challenges in plant breeding and food security amid a growing global population.
Collapse
Affiliation(s)
| | - Moises Chavira-Flores
- Instituto de Investigaciones en Matemáticas Aplicadas y Sistemas (IIMAS), Universidad Nacional Autónoma de México (UNAM), Ciudad de México 04510, México
| | - Kiasmiantini
- Statistics Study Program, Universitas Negeri Yogyakarta, Yogyakarta, 55281 Yogyakarta, Indonesia
| | - Leo Crespo-Herrera
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, Texcoco, CP 52640 Edo. de México, México
| | - Carolina Saint Piere
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, Texcoco, CP 52640 Edo. de México, México
| | - HuiHui Li
- Institute of Crop Science Chinese Academy of Agricultural Sciences (CAAS), Chin Office, 12 Zhongguancun, South Street, Beijing 100081, China
| | | | - Khalid Al-Nowibet
- Department of Statistics and Operations Research, King Saud University, Riyah 11451, Saudi Arabia
| | - Abelardo Montesinos-López
- Departamento de Matematicas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, México
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, Texcoco, CP 52640 Edo. de México, México
- Louisiana State University, Baton Rouge, LA 70803, USA
- Colegio de Postgraduados, Montecillos, Edo. de México CP 56230, México
- Distinguish Scientist Fellowship Program, King Saud University, Riyah 11451, Saudi Arabia
| |
Collapse
|
2
|
Zhu W, Li W, Zhang H, Li L. Big data and artificial intelligence-aided crop breeding: Progress and prospects. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2024. [PMID: 39467106 DOI: 10.1111/jipb.13791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 08/25/2024] [Accepted: 09/10/2024] [Indexed: 10/30/2024]
Abstract
The past decade has witnessed rapid developments in gene discovery, biological big data (BBD), artificial intelligence (AI)-aided technologies, and molecular breeding. These advancements are expected to accelerate crop breeding under the pressure of increasing demands for food. Here, we first summarize current breeding methods and discuss the need for new ways to support breeding efforts. Then, we review how to combine BBD and AI technologies for genetic dissection, exploring functional genes, predicting regulatory elements and functional domains, and phenotypic prediction. Finally, we propose the concept of intelligent precision design breeding (IPDB) driven by AI technology and offer ideas about how to implement IPDB. We hope that IPDB will enhance the predictability, efficiency, and cost of crop breeding compared with current technologies. As an example of IPDB, we explore the possibilities offered by CropGPT, which combines biological techniques, bioinformatics, and breeding art from breeders, and presents an open, shareable, and cooperative breeding system. IPDB provides integrated services and communication platforms for biologists, bioinformatics experts, germplasm resource specialists, breeders, dealers, and farmers, and should be well suited for future breeding.
Collapse
Affiliation(s)
- Wanchao Zhu
- Key Laboratory of Biology and Genetic Improvement of Maize in Arid Area of Northwest Region, College of Agronomy, Northwest A&F University, Yangling, 712100, China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weifu Li
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Engineering Research Center of Intelligent Technology for Agriculture, Ministry of Education, Wuhan, 430070, China
| | - Hongwei Zhang
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
3
|
Xiang Y, Xia C, Li L, Wei R, Rong T, Liu H, Lan H. Genomic prediction of yield-related traits and genome-based establishment of heterotic pattern in maize hybrid breeding of Southwest China. FRONTIERS IN PLANT SCIENCE 2024; 15:1441555. [PMID: 39315371 PMCID: PMC11416964 DOI: 10.3389/fpls.2024.1441555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 08/21/2024] [Indexed: 09/25/2024]
Abstract
When genomic prediction is implemented in breeding maize (Zea mays L.), it can accelerate the breeding process and reduce cost to a large extent. In this study, 11 yield-related traits of maize were used to evaluate four genomic prediction methods including rrBLUP, HEBLP|A, RF, and LightGBM. In all the 11 traits, rrBLUP had similar predictive accuracy to HEBLP|A, and so did RF to LightGBM, but rrBLUP and HEBLP|A outperformed RF and LightGBM in 8 traits. Furthermore, genomic prediction-based heterotic pattern of yield was established based on 64620 crosses of maize in Southwest China, and the result showed that one of the parent lines of the top 5% crosses came from temp-tropic or tropic germplasm, which is highly consistent with the actual situation in breeding, and that heterotic pattern (Reid+ × Suwan+) will be a major heterotic pattern of Southwest China in the future.
Collapse
Affiliation(s)
- Yong Xiang
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Chao Xia
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Lujiang Li
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Rujun Wei
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Tingzhao Rong
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Hailan Liu
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Hai Lan
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| |
Collapse
|
4
|
Vourlaki IT, Ramos-Onsins SE, Pérez-Enciso M, Castanera R. Evaluation of deep learning for predicting rice traits using structural and single-nucleotide genomic variants. PLANT METHODS 2024; 20:121. [PMID: 39127715 DOI: 10.1186/s13007-024-01250-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 07/28/2024] [Indexed: 08/12/2024]
Abstract
BACKGROUND Structural genomic variants (SVs) are prevalent in plant genomes and have played an important role in evolution and domestication, as they constitute a significant source of genomic and phenotypic variability. Nevertheless, most methods in quantitative genetics focusing on crop improvement, such as genomic prediction, consider only Single Nucleotide Polymorphisms (SNPs). Deep Learning (DL) is a promising strategy for genomic prediction, but its performance using SVs and SNPs as genetic markers remains unknown. RESULTS We used rice to investigate whether combining SVs and SNPs can result in better trait prediction over SNPs alone and examine the potential advantage of Deep Learning (DL) networks over Bayesian Linear models. Specifically, the performances of BayesC (considering additive effects) and a Bayesian Reproducible Kernel Hilbert space (RKHS) regression (considering both additive and non-additive effects) were compared to those of two different DL architectures, the Multilayer Perceptron, and the Convolution Neural Network, to explore their prediction ability by using various marker input strategies. We found that exploiting structural and nucleotide variation slightly improved prediction ability on complex traits in 87% of the cases. DL models outperformed Bayesian models in 75% of the studied cases, considering the four traits and the two validation strategies used. Finally, DL systematically improved prediction ability of binary traits against the Bayesian models. CONCLUSIONS Our study reveals that the use of structural genomic variants can improve trait prediction in rice, independently of the methodology used. Also, our results suggest that Deep Learning (DL) networks can perform better than Bayesian models in the prediction of binary traits, and in quantitative traits when the training and target sets are not closely related. This highlights the potential of DL to enhance crop improvement in specific scenarios and the importance to consider SVs in addition to SNPs in genomic selection.
Collapse
Affiliation(s)
- Ioanna-Theoni Vourlaki
- Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Bellaterra, 08193, Barcelona, Spain.
- IRTA (Institut de Recerca i Tecnologia Agroalimentàries), Caldes de Montbui, 08140, Barcelona, Spain.
| | - Sebastián E Ramos-Onsins
- Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Bellaterra, 08193, Barcelona, Spain
| | - Miguel Pérez-Enciso
- Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Bellaterra, 08193, Barcelona, Spain
- Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain
- Universitat Autónoma de Barcelona, 08193, Barcelona, Spain
| | - Raúl Castanera
- Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Bellaterra, 08193, Barcelona, Spain.
- IRTA (Institut de Recerca i Tecnologia Agroalimentàries), Caldes de Montbui, 08140, Barcelona, Spain.
| |
Collapse
|
5
|
King J, Dreisigacker S, Reynolds M, Bandyopadhyay A, Braun HJ, Crespo-Herrera L, Crossa J, Govindan V, Huerta J, Ibba MI, Robles-Zazueta CA, Saint Pierre C, Singh PK, Singh RP, Achary VMM, Bhavani S, Blasch G, Cheng S, Dempewolf H, Flavell RB, Gerard G, Grewal S, Griffiths S, Hawkesford M, He X, Hearne S, Hodson D, Howell P, Jalal Kamali MR, Karwat H, Kilian B, King IP, Kishii M, Kommerell VM, Lagudah E, Lan C, Montesinos-Lopez OA, Nicholson P, Pérez-Rodríguez P, Pinto F, Pixley K, Rebetzke G, Rivera-Amado C, Sansaloni C, Schulthess U, Sharma S, Shewry P, Subbarao G, Tiwari TP, Trethowan R, Uauy C. Wheat genetic resources have avoided disease pandemics, improved food security, and reduced environmental footprints: A review of historical impacts and future opportunities. GLOBAL CHANGE BIOLOGY 2024; 30:e17440. [PMID: 39185562 DOI: 10.1111/gcb.17440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/29/2024] [Accepted: 06/03/2024] [Indexed: 08/27/2024]
Abstract
The use of plant genetic resources (PGR)-wild relatives, landraces, and isolated breeding gene pools-has had substantial impacts on wheat breeding for resistance to biotic and abiotic stresses, while increasing nutritional value, end-use quality, and grain yield. In the Global South, post-Green Revolution genetic yield gains are generally achieved with minimal additional inputs. As a result, production has increased, and millions of hectares of natural ecosystems have been spared. Without PGR-derived disease resistance, fungicide use would have easily doubled, massively increasing selection pressure for fungicide resistance. It is estimated that in wheat, a billion liters of fungicide application have been avoided just since 2000. This review presents examples of successful use of PGR including the relentless battle against wheat rust epidemics/pandemics, defending against diseases that jump species barriers like blast, biofortification giving nutrient-dense varieties and the use of novel genetic variation for improving polygenic traits like climate resilience. Crop breeding genepools urgently need to be diversified to increase yields across a range of environments (>200 Mha globally), under less predictable weather and biotic stress pressure, while increasing input use efficiency. Given that the ~0.8 m PGR in wheat collections worldwide are relatively untapped and massive impacts of the tiny fraction studied, larger scale screenings and introgression promise solutions to emerging challenges, facilitated by advanced phenomic and genomic tools. The first translocations in wheat to modify rhizosphere microbiome interaction (reducing biological nitrification, reducing greenhouse gases, and increasing nitrogen use efficiency) is a landmark proof of concept. Phenomics and next-generation sequencing have already elucidated exotic haplotypes associated with biotic and complex abiotic traits now mainstreamed in breeding. Big data from decades of global yield trials can elucidate the benefits of PGR across environments. This kind of impact cannot be achieved without widescale sharing of germplasm and other breeding technologies through networks and public-private partnerships in a pre-competitive space.
Collapse
Affiliation(s)
- Julie King
- School of Biosciences, The University of Nottingham, Loughborough, UK
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Matthew Reynolds
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Anindya Bandyopadhyay
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Hans-Joachim Braun
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | | | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
- Colegio de Postgraduados, Montecillos, Mexico
| | - Velu Govindan
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Julio Huerta
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
- Instituto Nacional de Investigaciones Forestales, Agrícolas y Pecuarias (INIFAP), Campo Experimental Valle de México, Texcoco, Mexico
| | - Maria Itria Ibba
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | | | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Pawan K Singh
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Ravi P Singh
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
- Huazhong Agricultural University, Wuhan, Hubei, China
| | - V Mohan Murali Achary
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Sridhar Bhavani
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Gerald Blasch
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Shifeng Cheng
- Chinese Academy of Agricultural Science (AGIS), Shenzhen, China
| | - Hannes Dempewolf
- Crop, Livestock and Environment Division, Japan International Research Center for Agricultural Sciences (JIRCAS), Ibaraki, Japan
| | | | - Guillermo Gerard
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Surbhi Grewal
- School of Biosciences, The University of Nottingham, Loughborough, UK
| | | | | | - Xinyao He
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Sarah Hearne
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - David Hodson
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Phil Howell
- National Institute of Agricultural Botany (NIAB), Cambridge, UK
| | | | - Hannes Karwat
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | | | - Ian P King
- School of Biosciences, The University of Nottingham, Loughborough, UK
| | - Masahiro Kishii
- Crop, Livestock and Environment Division, Japan International Research Center for Agricultural Sciences (JIRCAS), Ibaraki, Japan
| | | | - Evans Lagudah
- Commonwealth Scientific and Industrial Research Organization (CSIRO), Agriculture and Food, Canberra, Australian Capital Territory, Australia
| | - Caixia Lan
- Huazhong Agricultural University, Wuhan, Hubei, China
| | | | - Paul Nicholson
- John Innes Centre (JIC), Norwich Research Park, Norwich, UK
| | | | - Francisco Pinto
- Department of Plant Sciences, Centre for Crop Systems Analysis, Wageningen University Research, Wageningen, The Netherlands
| | - Kevin Pixley
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Greg Rebetzke
- Commonwealth Scientific and Industrial Research Organization (CSIRO), Agriculture and Food, Canberra, Australian Capital Territory, Australia
| | - Carolina Rivera-Amado
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Carolina Sansaloni
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Urs Schulthess
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
- CIMMYT-China Joint Center for Wheat and Maize Improvement, Henan Agricultural University, Zhengzhou, China
| | | | | | - Guntar Subbarao
- Crop, Livestock and Environment Division, Japan International Research Center for Agricultural Sciences (JIRCAS), Ibaraki, Japan
| | - Thakur Prasad Tiwari
- International Maize and Wheat Improvement Center (CIMMYT) and Affiliates, Texcoco, Mexico
| | - Richard Trethowan
- School of Life and Environmental Sciences, Plant Breeding Institute, Sydney Institute of Agriculture, University of Sydney, Narrabri, New South Wales, Australia
| | - Cristobal Uauy
- John Innes Centre (JIC), Norwich Research Park, Norwich, UK
| |
Collapse
|
6
|
Larue F, Rouan L, Pot D, Rami JF, Luquet D, Beurier G. Linking genetic markers and crop model parameters using neural networks to enhance genomic prediction of integrative traits. FRONTIERS IN PLANT SCIENCE 2024; 15:1393965. [PMID: 39139722 PMCID: PMC11319263 DOI: 10.3389/fpls.2024.1393965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 07/04/2024] [Indexed: 08/15/2024]
Abstract
Introduction Predicting the performance (yield or other integrative traits) of cultivated plants is complex because it involves not only estimating the genetic value of the candidates to selection, the interactions between the genotype and the environment (GxE) but also the epistatic interactions between genomic regions for a given trait, and the interactions between the traits contributing to the integrative trait. Classical Genomic Prediction (GP) models mostly account for additive effects and are not suitable to estimate non-additive effects such as epistasis. Therefore, the use of machine learning and deep learning methods has been previously proposed to model those non-linear effects. Methods In this study, we propose a type of Artificial Neural Network (ANN) called Convolutional Neural Network (CNN) and compare it to two classical GP regression methods for their ability to predict an integrative trait of sorghum: aboveground fresh weight accumulation. We also suggest that the use of a crop growth model (CGM) can enhance predictions of integrative traits by decomposing them into more heritable intermediate traits. Results The results show that CNN outperformed both LASSO and Bayes C methods in accuracy, suggesting that CNN are better suited to predict integrative traits. Furthermore, the predictive ability of the combined CGM-GP approach surpassed that of GP without the CGM integration, irrespective of the regression method used. Discussion These results are consistent with recent works aiming to develop Genome-to-Phenotype models and advocate for the use of non-linear prediction methods, and the use of combined CGM-GP to enhance the prediction of crop performances.
Collapse
Affiliation(s)
- Florian Larue
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Unité Mixte de Recherche, Institut Amélioration Génétique et Adaptation des Plantes méditerranéennes et Tropicales (UMR AGAP), Montpellier, France
- Unité Mixte de Recherche, Institut Amélioration Génétique et Adaptation des Plantes méditerranéennes et Tropicales (UMR AGAP), Université Montpellier, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement (INRA), Institut Agro, Montpellier, France
| | - Lauriane Rouan
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Unité Mixte de Recherche, Institut Amélioration Génétique et Adaptation des Plantes méditerranéennes et Tropicales (UMR AGAP), Montpellier, France
- Unité Mixte de Recherche, Institut Amélioration Génétique et Adaptation des Plantes méditerranéennes et Tropicales (UMR AGAP), Université Montpellier, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement (INRA), Institut Agro, Montpellier, France
| | - David Pot
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Unité Mixte de Recherche, Institut Amélioration Génétique et Adaptation des Plantes méditerranéennes et Tropicales (UMR AGAP), Montpellier, France
- Unité Mixte de Recherche, Institut Amélioration Génétique et Adaptation des Plantes méditerranéennes et Tropicales (UMR AGAP), Université Montpellier, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement (INRA), Institut Agro, Montpellier, France
| | - Jean-François Rami
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Unité Mixte de Recherche, Institut Amélioration Génétique et Adaptation des Plantes méditerranéennes et Tropicales (UMR AGAP), Montpellier, France
- Unité Mixte de Recherche, Institut Amélioration Génétique et Adaptation des Plantes méditerranéennes et Tropicales (UMR AGAP), Université Montpellier, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement (INRA), Institut Agro, Montpellier, France
| | - Delphine Luquet
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Unité Mixte de Recherche, Institut Amélioration Génétique et Adaptation des Plantes méditerranéennes et Tropicales (UMR AGAP), Montpellier, France
- Unité Mixte de Recherche, Institut Amélioration Génétique et Adaptation des Plantes méditerranéennes et Tropicales (UMR AGAP), Université Montpellier, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement (INRA), Institut Agro, Montpellier, France
| | - Grégory Beurier
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Unité Mixte de Recherche, Institut Amélioration Génétique et Adaptation des Plantes méditerranéennes et Tropicales (UMR AGAP), Montpellier, France
- Unité Mixte de Recherche, Institut Amélioration Génétique et Adaptation des Plantes méditerranéennes et Tropicales (UMR AGAP), Université Montpellier, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement (INRA), Institut Agro, Montpellier, France
| |
Collapse
|
7
|
DeSalvio AJ, Adak A, Murray SC, Jarquín D, Winans ND, Crozier D, Rooney WL. Near-infrared reflectance spectroscopy phenomic prediction can perform similarly to genomic prediction of maize agronomic traits across environments. THE PLANT GENOME 2024; 17:e20454. [PMID: 38715204 DOI: 10.1002/tpg2.20454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 03/12/2024] [Accepted: 04/01/2024] [Indexed: 07/02/2024]
Abstract
For nearly two decades, genomic prediction and selection have supported efforts to increase genetic gains in plant and animal improvement programs. However, novel phenomic strategies for predicting complex traits in maize have recently proven beneficial when integrated into across-environment sparse genomic prediction models. One phenomic data modality is whole grain near-infrared spectroscopy (NIRS), which records reflectance values of biological samples (e.g., maize kernels) based on chemical composition. Predictions of hybrid maize grain yield (GY) and 500-kernel weight (KW) across 2 years (2011-2012) and two management conditions (water-stressed and well-watered) were conducted using combinations of reflectance data obtained from high-throughput, F2 whole-kernel scans and genomic data obtained from genotyping-by-sequencing within four different cross-validation (CV) schemes (CV2, CV1, CV0, and CV00). When predicting the performance of untested genotypes in characterized (CV1) environments, genomic data were better than phenomic data for GY (0.689 ± 0.024-genomic vs. 0.612 ± 0.045-phenomic), but phenomic data were better than genomic data for KW (0.535 ± 0.034-genomic vs. 0.617 ± 0.145-phenomic). Multi-kernel models (combinations of phenomic and genomic relationship matrices) did not surpass single-kernel models for GY prediction in CV1 or CV00 (prediction of untested genotypes in uncharacterized environments); however, these models did outperform the single-kernel models for prediction of KW in these same CVs. Lasso regression applied to the NIRS data set selected a subset of 216 NIRS bands that achieved comparable prediction abilities to the full phenomic data set of 3112 bands predicting GY and KW under CV1 and CV00.
Collapse
Affiliation(s)
- Aaron J DeSalvio
- Interdisciplinary Graduate Program in Genetics and Genomics (Department of Biochemistry and Biophysics), Texas A&M University, College Station, Texas, USA
| | - Alper Adak
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, USA
| | - Seth C Murray
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, USA
| | - Diego Jarquín
- Department of Agronomy, University of Florida, Gainesville, Florida, USA
| | - Noah D Winans
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, USA
| | - Daniel Crozier
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, USA
| | - William L Rooney
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
8
|
Cortés AJ. Abiotic Stress Tolerance Boosted by Genetic Diversity in Plants. Int J Mol Sci 2024; 25:5367. [PMID: 38791404 PMCID: PMC11121514 DOI: 10.3390/ijms25105367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 03/14/2024] [Indexed: 05/26/2024] Open
Abstract
Plant breeding [...].
Collapse
Affiliation(s)
- Andrés J. Cortés
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Km 7 vía Rionegro—Las Palmas, Rionegro 054048, Colombia;
- Facultad de Ciencias Agrarias—de Ciencias Forestales, Universidad Nacional de Colombia—Sede Medellín, Medellín 050034, Colombia
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Lomma 23436, Sweden
| |
Collapse
|
9
|
Lourenço VM, Ogutu JO, Rodrigues RAP, Posekany A, Piepho HP. Genomic prediction using machine learning: a comparison of the performance of regularized regression, ensemble, instance-based and deep learning methods on synthetic and empirical data. BMC Genomics 2024; 25:152. [PMID: 38326768 PMCID: PMC10848392 DOI: 10.1186/s12864-023-09933-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 12/20/2023] [Indexed: 02/09/2024] Open
Abstract
BACKGROUND The accurate prediction of genomic breeding values is central to genomic selection in both plant and animal breeding studies. Genomic prediction involves the use of thousands of molecular markers spanning the entire genome and therefore requires methods able to efficiently handle high dimensional data. Not surprisingly, machine learning methods are becoming widely advocated for and used in genomic prediction studies. These methods encompass different groups of supervised and unsupervised learning methods. Although several studies have compared the predictive performances of individual methods, studies comparing the predictive performance of different groups of methods are rare. However, such studies are crucial for identifying (i) groups of methods with superior genomic predictive performance and assessing (ii) the merits and demerits of such groups of methods relative to each other and to the established classical methods. Here, we comparatively evaluate the genomic predictive performance and informally assess the computational cost of several groups of supervised machine learning methods, specifically, regularized regression methods, deep, ensemble and instance-based learning algorithms, using one simulated animal breeding dataset and three empirical maize breeding datasets obtained from a commercial breeding program. RESULTS Our results show that the relative predictive performance and computational expense of the groups of machine learning methods depend upon both the data and target traits and that for classical regularized methods, increasing model complexity can incur huge computational costs but does not necessarily always improve predictive accuracy. Thus, despite their greater complexity and computational burden, neither the adaptive nor the group regularized methods clearly improved upon the results of their simple regularized counterparts. This rules out selection of one procedure among machine learning methods for routine use in genomic prediction. The results also show that, because of their competitive predictive performance, computational efficiency, simplicity and therefore relatively few tuning parameters, the classical linear mixed model and regularized regression methods are likely to remain strong contenders for genomic prediction. CONCLUSIONS The dependence of predictive performance and computational burden on target datasets and traits call for increasing investments in enhancing the computational efficiency of machine learning algorithms and computing resources.
Collapse
Affiliation(s)
- Vanda M Lourenço
- Center for Mathematics and Applications (NOVA Math) and Department of Mathematics, NOVA SST, 2829-516, Caparica, Portugal.
| | - Joseph O Ogutu
- Institute of Crop Science, Biostatistics Unit, University of Hohenheim, Fruwirthstrasse 23, 70599, Stuttgart, Germany.
| | - Rui A P Rodrigues
- Center for Mathematics and Applications (NOVA Math) and Department of Mathematics, NOVA SST, 2829-516, Caparica, Portugal
| | - Alexandra Posekany
- Research Unit of Computational Statistics, Vienna University of Technology, Wiedner Hauptstr. 8-10, 1040, Vienna, Austria
| | - Hans-Peter Piepho
- Institute of Crop Science, Biostatistics Unit, University of Hohenheim, Fruwirthstrasse 23, 70599, Stuttgart, Germany
| |
Collapse
|
10
|
Chen C, Powell O, Dinglasan E, Ross EM, Yadav S, Wei X, Atkin F, Deomano E, Hayes BJ. Genomic prediction with machine learning in sugarcane, a complex highly polyploid clonally propagated crop with substantial non-additive variation for key traits. THE PLANT GENOME 2023; 16:e20390. [PMID: 37728221 DOI: 10.1002/tpg2.20390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 08/01/2023] [Accepted: 08/29/2023] [Indexed: 09/21/2023]
Abstract
Sugarcane has a complex, highly polyploid genome with multi-species ancestry. Additive models for genomic prediction of clonal performance might not capture interactions between genes and alleles from different ploidies and ancestral species. As such, genomic prediction in sugarcane presents an interesting case for machine learning (ML) methods, which are purportedly able to deal with high levels of complexity in prediction. Here, we investigated deep learning (DL) neural networks, including multilayer networks (MLP) and convolution neural networks (CNN), and an ensemble machine learning approach, random forest (RF), for genomic prediction in sugarcane. The data set used was 2912 sugarcane clones, scored for 26,086 genome wide single nucleotide polymorphism markers, with final assessment trial data for total cane harvested (TCH), commercial cane sugar (CCS), and fiber content (Fiber). The clones in the latest trial (2017) were used as a validation set. We compared prediction accuracy of these methods to genomic best linear unbiased prediction (GBLUP) extended to include dominance and epistatic effects. The prediction accuracies from GBLUP models were up to 0.37 for TCH, 0.43 for CCS, and 0.48 for Fiber, while the optimized ML models had prediction accuracies of 0.35 for TCH, 0.38 for CCS, and 0.48 for Fiber. Both RF and DL neural network models have comparable predictive ability with the additive GBLUP model but are less accurate than the extended GBLUP model.
Collapse
Affiliation(s)
- Chensong Chen
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Owen Powell
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Eric Dinglasan
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Elizabeth M Ross
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Seema Yadav
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | | | | | | | - Ben J Hayes
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| |
Collapse
|
11
|
Weber SE, Chawla HS, Ehrig L, Hickey LT, Frisch M, Snowdon RJ. Accurate prediction of quantitative traits with failed SNP calls in canola and maize. FRONTIERS IN PLANT SCIENCE 2023; 14:1221750. [PMID: 37936929 PMCID: PMC10627008 DOI: 10.3389/fpls.2023.1221750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 10/05/2023] [Indexed: 11/09/2023]
Abstract
In modern plant breeding, genomic selection is becoming the gold standard to select superior genotypes in large breeding populations that are only partially phenotyped. Many breeding programs commonly rely on single-nucleotide polymorphism (SNP) markers to capture genome-wide data for selection candidates. For this purpose, SNP arrays with moderate to high marker density represent a robust and cost-effective tool to generate reproducible, easy-to-handle, high-throughput genotype data from large-scale breeding populations. However, SNP arrays are prone to technical errors that lead to failed allele calls. To overcome this problem, failed calls are often imputed, based on the assumption that failed SNP calls are purely technical. However, this ignores the biological causes for failed calls-for example: deletions-and there is increasing evidence that gene presence-absence and other kinds of genome structural variants can play a role in phenotypic expression. Because deletions are frequently not in linkage disequilibrium with their flanking SNPs, permutation of missing SNP calls can potentially obscure valuable marker-trait associations. In this study, we analyze published datasets for canola and maize using four parametric and two machine learning models and demonstrate that failed allele calls in genomic prediction are highly predictive for important agronomic traits. We present two statistical pipelines, based on population structure and linkage disequilibrium, that enable the filtering of failed SNP calls that are likely caused by biological reasons. For the population and trait examined, prediction accuracy based on these filtered failed allele calls was competitive to standard SNP-based prediction, underlying the potential value of missing data in genomic prediction approaches. The combination of SNPs with all failed allele calls or the filtered allele calls did not outperform predictions with only SNP-based prediction due to redundancy in genomic relationship estimates.
Collapse
Affiliation(s)
- Sven E. Weber
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | | | - Lennard Ehrig
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Lee T. Hickey
- Centre for Crop Science, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD, Australia
| | - Matthias Frisch
- Department of Biometry and Population Genetics, Justus Liebig University, Giessen, Germany
| | - Rod J. Snowdon
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| |
Collapse
|
12
|
Yan Q, Fruzangohar M, Taylor J, Gong D, Walter J, Norman A, Shi JQ, Coram T. Improved genomic prediction using machine learning with Variational Bayesian sparsity. PLANT METHODS 2023; 19:96. [PMID: 37660084 PMCID: PMC10474716 DOI: 10.1186/s13007-023-01073-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 08/22/2023] [Indexed: 09/04/2023]
Abstract
BACKGROUND Genomic prediction has become a powerful modelling tool for assessing line performance in plant and livestock breeding programmes. Among the genomic prediction modelling approaches, linear based models have proven to provide accurate predictions even when the number of genetic markers exceeds the number of data samples. However, breeding programmes are now compiling data from large numbers of lines and test environments for analyses, rendering these approaches computationally prohibitive. Machine learning (ML) now offers a solution to this problem through the construction of fully connected deep learning architectures and high parallelisation of the predictive task. However, the fully connected nature of these architectures immediately generates an over-parameterisation of the network that needs addressing for efficient and accurate predictions. RESULTS In this research we explore the use of an ML architecture governed by variational Bayesian sparsity in its initial layers that we have called VBS-ML. The use of VBS-ML provides a mechanism for feature selection of important markers linked to the trait, immediately reducing the network over-parameterisation. Selected markers then propagate to the remaining fully connected feed-forward components of the ML network to form the final genomic prediction. We illustrated the approach with four large Australian wheat breeding data sets that range from 2665 lines to 10375 lines genotyped across a large set of markers. For all data sets, the use of the VBS-ML architecture improved genomic prediction accuracy over legacy linear based modelling approaches. CONCLUSIONS An ML architecture governed under a variational Bayesian paradigm was shown to improve genomic prediction accuracy over legacy modelling approaches. This VBS-ML approach can be used to dramatically decrease the parameter burden on the network and provide a computationally feasible approach for improving genomic prediction conducted with large breeding population numbers and genetic markers.
Collapse
Affiliation(s)
- Qingsen Yan
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Mario Fruzangohar
- School of Food, Agriculture and Wine, University of Adelaide, Adelaide, Australia
| | - Julian Taylor
- School of Food, Agriculture and Wine, University of Adelaide, Adelaide, Australia
| | - Dong Gong
- School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia
| | - James Walter
- Australian Grains Technologies, Roseworthy, Australia
| | - Adam Norman
- Australian Grains Technologies, Roseworthy, Australia
| | - Javen Qinfeng Shi
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, Australia
| | - Tristan Coram
- Australian Grains Technologies, Roseworthy, Australia
| |
Collapse
|
13
|
Ruperao P, Rangan P, Shah T, Thakur V, Kalia S, Mayes S, Rathore A. The Progression in Developing Genomic Resources for Crop Improvement. Life (Basel) 2023; 13:1668. [PMID: 37629524 PMCID: PMC10455509 DOI: 10.3390/life13081668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 07/25/2023] [Indexed: 08/27/2023] Open
Abstract
Sequencing technologies have rapidly evolved over the past two decades, and new technologies are being continually developed and commercialized. The emerging sequencing technologies target generating more data with fewer inputs and at lower costs. This has also translated to an increase in the number and type of corresponding applications in genomics besides enhanced computational capacities (both hardware and software). Alongside the evolving DNA sequencing landscape, bioinformatics research teams have also evolved to accommodate the increasingly demanding techniques used to combine and interpret data, leading to many researchers moving from the lab to the computer. The rich history of DNA sequencing has paved the way for new insights and the development of new analysis methods. Understanding and learning from past technologies can help with the progress of future applications. This review focuses on the evolution of sequencing technologies, their significant enabling role in generating plant genome assemblies and downstream applications, and the parallel development of bioinformatics tools and skills, filling the gap in data analysis techniques.
Collapse
Affiliation(s)
- Pradeep Ruperao
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Parimalan Rangan
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India;
| | - Trushar Shah
- International Institute of Tropical Agriculture (IITA), Nairobi 30709-00100, Kenya;
| | - Vivek Thakur
- Department of Systems & Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad 500046, India;
| | - Sanjay Kalia
- Department of Biotechnology, Ministry of Science and Technology, Government of India, New Delhi 110003, India;
| | - Sean Mayes
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Abhishek Rathore
- Excellence in Breeding, International Maize and Wheat Improvement Center (CIMMYT), Hyderabad 502324, India
| |
Collapse
|
14
|
Lee HJ, Lee JH, Gondro C, Koh YJ, Lee SH. deepGBLUP: joint deep learning networks and GBLUP framework for accurate genomic prediction of complex traits in Korean native cattle. Genet Sel Evol 2023; 55:56. [PMID: 37525091 PMCID: PMC10392020 DOI: 10.1186/s12711-023-00825-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 07/07/2023] [Indexed: 08/02/2023] Open
Abstract
BACKGROUND Genomic prediction has become widespread as a valuable tool to estimate genetic merit in animal and plant breeding. Here we develop a novel genomic prediction algorithm, called deepGBLUP, which integrates deep learning networks and a genomic best linear unbiased prediction (GBLUP) framework. The deep learning networks assign marker effects using locally-connected layers and subsequently use them to estimate an initial genomic value through fully-connected layers. The GBLUP framework estimates three genomic values (additive, dominance, and epistasis) by leveraging respective genetic relationship matrices. Finally, deepGBLUP predicts a final genomic value by summing all the estimated genomic values. RESULTS We compared the proposed deepGBLUP with the conventional GBLUP and Bayesian methods. Extensive experiments demonstrate that the proposed deepGBLUP yields state-of-the-art performance on Korean native cattle data across diverse traits, marker densities, and training sizes. In addition, they show that the proposed deepGBLUP can outperform the previous methods on simulated data across various heritabilities and quantitative trait loci (QTL) effects. CONCLUSIONS We introduced a novel genomic prediction algorithm, deepGBLUP, which successfully integrates deep learning networks and GBLUP framework. Through comprehensive evaluations on the Korean native cattle data and simulated data, deepGBLUP consistently achieved superior performance across various traits, marker densities, training sizes, heritabilities, and QTL effects. Therefore, deepGBLUP is an efficient method to estimate an accurate genomic value. The source code and manual for deepGBLUP are available at https://github.com/gywns6287/deepGBLUP .
Collapse
Affiliation(s)
- Hyo-Jun Lee
- Department of Bio-AI Convergence, Chungnam National University, 305-764, Daejeon, Korea
| | - Jun Heon Lee
- Division of Animal and Dairy Science, Chungnam National University, 305-764, Daejeon, Korea
| | - Cedric Gondro
- Department of Animal Science, Michigan State University, East Lansing, MI, USA
| | - Yeong Jun Koh
- Department of Computer Science and Engineering, Chungnam National University, 305-764, Daejeon, Korea.
| | - Seung Hwan Lee
- Division of Animal and Dairy Science, Chungnam National University, 305-764, Daejeon, Korea.
| |
Collapse
|
15
|
Putra AR, Yen JDL, Fournier-Level A. Forecasting trait responses in novel environments to aid seed provenancing under climate change. Mol Ecol Resour 2023; 23:565-580. [PMID: 36308465 DOI: 10.1111/1755-0998.13728] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 10/23/2022] [Accepted: 10/27/2022] [Indexed: 11/28/2022]
Abstract
Revegetation projects face the major challenge of sourcing optimal plant material. This is often done with limited information about plant performance and increasingly requires factoring resilience to climate change. Functional traits can be used as quantitative indices of plant performance and guide seed provenancing, but trait values expected under novel conditions are often unknown. To support climate-resilient provenancing efforts, we develop a trait prediction model that integrates the effect of genetic variation with fine-scale temperature variation. We train our model on multiple field plantings of Arabidopsis thaliana and predict two relevant fitness traits-days-to-bolting and fecundity-across the species' European range. Prediction accuracy was high for days-to-bolting and moderate for fecundity, with the majority of trait variation explained by temperature differences between plantings. Projection under future climate predicted a decline in fecundity, although this response was heterogeneous across the range. In response, we identified novel genotypes that could be introduced to genetically offset the fitness decay. Our study highlights the value of predictive models to aid seed provenancing and improve the success of revegetation projects.
Collapse
Affiliation(s)
- Andhika R Putra
- School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia
| | - Jian D L Yen
- Arthur Rylah Institute for Environmental Research, Heidelberg, Victoria, Australia
| | | |
Collapse
|
16
|
Ray S, Jarquin D, Howard R. Comparing artificial-intelligence techniques with state-of-the-art parametric prediction models for predicting soybean traits. THE PLANT GENOME 2023; 16:e20263. [PMID: 36484148 DOI: 10.1002/tpg2.20263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 05/16/2022] [Indexed: 05/10/2023]
Abstract
Soybean [Glycine max (L.) Merr.] is a significant source of protein and oil and is also widely used as animal feed. Thus, developing lines that are superior in terms of yield, protein, and oil content is important to feed the ever-growing population. As opposed to high-cost phenotyping, genotyping is both cost and time efficient for breeders because evaluating new lines in different environments (location-year combinations) can be costly. Several genomic prediction (GP) methods have been developed to use the marker and environment data effectively to predict the yield or other relevant phenotypic traits of crops. Our study compares a conventional GP method (genomic best linear unbiased predictor [GBLUP]), a kernel method (Gaussian kernel [GK]), an artificial-intelligence (AI) method (deep learning [DL]), and a hybrid method that corresponds to the emulation of a DL model using a kernel method (an arc-cosine kernel [AK]) in terms of their prediction accuracies for predicting grain yield, oil, and protein using data from the soybean nested association mapping experiment (1,379 genotypes tested in six environments, all genotypes in all environments). The relative performance of the four methods varied with the response variable and whether the model includes the genotype × environmental interaction (G×E) effects or not. The GBLUP consistently showed better performances, whereas GK and AK followed a similar pattern to GBLUP and DL performed slightly worse than the other three methods in most of the cases; however, this may also be attributed to suboptimal hyperparameters. The DL method performed particularly worse than the other three methods in presence of the G×E effects.
Collapse
Affiliation(s)
- Susweta Ray
- Dep. of Statistics, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583, USA
| | - Diego Jarquin
- Dep. of Agronomy, Univ. of Florida, Gainesville, FL, 32611, USA
| | - Reka Howard
- Dep. of Statistics, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583, USA
| |
Collapse
|
17
|
Nguyen VH, Morantte RIZ, Lopena V, Verdeprado H, Murori R, Ndayiragije A, Katiyar SK, Islam MR, Juma RU, Flandez-Galvez H, Glaszmann JC, Cobb JN, Bartholomé J. Multi-environment Genomic Selection in Rice Elite Breeding Lines. RICE (NEW YORK, N.Y.) 2023; 16:7. [PMID: 36752880 PMCID: PMC9908796 DOI: 10.1186/s12284-023-00623-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND Assessing the performance of elite lines in target environments is essential for breeding programs to select the most relevant genotypes. One of the main complexities in this task resides in accounting for the genotype by environment interactions. Genomic prediction models that integrate information from multi-environment trials and environmental covariates can be efficient tools in this context. The objective of this study was to assess the predictive ability of different genomic prediction models to optimize the use of multi-environment information. We used 111 elite breeding lines representing the diversity of the international rice research institute breeding program for irrigated ecosystems. The lines were evaluated for three traits (days to flowering, plant height, and grain yield) in 15 environments in Asia and Africa and genotyped with 882 SNP markers. We evaluated the efficiency of genomic prediction to predict untested environments using seven multi-environment models and three cross-validation scenarios. RESULTS The elite lines were found to belong to the indica group and more specifically the indica-1B subgroup which gathered improved material originating from the Green Revolution. Phenotypic correlations between environments were high for days to flowering and plant height (33% and 54% of pairwise correlation greater than 0.5) but low for grain yield (lower than 0.2 in most cases). Clustering analyses based on environmental covariates separated Asia's and Africa's environments into different clusters or subclusters. The predictive abilities ranged from 0.06 to 0.79 for days to flowering, 0.25-0.88 for plant height, and - 0.29-0.62 for grain yield. We found that models integrating genotype-by-environment interaction effects did not perform significantly better than models integrating only main effects (genotypes and environment or environmental covariates). The different cross-validation scenarios showed that, in most cases, the use of all available environments gave better results than a subset. CONCLUSION Multi-environment genomic prediction models with main effects were sufficient for accurate phenotypic prediction of elite lines in targeted environments. These results will help refine the testing strategy to update the genomic prediction models to improve predictive ability.
Collapse
Affiliation(s)
- Van Hieu Nguyen
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
- Institute of Crop Science, College of Agriculture and Food Science, University of the Philippines, Los Baños, Laguna, Philippines
| | - Rose Imee Zhella Morantte
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Vitaliano Lopena
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Holden Verdeprado
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Rosemary Murori
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Alexis Ndayiragije
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Sanjay Kumar Katiyar
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Md Rafiqul Islam
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Roselyne Uside Juma
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
| | - Hayde Flandez-Galvez
- Institute of Crop Science, College of Agriculture and Food Science, University of the Philippines, Los Baños, Laguna, Philippines
| | - Jean-Christophe Glaszmann
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Joshua N Cobb
- Rice Breeding Innovation Platform, International Rice Research Institute, DAPO, Box7777, Metro Manila, Philippines
- RiceTec. Inc, PO Box 1305, Alvin, TX, 77512, USA
| | - Jérôme Bartholomé
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France.
- CIRAD, UMR AGAP Institut, Cali, Colombia.
- Alliance Bioversity-CIAT, Cali, Colombia.
| |
Collapse
|
18
|
Jeon D, Kang Y, Lee S, Choi S, Sung Y, Lee TH, Kim C. Digitalizing breeding in plants: A new trend of next-generation breeding based on genomic prediction. FRONTIERS IN PLANT SCIENCE 2023; 14:1092584. [PMID: 36743488 PMCID: PMC9892199 DOI: 10.3389/fpls.2023.1092584] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 01/05/2023] [Indexed: 06/18/2023]
Abstract
As the world's population grows and food needs diversification, the demand for cereals and horticultural crops with beneficial traits increases. In order to meet a variety of demands, suitable cultivars and innovative breeding methods need to be developed. Breeding methods have changed over time following the advance of genetics. With the advent of new sequencing technology in the early 21st century, predictive breeding, such as genomic selection (GS), emerged when large-scale genomic information became available. GS shows good predictive ability for the selection of individuals with traits of interest even for quantitative traits by using various types of the whole genome-scanning markers, breaking away from the limitations of marker-assisted selection (MAS). In the current review, we briefly describe the history of breeding techniques, each breeding method, various statistical models applied to GS and methods to increase the GS efficiency. Consequently, we intend to propose and define the term digital breeding through this review article. Digital breeding is to develop a predictive breeding methods such as GS at a higher level, aiming to minimize human intervention by automatically proceeding breeding design, propagating breeding populations, and to make selections in consideration of various environments, climates, and topography during the breeding process. We also classified the phases of digital breeding based on the technologies and methods applied to each phase. This review paper will provide an understanding and a direction for the final evolution of plant breeding in the future.
Collapse
Affiliation(s)
- Donghyun Jeon
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
| | - Yuna Kang
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Solji Lee
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Sehyun Choi
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Yeonjun Sung
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
| | - Tae-Ho Lee
- Genomics Division, National Institute of Agricultural Sciences, Jeonju, Republic of Korea
| | - Changsoo Kim
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| |
Collapse
|
19
|
Jubair S, Domaratzki M. Crop genomic selection with deep learning and environmental data: A survey. Front Artif Intell 2023; 5:1040295. [PMID: 36703955 PMCID: PMC9871498 DOI: 10.3389/frai.2022.1040295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/22/2022] [Indexed: 01/12/2023] Open
Abstract
Machine learning techniques for crop genomic selections, especially for single-environment plants, are well-developed. These machine learning models, which use dense genome-wide markers to predict phenotype, routinely perform well on single-environment datasets, especially for complex traits affected by multiple markers. On the other hand, machine learning models for predicting crop phenotype, especially deep learning models, using datasets that span different environmental conditions, have only recently emerged. Models that can accept heterogeneous data sources, such as temperature, soil conditions and precipitation, are natural choices for modeling GxE in multi-environment prediction. Here, we review emerging deep learning techniques that incorporate environmental data directly into genomic selection models.
Collapse
Affiliation(s)
- Sheikh Jubair
- Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada
| | - Mike Domaratzki
- Department of Computer Science, University of Western Ontario, London, ON, Canada
| |
Collapse
|
20
|
Cuevas J, Reslow F, Crossa J, Ortiz R. Modeling genotype × environment interaction for single and multitrait genomic prediction in potato (Solanum tuberosum L.). G3 (BETHESDA, MD.) 2022; 13:6883526. [PMID: 36477309 PMCID: PMC9911059 DOI: 10.1093/g3journal/jkac322] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 11/01/2022] [Accepted: 11/28/2022] [Indexed: 12/13/2022]
Abstract
In this study, we extend research on genomic prediction (GP) to polysomic polyploid plant species with the main objective to investigate single-trait (ST) and multitrait (MT) multienvironment (ME) models using field trial data from 3 locations in Sweden [Helgegården (HEL), Mosslunda (MOS), Umeå (UM)] over 2 years (2020, 2021) of 253 potato cultivars and breeding clones for 5 tuber weight traits and 2 tuber flesh quality characteristics. This research investigated the GP of 4 genome-based prediction models with genotype × environment interactions (GEs): (1) ST reaction norm model (M1), (2) ST model considering covariances between environments (M2), (3) ST M2 extended to include a random vector that utilizes the environmental covariances (M3), and (4) MT model with GE (M4). Several prediction problems were analyzed for each of the GP accuracy of the 4 models. Results of the prediction of traits in HEL, the high yield potential testing site in 2021, show that the best-predicted traits were tuber flesh starch (%), weight of tuber above 60 or below 40 mm in size, and the total tuber weight. In terms of GP, accuracy model M4 gave the best prediction accuracy in 3 traits, namely tuber weight of 40-50 or above 60 mm in size, and total tuber weight, and very similar in the starch trait. For MOS in 2021, the best predictive traits were starch, weight of tubers above 60, 50-60, or below 40 mm in size, and the total tuber weight. MT model M4 was the best GP model based on its accuracy when some cultivars are observed in some traits. For the GP accuracy of traits in UM in 2021, the best predictive traits were the weight of tubers above 60, 50-60, or below 40 mm in size, and the best model was MT M4, followed by models ST M3 and M2.
Collapse
Affiliation(s)
- Jaime Cuevas
- Departamento de Energía, Universidad Autónoma del Estado de Quintana Roo, Chetumal, Quintana Roo 77019, México
| | - Fredrik Reslow
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), P.O. Box 190, Lomma SE 23436, Sweden
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz Km. 45, El Batán, Texcoco 56237, Edo. de Mexico, Mexico,Colegio de Postgraduados, Montecillos, Edo. de México 56230, México
| | - Rodomiro Ortiz
- Corresponding author: Sveriges Lantbruksuniversitet, Inst. för Växtförädling, Box 190, SE 23 422 Lomma, Sweden.
| |
Collapse
|
21
|
Łysko A, Popiela A, Forczmański P, V. AM, Lukács BA, Barta Z, Maćków W, Wolski GJ. Comparison of discriminant methods and deep learning analysis in plant taxonomy: a case study of Elatine. Sci Rep 2022; 12:20450. [PMID: 36443472 PMCID: PMC9705712 DOI: 10.1038/s41598-022-24660-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 11/18/2022] [Indexed: 11/29/2022] Open
Abstract
Elatine is a genus in which, flower and seed characteristics are the most important diagnostic features; i.e. seed shape and the structure of its cover found to be the most reliable identification character. We used a combination of classic discriminant methods by combining with deep learning techniques to analyze seed morphometric data within 28 populations of six Elatine species from 11 countries throughout the Northern Hemisphere to compare the obtained results and then check their taxonomic classification. Our findings indicate that among the discriminant methods, Quadratic Discriminant Analysis (QDA) had the highest percentage of correct matching (mean fit-91.23%); only the deep machine learning method based on Convolutional Neural Network (CNN) was characterized by a higher match (mean fit-93.40%). The QDA method recognized the seeds of E. brochonii and E. orthosperma with 99% accuracy, and the CNN method with 100%. Other taxa, such as E. alsinastrum, E. trianda, E. californica and E. hungarica were matched with an accuracy of at least 95% (CNN). Our results indicate that the CNN obtains remarkably more accurate classifications than classic discriminant methods, and better recognizes the entire taxa pool analyzed. The least recognized species are E. macropoda and E. hexandra (88% and 78% match).
Collapse
Affiliation(s)
- Andrzej Łysko
- grid.411391.f0000 0001 0659 0011Faculty of Computer Science and Information Technology, West Pomeranian University of Technology in Szczecin, Szczecin, Poland
| | - Agnieszka Popiela
- grid.79757.3b0000 0000 8780 7659Institute of Biology, University of Szczecin, Szczecin, Poland
| | - Paweł Forczmański
- grid.411391.f0000 0001 0659 0011Faculty of Computer Science and Information Technology, West Pomeranian University of Technology in Szczecin, Szczecin, Poland
| | - Attila Molnár V.
- grid.7122.60000 0001 1088 8582Department of Botany, University of Debrecen, Debrecen, Hungary ,ELKH-DE Conservation Biology Research Group, Debrecen, Hungary
| | - Balázs András Lukács
- ELKH-DE Conservation Biology Research Group, Debrecen, Hungary ,grid.481817.3Wetland Ecology Research Group, Centre of Ecological Research, Debrecen, Hungary
| | - Zoltán Barta
- grid.7122.60000 0001 1088 8582ELKH-DE Behavioural Ecology Research Group, Department of Evolutionary Zoology and Human Biology, University of Debrecen, Debrecen, Hungary
| | - Witold Maćków
- grid.411391.f0000 0001 0659 0011Faculty of Computer Science and Information Technology, West Pomeranian University of Technology in Szczecin, Szczecin, Poland
| | - Grzegorz J. Wolski
- grid.10789.370000 0000 9730 2769Department of Geobotany and Plant Ecology, Faculty of Biology and Environmental Protection, University of Łódź, ul. Banacha 12/16, 90-237 Łódź, Poland
| |
Collapse
|
22
|
John M, Haselbeck F, Dass R, Malisi C, Ricca P, Dreischer C, Schultheiss SJ, Grimm DG. A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species. FRONTIERS IN PLANT SCIENCE 2022; 13:932512. [PMID: 36407627 PMCID: PMC9673477 DOI: 10.3389/fpls.2022.932512] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 07/25/2022] [Indexed: 06/16/2023]
Abstract
Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare 12 different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allow us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well-established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research.
Collapse
Affiliation(s)
- Maura John
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
| | - Florian Haselbeck
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
| | | | | | | | | | | | - Dominik G. Grimm
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
- Technical University of Munich, Department of Informatics, Garching, Germany
| |
Collapse
|
23
|
Ballén-Taborda C, Lyerly J, Smith J, Howell K, Brown-Guedira G, Babar MA, Harrison SA, Mason RE, Mergoum M, Murphy JP, Sutton R, Griffey CA, Boyles RE. Utilizing genomics and historical data to optimize gene pools for new breeding programs: A case study in winter wheat. Front Genet 2022; 13:964684. [PMID: 36276956 PMCID: PMC9585219 DOI: 10.3389/fgene.2022.964684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 08/05/2022] [Indexed: 11/13/2022] Open
Abstract
With the rapid generation and preservation of both genomic and phenotypic information for many genotypes within crops and across locations, emerging breeding programs have a valuable opportunity to leverage these resources to 1) establish the most appropriate genetic foundation at program inception and 2) implement robust genomic prediction platforms that can effectively select future breeding lines. Integrating genomics-enabled1 breeding into cultivar development can save costs and allow resources to be reallocated towards advanced (i.e., later) stages of field evaluation, which can facilitate an increased number of testing locations and replicates within locations. In this context, a reestablished winter wheat breeding program was used as a case study to understand best practices to leverage and tailor existing genomic and phenotypic resources to determine optimal genetics for a specific target population of environments. First, historical multi-environment phenotype data, representing 1,285 advanced breeding lines, were compiled from multi-institutional testing as part of the SunGrains cooperative and used to produce GGE biplots and PCA for yield. Locations were clustered based on highly correlated line performance among the target population of environments into 22 subsets. For each of the subsets generated, EMMs and BLUPs were calculated using linear models with the ‘lme4’ R package. Second, for each subset, TPs representative of the new SC breeding lines were determined based on genetic relatedness using the ‘STPGA’ R package. Third, for each TP, phenotypic values and SNP data were incorporated into the ‘rrBLUP’ mixed models for generation of GEBVs of YLD, TW, HD and PH. Using a five-fold cross-validation strategy, an average accuracy of r = 0.42 was obtained for yield between all TPs. The validation performed with 58 SC elite breeding lines resulted in an accuracy of r = 0.62 when the TP included complete historical data. Lastly, QTL-by-environment interaction for 18 major effect genes across three geographic regions was examined. Lines harboring major QTL in the absence of disease could potentially underperform (e.g., Fhb1 R-gene), whereas it is advantageous to express a major QTL under biotic pressure (e.g., stripe rust R-gene). This study highlights the importance of genomics-enabled breeding and multi-institutional partnerships to accelerate cultivar development.
Collapse
Affiliation(s)
- Carolina Ballén-Taborda
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, United States
- Pee Dee Research and Education Center, Clemson University, Florence, SC, United States
| | - Jeanette Lyerly
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
| | - Jared Smith
- U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), Raleigh, NC, United States
| | - Kimberly Howell
- U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), Raleigh, NC, United States
| | - Gina Brown-Guedira
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
- U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), Raleigh, NC, United States
| | - Md. Ali Babar
- Agronomy Department, University of Florida, Gainesville, FL, United States
| | - Stephen A. Harrison
- School of Plant, Environmental and Soil Sciences, Louisiana State University, Baton Rouge, LA, United States
| | - Richard E. Mason
- College of Agricultural Sciences, Colorado State University, Fort Collins, CO, United States
| | - Mohamed Mergoum
- Department of Crop and Soil Sciences, University of Georgia, Griffin, GA, United States
| | - J. Paul Murphy
- Crop and Soil Sciences Department, North Carolina State University, Raleigh, NC, United States
| | - Russell Sutton
- Department of Soil and Crop Sciences, Texas A&M University, Commerce, TX, United States
| | - Carl A. Griffey
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, United States
| | - Richard E. Boyles
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, United States
- Pee Dee Research and Education Center, Clemson University, Florence, SC, United States
- *Correspondence: Richard E. Boyles,
| |
Collapse
|
24
|
Raffo MA, Sarup P, Andersen JR, Orabi J, Jahoor A, Jensen J. Integrating a growth degree-days based reaction norm methodology and multi-trait modeling for genomic prediction in wheat. FRONTIERS IN PLANT SCIENCE 2022; 13:939448. [PMID: 36119585 PMCID: PMC9481302 DOI: 10.3389/fpls.2022.939448] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 08/08/2022] [Indexed: 05/26/2023]
Abstract
Multi-trait and multi-environment analyses can improve genomic prediction by exploiting between-trait correlations and genotype-by-environment interactions. In the context of reaction norm models, genotype-by-environment interactions can be described as functions of high-dimensional sets of markers and environmental covariates. However, comprehensive multi-trait reaction norm models accounting for marker × environmental covariates interactions are lacking. In this article, we propose to extend a reaction norm model incorporating genotype-by-environment interactions through (co)variance structures of markers and environmental covariates to a multi-trait reaction norm case. To do that, we propose a novel methodology for characterizing the environment at different growth stages based on growth degree-days (GDD). The proposed models were evaluated by variance components estimation and predictive performance for winter wheat grain yield and protein content in a set of 2,015 F6-lines. Cross-validation analyses were performed using leave-one-year-location-out (CV1) and leave-one-breeding-cycle-out (CV2) strategies. The modeling of genomic [SNPs] × environmental covariates interactions significantly improved predictive ability and reduced the variance inflation of predicted genetic values for grain yield and protein content in both cross-validation schemes. Trait-assisted genomic prediction was carried out for multi-trait models, and it significantly enhanced predictive ability and reduced variance inflation in all scenarios. The genotype by environment interaction modeling via genomic [SNPs] × environmental covariates interactions, combined with trait-assisted genomic prediction, boosted the benefits in predictive performance. The proposed multi-trait reaction norm methodology is a comprehensive approach that allows capitalizing on the benefits of multi-trait models accounting for between-trait correlations and reaction norm models exploiting high-dimensional genomic and environmental information.
Collapse
Affiliation(s)
- Miguel Angel Raffo
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Pernille Sarup
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
- Nordic Seed A/S, Odder, Denmark
| | | | | | - Ahmed Jahoor
- Nordic Seed A/S, Odder, Denmark
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Just Jensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| |
Collapse
|
25
|
Forkman J, Malik WA, Hadasch S, Piepho HP. Testing components of two-way interaction in multi-environment trials. COMMUN STAT-THEOR M 2022. [DOI: 10.1080/03610926.2022.2108058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- Johannes Forkman
- Department of Crop Production Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Waqas Ahmed Malik
- Institute of Crop Science, University of Hohenheim, Stuttgart, Germany
| | - Steffen Hadasch
- Institute of Crop Science, University of Hohenheim, Stuttgart, Germany
| | - Hans-Peter Piepho
- Institute of Crop Science, University of Hohenheim, Stuttgart, Germany
| |
Collapse
|
26
|
A Review of Deep Learning Applications for the Next Generation of Cognitive Networks. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12126262] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Intelligence capabilities will be the cornerstone in the development of next-generation cognitive networks. These capabilities allow them to observe network conditions, learn from them, and then, using prior knowledge gained, respond to its operating environment to optimize network performance. This study aims to offer an overview of the current state of the art related to the use of deep learning in applications for intelligent cognitive networks that can serve as a reference for future initiatives in this field. For this, a systematic literature review was carried out in three databases, and eligible articles were selected that focused on using deep learning to solve challenges presented by current cognitive networks. As a result, 14 articles were analyzed. The results showed that applying algorithms based on deep learning to optimize cognitive data networks has been approached from different perspectives in recent years and in an experimental way to test its technological feasibility. In addition, its implications for solving fundamental challenges in current wireless networks are discussed.
Collapse
|
27
|
Zhang Q, Zhang Q, Jensen J. Association Studies and Genomic Prediction for Genetic Improvements in Agriculture. FRONTIERS IN PLANT SCIENCE 2022; 13:904230. [PMID: 35720549 PMCID: PMC9201771 DOI: 10.3389/fpls.2022.904230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 05/16/2022] [Indexed: 06/15/2023]
Abstract
To feed the fast growing global population with sufficient food using limited global resources, it is urgent to develop and utilize cutting-edge technologies and improve efficiency of agricultural production. In this review, we specifically introduce the concepts, theories, methods, applications and future implications of association studies and predicting unknown genetic value or future phenotypic events using genomics in the area of breeding in agriculture. Genome wide association studies can identify the quantitative genetic loci associated with phenotypes of importance in agriculture, while genomic prediction utilizes individual genetic value to rank selection candidates to improve the next generation of plants or animals. These technologies and methods have improved the efficiency of genetic improvement programs for agricultural production via elite animal breeds and plant varieties. With the development of new data acquisition technologies, there will be more and more data collected from high-through-put technologies to assist agricultural breeding. It will be crucial to extract useful information among these large amounts of data and to face this challenge, more efficient algorithms need to be developed and utilized for analyzing these data. Such development will require knowledge from multiple disciplines of research.
Collapse
Affiliation(s)
- Qianqian Zhang
- Institute of Biotechnology, Beijing Academy of Agricultural and Forestry Sciences, Beijing, China
| | - Qin Zhang
- College of Animal Science and Technology, Shandong Agricultural University, Taian, China
- College of Animal Science and Technology, China Agricultural University, BeijingChina
| | - Just Jensen
- Centre for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
28
|
Danilevicz MF, Gill M, Anderson R, Batley J, Bennamoun M, Bayer PE, Edwards D. Plant Genotype to Phenotype Prediction Using Machine Learning. Front Genet 2022; 13:822173. [PMID: 35664329 PMCID: PMC9159391 DOI: 10.3389/fgene.2022.822173] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 03/07/2022] [Indexed: 12/13/2022] Open
Abstract
Genomic prediction tools support crop breeding based on statistical methods, such as the genomic best linear unbiased prediction (GBLUP). However, these tools are not designed to capture non-linear relationships within multi-dimensional datasets, or deal with high dimension datasets such as imagery collected by unmanned aerial vehicles. Machine learning (ML) algorithms have the potential to surpass the prediction accuracy of current tools used for genotype to phenotype prediction, due to their capacity to autonomously extract data features and represent their relationships at multiple levels of abstraction. This review addresses the challenges of applying statistical and machine learning methods for predicting phenotypic traits based on genetic markers, environment data, and imagery for crop breeding. We present the advantages and disadvantages of explainable model structures, discuss the potential of machine learning models for genotype to phenotype prediction in crop breeding, and the challenges, including the scarcity of high-quality datasets, inconsistent metadata annotation and the requirements of ML models.
Collapse
Affiliation(s)
- Monica F. Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Mitchell Gill
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Robyn Anderson
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Mohammed Bennamoun
- School of Physics, Mathematics and Computing, University of Western Australia, Perth, WA, Australia
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
- *Correspondence: David Edwards,
| |
Collapse
|
29
|
Genome-Enabled Prediction Methods Based on Machine Learning. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:189-218. [PMID: 35451777 DOI: 10.1007/978-1-0716-2205-6_7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Growth of artificial intelligence and machine learning (ML) methodology has been explosive in recent years. In this class of procedures, computers get knowledge from sets of experiences and provide forecasts or classification. In genome-wide based prediction (GWP), many ML studies have been carried out. This chapter provides a description of main semiparametric and nonparametric algorithms used in GWP in animals and plants. Thirty-four ML comparative studies conducted in the last decade were used to develop a meta-analysis through a Thurstonian model, to evaluate algorithms with the best predictive qualities. It was found that some kernel, Bayesian, and ensemble methods displayed greater robustness and predictive ability. However, the type of study and data distribution must be considered in order to choose the most appropriate model for a given problem.
Collapse
|
30
|
Rico-Chávez AK, Franco JA, Fernandez-Jaramillo AA, Contreras-Medina LM, Guevara-González RG, Hernandez-Escobedo Q. Machine Learning for Plant Stress Modeling: A Perspective towards Hormesis Management. PLANTS 2022; 11:plants11070970. [PMID: 35406950 PMCID: PMC9003083 DOI: 10.3390/plants11070970] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 03/28/2022] [Accepted: 03/31/2022] [Indexed: 01/11/2023]
Abstract
Plant stress is one of the most significant factors affecting plant fitness and, consequently, food production. However, plant stress may also be profitable since it behaves hormetically; at low doses, it stimulates positive traits in crops, such as the synthesis of specialized metabolites and additional stress tolerance. The controlled exposure of crops to low doses of stressors is therefore called hormesis management, and it is a promising method to increase crop productivity and quality. Nevertheless, hormesis management has severe limitations derived from the complexity of plant physiological responses to stress. Many technological advances assist plant stress science in overcoming such limitations, which results in extensive datasets originating from the multiple layers of the plant defensive response. For that reason, artificial intelligence tools, particularly Machine Learning (ML) and Deep Learning (DL), have become crucial for processing and interpreting data to accurately model plant stress responses such as genomic variation, gene and protein expression, and metabolite biosynthesis. In this review, we discuss the most recent ML and DL applications in plant stress science, focusing on their potential for improving the development of hormesis management protocols.
Collapse
Affiliation(s)
- Amanda Kim Rico-Chávez
- Unidad de Ingeniería en Biosistemas, Facultad de Ingeniería Campus Amazcala, Universidad Autónoma de Querétaro, Carretera Chichimequillas, s/n km 1, El Marqués CP 76265, Mexico; (A.K.R.-C.); (L.M.C.-M.)
| | - Jesus Alejandro Franco
- Escuela Nacional de Estudios Superiores Unidad Juriquilla, UNAM, Querétaro CP 76230, Mexico;
| | - Arturo Alfonso Fernandez-Jaramillo
- Unidad Académica de Ingeniería Biomédica, Universidad Politécnica de Sinaloa, Carretera Municipal Libre Mazatlán Higueras km 3, Col. Genaro Estrada, Mazatlán CP 82199, Mexico;
| | - Luis Miguel Contreras-Medina
- Unidad de Ingeniería en Biosistemas, Facultad de Ingeniería Campus Amazcala, Universidad Autónoma de Querétaro, Carretera Chichimequillas, s/n km 1, El Marqués CP 76265, Mexico; (A.K.R.-C.); (L.M.C.-M.)
| | - Ramón Gerardo Guevara-González
- Unidad de Ingeniería en Biosistemas, Facultad de Ingeniería Campus Amazcala, Universidad Autónoma de Querétaro, Carretera Chichimequillas, s/n km 1, El Marqués CP 76265, Mexico; (A.K.R.-C.); (L.M.C.-M.)
- Correspondence: (R.G.G.-G.); (Q.H.-E.)
| | - Quetzalcoatl Hernandez-Escobedo
- Escuela Nacional de Estudios Superiores Unidad Juriquilla, UNAM, Querétaro CP 76230, Mexico;
- Correspondence: (R.G.G.-G.); (Q.H.-E.)
| |
Collapse
|
31
|
Galli G, Sabadin F, Yassue RM, Galves C, Carvalho HF, Crossa J, Montesinos-López OA, Fritsche-Neto R. Automated Machine Learning: A Case Study of Genomic "Image-Based" Prediction in Maize Hybrids. FRONTIERS IN PLANT SCIENCE 2022; 13:845524. [PMID: 35321444 PMCID: PMC8936805 DOI: 10.3389/fpls.2022.845524] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 02/03/2022] [Indexed: 06/14/2023]
Abstract
Machine learning methods such as multilayer perceptrons (MLP) and Convolutional Neural Networks (CNN) have emerged as promising methods for genomic prediction (GP). In this context, we assess the performance of MLP and CNN on regression and classification tasks in a case study with maize hybrids. The genomic information was provided to the MLP as a relationship matrix and to the CNN as "genomic images." In the regression task, the machine learning models were compared along with GBLUP. Under the classification task, MLP and CNN were compared. In this case, the traits (plant height and grain yield) were discretized in such a way to create balanced (moderate selection intensity) and unbalanced (extreme selection intensity) datasets for further evaluations. An automatic hyperparameter search for MLP and CNN was performed, and the best models were reported. For both task types, several metrics were calculated under a validation scheme to assess the effect of the prediction method and other variables. Overall, MLP and CNN presented competitive results to GBLUP. Also, we bring new insights on automated machine learning for genomic prediction and its implications to plant breeding.
Collapse
Affiliation(s)
- Giovanni Galli
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Felipe Sabadin
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, United States
| | - Rafael Massahiro Yassue
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Cassia Galves
- Department of Food Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | | | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - Roberto Fritsche-Neto
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
- International Rice Research Institute (IRRI), Los Baños, Philippines
| |
Collapse
|
32
|
Rio S, Akdemir D, Carvalho T, Sánchez JIY. Assessment of genomic prediction reliability and optimization of experimental designs in multi-environment trials. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:405-419. [PMID: 34807267 PMCID: PMC8866390 DOI: 10.1007/s00122-021-03972-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Accepted: 10/08/2021] [Indexed: 06/13/2023]
Abstract
New forms of the coefficient of determination can help to forecast the accuracy of genomic prediction and optimize experimental designs in multi-environment trials with genotype-by-environment interactions. In multi-environment trials, the relative performance of genotypes may vary depending on the environmental conditions, and this phenomenon is commonly referred to as genotype-by-environment interaction (G[Formula: see text]E). With genomic prediction, G[Formula: see text]E can be accounted for by modeling the genetic covariance between trials, even when the overall experimental design is highly unbalanced between trials, thanks to the genomic relationship between genotypes. In this study, we propose new forms of the coefficient of determination (CD, i.e., the expected model-based square correlation between a genetic value and its corresponding prediction) that can be used to forecast the genomic prediction reliability of genotypes, both for their trial-specific performance and their mean performance. As the expected prediction reliability based on these new CD criteria is generally a good approximation of the observed reliability, we demonstrate that they can be used to optimize multi-environment trials in the presence of G[Formula: see text]E. In addition, this reliability may be highly variable between genotypes, especially in unbalanced designs with complex pedigree relationships between genotypes. Therefore, it can be useful for breeders to assess it before selecting genotypes based on their predicted genetic values. Using a wheat population evaluated both for simulated and phenology traits, and two maize populations evaluated for grain yield, we illustrate this approach and confirm the value of our new CD criteria.
Collapse
Affiliation(s)
- Simon Rio
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA) Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón Madrid, Spain
| | - Deniz Akdemir
- CIBMTR (Center for International Blood and Marrow Transplant Research), National Marrow Donor Program/Be The Match, Minneapolis, MN USA
| | - Tiago Carvalho
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA) Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón Madrid, Spain
| | - Julio Isidro y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA) Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón Madrid, Spain
| |
Collapse
|
33
|
Martini JWR, Gao N, Crossa J. Incorporating Omics Data in Genomic Prediction. Methods Mol Biol 2022; 2467:341-357. [PMID: 35451782 DOI: 10.1007/978-1-0716-2205-6_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this chapter, we discuss the motivation for integrating other types of omics data into genomic prediction methods. We give an overview of literature investigating the performance of omics-enhanced predictions, and highlight potential pitfalls when applying these methods in breeding. We emphasize that the statistical methods available for genomic data can be transferred to the general omics case. However, when using a framework of omic relationship matrices, the standardization of the variables may be more relevant than it is for a genomic relationship matrix based on single-nucleotide polymorphisms.
Collapse
Affiliation(s)
- Johannes W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Veracruz, CP, Mexico.
| | - Ning Gao
- School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Veracruz, CP, Mexico
| |
Collapse
|
34
|
Montesinos-López OA, Montesinos-López A, Mosqueda-Gonzalez BA, Montesinos-López JC, Crossa J. Accounting for Correlation Between Traits in Genomic Prediction. Methods Mol Biol 2022; 2467:285-327. [PMID: 35451780 DOI: 10.1007/978-1-0716-2205-6_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genomic enabled prediction is playing a key role for the success of genomic selection (GS). However, according to the No Free Lunch Theorem, there is not a universal model that performs well for all data sets. Due to this, many statistical and machine learning models are available for genomic prediction. When multitrait data is available, models that are able to account for correlations between phenotypic traits are preferred, since these models help increase the prediction accuracy when the degree of correlation is moderate to large. For this reason, in this chapter we review multitrait models for genome-enabled prediction and we illustrate the power of this model with real examples. In addition, we provide details of the software (R code) available for its application to help users implement these models with its own data. The multitrait models were implemented under conventional Bayesian Ridge regression and best linear unbiased predictor, but also under a deep learning framework. The multitrait deep learning framework helps implement prediction models with mixed outcomes (continuous, binary, ordinal, and count, measured on different scales), which is not easy in conventional statistical models. The illustrative examples are very detailed in order to make the implementation of multitrait models in plant and animal breeding friendlier for breeders and scientists.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Brandon A Mosqueda-Gonzalez
- Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Esq. Miguel Othón de Mendizábal, Mexico city, Mexico
| | | | - José Crossa
- Colegio de Postgraduados, Montecillos, Mexico.
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Carretera Mexico-Veracruz, Mexico.
| |
Collapse
|
35
|
Howard R, Jarquin D, Crossa J. Overview of Genomic Prediction Methods and the Associated Assumptions on the Variance of Marker Effect, and on the Architecture of the Target Trait. Methods Mol Biol 2022; 2467:139-156. [PMID: 35451775 DOI: 10.1007/978-1-0716-2205-6_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genomic selection (GS) is a methodology that revolutionized the process of breeding improved genetic materials in plant and animal breeding programs. It uses predicted genomic values of the potential of untested/unobserved genotypes as surrogates of phenotypes during the selection process. Such that the predicted genomic values are obtained using exclusively the marker profiles of the untested genotypes, and these potentially can be used by breeders for screening the genotypes to be advanced in the breeding pipeline, to identify potential parents for next improvement cycles, or to find optimal crosses for targeting genotypes among others. Conceptually, GS initially requires a set of genotypes with both molecular marker information and phenotypic data for model calibration and then the performance of untested genotypes is predicted using their marker profiles only. Hence, it is expected that breeders would look at these values in order to conduct selections. Even though the concept of GS seems trivial, due to the high dimensional nature of the data delivered from modern sequencing technologies where the number of molecular markers (p) excess by far the number of data points available for model fitting (n; p ≫ n) a complete renovated set of prediction models was needed to cope with this challenge. In this chapter, we provide a conceptual framework for comparing statistical models to overcome the "large p, small n problem." Given the very large diversity of GS models only the most popular are presented here; mainly we focused on linear regression-based models and nonparametric models that predict the genetic estimated breeding values (GEBV) in a single environment considering a single trait only, mainly in the context of plant breeding.
Collapse
Affiliation(s)
- Réka Howard
- University of Nebraska-Lincoln, Lincoln, NE, USA.
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| |
Collapse
|
36
|
Crossa J, Montesinos-López OA, Pérez-Rodríguez P, Costa-Neto G, Fritsche-Neto R, Ortiz R, Martini JWR, Lillemo M, Montesinos-López A, Jarquin D, Breseghello F, Cuevas J, Rincent R. Genome and Environment Based Prediction Models and Methods of Complex Traits Incorporating Genotype × Environment Interaction. Methods Mol Biol 2022; 2467:245-283. [PMID: 35451779 DOI: 10.1007/978-1-0716-2205-6_9] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genomic-enabled prediction models are of paramount importance for the successful implementation of genomic selection (GS) based on breeding values. As opposed to animal breeding, plant breeding includes extensive multienvironment and multiyear field trial data. Hence, genomic-enabled prediction models should include genotype × environment (G × E) interaction, which most of the time increases the prediction performance when the response of lines are different from environment to environment. In this chapter, we describe a historical timeline since 2012 related to advances of the GS models that take into account G × E interaction. We describe theoretical and practical aspects of those GS models, including the gains in prediction performance when including G × E structures for both complex continuous and categorical scale traits. Then, we detailed and explained the main G × E genomic prediction models for complex traits measured in continuous and noncontinuous (categorical) scale. Related to G × E interaction models this review also examine the analyses of the information generated with high-throughput phenotype data (phenomic) and the joint analyses of multitrait and multienvironment field trial data that is also employed in the general assessment of multitrait G × E interaction. The inclusion of nongenomic data in increasing the accuracy and biological reliability of the G × E approach is also outlined. We show the recent advances in large-scale envirotyping (enviromics), and how the use of mechanistic computational modeling can derive the crop growth and development aspects useful for predicting phenotypes and explaining G × E.
Collapse
Affiliation(s)
- José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, Mexico
- Colegio de Postgraduados, Montecillos, Mexico
| | | | | | - Germano Costa-Neto
- Departamento de Genética, Escola Superior de Agricultura "Luiz de Queiroz" (ESALQ/USP), São Paulo, Brazil
| | - Roberto Fritsche-Neto
- Departamento de Genética, Escola Superior de Agricultura "Luiz de Queiroz" (ESALQ/USP), São Paulo, Brazil
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), Alnarp, Sweden
| | - Johannes W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, Mexico
| | - Morten Lillemo
- Department of Plant Sciences, Norwegian University of Life Sciences, IHA/CIGENE, Ås, Norway
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | | | | | - Jaime Cuevas
- Universidad de Quintana Roo, Chetumal, Quintana Roo, Mexico.
| | - Renaud Rincent
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, Gif-sur-Yvette, France.
| |
Collapse
|
37
|
Rogers AR, Holland JB. Environment-specific genomic prediction ability in maize using environmental covariates depends on environmental similarity to training data. G3 (BETHESDA, MD.) 2021; 12:6486423. [PMID: 35100364 PMCID: PMC9245610 DOI: 10.1093/g3journal/jkab440] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 12/06/2021] [Indexed: 12/30/2022]
Abstract
Technology advances have made possible the collection of a wealth of genomic, environmental, and phenotypic data for use in plant breeding. Incorporation of environmental data into environment-specific genomic prediction is hindered in part because of inherently high data dimensionality. Computationally efficient approaches to combining genomic and environmental information may facilitate extension of genomic prediction models to new environments and germplasm, and better understanding of genotype-by-environment (G × E) interactions. Using genomic, yield trial, and environmental data on 1,918 unique hybrids evaluated in 59 environments from the maize Genomes to Fields project, we determined that a set of 10,153 SNP dominance coefficients and a 5-day temporal window size for summarizing environmental variables were optimal for genomic prediction using only genetic and environmental main effects. Adding marker-by-environment variable interactions required dimension reduction, and we found that reducing dimensionality of the genetic data while keeping the full set of environmental covariates was best for environment-specific genomic prediction of grain yield, leading to an increase in prediction ability of 2.7% to achieve a prediction ability of 80% across environments when data were masked at random. We then measured how prediction ability within environments was affected under stratified training-testing sets to approximate scenarios commonly encountered by plant breeders, finding that incorporation of marker-by-environment effects improved prediction ability in cases where training and test sets shared environments, but did not improve prediction in new untested environments. The environmental similarity between training and testing sets had a greater impact on the efficacy of prediction than genetic similarity between training and test sets.
Collapse
Affiliation(s)
- Anna R Rogers
- Program in Genetics, North Carolina State University, Raleigh, NC
27695, USA
| | - James B Holland
- Program in Genetics, North Carolina State University, Raleigh, NC
27695, USA,USDA-ARS Plant Science Research Unit, North Carolina State
University, Raleigh, NC 27695, USA,Department of Crop and Soil Sciences, North Carolina State
University, Raleigh, NC 27695, USA,Corresponding author: Department of Agriculture—Agriculture
Research Service, Box 7620 North Carolina State University, Raleigh, NC 27695-7620, USA.
| |
Collapse
|
38
|
Montesinos-López OA, Montesinos-López A, Mosqueda-González BA, Bentley AR, Lillemo M, Varshney RK, Crossa J. A New Deep Learning Calibration Method Enhances Genome-Based Prediction of Continuous Crop Traits. Front Genet 2021; 12:798840. [PMID: 34976026 PMCID: PMC8718701 DOI: 10.3389/fgene.2021.798840] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 11/18/2021] [Indexed: 11/13/2022] Open
Abstract
Genomic selection (GS) has the potential to revolutionize predictive plant breeding. A reference population is phenotyped and genotyped to train a statistical model that is used to perform genome-enabled predictions of new individuals that were only genotyped. In this vein, deep neural networks, are a type of machine learning model and have been widely adopted for use in GS studies, as they are not parametric methods, making them more adept at capturing nonlinear patterns. However, the training process for deep neural networks is very challenging due to the numerous hyper-parameters that need to be tuned, especially when imperfect tuning can result in biased predictions. In this paper we propose a simple method for calibrating (adjusting) the prediction of continuous response variables resulting from deep learning applications. We evaluated the proposed deep learning calibration method (DL_M2) using four crop breeding data sets and its performance was compared with the standard deep learning method (DL_M1), as well as the standard genomic Best Linear Unbiased Predictor (GBLUP). While the GBLUP was the most accurate model overall, the proposed deep learning calibration method (DL_M2) helped increase the genome-enabled prediction performance in all data sets when compared with the traditional DL method (DL_M1). Taken together, we provide evidence for extending the use of the proposed calibration method to evaluate its potential and consistency for predicting performance in the context of GS applied to plant breeding.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Mexico
- *Correspondence: Abelardo Montesinos-López, ; Rajeev K. Varshney, ; José Crossa,
| | - Brandon A. Mosqueda-González
- Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Esq. Miguel Othón de Mendizábal, Mexico city, Mexico
| | - Alison R. Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Morten Lillemo
- Department of Plant Sciences, Norwegian University of Life Sciences, IHA/CIGENE, As, Norway
| | - Rajeev K. Varshney
- Centre of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Murdoch University, Perth, WA, Australia
- *Correspondence: Abelardo Montesinos-López, ; Rajeev K. Varshney, ; José Crossa,
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
- Colegio de Postgraduados, Montecillo, Mexico
- *Correspondence: Abelardo Montesinos-López, ; Rajeev K. Varshney, ; José Crossa,
| |
Collapse
|
39
|
Washburn JD, Cimen E, Ramstein G, Reeves T, O'Briant P, McLean G, Cooper M, Hammer G, Buckler ES. Predicting phenotypes from genetic, environment, management, and historical data using CNNs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:3997-4011. [PMID: 34448888 DOI: 10.1007/s00122-021-03943-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 08/18/2021] [Indexed: 06/13/2023]
Abstract
Convolutional Neural Networks (CNNs) can perform similarly or better than standard genomic prediction methods when sufficient genetic, environmental, and management data are provided. Predicting phenotypes from genetic (G), environmental (E), and management (M) conditions is a long-standing challenge with implications to agriculture, medicine, and conservation. Most methods reduce the factors in a dataset (feature engineering) in a subjective and potentially oversimplified manner. Deep neural networks such as Multilayer Perceptrons (MPL) and Convolutional Neural Networks (CNN) can overcome this by allowing the data itself to determine which factors are most important. CNN models were developed for predicting agronomic yield from a combination of replicated trials and historical yield survey data. The results were more accurate than standard methods when tested on held-out G, E, and M data (r = 0.50 vs. r = 0.43), and performed slightly worse than standard methods when only G was held out (r = 0.74 vs. r = 0.80). Pre-training on historical data increased accuracy compared to trial data alone. Saliency map analysis indicated the CNN has "learned" to prioritize many factors of known agricultural importance.
Collapse
Affiliation(s)
- Jacob D Washburn
- United States Department of Agriculture, Agricultural Research Service, Columbia, MO, 65211, USA.
| | - Emre Cimen
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
- Computational Intelligence and Optimization Laboratory, Industrial Engineering Department, Eskisehir Technical University, Eskisehir, Turkey
| | - Guillaume Ramstein
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000, Aarhus, Denmark
| | - Timothy Reeves
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
| | - Patrick O'Briant
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
| | - Greg McLean
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| | - Mark Cooper
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| | - Graeme Hammer
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
- Department of Agriculture, Agricultural Research Service, Ithaca, NY, 14850, USA
| |
Collapse
|
40
|
Sandhu K, Patil SS, Pumphrey M, Carter A. Multitrait machine- and deep-learning models for genomic selection using spectral information in a wheat breeding program. THE PLANT GENOME 2021; 14:e20119. [PMID: 34482627 DOI: 10.1002/tpg2.20119] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 05/18/2021] [Indexed: 06/13/2023]
Abstract
Prediction of breeding values is central to plant breeding and has been revolutionized by the adoption of genomic selection (GS). Use of machine- and deep-learning algorithms applied to complex traits in plants can improve prediction accuracies. Because of the tremendous increase in collected data in breeding programs and the slow rate of genetic gain increase, it is required to explore the potential of artificial intelligence in analyzing the data. The main objectives of this study include optimization of multitrait (MT) machine- and deep-learning models for predicting grain yield and grain protein content in wheat (Triticum aestivum L.) using spectral information. This study compares the performance of four machine- and deep-learning-based unitrait (UT) and MT models with traditional genomic best linear unbiased predictor (GBLUP) and Bayesian models. The dataset consisted of 650 recombinant inbred lines (RILs) from a spring wheat breeding program grown for three years (2014-2016), and spectral data were collected at heading and grain filling stages. The MT-GS models performed 0-28.5 and -0.04 to 15% superior to the UT-GS models. Random forest and multilayer perceptron were the best performing machine- and deep-learning models to predict both traits. Four explored Bayesian models gave similar accuracies, which were less than machine- and deep-learning-based models and required increased computational time. Green normalized difference vegetation index (GNDVI) best predicted grain protein content in seven out of the nine MT-GS models. Overall, this study concluded that machine- and deep-learning-based MT-GS models increased prediction accuracy and should be employed in large-scale breeding programs.
Collapse
Affiliation(s)
- Karansher Sandhu
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| | - Shruti Sunil Patil
- School of Electrical Engineering and Computer Science, WA State University, Pullman, WA, 99164, USA
| | - Michael Pumphrey
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| | - Arron Carter
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| |
Collapse
|
41
|
Montesinos-Lopez OA, Montesinos-Lopez JC, Salazar E, Barron JA, Montesinos-Lopez A, Buenrostro-Mariscal R, Crossa J. Application of a Poisson deep neural network model for the prediction of count data in genome-based prediction. THE PLANT GENOME 2021; 14:e20118. [PMID: 34323393 DOI: 10.1002/tpg2.20118] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 05/15/2021] [Indexed: 06/13/2023]
Abstract
Genomic selection (GS) is revolutionizing conventional ways of developing new plants and animals. However, because it is a predictive methodology, GS strongly depends on statistical and machine learning to perform these predictions. For continuous outcomes, more models are available for GS. Unfortunately, for count data outcomes, there are few efficient statistical machine learning models for large datasets or for datasets with fewer observations than independent variables. For this reason, in this paper, we applied the univariate version of the Poisson deep neural network (PDNN) proposed earlier for genomic predictions of count data. The model was implemented with (a) the negative log-likelihood of Poisson distribution as the loss function, (b) the rectified linear activation unit as the activation function in hidden layers, and (c) the exponential activation function in the output layer. The advantage of the PDNN model is that it captures complex patterns in the data by implementing many nonlinear transformations in the hidden layers. Moreover, since it was implemented in Tensorflow as the back-end, and in Keras as the front-end, the model can be applied to moderate and large datasets, which is a significant advantage over previous GS models for count data. The PDNN model was compared with deep learning models with continuous outcomes, conventional generalized Poisson regression models, and conventional Bayesian regression methods. We found that the PDNN model outperformed the Bayesian regression and generalized Poisson regression methods in terms of prediction accuracy, although it was not better than the conventional deep neural network with continuous outcomes.
Collapse
Affiliation(s)
| | - Jose C Montesinos-Lopez
- Dep. de Estadística, Centro de Investigación en Matemáticas, Guanajuato, Guanajuato, 36023, México
| | - Eduardo Salazar
- Facultad de Telemática, Univ. de Colima, Colima, Colima, 28040, México
| | - Jose Alberto Barron
- Dep. of Animal Production (DPA), Universidad Nacional Agraria La Molina, Av. La Molina, s/n La Molina 15024, Lima, Perú
| | - Abelardo Montesinos-Lopez
- Dep. de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías, Univ. de Guadalajara, Guadalajara, Jalisco, 44430, México
| | | | - Jose Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Carretera km 45, Mexico-Veracruz, Texcoco, Edo. de México, CP 52640, México
- Colegio de Post-Graduados, CP 56230, Montecillos, Edo. de México, Texcoco, México
| |
Collapse
|
42
|
Tomar V, Singh D, Dhillon GS, Chung YS, Poland J, Singh RP, Joshi AK, Gautam Y, Tiwari BS, Kumar U. Increased Predictive Accuracy of Multi-Environment Genomic Prediction Model for Yield and Related Traits in Spring Wheat ( Triticum aestivum L.). FRONTIERS IN PLANT SCIENCE 2021; 12:720123. [PMID: 34691100 PMCID: PMC8531512 DOI: 10.3389/fpls.2021.720123] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 09/03/2021] [Indexed: 06/13/2023]
Abstract
Genomic selection (GS) has the potential to improve the selection gain for complex traits in crop breeding programs from resource-poor countries. The GS model performance in multi-environment (ME) trials was assessed for 141 advanced breeding lines under four field environments via cross-predictions. We compared prediction accuracy (PA) of two GS models with or without accounting for the environmental variation on four quantitative traits of significant importance, i.e., grain yield (GRYLD), thousand-grain weight, days to heading, and days to maturity, under North and Central Indian conditions. For each trait, we generated PA using the following two different ME cross-validation (CV) schemes representing actual breeding scenarios: (1) predicting untested lines in tested environments through the ME model (ME_CV1) and (2) predicting tested lines in untested environments through the ME model (ME_CV2). The ME predictions were compared with the baseline single-environment (SE) GS model (SE_CV1) representing a breeding scenario, where relationships and interactions are not leveraged across environments. Our results suggested that the ME models provide a clear advantage over SE models in terms of robust trait predictions. Both ME models provided 2-3 times higher prediction accuracies for all four traits across the four tested environments, highlighting the importance of accounting environmental variance in GS models. While the improvement in PA from SE to ME models was significant, the CV1 and CV2 schemes did not show any clear differences within ME, indicating the ME model was able to predict the untested environments and lines equally well. Overall, our results provide an important insight into the impact of environmental variation on GS in smaller breeding programs where these programs can potentially increase the rate of genetic gain by leveraging the ME wheat breeding trials.
Collapse
Affiliation(s)
- Vipin Tomar
- Borlaug Institute for South Asia, Ludhiana, India
- Department of Biological Sciences and Biotechnology, Institute of Advanced Research, Gandhinagar, India
- International Maize and Wheat Improvement Center, New Delhi, India
| | - Daljit Singh
- Department of Plant Pathology, Kansas State University, Manhattan, KS, United States
| | - Guriqbal Singh Dhillon
- Department of Biotechnology, Thapar Institute of Engineering & Technology, Patiala, India
| | - Yong Suk Chung
- Department of Plant Resources and Environment, Jeju National University, Jeju-si, South Korea
| | - Jesse Poland
- Department of Plant Pathology, Kansas State University, Manhattan, KS, United States
| | - Ravi Prakash Singh
- Global Wheat Program, International Maize and Wheat Improvement Center, Texcoco, Mexico
| | - Arun Kumar Joshi
- Borlaug Institute for South Asia, Ludhiana, India
- International Maize and Wheat Improvement Center, New Delhi, India
- Global Wheat Program, International Maize and Wheat Improvement Center, Texcoco, Mexico
| | | | - Budhi Sagar Tiwari
- Department of Biological Sciences and Biotechnology, Institute of Advanced Research, Gandhinagar, India
| | - Uttam Kumar
- Borlaug Institute for South Asia, Ludhiana, India
- International Maize and Wheat Improvement Center, New Delhi, India
- Global Wheat Program, International Maize and Wheat Improvement Center, Texcoco, Mexico
| |
Collapse
|
43
|
Amas J, Anderson R, Edwards D, Cowling W, Batley J. Status and advances in mining for blackleg (Leptosphaeria maculans) quantitative resistance (QR) in oilseed rape (Brassica napus). TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:3123-3145. [PMID: 34104999 PMCID: PMC8440254 DOI: 10.1007/s00122-021-03877-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 05/29/2021] [Indexed: 05/04/2023]
Abstract
KEY MESSAGE Quantitative resistance (QR) loci discovered through genetic and genomic analyses are abundant in the Brassica napus genome, providing an opportunity for their utilization in enhancing blackleg resistance. Quantitative resistance (QR) has long been utilized to manage blackleg in Brassica napus (canola, oilseed rape), even before major resistance genes (R-genes) were extensively explored in breeding programmes. In contrast to R-gene-mediated qualitative resistance, QR reduces blackleg symptoms rather than completely eliminating the disease. As a polygenic trait, QR is controlled by numerous genes with modest effects, which exerts less pressure on the pathogen to evolve; hence, its effectiveness is more durable compared to R-gene-mediated resistance. Furthermore, combining QR with major R-genes has been shown to enhance resistance against diseases in important crops, including oilseed rape. For these reasons, there has been a renewed interest among breeders in utilizing QR in crop improvement. However, the mechanisms governing QR are largely unknown, limiting its deployment. Advances in genomics are facilitating the dissection of the genetic and molecular underpinnings of QR, resulting in the discovery of several loci and genes that can be potentially deployed to enhance blackleg resistance. Here, we summarize the efforts undertaken to identify blackleg QR loci in oilseed rape using linkage and association analysis. We update the knowledge on the possible mechanisms governing QR and the advances in searching for the underlying genes. Lastly, we lay out strategies to accelerate the genetic improvement of blackleg QR in oilseed rape using improved phenotyping approaches and genomic prediction tools.
Collapse
Affiliation(s)
- Junrey Amas
- School of Biological Sciences and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6001 Australia
| | - Robyn Anderson
- School of Biological Sciences and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6001 Australia
| | - David Edwards
- School of Biological Sciences and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6001 Australia
| | - Wallace Cowling
- School of Agriculture and Environment and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6009 Australia
| | - Jacqueline Batley
- School of Biological Sciences and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6001 Australia
| |
Collapse
|
44
|
Montesinos-López A, Runcie DE, Ibba MI, Pérez-Rodríguez P, Montesinos-López OA, Crespo LA, Bentley AR, Crossa J. Multi-trait genomic-enabled prediction enhances accuracy in multi-year wheat breeding trials. G3-GENES GENOMES GENETICS 2021; 11:6332007. [PMID: 34568924 PMCID: PMC8496321 DOI: 10.1093/g3journal/jkab270] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 07/25/2021] [Indexed: 11/14/2022]
Abstract
Implementing genomic-based prediction models in genomic selection requires an understanding of the measures for evaluating prediction accuracy from different models and methods using multi-trait data. In this study, we compared prediction accuracy using six large multi-trait wheat data sets (quality and grain yield). The data were used to predict 1 year (testing) from the previous year (training) to assess prediction accuracy using four different prediction models. The results indicated that the conventional Pearson’s correlation between observed and predicted values underestimated the true correlation value, whereas the corrected Pearson’s correlation calculated by fitting a bivariate model was higher than the division of the Pearson’s correlation by the squared root of the heritability across traits, by 2.53–11.46%. Across the datasets, the corrected Pearson’s correlation was higher than the uncorrected by 5.80–14.01%. Overall, we found that for grain yield the prediction performance was highest using a multi-trait compared to a single-trait model. The higher the absolute genetic correlation between traits the greater the benefits of multi-trait models for increasing the genomic-enabled prediction accuracy of traits.
Collapse
Affiliation(s)
- Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico
| | - Daniel E Runcie
- Department of Plant Sciences, College of Agricultural & Environmental Sciences, University of California Davis, Davis CA 95616, USA
| | - Maria Itria Ibba
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, México
| | | | | | - Leonardo A Crespo
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, México
| | - Alison R Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, México
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, México.,Colegio de Postgraduados (COLPOS), Montecillos, Edo. de México, México
| |
Collapse
|
45
|
Sandhu KS, Aoun M, Morris CF, Carter AH. Genomic Selection for End-Use Quality and Processing Traits in Soft White Winter Wheat Breeding Program with Machine and Deep Learning Models. BIOLOGY 2021; 10:689. [PMID: 34356544 PMCID: PMC8301459 DOI: 10.3390/biology10070689] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 07/13/2021] [Accepted: 07/17/2021] [Indexed: 01/12/2023]
Abstract
Breeding for grain yield, biotic and abiotic stress resistance, and end-use quality are important goals of wheat breeding programs. Screening for end-use quality traits is usually secondary to grain yield due to high labor needs, cost of testing, and large seed requirements for phenotyping. Genomic selection provides an alternative to predict performance using genome-wide markers under forward and across location predictions, where a previous year's dataset can be used to build the models. Due to large datasets in breeding programs, we explored the potential of the machine and deep learning models to predict fourteen end-use quality traits in a winter wheat breeding program. The population used consisted of 666 wheat genotypes screened for five years (2015-19) at two locations (Pullman and Lind, WA, USA). Nine different models, including two machine learning (random forest and support vector machine) and two deep learning models (convolutional neural network and multilayer perceptron) were explored for cross-validation, forward, and across locations predictions. The prediction accuracies for different traits varied from 0.45-0.81, 0.29-0.55, and 0.27-0.50 under cross-validation, forward, and across location predictions. In general, forward prediction accuracies kept increasing over time due to increments in training data size and was more evident for machine and deep learning models. Deep learning models were superior over the traditional ridge regression best linear unbiased prediction (RRBLUP) and Bayesian models under all prediction scenarios. The high accuracy observed for end-use quality traits in this study support predicting them in early generations, leading to the advancement of superior genotypes to more extensive grain yield trails. Furthermore, the superior performance of machine and deep learning models strengthens the idea to include them in large scale breeding programs for predicting complex traits.
Collapse
Affiliation(s)
- Karansher Singh Sandhu
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA; (K.S.S.); (M.A.)
| | - Meriem Aoun
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA; (K.S.S.); (M.A.)
| | - Craig F. Morris
- USDA-ARS Western Wheat Quality Laboratory, E-202 Food Quality Building, Washington State University, Pullman, WA 99164, USA;
| | - Arron H. Carter
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA; (K.S.S.); (M.A.)
| |
Collapse
|
46
|
Reynolds MP, Lewis JM, Ammar K, Basnet BR, Crespo-Herrera L, Crossa J, Dhugga KS, Dreisigacker S, Juliana P, Karwat H, Kishii M, Krause MR, Langridge P, Lashkari A, Mondal S, Payne T, Pequeno D, Pinto F, Sansaloni C, Schulthess U, Singh RP, Sonder K, Sukumaran S, Xiong W, Braun HJ. Harnessing translational research in wheat for climate resilience. JOURNAL OF EXPERIMENTAL BOTANY 2021; 72:5134-5157. [PMID: 34139769 PMCID: PMC8272565 DOI: 10.1093/jxb/erab256] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 06/14/2021] [Indexed: 05/24/2023]
Abstract
Despite being the world's most widely grown crop, research investments in wheat (Triticum aestivum and Triticum durum) fall behind those in other staple crops. Current yield gains will not meet 2050 needs, and climate stresses compound this challenge. However, there is good evidence that heat and drought resilience can be boosted through translating promising ideas into novel breeding technologies using powerful new tools in genetics and remote sensing, for example. Such technologies can also be applied to identify climate resilience traits from among the vast and largely untapped reserve of wheat genetic resources in collections worldwide. This review describes multi-pronged research opportunities at the focus of the Heat and Drought Wheat Improvement Consortium (coordinated by CIMMYT), which together create a pipeline to boost heat and drought resilience, specifically: improving crop design targets using big data approaches; developing phenomic tools for field-based screening and research; applying genomic technologies to elucidate the bases of climate resilience traits; and applying these outputs in developing next-generation breeding methods. The global impact of these outputs will be validated through the International Wheat Improvement Network, a global germplasm development and testing system that contributes key productivity traits to approximately half of the global wheat-growing area.
Collapse
Affiliation(s)
- Matthew P Reynolds
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Janet M Lewis
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Karim Ammar
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Bhoja R Basnet
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Kanwarpal S Dhugga
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - Philomin Juliana
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Hannes Karwat
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Masahiro Kishii
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Margaret R Krause
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Peter Langridge
- School of Agriculture, Food and Wine, University of Adelaide, Waite Campus, PMB1, Glen Osmond SA 5064, Australia
- Wheat Initiative, Julius Kühn-Institute, Königin-Luise-Str. 19, 14195 Berlin, Germany
| | - Azam Lashkari
- CIMMYT-Henan Collaborative Innovation Center, Henan Agricultural University, Zhengzhou, 450002, PR China
| | - Suchismita Mondal
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Thomas Payne
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Diego Pequeno
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Francisco Pinto
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Carolina Sansaloni
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Urs Schulthess
- CIMMYT-Henan Collaborative Innovation Center, Henan Agricultural University, Zhengzhou, 450002, PR China
| | - Ravi P Singh
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Kai Sonder
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - Wei Xiong
- CIMMYT-Henan Collaborative Innovation Center, Henan Agricultural University, Zhengzhou, 450002, PR China
| | - Hans J Braun
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| |
Collapse
|
47
|
Cortés AJ, López-Hernández F. Harnessing Crop Wild Diversity for Climate Change Adaptation. Genes (Basel) 2021; 12:783. [PMID: 34065368 PMCID: PMC8161384 DOI: 10.3390/genes12050783] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 04/28/2021] [Accepted: 05/19/2021] [Indexed: 12/20/2022] Open
Abstract
Warming and drought are reducing global crop production with a potential to substantially worsen global malnutrition. As with the green revolution in the last century, plant genetics may offer concrete opportunities to increase yield and crop adaptability. However, the rate at which the threat is happening requires powering new strategies in order to meet the global food demand. In this review, we highlight major recent 'big data' developments from both empirical and theoretical genomics that may speed up the identification, conservation, and breeding of exotic and elite crop varieties with the potential to feed humans. We first emphasize the major bottlenecks to capture and utilize novel sources of variation in abiotic stress (i.e., heat and drought) tolerance. We argue that adaptation of crop wild relatives to dry environments could be informative on how plant phenotypes may react to a drier climate because natural selection has already tested more options than humans ever will. Because isolated pockets of cryptic diversity may still persist in remote semi-arid regions, we encourage new habitat-based population-guided collections for genebanks. We continue discussing how to systematically study abiotic stress tolerance in these crop collections of wild and landraces using geo-referencing and extensive environmental data. By uncovering the genes that underlie the tolerance adaptive trait, natural variation has the potential to be introgressed into elite cultivars. However, unlocking adaptive genetic variation hidden in related wild species and early landraces remains a major challenge for complex traits that, as abiotic stress tolerance, are polygenic (i.e., regulated by many low-effect genes). Therefore, we finish prospecting modern analytical approaches that will serve to overcome this issue. Concretely, genomic prediction, machine learning, and multi-trait gene editing, all offer innovative alternatives to speed up more accurate pre- and breeding efforts toward the increase in crop adaptability and yield, while matching future global food demands in the face of increased heat and drought. In order for these 'big data' approaches to succeed, we advocate for a trans-disciplinary approach with open-source data and long-term funding. The recent developments and perspectives discussed throughout this review ultimately aim to contribute to increased crop adaptability and yield in the face of heat waves and drought events.
Collapse
Affiliation(s)
- Andrés J. Cortés
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Km 7 Vía Rionegro, Las Palmas, Rionegro 054048, Colombia;
- Departamento de Ciencias Forestales, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Sede Medellín, Medellín 050034, Colombia
| | - Felipe López-Hernández
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Km 7 Vía Rionegro, Las Palmas, Rionegro 054048, Colombia;
| |
Collapse
|
48
|
Abstract
Technological developments have revolutionized measurements on plant genotypes and phenotypes, leading to routine production of large, complex data sets. This has led to increased efforts to extract meaning from these measurements and to integrate various data sets. Concurrently, machine learning has rapidly evolved and is now widely applied in science in general and in plant genotyping and phenotyping in particular. Here, we review the application of machine learning in the context of plant science and plant breeding. We focus on analyses at different phenotype levels, from biochemical to yield, and in connecting genotypes to these. In this way, we illustrate how machine learning offers a suite of methods that enable researchers to find meaningful patterns in relevant plant data.
Collapse
Affiliation(s)
- Aalt Dirk Jan van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
- Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| | - Gert Kootstra
- Farm Technology, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| | - Willem Kruijer
- Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| |
Collapse
|
49
|
Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, Barrón-López JA, Martini JWR, Fajardo-Flores SB, Gaytan-Lugo LS, Santana-Mancilla PC, Crossa J. A review of deep learning applications for genomic selection. BMC Genomics 2021; 22:19. [PMID: 33407114 PMCID: PMC7789712 DOI: 10.1186/s12864-020-07319-x] [Citation(s) in RCA: 89] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 12/10/2020] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Several conventional genomic Bayesian (or no Bayesian) prediction methods have been proposed including the standard additive genetic effect model for which the variance components are estimated with mixed model equations. In recent years, deep learning (DL) methods have been considered in the context of genomic prediction. The DL methods are nonparametric models providing flexibility to adapt to complicated associations between data and output with the ability to adapt to very complex patterns. MAIN BODY We review the applications of deep learning (DL) methods in genomic selection (GS) to obtain a meta-picture of GS performance and highlight how these tools can help solve challenging plant breeding problems. We also provide general guidance for the effective use of DL methods including the fundamentals of DL and the requirements for its appropriate use. We discuss the pros and cons of this technique compared to traditional genomic prediction approaches as well as the current trends in DL applications. CONCLUSIONS The main requirement for using DL is the quality and sufficiently large training data. Although, based on current literature GS in plant and animal breeding we did not find clear superiority of DL in terms of prediction power compared to conventional genome based prediction models. Nevertheless, there are clear evidences that DL algorithms capture nonlinear patterns more efficiently than conventional genome based. Deep learning algorithms are able to integrate data from different sources as is usually needed in GS assisted breeding and it shows the ability for improving prediction accuracy for large plant breeding data. It is important to apply DL to large training-testing data sets.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, Mexico.
| | | | - José Alberto Barrón-López
- Department of Animal Production (DPA), Universidad Nacional Agraria La Molina, Av. La Molina s/n La Molina, 15024, Lima, Peru
| | - Johannes W R Martini
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45, CP 52640, Carretera Mexico-Veracruz, Mexico
| | | | - Laura S Gaytan-Lugo
- School of Mechanical and Electrical Engineering, Universidad de Colima, 28040, Colima, Colima, Mexico
| | | | - José Crossa
- Colegio de Postgraduados, CP 56230, Montecillos, Edo. de México, Mexico.
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45, CP 52640, Carretera Mexico-Veracruz, Mexico.
| |
Collapse
|
50
|
Sandhu KS, Lozada DN, Zhang Z, Pumphrey MO, Carter AH. Deep Learning for Predicting Complex Traits in Spring Wheat Breeding Program. FRONTIERS IN PLANT SCIENCE 2021; 11:613325. [PMID: 33469463 PMCID: PMC7813801 DOI: 10.3389/fpls.2020.613325] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 11/30/2020] [Indexed: 05/12/2023]
Abstract
Genomic selection (GS) is transforming the field of plant breeding and implementing models that improve prediction accuracy for complex traits is needed. Analytical methods for complex datasets traditionally used in other disciplines represent an opportunity for improving prediction accuracy in GS. Deep learning (DL) is a branch of machine learning (ML) which focuses on densely connected networks using artificial neural networks for training the models. The objective of this research was to evaluate the potential of DL models in the Washington State University spring wheat breeding program. We compared the performance of two DL algorithms, namely multilayer perceptron (MLP) and convolutional neural network (CNN), with ridge regression best linear unbiased predictor (rrBLUP), a commonly used GS model. The dataset consisted of 650 recombinant inbred lines (RILs) from a spring wheat nested association mapping (NAM) population planted from 2014-2016 growing seasons. We predicted five different quantitative traits with varying genetic architecture using cross-validations (CVs), independent validations, and different sets of SNP markers. Hyperparameters were optimized for DL models by lowering the root mean square in the training set, avoiding model overfitting using dropout and regularization. DL models gave 0 to 5% higher prediction accuracy than rrBLUP model under both cross and independent validations for all five traits used in this study. Furthermore, MLP produces 5% higher prediction accuracy than CNN for grain yield and grain protein content. Altogether, DL approaches obtained better prediction accuracy for each trait, and should be incorporated into a plant breeder's toolkit for use in large scale breeding programs.
Collapse
Affiliation(s)
- Karansher S. Sandhu
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Dennis N. Lozada
- Department of Plant and Environmental Sciences, New Mexico State University, Las Cruces, NM, United States
| | - Zhiwu Zhang
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Michael O. Pumphrey
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Arron H. Carter
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| |
Collapse
|