1
|
Bose S, Banerjee S, Kumar S, Saha A, Nandy D, Hazra S. Review of applications of artificial intelligence (AI) methods in crop research. J Appl Genet 2024; 65:225-240. [PMID: 38216788 DOI: 10.1007/s13353-023-00826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 12/23/2023] [Accepted: 12/26/2023] [Indexed: 01/14/2024]
Abstract
Sophisticated and modern crop improvement techniques can bridge the gap for feeding the ever-increasing population. Artificial intelligence (AI) refers to the simulation of human intelligence in machines, which refers to the application of computational algorithms, machine learning (ML) and deep learning (DL) techniques. This is aimed to generalise patterns and relationships from historical data, employing various mathematical optimisation techniques thus making prediction models for facilitating selection of superior genotypes. These techniques are less resource intensive and can solve the problem based on the analysis of large-scale phenotypic datasets. ML for genomic selection (GS) uses high-throughput genotyping technologies to gather genetic information on a large number of markers across the genome. The prediction of GS models is based on the mathematical relation between genotypic and phenotypic data from the training population. ML techniques have emerged as powerful tools for genome editing through analysing large-scale genomic data and facilitating the development of accurate prediction models. Precise phenotyping is a prerequisite to advance crop breeding for solving agricultural production-related issues. ML algorithms can solve this problem through generating predictive models, based on the analysis of large-scale phenotypic datasets. DL models also have the potential reliability of precise phenotyping. This review provides a comprehensive overview on various ML and DL models, their applications, potential to enhance the efficiency, specificity and safety towards advanced crop improvement protocols such as genomic selection, genome editing, along with phenotypic prediction to promote accelerated breeding.
Collapse
Affiliation(s)
- Suvojit Bose
- Department of Vegetables and Spice Crops, Uttar Banga Krishi Viswavidyalaya, Pundibari, Cooch Behar, 736165, West Bengal, India
| | | | - Soumya Kumar
- School of Agricultural Sciences, JIS University, Kolkata, 700109, West Bengal, India
| | - Akash Saha
- School of Agricultural Sciences, JIS University, Kolkata, 700109, West Bengal, India
| | - Debalina Nandy
- School of Agricultural Sciences, JIS University, Kolkata, 700109, West Bengal, India
| | - Soham Hazra
- Department of Agriculture, Brainware University, Barasat, 700125, West Bengal, India.
| |
Collapse
|
2
|
Hong JK, Kim YM, Cho ES, Lee JB, Kim YS, Park HB. Application of deep learning with bivariate models for genomic prediction of sow lifetime productivity-related traits. Anim Biosci 2024; 37:622-630. [PMID: 38228129 PMCID: PMC10915216 DOI: 10.5713/ab.23.0264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 08/31/2023] [Accepted: 11/03/2023] [Indexed: 01/18/2024] Open
Abstract
OBJECTIVE Pig breeders cannot obtain phenotypic information at the time of selection for sow lifetime productivity (SLP). They would benefit from obtaining genetic information of candidate sows. Genomic data interpreted using deep learning (DL) techniques could contribute to the genetic improvement of SLP to maximize farm profitability because DL models capture nonlinear genetic effects such as dominance and epistasis more efficiently than conventional genomic prediction methods based on linear models. This study aimed to investigate the usefulness of DL for the genomic prediction of two SLP-related traits; lifetime number of litters (LNL) and lifetime pig production (LPP). METHODS Two bivariate DL models, convolutional neural network (CNN) and local convolutional neural network (LCNN), were compared with conventional bivariate linear models (i.e., genomic best linear unbiased prediction, Bayesian ridge regression, Bayes A, and Bayes B). Phenotype and pedigree data were collected from 40,011 sows that had husbandry records. Among these, 3,652 pigs were genotyped using the PorcineSNP60K BeadChip. RESULTS The best predictive correlation for LNL was obtained with CNN (0.28), followed by LCNN (0.26) and conventional linear models (approximately 0.21). For LPP, the best predictive correlation was also obtained with CNN (0.29), followed by LCNN (0.27) and conventional linear models (approximately 0.25). A similar trend was observed with the mean squared error of prediction for the SLP traits. CONCLUSION This study provides an example of a CNN that can outperform against the linear model-based genomic prediction approaches when the nonlinear interaction components are important because LNL and LPP exhibited strong epistatic interaction components. Additionally, our results suggest that applying bivariate DL models could also contribute to the prediction accuracy by utilizing the genetic correlation between LNL and LPP.
Collapse
Affiliation(s)
- Joon-Ki Hong
- Swine Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000,
Korea
| | - Yong-Min Kim
- Swine Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000,
Korea
| | - Eun-Seok Cho
- Swine Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000,
Korea
| | - Jae-Bong Lee
- Korea Zoonosis Research Institute, Jeonbuk National University, Iksan 54531,
Korea
| | - Young-Sin Kim
- Swine Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000,
Korea
| | - Hee-Bok Park
- Department of Animal Resources Science, Kongju National University, Yesan 32439,
Korea
- Resource Science Research Institute, Kongju National University, Yesan 32439,
Korea
| |
Collapse
|
3
|
Robles-Zazueta CA, Crespo-Herrera LA, Piñera-Chavez FJ, Rivera-Amado C, Aradottir GI. Climate change impacts on crop breeding: Targeting interacting biotic and abiotic stresses for wheat improvement. THE PLANT GENOME 2024; 17:e20365. [PMID: 37415292 DOI: 10.1002/tpg2.20365] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 05/23/2023] [Accepted: 05/30/2023] [Indexed: 07/08/2023]
Abstract
Wheat (Triticum aestivum L.) as a staple crop is closely interwoven into the development of modern society. Its influence on culture and economic development is global. Recent instability in wheat markets has demonstrated its importance in guaranteeing food security across national borders. Climate change threatens food security as it interacts with a multitude of factors impacting wheat production. The challenge needs to be addressed with a multidisciplinary perspective delivered across research, private, and government sectors. Many experimental studies have identified the major biotic and abiotic stresses impacting wheat production, but fewer have addressed the combinations of stresses that occur simultaneously or sequentially during the wheat growth cycle. Here, we argue that biotic and abiotic stress interactions, and the genetics and genomics underlying them, have been insufficiently addressed by the crop science community. We propose this as a reason for the limited transfer of practical and feasible climate adaptation knowledge from research projects into routine farming practice. To address this gap, we propose that novel methodology integration can align large volumes of data available from crop breeding programs with increasingly cheaper omics tools to predict wheat performance under different climate change scenarios. Underlying this is our proposal that breeders design and deliver future wheat ideotypes based on new or enhanced understanding of the genetic and physiological processes that are triggered when wheat is subjected to combinations of stresses. By defining this to a trait and/or genetic level, new insights can be made for yield improvement under future climate conditions.
Collapse
Affiliation(s)
- Carlos A Robles-Zazueta
- Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, México
| | | | | | - Carolina Rivera-Amado
- Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, México
| | | |
Collapse
|
4
|
Lourenço VM, Ogutu JO, Rodrigues RAP, Posekany A, Piepho HP. Genomic prediction using machine learning: a comparison of the performance of regularized regression, ensemble, instance-based and deep learning methods on synthetic and empirical data. BMC Genomics 2024; 25:152. [PMID: 38326768 PMCID: PMC10848392 DOI: 10.1186/s12864-023-09933-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 12/20/2023] [Indexed: 02/09/2024] Open
Abstract
BACKGROUND The accurate prediction of genomic breeding values is central to genomic selection in both plant and animal breeding studies. Genomic prediction involves the use of thousands of molecular markers spanning the entire genome and therefore requires methods able to efficiently handle high dimensional data. Not surprisingly, machine learning methods are becoming widely advocated for and used in genomic prediction studies. These methods encompass different groups of supervised and unsupervised learning methods. Although several studies have compared the predictive performances of individual methods, studies comparing the predictive performance of different groups of methods are rare. However, such studies are crucial for identifying (i) groups of methods with superior genomic predictive performance and assessing (ii) the merits and demerits of such groups of methods relative to each other and to the established classical methods. Here, we comparatively evaluate the genomic predictive performance and informally assess the computational cost of several groups of supervised machine learning methods, specifically, regularized regression methods, deep, ensemble and instance-based learning algorithms, using one simulated animal breeding dataset and three empirical maize breeding datasets obtained from a commercial breeding program. RESULTS Our results show that the relative predictive performance and computational expense of the groups of machine learning methods depend upon both the data and target traits and that for classical regularized methods, increasing model complexity can incur huge computational costs but does not necessarily always improve predictive accuracy. Thus, despite their greater complexity and computational burden, neither the adaptive nor the group regularized methods clearly improved upon the results of their simple regularized counterparts. This rules out selection of one procedure among machine learning methods for routine use in genomic prediction. The results also show that, because of their competitive predictive performance, computational efficiency, simplicity and therefore relatively few tuning parameters, the classical linear mixed model and regularized regression methods are likely to remain strong contenders for genomic prediction. CONCLUSIONS The dependence of predictive performance and computational burden on target datasets and traits call for increasing investments in enhancing the computational efficiency of machine learning algorithms and computing resources.
Collapse
Affiliation(s)
- Vanda M Lourenço
- Center for Mathematics and Applications (NOVA Math) and Department of Mathematics, NOVA SST, 2829-516, Caparica, Portugal.
| | - Joseph O Ogutu
- Institute of Crop Science, Biostatistics Unit, University of Hohenheim, Fruwirthstrasse 23, 70599, Stuttgart, Germany.
| | - Rui A P Rodrigues
- Center for Mathematics and Applications (NOVA Math) and Department of Mathematics, NOVA SST, 2829-516, Caparica, Portugal
| | - Alexandra Posekany
- Research Unit of Computational Statistics, Vienna University of Technology, Wiedner Hauptstr. 8-10, 1040, Vienna, Austria
| | - Hans-Peter Piepho
- Institute of Crop Science, Biostatistics Unit, University of Hohenheim, Fruwirthstrasse 23, 70599, Stuttgart, Germany
| |
Collapse
|
5
|
Montesinos-López A, Rivera C, Pinto F, Piñera F, Gonzalez D, Reynolds M, Pérez-Rodríguez P, Li H, Montesinos-López OA, Crossa J. Multimodal deep learning methods enhance genomic prediction of wheat breeding. G3 (BETHESDA, MD.) 2023; 13:jkad045. [PMID: 36869747 PMCID: PMC10151399 DOI: 10.1093/g3journal/jkad045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 02/21/2023] [Accepted: 02/22/2023] [Indexed: 03/05/2023]
Abstract
While several statistical machine learning methods have been developed and studied for assessing the genomic prediction (GP) accuracy of unobserved phenotypes in plant breeding research, few methods have linked genomics and phenomics (imaging). Deep learning (DL) neural networks have been developed to increase the GP accuracy of unobserved phenotypes while simultaneously accounting for the complexity of genotype-environment interaction (GE); however, unlike conventional GP models, DL has not been investigated for when genomics is linked with phenomics. In this study we used 2 wheat data sets (DS1 and DS2) to compare a novel DL method with conventional GP models. Models fitted for DS1 were GBLUP, gradient boosting machine (GBM), support vector regression (SVR) and the DL method. Results indicated that for 1 year, DL provided better GP accuracy than results obtained by the other models. However, GP accuracy obtained for other years indicated that the GBLUP model was slightly superior to the DL. DS2 is comprised only of genomic data from wheat lines tested for 3 years, 2 environments (drought and irrigated) and 2-4 traits. DS2 results showed that when predicting the irrigated environment with the drought environment, DL had higher accuracy than the GBLUP model in all analyzed traits and years. When predicting drought environment with information on the irrigated environment, the DL model and GBLUP model had similar accuracy. The DL method used in this study is novel and presents a strong degree of generalization as several modules can potentially be incorporated and concatenated to produce an output for a multi-input data structure.
Collapse
Affiliation(s)
- Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, Mexico
| | - Carolina Rivera
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Francisco Pinto
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Francisco Piñera
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - David Gonzalez
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Mathew Reynolds
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | | | - Huihui Li
- Institute of Crop Sciences, The National Key Facility for Crop Gene Resources and Genetic Improvement and CIMMYT China office, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | | | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
- Colegio de Postgraduados, Montecillos, Edo. de México, CP 56230, Mexico
| |
Collapse
|
6
|
Jubair S, Domaratzki M. Crop genomic selection with deep learning and environmental data: A survey. Front Artif Intell 2023; 5:1040295. [PMID: 36703955 PMCID: PMC9871498 DOI: 10.3389/frai.2022.1040295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/22/2022] [Indexed: 01/12/2023] Open
Abstract
Machine learning techniques for crop genomic selections, especially for single-environment plants, are well-developed. These machine learning models, which use dense genome-wide markers to predict phenotype, routinely perform well on single-environment datasets, especially for complex traits affected by multiple markers. On the other hand, machine learning models for predicting crop phenotype, especially deep learning models, using datasets that span different environmental conditions, have only recently emerged. Models that can accept heterogeneous data sources, such as temperature, soil conditions and precipitation, are natural choices for modeling GxE in multi-environment prediction. Here, we review emerging deep learning techniques that incorporate environmental data directly into genomic selection models.
Collapse
Affiliation(s)
- Sheikh Jubair
- Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada
| | - Mike Domaratzki
- Department of Computer Science, University of Western Ontario, London, ON, Canada
| |
Collapse
|
7
|
Cuevas J, Reslow F, Crossa J, Ortiz R. Modeling genotype × environment interaction for single and multitrait genomic prediction in potato (Solanum tuberosum L.). G3 (BETHESDA, MD.) 2022; 13:6883526. [PMID: 36477309 PMCID: PMC9911059 DOI: 10.1093/g3journal/jkac322] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 11/01/2022] [Accepted: 11/28/2022] [Indexed: 12/13/2022]
Abstract
In this study, we extend research on genomic prediction (GP) to polysomic polyploid plant species with the main objective to investigate single-trait (ST) and multitrait (MT) multienvironment (ME) models using field trial data from 3 locations in Sweden [Helgegården (HEL), Mosslunda (MOS), Umeå (UM)] over 2 years (2020, 2021) of 253 potato cultivars and breeding clones for 5 tuber weight traits and 2 tuber flesh quality characteristics. This research investigated the GP of 4 genome-based prediction models with genotype × environment interactions (GEs): (1) ST reaction norm model (M1), (2) ST model considering covariances between environments (M2), (3) ST M2 extended to include a random vector that utilizes the environmental covariances (M3), and (4) MT model with GE (M4). Several prediction problems were analyzed for each of the GP accuracy of the 4 models. Results of the prediction of traits in HEL, the high yield potential testing site in 2021, show that the best-predicted traits were tuber flesh starch (%), weight of tuber above 60 or below 40 mm in size, and the total tuber weight. In terms of GP, accuracy model M4 gave the best prediction accuracy in 3 traits, namely tuber weight of 40-50 or above 60 mm in size, and total tuber weight, and very similar in the starch trait. For MOS in 2021, the best predictive traits were starch, weight of tubers above 60, 50-60, or below 40 mm in size, and the total tuber weight. MT model M4 was the best GP model based on its accuracy when some cultivars are observed in some traits. For the GP accuracy of traits in UM in 2021, the best predictive traits were the weight of tubers above 60, 50-60, or below 40 mm in size, and the best model was MT M4, followed by models ST M3 and M2.
Collapse
Affiliation(s)
- Jaime Cuevas
- Departamento de Energía, Universidad Autónoma del Estado de Quintana Roo, Chetumal, Quintana Roo 77019, México
| | - Fredrik Reslow
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), P.O. Box 190, Lomma SE 23436, Sweden
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz Km. 45, El Batán, Texcoco 56237, Edo. de Mexico, Mexico,Colegio de Postgraduados, Montecillos, Edo. de México 56230, México
| | - Rodomiro Ortiz
- Corresponding author: Sveriges Lantbruksuniversitet, Inst. för Växtförädling, Box 190, SE 23 422 Lomma, Sweden.
| |
Collapse
|
8
|
Zandberg JD, Fernandez CT, Danilevicz MF, Thomas WJW, Edwards D, Batley J. The Global Assessment of Oilseed Brassica Crop Species Yield, Yield Stability and the Underlying Genetics. PLANTS (BASEL, SWITZERLAND) 2022; 11:2740. [PMID: 36297764 PMCID: PMC9610009 DOI: 10.3390/plants11202740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 10/08/2022] [Accepted: 10/09/2022] [Indexed: 06/16/2023]
Abstract
The global demand for oilseeds is increasing along with the human population. The family of Brassicaceae crops are no exception, typically harvested as a valuable source of oil, rich in beneficial molecules important for human health. The global capacity for improving Brassica yield has steadily risen over the last 50 years, with the major crop Brassica napus (rapeseed, canola) production increasing to ~72 Gt in 2020. In contrast, the production of Brassica mustard crops has fluctuated, rarely improving in farming efficiency. The drastic increase in global yield of B. napus is largely due to the demand for a stable source of cooking oil. Furthermore, with the adoption of highly efficient farming techniques, yield enhancement programs, breeding programs, the integration of high-throughput phenotyping technology and establishing the underlying genetics, B. napus yields have increased by >450 fold since 1978. Yield stability has been improved with new management strategies targeting diseases and pests, as well as by understanding the complex interaction of environment, phenotype and genotype. This review assesses the global yield and yield stability of agriculturally important oilseed Brassica species and discusses how contemporary farming and genetic techniques have driven improvements.
Collapse
Affiliation(s)
- Jaco D. Zandberg
- School of Biological Sciences, University of Western Australia, Perth, WA 6009, Australia
| | | | - Monica F. Danilevicz
- School of Biological Sciences, University of Western Australia, Perth, WA 6009, Australia
| | - William J. W. Thomas
- School of Biological Sciences, University of Western Australia, Perth, WA 6009, Australia
| | - David Edwards
- Center for Applied Bioinformatics, School of Biological Sciences, University of Western Australia, Perth, WA 6009, Australia
| | - Jacqueline Batley
- School of Biological Sciences, University of Western Australia, Perth, WA 6009, Australia
| |
Collapse
|
9
|
Pham H, Reisner J, Swift A, Olafsson S, Vardeman S. Crop phenotype prediction using biclustering to explain genotype-by-environment interactions. FRONTIERS IN PLANT SCIENCE 2022; 13:975976. [PMID: 36204056 PMCID: PMC9530907 DOI: 10.3389/fpls.2022.975976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 08/25/2022] [Indexed: 06/16/2023]
Abstract
Phenotypic variation in plants is attributed to genotype (G), environment (E), and genotype-by-environment interaction (GEI). Although the main effects of G and E are typically larger and easier to model, the GEI interaction effects are important and a critical factor when considering such issues as to why some genotypes perform consistently well across a range of environments. In plant breeding, a major challenge is limited information, including a single genotype is tested in only a small subset of all possible test environments. The two-way table of phenotype responses will therefore commonly contain missing data. In this paper, we propose a new model of GEI effects that only requires an input of a two-way table of phenotype observations, with genotypes as rows and environments as columns that do not assume the completeness of data. Our analysis can deal with this scenario as it utilizes a novel biclustering algorithm that can handle missing values, resulting in an output of homogeneous cells with no interactions between G and E. In other words, we identify subsets of genotypes and environments where phenotype can be modeled simply. Based on this, we fit no-interaction models to predict phenotypes of a given crop and draw insights into how a particular cultivar will perform in the unused test environments. Our new methodology is validated on data from different plant species and phenotypes and shows superior performance compared to well-studied statistical approaches.
Collapse
Affiliation(s)
- Hieu Pham
- Department of Information Systems, Supply Chain, and Analytics, College of Business, The University of Alabama in Huntsville, Huntsville, AL, United States
| | - John Reisner
- Department of Statistics, Iowa State University, Ames, IA, United States
| | - Ashley Swift
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, United States
| | - Sigurdur Olafsson
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, United States
| | - Stephen Vardeman
- Department of Statistics, Iowa State University, Ames, IA, United States
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, United States
| |
Collapse
|
10
|
Zhang Q, Zhang Q, Jensen J. Association Studies and Genomic Prediction for Genetic Improvements in Agriculture. FRONTIERS IN PLANT SCIENCE 2022; 13:904230. [PMID: 35720549 PMCID: PMC9201771 DOI: 10.3389/fpls.2022.904230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 05/16/2022] [Indexed: 06/15/2023]
Abstract
To feed the fast growing global population with sufficient food using limited global resources, it is urgent to develop and utilize cutting-edge technologies and improve efficiency of agricultural production. In this review, we specifically introduce the concepts, theories, methods, applications and future implications of association studies and predicting unknown genetic value or future phenotypic events using genomics in the area of breeding in agriculture. Genome wide association studies can identify the quantitative genetic loci associated with phenotypes of importance in agriculture, while genomic prediction utilizes individual genetic value to rank selection candidates to improve the next generation of plants or animals. These technologies and methods have improved the efficiency of genetic improvement programs for agricultural production via elite animal breeds and plant varieties. With the development of new data acquisition technologies, there will be more and more data collected from high-through-put technologies to assist agricultural breeding. It will be crucial to extract useful information among these large amounts of data and to face this challenge, more efficient algorithms need to be developed and utilized for analyzing these data. Such development will require knowledge from multiple disciplines of research.
Collapse
Affiliation(s)
- Qianqian Zhang
- Institute of Biotechnology, Beijing Academy of Agricultural and Forestry Sciences, Beijing, China
| | - Qin Zhang
- College of Animal Science and Technology, Shandong Agricultural University, Taian, China
- College of Animal Science and Technology, China Agricultural University, BeijingChina
| | - Just Jensen
- Centre for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
11
|
Mathew B, Hauptmann A, Léon J, Sillanpää MJ. NeuralLasso: Neural Networks Meet Lasso in Genomic Prediction. FRONTIERS IN PLANT SCIENCE 2022; 13:800161. [PMID: 35574107 PMCID: PMC9100816 DOI: 10.3389/fpls.2022.800161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 03/18/2022] [Indexed: 06/15/2023]
Abstract
Prediction of complex traits based on genome-wide marker information is of central importance for both animal and plant breeding. Numerous models have been proposed for the prediction of complex traits and still considerable effort has been given to improve the prediction accuracy of these models, because various genetics factors like additive, dominance and epistasis effects can influence of the prediction accuracy of such models. Recently machine learning (ML) methods have been widely applied for prediction in both animal and plant breeding programs. In this study, we propose a new algorithm for genomic prediction which is based on neural networks, but incorporates classical elements of LASSO. Our new method is able to account for the local epistasis (higher order interaction between the neighboring markers) in the prediction. We compare the prediction accuracy of our new method with the most commonly used prediction methods, such as BayesA, BayesB, Bayesian Lasso (BL), genomic BLUP and Elastic Net (EN) using the heterogenous stock mouse and rice field data sets.
Collapse
Affiliation(s)
- Boby Mathew
- Bayer CropScience, Monheim am Rhein, Germany
- Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany
| | - Andreas Hauptmann
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
- Department of Computer Science, University College London, London, United Kingdom
| | - Jens Léon
- Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany
| | - Mikko J. Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
| |
Collapse
|
12
|
Yang J, Guo X, Li Y, Marinello F, Ercisli S, Zhang Z. A survey of few-shot learning in smart agriculture: developments, applications, and challenges. PLANT METHODS 2022; 18:28. [PMID: 35248105 PMCID: PMC8897954 DOI: 10.1186/s13007-022-00866-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 03/01/2022] [Indexed: 05/08/2023]
Abstract
With the rise of artificial intelligence, deep learning is gradually applied to the field of agriculture and plant science. However, the excellent performance of deep learning needs to be established on massive numbers of samples. In the field of plant science and biology, it is not easy to obtain a large amount of labeled data. The emergence of few-shot learning solves this problem. It imitates the ability of humans' rapid learning and can learn a new task with only a small number of labeled samples, which greatly reduces the time cost and financial resources. At present, the advanced few-shot learning methods are mainly divided into four categories based on: data augmentation, metric learning, external memory, and parameter optimization, solving the over-fitting problem from different viewpoints. This review comprehensively expounds on few-shot learning in smart agriculture, introduces the definition of few-shot learning, four kinds of learning methods, the publicly available datasets for few-shot learning, various applications in smart agriculture, and the challenges in smart agriculture in future development.
Collapse
Affiliation(s)
- Jiachen Yang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Xiaolan Guo
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Yang Li
- College of Mechanical and Electrical Engineering, Shihezi University, Xinjiang, China.
| | - Francesco Marinello
- Department of Land Environment Agriculture and Forestry, University of Padova, Legnaro, Italy
| | - Sezai Ercisli
- Department of Horticulture, Faculty of Agriculture, Ataturk University, Erzurum, Turkey
| | - Zhuo Zhang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| |
Collapse
|
13
|
Montesinos-López OA, Montesinos-López JC, Montesinos-López A, Ramírez-Alcaraz JM, Poland J, Singh R, Dreisigacker S, Crespo L, Mondal S, Govidan V, Juliana P, Espino JH, Shrestha S, Varshney RK, Crossa J. Bayesian multitrait kernel methods improve multienvironment genome-based prediction. G3 (BETHESDA, MD.) 2022; 12:6446035. [PMID: 34849802 PMCID: PMC9210316 DOI: 10.1093/g3journal/jkab406] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 11/18/2021] [Indexed: 11/14/2022]
Abstract
When multitrait data are available, the preferred models are those that are able to account for correlations between phenotypic traits because when the degree of correlation is moderate or large, this increases the genomic prediction accuracy. For this reason, in this article, we explore Bayesian multitrait kernel methods for genomic prediction and we illustrate the power of these models with three-real datasets. The kernels under study were the linear, Gaussian, polynomial, and sigmoid kernels; they were compared with the conventional Ridge regression and GBLUP multitrait models. The results show that, in general, the Gaussian kernel method outperformed conventional Bayesian Ridge and GBLUP multitrait linear models by 2.2–17.45% (datasets 1–3) in terms of prediction performance based on the mean square error of prediction. This improvement in terms of prediction performance of the Bayesian multitrait kernel method can be attributed to the fact that the proposed model is able to capture nonlinear patterns more efficiently than linear multitrait models. However, not all kernels perform well in the datasets used for evaluation, which is why more than one kernel should be evaluated to be able to choose the best kernel.
Collapse
Affiliation(s)
| | | | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Guadalajara 44430, Mexico
- Corresponding author: Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco 44430, Mexico. (A.M.-L.); International Maize and Wheat Improvement Center (CIMMYT). Km 45 Carretera Mexico-Veracruz, CP 52640, Texcoco, Edo de Mexico, Mexico. (J.C.)
| | | | - Jesse Poland
- Department of Agronomy, Kansas State University, 2004 Throckmorton Plant Science Center, Manhattan, KS 66506, USA
| | - Ravi Singh
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
| | - Leonardo Crespo
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
| | - Sushismita Mondal
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
| | - Velu Govidan
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
| | - Philomin Juliana
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
| | - Julio Huerta Espino
- Campo Experimental Valle de Mexico, Instituto Nacional de Investigaciones Forestales, Agricolas y Pecuarias (INIFAP), Universidad Autónoma de Chapingo, Texcoco 56235, Mexico
| | - Sandesh Shrestha
- Department of Agronomy, Kansas State University, 2004 Throckmorton Plant Science Center, Manhattan, KS 66506, USA
| | - Rajeev K Varshney
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch 6150, Australia
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, CP 52640, Texoco, Edo. de Mexico, Mexico
- Colegio de Postgraduados, Montecillos, Edo. de México 56230, Mexico
- Corresponding author: Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco 44430, Mexico. (A.M.-L.); International Maize and Wheat Improvement Center (CIMMYT). Km 45 Carretera Mexico-Veracruz, CP 52640, Texcoco, Edo de Mexico, Mexico. (J.C.)
| |
Collapse
|
14
|
Sandhu KS, Merrick LF, Sankaran S, Zhang Z, Carter AH. Prospectus of Genomic Selection and Phenomics in Cereal, Legume and Oilseed Breeding Programs. Front Genet 2022. [PMCID: PMC8814369 DOI: 10.3389/fgene.2021.829131] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The last decade witnessed an unprecedented increase in the adoption of genomic selection (GS) and phenomics tools in plant breeding programs, especially in major cereal crops. GS has demonstrated the potential for selecting superior genotypes with high precision and accelerating the breeding cycle. Phenomics is a rapidly advancing domain to alleviate phenotyping bottlenecks and explores new large-scale phenotyping and data acquisition methods. In this review, we discuss the lesson learned from GS and phenomics in six self-pollinated crops, primarily focusing on rice, wheat, soybean, common bean, chickpea, and groundnut, and their implementation schemes are discussed after assessing their impact in the breeding programs. Here, the status of the adoption of genomics and phenomics is provided for those crops, with a complete GS overview. GS’s progress until 2020 is discussed in detail, and relevant information and links to the source codes are provided for implementing this technology into plant breeding programs, with most of the examples from wheat breeding programs. Detailed information about various phenotyping tools is provided to strengthen the field of phenomics for a plant breeder in the coming years. Finally, we highlight the benefits of merging genomic selection, phenomics, and machine and deep learning that have resulted in extraordinary results during recent years in wheat, rice, and soybean. Hence, there is a potential for adopting these technologies into crops like the common bean, chickpea, and groundnut. The adoption of phenomics and GS into different breeding programs will accelerate genetic gain that would create an impact on food security, realizing the need to feed an ever-growing population.
Collapse
Affiliation(s)
- Karansher S. Sandhu
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
- *Correspondence: Karansher S. Sandhu,
| | - Lance F. Merrick
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Sindhuja Sankaran
- Department of Biological System Engineering, Washington State University, Pullman, WA, United States
| | - Zhiwu Zhang
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Arron H. Carter
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| |
Collapse
|
15
|
Varona L, Legarra A, Toro MA, Vitezica ZG. Genomic Prediction Methods Accounting for Nonadditive Genetic Effects. Methods Mol Biol 2022; 2467:219-243. [PMID: 35451778 DOI: 10.1007/978-1-0716-2205-6_8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The use of genomic information for prediction of future phenotypes or breeding values for the candidates to selection has become a standard over the last decade. However, most procedures for genomic prediction only consider the additive (or substitution) effects associated with polymorphic markers. Nevertheless, the implementation of models that consider nonadditive genetic variation may be interesting because they (1) may increase the ability of prediction, (2) can be used to define mate allocation procedures in plant and animal breeding schemes, and (3) can be used to benefit from nonadditive genetic variation in crossbreeding or purebred breeding schemes. This study reviews the available methods for incorporating nonadditive effects into genomic prediction procedures and their potential applications in predicting future phenotypic performance, mate allocation, and crossbred and purebred selection. Finally, a brief outline of some future research lines is also proposed.
Collapse
Affiliation(s)
- Luis Varona
- Departamento de Anatomía, Embriología y Genética Animal, Universidad de Zaragoza, Zaragoza, Spain.
- Instituto Agroalimentario de Aragón (IA2), Zaragoza, Spain.
| | | | - Miguel A Toro
- Dpto. Producción Agraria, ETS Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, Madrid, Spain
| | | |
Collapse
|
16
|
Montesinos-López OA, Montesinos-López A, Mosqueda-Gonzalez BA, Montesinos-López JC, Crossa J. Accounting for Correlation Between Traits in Genomic Prediction. Methods Mol Biol 2022; 2467:285-327. [PMID: 35451780 DOI: 10.1007/978-1-0716-2205-6_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genomic enabled prediction is playing a key role for the success of genomic selection (GS). However, according to the No Free Lunch Theorem, there is not a universal model that performs well for all data sets. Due to this, many statistical and machine learning models are available for genomic prediction. When multitrait data is available, models that are able to account for correlations between phenotypic traits are preferred, since these models help increase the prediction accuracy when the degree of correlation is moderate to large. For this reason, in this chapter we review multitrait models for genome-enabled prediction and we illustrate the power of this model with real examples. In addition, we provide details of the software (R code) available for its application to help users implement these models with its own data. The multitrait models were implemented under conventional Bayesian Ridge regression and best linear unbiased predictor, but also under a deep learning framework. The multitrait deep learning framework helps implement prediction models with mixed outcomes (continuous, binary, ordinal, and count, measured on different scales), which is not easy in conventional statistical models. The illustrative examples are very detailed in order to make the implementation of multitrait models in plant and animal breeding friendlier for breeders and scientists.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Brandon A Mosqueda-Gonzalez
- Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Esq. Miguel Othón de Mendizábal, Mexico city, Mexico
| | | | - José Crossa
- Colegio de Postgraduados, Montecillos, Mexico.
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Carretera Mexico-Veracruz, Mexico.
| |
Collapse
|
17
|
Crossa J, Montesinos-López OA, Pérez-Rodríguez P, Costa-Neto G, Fritsche-Neto R, Ortiz R, Martini JWR, Lillemo M, Montesinos-López A, Jarquin D, Breseghello F, Cuevas J, Rincent R. Genome and Environment Based Prediction Models and Methods of Complex Traits Incorporating Genotype × Environment Interaction. Methods Mol Biol 2022; 2467:245-283. [PMID: 35451779 DOI: 10.1007/978-1-0716-2205-6_9] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genomic-enabled prediction models are of paramount importance for the successful implementation of genomic selection (GS) based on breeding values. As opposed to animal breeding, plant breeding includes extensive multienvironment and multiyear field trial data. Hence, genomic-enabled prediction models should include genotype × environment (G × E) interaction, which most of the time increases the prediction performance when the response of lines are different from environment to environment. In this chapter, we describe a historical timeline since 2012 related to advances of the GS models that take into account G × E interaction. We describe theoretical and practical aspects of those GS models, including the gains in prediction performance when including G × E structures for both complex continuous and categorical scale traits. Then, we detailed and explained the main G × E genomic prediction models for complex traits measured in continuous and noncontinuous (categorical) scale. Related to G × E interaction models this review also examine the analyses of the information generated with high-throughput phenotype data (phenomic) and the joint analyses of multitrait and multienvironment field trial data that is also employed in the general assessment of multitrait G × E interaction. The inclusion of nongenomic data in increasing the accuracy and biological reliability of the G × E approach is also outlined. We show the recent advances in large-scale envirotyping (enviromics), and how the use of mechanistic computational modeling can derive the crop growth and development aspects useful for predicting phenotypes and explaining G × E.
Collapse
Affiliation(s)
- José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, Mexico
- Colegio de Postgraduados, Montecillos, Mexico
| | | | | | - Germano Costa-Neto
- Departamento de Genética, Escola Superior de Agricultura "Luiz de Queiroz" (ESALQ/USP), São Paulo, Brazil
| | - Roberto Fritsche-Neto
- Departamento de Genética, Escola Superior de Agricultura "Luiz de Queiroz" (ESALQ/USP), São Paulo, Brazil
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), Alnarp, Sweden
| | - Johannes W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, Mexico
| | - Morten Lillemo
- Department of Plant Sciences, Norwegian University of Life Sciences, IHA/CIGENE, Ås, Norway
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | | | | | - Jaime Cuevas
- Universidad de Quintana Roo, Chetumal, Quintana Roo, Mexico.
| | - Renaud Rincent
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, Gif-sur-Yvette, France.
| |
Collapse
|
18
|
Montesinos-López OA, Montesinos-López A, Mosqueda-González BA, Bentley AR, Lillemo M, Varshney RK, Crossa J. A New Deep Learning Calibration Method Enhances Genome-Based Prediction of Continuous Crop Traits. Front Genet 2021; 12:798840. [PMID: 34976026 PMCID: PMC8718701 DOI: 10.3389/fgene.2021.798840] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 11/18/2021] [Indexed: 11/13/2022] Open
Abstract
Genomic selection (GS) has the potential to revolutionize predictive plant breeding. A reference population is phenotyped and genotyped to train a statistical model that is used to perform genome-enabled predictions of new individuals that were only genotyped. In this vein, deep neural networks, are a type of machine learning model and have been widely adopted for use in GS studies, as they are not parametric methods, making them more adept at capturing nonlinear patterns. However, the training process for deep neural networks is very challenging due to the numerous hyper-parameters that need to be tuned, especially when imperfect tuning can result in biased predictions. In this paper we propose a simple method for calibrating (adjusting) the prediction of continuous response variables resulting from deep learning applications. We evaluated the proposed deep learning calibration method (DL_M2) using four crop breeding data sets and its performance was compared with the standard deep learning method (DL_M1), as well as the standard genomic Best Linear Unbiased Predictor (GBLUP). While the GBLUP was the most accurate model overall, the proposed deep learning calibration method (DL_M2) helped increase the genome-enabled prediction performance in all data sets when compared with the traditional DL method (DL_M1). Taken together, we provide evidence for extending the use of the proposed calibration method to evaluate its potential and consistency for predicting performance in the context of GS applied to plant breeding.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Mexico
- *Correspondence: Abelardo Montesinos-López, ; Rajeev K. Varshney, ; José Crossa,
| | - Brandon A. Mosqueda-González
- Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Esq. Miguel Othón de Mendizábal, Mexico city, Mexico
| | - Alison R. Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Morten Lillemo
- Department of Plant Sciences, Norwegian University of Life Sciences, IHA/CIGENE, As, Norway
| | - Rajeev K. Varshney
- Centre of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Murdoch University, Perth, WA, Australia
- *Correspondence: Abelardo Montesinos-López, ; Rajeev K. Varshney, ; José Crossa,
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
- Colegio de Postgraduados, Montecillo, Mexico
- *Correspondence: Abelardo Montesinos-López, ; Rajeev K. Varshney, ; José Crossa,
| |
Collapse
|
19
|
Washburn JD, Cimen E, Ramstein G, Reeves T, O'Briant P, McLean G, Cooper M, Hammer G, Buckler ES. Predicting phenotypes from genetic, environment, management, and historical data using CNNs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:3997-4011. [PMID: 34448888 DOI: 10.1007/s00122-021-03943-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 08/18/2021] [Indexed: 06/13/2023]
Abstract
Convolutional Neural Networks (CNNs) can perform similarly or better than standard genomic prediction methods when sufficient genetic, environmental, and management data are provided. Predicting phenotypes from genetic (G), environmental (E), and management (M) conditions is a long-standing challenge with implications to agriculture, medicine, and conservation. Most methods reduce the factors in a dataset (feature engineering) in a subjective and potentially oversimplified manner. Deep neural networks such as Multilayer Perceptrons (MPL) and Convolutional Neural Networks (CNN) can overcome this by allowing the data itself to determine which factors are most important. CNN models were developed for predicting agronomic yield from a combination of replicated trials and historical yield survey data. The results were more accurate than standard methods when tested on held-out G, E, and M data (r = 0.50 vs. r = 0.43), and performed slightly worse than standard methods when only G was held out (r = 0.74 vs. r = 0.80). Pre-training on historical data increased accuracy compared to trial data alone. Saliency map analysis indicated the CNN has "learned" to prioritize many factors of known agricultural importance.
Collapse
Affiliation(s)
- Jacob D Washburn
- United States Department of Agriculture, Agricultural Research Service, Columbia, MO, 65211, USA.
| | - Emre Cimen
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
- Computational Intelligence and Optimization Laboratory, Industrial Engineering Department, Eskisehir Technical University, Eskisehir, Turkey
| | - Guillaume Ramstein
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000, Aarhus, Denmark
| | - Timothy Reeves
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
| | - Patrick O'Briant
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
| | - Greg McLean
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| | - Mark Cooper
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| | - Graeme Hammer
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
- Department of Agriculture, Agricultural Research Service, Ithaca, NY, 14850, USA
| |
Collapse
|
20
|
Xu Y, Liu X, Cao X, Huang C, Liu E, Qian S, Liu X, Wu Y, Dong F, Qiu CW, Qiu J, Hua K, Su W, Wu J, Xu H, Han Y, Fu C, Yin Z, Liu M, Roepman R, Dietmann S, Virta M, Kengara F, Zhang Z, Zhang L, Zhao T, Dai J, Yang J, Lan L, Luo M, Liu Z, An T, Zhang B, He X, Cong S, Liu X, Zhang W, Lewis JP, Tiedje JM, Wang Q, An Z, Wang F, Zhang L, Huang T, Lu C, Cai Z, Wang F, Zhang J. Artificial intelligence: A powerful paradigm for scientific research. Innovation (N Y) 2021; 2:100179. [PMID: 34877560 PMCID: PMC8633405 DOI: 10.1016/j.xinn.2021.100179] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 10/26/2021] [Indexed: 12/18/2022] Open
Abstract
Artificial intelligence (AI) coupled with promising machine learning (ML) techniques well known from computer science is broadly affecting many aspects of various fields including science and technology, industry, and even our day-to-day life. The ML techniques have been developed to analyze high-throughput data with a view to obtaining useful insights, categorizing, predicting, and making evidence-based decisions in novel ways, which will promote the growth of novel applications and fuel the sustainable booming of AI. This paper undertakes a comprehensive survey on the development and application of AI in different aspects of fundamental sciences, including information science, mathematics, medical science, materials science, geoscience, life science, physics, and chemistry. The challenges that each discipline of science meets, and the potentials of AI techniques to handle these challenges, are discussed in detail. Moreover, we shed light on new research trends entailing the integration of AI into each scientific discipline. The aim of this paper is to provide a broad research guideline on fundamental sciences with potential infusion of AI, to help motivate researchers to deeply understand the state-of-the-art applications of AI-based fundamental sciences, and thereby to help promote the continuous development of these fundamental sciences.
Collapse
Affiliation(s)
- Yongjun Xu
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xin Liu
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xin Cao
- Zhongshan Hospital Institute of Clinical Science, Fudan University, Shanghai 200032, China
| | - Changping Huang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Enke Liu
- Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China
- Songshan Lake Materials Laboratory, Dongguan, Guangdong 523808, China
| | - Sen Qian
- Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Xingchen Liu
- Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan 030001, China
| | - Yanjun Wu
- Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fengliang Dong
- National Center for Nanoscience and Technology, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Cheng-Wei Qiu
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117583, Singapore
| | - Junjun Qiu
- Department of Gynaecology, Obstetrics and Gynaecology Hospital, Fudan University, Shanghai 200011, China
- Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Shanghai 200011, China
| | - Keqin Hua
- Department of Gynaecology, Obstetrics and Gynaecology Hospital, Fudan University, Shanghai 200011, China
- Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Shanghai 200011, China
| | - Wentao Su
- School of Food Science and Technology, Dalian Polytechnic University, Dalian 116034, China
| | - Jian Wu
- Second Affiliated Hospital School of Medicine, and School of Public Health, Zhejiang University, Hangzhou 310058, China
| | - Huiyu Xu
- Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing 100191, China
| | - Yong Han
- Zhejiang Provincial People’s Hospital, Hangzhou 310014, China
| | - Chenguang Fu
- School of Materials Science and Engineering, Zhejiang University, Hangzhou 310027, China
| | - Zhigang Yin
- Fujian Institute of Research on the Structure of Matter, Chinese Academy of Sciences, Fuzhou 350002, China
| | - Miao Liu
- Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China
- Songshan Lake Materials Laboratory, Dongguan, Guangdong 523808, China
| | - Ronald Roepman
- Medical Center, Radboud University, 6500 Nijmegen, the Netherlands
| | - Sabine Dietmann
- Institute for Informatics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Marko Virta
- Department of Microbiology, University of Helsinki, 00014 Helsinki, Finland
| | - Fredrick Kengara
- School of Pure and Applied Sciences, Bomet University College, Bomet 20400, Kenya
| | - Ze Zhang
- Agriculture College of Shihezi University, Xinjiang 832000, China
| | - Lifu Zhang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
- Agriculture College of Shihezi University, Xinjiang 832000, China
| | - Taolan Zhao
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Ji Dai
- The Brain Cognition and Brain Disease Institute, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Shenzhen-Hong Kong Institute of Brain Science-Shenzhen Fundamental Research Institutions, Shenzhen 518055, China
| | | | - Liang Lan
- Department of Communication Studies, Hong Kong Baptist University, Hong Kong, China
| | - Ming Luo
- South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou 510650, China
| | - Zhaofeng Liu
- Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tao An
- Shanghai Astronomical Observatory, Chinese Academy of Sciences, Shanghai 200030, China
| | - Bin Zhang
- Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan 030001, China
| | - Xiao He
- Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Shan Cong
- Suzhou Institute of Nano-Tech and Nano-Bionics, Chinese Academy of Sciences, Suzhou 215123, China
| | - Xiaohong Liu
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Wei Zhang
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - James P. Lewis
- Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan 030001, China
| | - James M. Tiedje
- Center for Microbial Ecology, Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA
| | - Qi Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Zhejiang Lab, Hangzhou 311121, China
| | - Zhulin An
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fei Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Libo Zhang
- Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China
| | - Chuan Lu
- Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion SY23 3FL, UK
| | - Zhipeng Cai
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
| | - Fang Wang
- Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiabao Zhang
- Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
21
|
Bayer PE, Petereit J, Danilevicz MF, Anderson R, Batley J, Edwards D. The application of pangenomics and machine learning in genomic selection in plants. THE PLANT GENOME 2021; 14:e20112. [PMID: 34288550 DOI: 10.1002/tpg2.20112] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 05/01/2021] [Indexed: 05/10/2023]
Abstract
Genomic selection approaches have increased the speed of plant breeding, leading to growing crop yields over the last decade. However, climate change is impacting current and future yields, resulting in the need to further accelerate breeding efforts to cope with these changing conditions. Here we present approaches to accelerate plant breeding and incorporate nonadditive effects in genomic selection by applying state-of-the-art machine learning approaches. These approaches are made more powerful by the inclusion of pangenomes, which represent the entire genome content of a species. Understanding the strengths and limitations of machine learning methods, compared with more traditional genomic selection efforts, is paramount to the successful application of these methods in crop breeding. We describe examples of genomic selection and pangenome-based approaches in crop breeding, discuss machine learning-specific challenges, and highlight the potential for the application of machine learning in genomic selection. We believe that careful implementation of machine learning approaches will support crop improvement to help counter the adverse outcomes of climate change on crop production.
Collapse
Affiliation(s)
- Philipp E Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jakob Petereit
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Monica Furaste Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Robyn Anderson
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| |
Collapse
|
22
|
Sandhu K, Patil SS, Pumphrey M, Carter A. Multitrait machine- and deep-learning models for genomic selection using spectral information in a wheat breeding program. THE PLANT GENOME 2021; 14:e20119. [PMID: 34482627 DOI: 10.1002/tpg2.20119] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 05/18/2021] [Indexed: 06/13/2023]
Abstract
Prediction of breeding values is central to plant breeding and has been revolutionized by the adoption of genomic selection (GS). Use of machine- and deep-learning algorithms applied to complex traits in plants can improve prediction accuracies. Because of the tremendous increase in collected data in breeding programs and the slow rate of genetic gain increase, it is required to explore the potential of artificial intelligence in analyzing the data. The main objectives of this study include optimization of multitrait (MT) machine- and deep-learning models for predicting grain yield and grain protein content in wheat (Triticum aestivum L.) using spectral information. This study compares the performance of four machine- and deep-learning-based unitrait (UT) and MT models with traditional genomic best linear unbiased predictor (GBLUP) and Bayesian models. The dataset consisted of 650 recombinant inbred lines (RILs) from a spring wheat breeding program grown for three years (2014-2016), and spectral data were collected at heading and grain filling stages. The MT-GS models performed 0-28.5 and -0.04 to 15% superior to the UT-GS models. Random forest and multilayer perceptron were the best performing machine- and deep-learning models to predict both traits. Four explored Bayesian models gave similar accuracies, which were less than machine- and deep-learning-based models and required increased computational time. Green normalized difference vegetation index (GNDVI) best predicted grain protein content in seven out of the nine MT-GS models. Overall, this study concluded that machine- and deep-learning-based MT-GS models increased prediction accuracy and should be employed in large-scale breeding programs.
Collapse
Affiliation(s)
- Karansher Sandhu
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| | - Shruti Sunil Patil
- School of Electrical Engineering and Computer Science, WA State University, Pullman, WA, 99164, USA
| | - Michael Pumphrey
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| | - Arron Carter
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| |
Collapse
|
23
|
Montesinos-Lopez OA, Montesinos-Lopez JC, Salazar E, Barron JA, Montesinos-Lopez A, Buenrostro-Mariscal R, Crossa J. Application of a Poisson deep neural network model for the prediction of count data in genome-based prediction. THE PLANT GENOME 2021; 14:e20118. [PMID: 34323393 DOI: 10.1002/tpg2.20118] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 05/15/2021] [Indexed: 06/13/2023]
Abstract
Genomic selection (GS) is revolutionizing conventional ways of developing new plants and animals. However, because it is a predictive methodology, GS strongly depends on statistical and machine learning to perform these predictions. For continuous outcomes, more models are available for GS. Unfortunately, for count data outcomes, there are few efficient statistical machine learning models for large datasets or for datasets with fewer observations than independent variables. For this reason, in this paper, we applied the univariate version of the Poisson deep neural network (PDNN) proposed earlier for genomic predictions of count data. The model was implemented with (a) the negative log-likelihood of Poisson distribution as the loss function, (b) the rectified linear activation unit as the activation function in hidden layers, and (c) the exponential activation function in the output layer. The advantage of the PDNN model is that it captures complex patterns in the data by implementing many nonlinear transformations in the hidden layers. Moreover, since it was implemented in Tensorflow as the back-end, and in Keras as the front-end, the model can be applied to moderate and large datasets, which is a significant advantage over previous GS models for count data. The PDNN model was compared with deep learning models with continuous outcomes, conventional generalized Poisson regression models, and conventional Bayesian regression methods. We found that the PDNN model outperformed the Bayesian regression and generalized Poisson regression methods in terms of prediction accuracy, although it was not better than the conventional deep neural network with continuous outcomes.
Collapse
Affiliation(s)
| | - Jose C Montesinos-Lopez
- Dep. de Estadística, Centro de Investigación en Matemáticas, Guanajuato, Guanajuato, 36023, México
| | - Eduardo Salazar
- Facultad de Telemática, Univ. de Colima, Colima, Colima, 28040, México
| | - Jose Alberto Barron
- Dep. of Animal Production (DPA), Universidad Nacional Agraria La Molina, Av. La Molina, s/n La Molina 15024, Lima, Perú
| | - Abelardo Montesinos-Lopez
- Dep. de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías, Univ. de Guadalajara, Guadalajara, Jalisco, 44430, México
| | | | - Jose Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Carretera km 45, Mexico-Veracruz, Texcoco, Edo. de México, CP 52640, México
- Colegio de Post-Graduados, CP 56230, Montecillos, Edo. de México, Texcoco, México
| |
Collapse
|
24
|
Vu NT, Phuc TH, Oanh KTP, Sang NV, Trang TT, Nguyen NH. Accuracies of genomic predictions for disease resistance of striped catfish to Edwardsiella ictaluri using artificial intelligence algorithms. G3-GENES GENOMES GENETICS 2021; 12:6408442. [PMID: 34788431 PMCID: PMC8727988 DOI: 10.1093/g3journal/jkab361] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 10/10/2021] [Indexed: 02/04/2023]
Abstract
Assessments of genomic prediction accuracies using artificial intelligent (AI) algorithms (i.e., machine and deep learning methods) are currently not available or very limited in aquaculture species. The principal aim of this study was to examine the predictive performance of these new methods for disease resistance to Edwardsiella ictaluri in a population of striped catfish Pangasianodon hypophthalmus and to make comparisons with four common methods, i.e., pedigree-based best linear unbiased prediction (PBLUP), genomic-based best linear unbiased prediction (GBLUP), single-step GBLUP (ssGBLUP) and a nonlinear Bayesian approach (notably BayesR). Our analyses using machine learning (i.e., ML-KAML) and deep learning (i.e., DL-MLP and DL-CNN) together with the four common methods (PBLUP, GBLUP, ssGBLUP, and BayesR) were conducted for two main disease resistance traits (i.e., survival status coded as 0 and 1 and survival time, i.e., days that the animals were still alive after the challenge test) in a pedigree consisting of 560 individual animals (490 offspring and 70 parents) genotyped for 14,154 single nucleotide polymorphism (SNPs). The results using 6,470 SNPs after quality control showed that machine learning methods outperformed PBLUP, GBLUP, and ssGBLUP, with the increases in the prediction accuracies for both traits by 9.1–15.4%. However, the prediction accuracies obtained from machine learning methods were comparable to those estimated using BayesR. Imputation of missing genotypes using AlphaFamImpute increased the prediction accuracies by 5.3–19.2% in all the methods and data used. On the other hand, there were insignificant decreases (0.3–5.6%) in the prediction accuracies for both survival status and survival time when multivariate models were used in comparison to univariate analyses. Interestingly, the genomic prediction accuracies based on only highly significant SNPs (P < 0.00001, 318–400 SNPs for survival status and 1,362–1,589 SNPs for survival time) were somewhat lower (0.3–15.6%) than those obtained from the whole set of 6,470 SNPs. In most of our analyses, the accuracies of genomic prediction were somewhat higher for survival time than survival status (0/1 data). It is concluded that although there are prospects for the application of genomic selection to increase disease resistance to E. ictaluri in striped catfish breeding programs, further evaluation of these methods should be made in independent families/populations when more data are accumulated in future generations to avoid possible biases in the genetic parameters estimates and prediction accuracies for the disease-resistant traits studied in this population of striped catfish P. hypophthalmus.
Collapse
Affiliation(s)
- Nguyen Thanh Vu
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Genecology Research Center, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Research Institute for Aquaculture No.2, Ho Chi Minh 710000, Vietnam
| | - Tran Huu Phuc
- Research Institute for Aquaculture No.2, Ho Chi Minh 710000, Vietnam
| | - Kim Thi Phuong Oanh
- Institute of Genome Research, Vietnam Academy of Science and Technology, Hanoi, Vietnam
| | - Nguyen Van Sang
- Research Institute for Aquaculture No.2, Ho Chi Minh 710000, Vietnam
| | - Trinh Thi Trang
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Genecology Research Center, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Vietnam National University of Agriculture, Gia Lam 131000, Vietnam
| | - Nguyen Hong Nguyen
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Genecology Research Center, University of the Sunshine Coast, Sippy Downs, QLD, Australia
| |
Collapse
|
25
|
Razzaq A, Kaur P, Akhter N, Wani SH, Saleem F. Next-Generation Breeding Strategies for Climate-Ready Crops. FRONTIERS IN PLANT SCIENCE 2021; 12:620420. [PMID: 34367194 PMCID: PMC8336580 DOI: 10.3389/fpls.2021.620420] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 06/14/2021] [Indexed: 05/17/2023]
Abstract
Climate change is a threat to global food security due to the reduction of crop productivity around the globe. Food security is a matter of concern for stakeholders and policymakers as the global population is predicted to bypass 10 billion in the coming years. Crop improvement via modern breeding techniques along with efficient agronomic practices innovations in microbiome applications, and exploiting the natural variations in underutilized crops is an excellent way forward to fulfill future food requirements. In this review, we describe the next-generation breeding tools that can be used to increase crop production by developing climate-resilient superior genotypes to cope with the future challenges of global food security. Recent innovations in genomic-assisted breeding (GAB) strategies allow the construction of highly annotated crop pan-genomes to give a snapshot of the full landscape of genetic diversity (GD) and recapture the lost gene repertoire of a species. Pan-genomes provide new platforms to exploit these unique genes or genetic variation for optimizing breeding programs. The advent of next-generation clustered regularly interspaced short palindromic repeat/CRISPR-associated (CRISPR/Cas) systems, such as prime editing, base editing, and de nova domestication, has institutionalized the idea that genome editing is revamped for crop improvement. Also, the availability of versatile Cas orthologs, including Cas9, Cas12, Cas13, and Cas14, improved the editing efficiency. Now, the CRISPR/Cas systems have numerous applications in crop research and successfully edit the major crop to develop resistance against abiotic and biotic stress. By adopting high-throughput phenotyping approaches and big data analytics tools like artificial intelligence (AI) and machine learning (ML), agriculture is heading toward automation or digitalization. The integration of speed breeding with genomic and phenomic tools can allow rapid gene identifications and ultimately accelerate crop improvement programs. In addition, the integration of next-generation multidisciplinary breeding platforms can open exciting avenues to develop climate-ready crops toward global food security.
Collapse
Affiliation(s)
- Ali Razzaq
- Centre of Agricultural Biochemistry and Biotechnology (CABB), University of Agriculture, Faisalabad, Pakistan
| | - Parwinder Kaur
- UWA School of Agriculture and Environment, The University of Western Australia, Perth, WA, Australia
| | - Naheed Akhter
- College of Allied Health Professional, Faculty of Medical Sciences, Government College University Faisalabad, Faisalabad, Pakistan
| | - Shabir Hussain Wani
- Mountain Research Center for Field Crops, Khudwani, Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir, Srinagar, India
| | - Fozia Saleem
- Centre of Agricultural Biochemistry and Biotechnology (CABB), University of Agriculture, Faisalabad, Pakistan
| |
Collapse
|
26
|
Sandhu KS, Aoun M, Morris CF, Carter AH. Genomic Selection for End-Use Quality and Processing Traits in Soft White Winter Wheat Breeding Program with Machine and Deep Learning Models. BIOLOGY 2021; 10:689. [PMID: 34356544 PMCID: PMC8301459 DOI: 10.3390/biology10070689] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 07/13/2021] [Accepted: 07/17/2021] [Indexed: 01/12/2023]
Abstract
Breeding for grain yield, biotic and abiotic stress resistance, and end-use quality are important goals of wheat breeding programs. Screening for end-use quality traits is usually secondary to grain yield due to high labor needs, cost of testing, and large seed requirements for phenotyping. Genomic selection provides an alternative to predict performance using genome-wide markers under forward and across location predictions, where a previous year's dataset can be used to build the models. Due to large datasets in breeding programs, we explored the potential of the machine and deep learning models to predict fourteen end-use quality traits in a winter wheat breeding program. The population used consisted of 666 wheat genotypes screened for five years (2015-19) at two locations (Pullman and Lind, WA, USA). Nine different models, including two machine learning (random forest and support vector machine) and two deep learning models (convolutional neural network and multilayer perceptron) were explored for cross-validation, forward, and across locations predictions. The prediction accuracies for different traits varied from 0.45-0.81, 0.29-0.55, and 0.27-0.50 under cross-validation, forward, and across location predictions. In general, forward prediction accuracies kept increasing over time due to increments in training data size and was more evident for machine and deep learning models. Deep learning models were superior over the traditional ridge regression best linear unbiased prediction (RRBLUP) and Bayesian models under all prediction scenarios. The high accuracy observed for end-use quality traits in this study support predicting them in early generations, leading to the advancement of superior genotypes to more extensive grain yield trails. Furthermore, the superior performance of machine and deep learning models strengthens the idea to include them in large scale breeding programs for predicting complex traits.
Collapse
Affiliation(s)
- Karansher Singh Sandhu
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA; (K.S.S.); (M.A.)
| | - Meriem Aoun
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA; (K.S.S.); (M.A.)
| | - Craig F. Morris
- USDA-ARS Western Wheat Quality Laboratory, E-202 Food Quality Building, Washington State University, Pullman, WA 99164, USA;
| | - Arron H. Carter
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA; (K.S.S.); (M.A.)
| |
Collapse
|
27
|
Reynolds MP, Lewis JM, Ammar K, Basnet BR, Crespo-Herrera L, Crossa J, Dhugga KS, Dreisigacker S, Juliana P, Karwat H, Kishii M, Krause MR, Langridge P, Lashkari A, Mondal S, Payne T, Pequeno D, Pinto F, Sansaloni C, Schulthess U, Singh RP, Sonder K, Sukumaran S, Xiong W, Braun HJ. Harnessing translational research in wheat for climate resilience. JOURNAL OF EXPERIMENTAL BOTANY 2021; 72:5134-5157. [PMID: 34139769 PMCID: PMC8272565 DOI: 10.1093/jxb/erab256] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 06/14/2021] [Indexed: 05/24/2023]
Abstract
Despite being the world's most widely grown crop, research investments in wheat (Triticum aestivum and Triticum durum) fall behind those in other staple crops. Current yield gains will not meet 2050 needs, and climate stresses compound this challenge. However, there is good evidence that heat and drought resilience can be boosted through translating promising ideas into novel breeding technologies using powerful new tools in genetics and remote sensing, for example. Such technologies can also be applied to identify climate resilience traits from among the vast and largely untapped reserve of wheat genetic resources in collections worldwide. This review describes multi-pronged research opportunities at the focus of the Heat and Drought Wheat Improvement Consortium (coordinated by CIMMYT), which together create a pipeline to boost heat and drought resilience, specifically: improving crop design targets using big data approaches; developing phenomic tools for field-based screening and research; applying genomic technologies to elucidate the bases of climate resilience traits; and applying these outputs in developing next-generation breeding methods. The global impact of these outputs will be validated through the International Wheat Improvement Network, a global germplasm development and testing system that contributes key productivity traits to approximately half of the global wheat-growing area.
Collapse
Affiliation(s)
- Matthew P Reynolds
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Janet M Lewis
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Karim Ammar
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Bhoja R Basnet
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Kanwarpal S Dhugga
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - Philomin Juliana
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Hannes Karwat
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Masahiro Kishii
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Margaret R Krause
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Peter Langridge
- School of Agriculture, Food and Wine, University of Adelaide, Waite Campus, PMB1, Glen Osmond SA 5064, Australia
- Wheat Initiative, Julius Kühn-Institute, Königin-Luise-Str. 19, 14195 Berlin, Germany
| | - Azam Lashkari
- CIMMYT-Henan Collaborative Innovation Center, Henan Agricultural University, Zhengzhou, 450002, PR China
| | - Suchismita Mondal
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Thomas Payne
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Diego Pequeno
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Francisco Pinto
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Carolina Sansaloni
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Urs Schulthess
- CIMMYT-Henan Collaborative Innovation Center, Henan Agricultural University, Zhengzhou, 450002, PR China
| | - Ravi P Singh
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Kai Sonder
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - Wei Xiong
- CIMMYT-Henan Collaborative Innovation Center, Henan Agricultural University, Zhengzhou, 450002, PR China
| | - Hans J Braun
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| |
Collapse
|
28
|
Abstract
Technological developments have revolutionized measurements on plant genotypes and phenotypes, leading to routine production of large, complex data sets. This has led to increased efforts to extract meaning from these measurements and to integrate various data sets. Concurrently, machine learning has rapidly evolved and is now widely applied in science in general and in plant genotyping and phenotyping in particular. Here, we review the application of machine learning in the context of plant science and plant breeding. We focus on analyses at different phenotype levels, from biochemical to yield, and in connecting genotypes to these. In this way, we illustrate how machine learning offers a suite of methods that enable researchers to find meaningful patterns in relevant plant data.
Collapse
Affiliation(s)
- Aalt Dirk Jan van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
- Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| | - Gert Kootstra
- Farm Technology, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| | - Willem Kruijer
- Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| |
Collapse
|
29
|
Sandhu KS, Lozada DN, Zhang Z, Pumphrey MO, Carter AH. Deep Learning for Predicting Complex Traits in Spring Wheat Breeding Program. FRONTIERS IN PLANT SCIENCE 2021; 11:613325. [PMID: 33469463 PMCID: PMC7813801 DOI: 10.3389/fpls.2020.613325] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 11/30/2020] [Indexed: 05/12/2023]
Abstract
Genomic selection (GS) is transforming the field of plant breeding and implementing models that improve prediction accuracy for complex traits is needed. Analytical methods for complex datasets traditionally used in other disciplines represent an opportunity for improving prediction accuracy in GS. Deep learning (DL) is a branch of machine learning (ML) which focuses on densely connected networks using artificial neural networks for training the models. The objective of this research was to evaluate the potential of DL models in the Washington State University spring wheat breeding program. We compared the performance of two DL algorithms, namely multilayer perceptron (MLP) and convolutional neural network (CNN), with ridge regression best linear unbiased predictor (rrBLUP), a commonly used GS model. The dataset consisted of 650 recombinant inbred lines (RILs) from a spring wheat nested association mapping (NAM) population planted from 2014-2016 growing seasons. We predicted five different quantitative traits with varying genetic architecture using cross-validations (CVs), independent validations, and different sets of SNP markers. Hyperparameters were optimized for DL models by lowering the root mean square in the training set, avoiding model overfitting using dropout and regularization. DL models gave 0 to 5% higher prediction accuracy than rrBLUP model under both cross and independent validations for all five traits used in this study. Furthermore, MLP produces 5% higher prediction accuracy than CNN for grain yield and grain protein content. Altogether, DL approaches obtained better prediction accuracy for each trait, and should be incorporated into a plant breeder's toolkit for use in large scale breeding programs.
Collapse
Affiliation(s)
- Karansher S. Sandhu
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Dennis N. Lozada
- Department of Plant and Environmental Sciences, New Mexico State University, Las Cruces, NM, United States
| | - Zhiwu Zhang
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Michael O. Pumphrey
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Arron H. Carter
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| |
Collapse
|
30
|
Crossa J, Fritsche-Neto R, Montesinos-Lopez OA, Costa-Neto G, Dreisigacker S, Montesinos-Lopez A, Bentley AR. The Modern Plant Breeding Triangle: Optimizing the Use of Genomics, Phenomics, and Enviromics Data. FRONTIERS IN PLANT SCIENCE 2021; 12:651480. [PMID: 33936136 PMCID: PMC8085545 DOI: 10.3389/fpls.2021.651480] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 02/11/2021] [Indexed: 05/04/2023]
Affiliation(s)
- Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, de Mexico, Mexico
- Colegio de Postgraduados, Montecillo, Edo. de Mexico, Mexico
| | - Roberto Fritsche-Neto
- Department of Genetics, “Luiz de Queiroz” Agriculture College, University of São Paulo, São Paulo, Brazil
| | | | - Germano Costa-Neto
- Department of Genetics, “Luiz de Queiroz” Agriculture College, University of São Paulo, São Paulo, Brazil
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, de Mexico, Mexico
| | - Abelardo Montesinos-Lopez
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Mexico
| | - Alison R. Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, de Mexico, Mexico
- *Correspondence: Alison R. Bentley
| |
Collapse
|
31
|
Maldonado C, Mora-Poblete F, Contreras-Soto RI, Ahmar S, Chen JT, do Amaral Júnior AT, Scapim CA. Genome-Wide Prediction of Complex Traits in Two Outcrossing Plant Species Through Deep Learning and Bayesian Regularized Neural Network. FRONTIERS IN PLANT SCIENCE 2020; 11:593897. [PMID: 33329658 PMCID: PMC7728740 DOI: 10.3389/fpls.2020.593897] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 10/27/2020] [Indexed: 05/25/2023]
Abstract
Genomic selection models were investigated to predict several complex traits in breeding populations of Zea mays L. and Eucalyptus globulus Labill. For this, the following methods of Machine Learning (ML) were implemented: (i) Deep Learning (DL) and (ii) Bayesian Regularized Neural Network (BRNN) both in combination with different hyperparameters. These ML methods were also compared with Genomic Best Linear Unbiased Prediction (GBLUP) and different Bayesian regression models [Bayes A, Bayes B, Bayes Cπ, Bayesian Ridge Regression, Bayesian LASSO, and Reproducing Kernel Hilbert Space (RKHS)]. DL models, using Rectified Linear Units (as the activation function), had higher predictive ability values, which varied from 0.27 (pilodyn penetration of 6 years old eucalypt trees) to 0.78 (flowering-related traits of maize). Moreover, the larger mini-batch size (100%) had a significantly higher predictive ability for wood-related traits than the smaller mini-batch size (10%). On the other hand, in the BRNN method, the architectures of one and two layers that used only the pureline function showed better results of prediction, with values ranging from 0.21 (pilodyn penetration) to 0.71 (flowering traits). A significant increase in the prediction ability was observed for DL in comparison with other methods of genomic prediction (Bayesian alphabet models, GBLUP, RKHS, and BRNN). Another important finding was the usefulness of DL models (through an iterative algorithm) as an SNP detection strategy for genome-wide association studies. The results of this study confirm the importance of DL for genome-wide analyses and crop/tree improvement strategies, which holds promise for accelerating breeding progress.
Collapse
Affiliation(s)
- Carlos Maldonado
- Instituto de Ciencias Agroalimentarias, Animales y Ambientales, Universidad de O’ Higgins, San Fernando, Chile
| | | | - Rodrigo Iván Contreras-Soto
- Instituto de Ciencias Agroalimentarias, Animales y Ambientales, Universidad de O’ Higgins, San Fernando, Chile
| | - Sunny Ahmar
- Institute of Biological Sciences, University of Talca, Talca, Chile
- College of Plant Sciences and Technology, Huazhong Agricultural University, Wuhan, China
| | - Jen-Tsung Chen
- Department of Life Sciences, National University of Kaohsiung, Kaohsiung, Taiwan
| | - Antônio Teixeira do Amaral Júnior
- Laboratory de Melhoramento Genético Veget al., Universidade Estadual do Norte Fluminense Darcy Ribeiro/CCTA, Campos dos Goytacazes, Brazil
| | | |
Collapse
|
32
|
Pook T, Freudenthal J, Korte A, Simianer H. Using Local Convolutional Neural Networks for Genomic Prediction. Front Genet 2020; 11:561497. [PMID: 33281867 PMCID: PMC7689358 DOI: 10.3389/fgene.2020.561497] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 10/12/2020] [Indexed: 11/18/2022] Open
Abstract
The prediction of breeding values and phenotypes is of central importance for both livestock and crop breeding. In this study, we analyze the use of artificial neural networks (ANN) and, in particular, local convolutional neural networks (LCNN) for genomic prediction, as a region-specific filter corresponds much better with our prior genetic knowledge on the genetic architecture of traits than traditional convolutional neural networks. Model performances are evaluated on a simulated maize data panel (n = 10,000; p = 34,595) and real Arabidopsis data (n = 2,039; p = 180,000) for a variety of traits based on their predictive ability. The baseline LCNN, containing one local convolutional layer (kernel size: 10) and two fully connected layers with 64 nodes each, is outperforming commonly proposed ANNs (multi layer perceptrons and convolutional neural networks) for basically all considered traits. For traits with high heritability and large training population as present in the simulated data, LCNN are even outperforming state-of-the-art methods like genomic best linear unbiased prediction (GBLUP), Bayesian models and extended GBLUP, indicated by an increase in predictive ability of up to 24%. However, for small training populations, these state-of-the-art methods outperform all considered ANNs. Nevertheless, the LCNN still outperforms all other considered ANNs by around 10%. Minor improvements to the tested baseline network architecture of the LCNN were obtained by increasing the kernel size and of reducing the stride, whereas the number of subsequent fully connected layers and their node sizes had neglectable impact. Although gains in predictive ability were obtained for large scale data sets by using LCNNs, the practical use of ANNs comes with additional problems, such as the need of genotyping all considered individuals, the lack of estimation of heritability and reliability. Furthermore, breeding values are additive by design, whereas ANN-based estimates are not. However, ANNs also comes with new opportunities, as networks can easily be extended to account for additional inputs (omics, weather etc.) and outputs (multi-trait models), and computing time increases linearly with the number of individuals. With advances in high-throughput phenotyping and cheaper genotyping, ANNs can become a valid alternative for genomic prediction.
Collapse
Affiliation(s)
- Torsten Pook
- Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Goettingen, Göttingen, Germany
| | - Jan Freudenthal
- Center for Computational and Theoretical Biology, University of Wuerzburg, Wuerzburg, Germany
| | - Arthur Korte
- Center for Computational and Theoretical Biology, University of Wuerzburg, Wuerzburg, Germany
| | - Henner Simianer
- Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Goettingen, Göttingen, Germany
| |
Collapse
|
33
|
Montesinos-López OA, Montesinos-López JC, Singh P, Lozano-Ramirez N, Barrón-López A, Montesinos-López A, Crossa J. A Multivariate Poisson Deep Learning Model for Genomic Prediction of Count Data. G3 (BETHESDA, MD.) 2020; 10:4177-4190. [PMID: 32934019 PMCID: PMC7642922 DOI: 10.1534/g3.120.401631] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 09/13/2020] [Indexed: 01/24/2023]
Abstract
The paradigm called genomic selection (GS) is a revolutionary way of developing new plants and animals. This is a predictive methodology, since it uses learning methods to perform its task. Unfortunately, there is no universal model that can be used for all types of predictions; for this reason, specific methodologies are required for each type of output (response variables). Since there is a lack of efficient methodologies for multivariate count data outcomes, in this paper, a multivariate Poisson deep neural network (MPDN) model is proposed for the genomic prediction of various count outcomes simultaneously. The MPDN model uses the minus log-likelihood of a Poisson distribution as a loss function, in hidden layers for capturing nonlinear patterns using the rectified linear unit (RELU) activation function and, in the output layer, the exponential activation function was used for producing outputs on the same scale of counts. The proposed MPDN model was compared to conventional generalized Poisson regression models and univariate Poisson deep learning models in two experimental data sets of count data. We found that the proposed MPDL outperformed univariate Poisson deep neural network models, but did not outperform, in terms of prediction, the univariate generalized Poisson regression models. All deep learning models were implemented in Tensorflow as back-end and Keras as front-end, which allows implementing these models on moderate and large data sets, which is a significant advantage over previous GS models for multivariate count data.
Collapse
Affiliation(s)
| | | | - Pawan Singh
- Biometrics and Statistics Unit, Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera Mexico-Veracruz, CP 52640, Mexico
| | - Nerida Lozano-Ramirez
- Biometrics and Statistics Unit, Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera Mexico-Veracruz, CP 52640, Mexico
| | - Alberto Barrón-López
- Department of Animal Production (DPA), Universidad Nacional Agraria La Molina, Av. La Molina s/n La Molina, 15024, Lima, Perú
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Jalisco, México
| | - José Crossa
- Biometrics and Statistics Unit, Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera Mexico-Veracruz, CP 52640, Mexico
- Colegio de Post-Graduados, Montecillos Texcoco. Edo. de Mexico
| |
Collapse
|
34
|
Ibba MI, Crossa J, Montesinos-López OA, Montesinos-López A, Juliana P, Guzman C, Delorean E, Dreisigacker S, Poland J. Genome-based prediction of multiple wheat quality traits in multiple years. THE PLANT GENOME 2020; 13:e20034. [PMID: 33217204 DOI: 10.1002/tpg2.20034] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 05/26/2020] [Indexed: 05/20/2023]
Abstract
Wheat quality improvement is an important objective in all wheat breeding programs. However, due to the cost, time and quantity of seed required, wheat quality is typically analyzed only in the last stages of the breeding cycle on a limited number of samples. The use of genomic prediction could greatly help to select for wheat quality more efficiently by reducing the cost and time required for this analysis. Here were evaluated the prediction performances of 13 wheat quality traits under two multi-trait models (Bayesian multi-trait multi-environment [BMTME] and multi-trait ridge regression [MTR]) using five data sets of wheat lines evaluated in the field during two consecutive years. Lines in the second year (testing) were predicted using the quality information obtained in the first year (training). For most quality traits were found moderate to high prediction accuracies, suggesting that the use of genomic selection could be feasible. The best predictions were obtained with the BMTME model in all traits and the worst with the MTR model. The best predictions with the BMTME model under the mean arctangent absolute percentage error (MAAPE) were for test weight across the five data sets, whereas the worst predictions were for the alveograph trait ALVPL. In contrast, under Pearson's correlation, the best predictions depended on the data set. The results obtained suggest that the BMTME model should be preferred for multi-trait prediction analyses. This model allows to obtain not only the correlation among traits, but also the correlation among environments, helping to increase the prediction accuracy.
Collapse
Affiliation(s)
- Maria Itria Ibba
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera, Mexico-Veracruz, CP, 52640, Mexico
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera, Mexico-Veracruz, CP, 52640, Mexico
- Colegio de Postgraduados (COLPOS), Montecillos, Edo. de México, CP, 56230, México
| | | | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, 44430, México
| | - Philomin Juliana
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera, Mexico-Veracruz, CP, 52640, Mexico
| | - Carlos Guzman
- Departamento de Genética, Escuela Técnica Superior de Ingeniería Agronómica y de Montes, Campus de Rabanales, Universidad de Córdoba, Córdoba, Spain
| | - Emily Delorean
- Department of Agronomy, Kansas State University, 2004 Throckmorton Plant Science Center, Manhattan, KS, 66506, USA
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera, Mexico-Veracruz, CP, 52640, Mexico
| | - Jesse Poland
- Department of Agronomy, Kansas State University, 2004 Throckmorton Plant Science Center, Manhattan, KS, 66506, USA
| |
Collapse
|
35
|
Kim KD, Kang Y, Kim C. Application of Genomic Big Data in Plant Breeding:Past, Present, and Future. PLANTS (BASEL, SWITZERLAND) 2020; 9:E1454. [PMID: 33126607 PMCID: PMC7694055 DOI: 10.3390/plants9111454] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 10/26/2020] [Accepted: 10/26/2020] [Indexed: 01/11/2023]
Abstract
Plant breeding has a long history of developing new varieties that have ensured the food security of the human population. During this long journey together with humanity, plant breeders have successfully integrated the latest innovations in science and technologies to accelerate the increase in crop production and quality. For the past two decades, since the completion of human genome sequencing, genomic tools and sequencing technologies have advanced remarkably, and adopting these innovations has enabled us to cost down and/or speed up the plant breeding process. Currently, with the growing mass of genomic data and digitalized biological data, interdisciplinary approaches using new technologies could lead to a new paradigm of plant breeding. In this review, we summarize the overall history and advances of plant breeding, which have been aided by plant genomic research. We highlight the key advances in the field of plant genomics that have impacted plant breeding over the past decades and introduce the current status of innovative approaches such as genomic selection, which could overcome limitations of conventional breeding and enhance the rate of genetic gain.
Collapse
Affiliation(s)
- Kyung Do Kim
- Department of Bioscience and Bioinformatics, Myongji University, Yongin 17058, Korea;
| | - Yuna Kang
- Department of Crop Science, Chungnam National University, Daejeon 34134, Korea;
| | - Changsoo Kim
- Department of Crop Science, Chungnam National University, Daejeon 34134, Korea;
- Department of Smart Agriculture Systems, Chungnam National University, Daejeon 34134, Korea
| |
Collapse
|
36
|
Pérez-Rodríguez P, Flores-Galarza S, Vaquera-Huerta H, Del Valle-Paniagua DH, Montesinos-López OA, Crossa J. Genome-based prediction of Bayesian linear and non-linear regression models for ordinal data. THE PLANT GENOME 2020; 13:e20021. [PMID: 33016610 DOI: 10.1002/tpg2.20021] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 03/21/2020] [Accepted: 03/28/2020] [Indexed: 06/11/2023]
Abstract
Linear and non-linear models used in applications of genomic selection (GS) can fit different types of responses (e.g., continuous, ordinal, binary). In recent years, several genomic-enabled prediction models have been developed for predicting complex traits in genomic-assisted animal and plant breeding. These models include linear, non-linear and non-parametric models, mostly for continuous responses and less frequently for categorical responses. Several linear and non-linear models are special cases of a more general family of statistical models known as artificial neural networks, which provide better prediction ability than other models. In this paper, we propose a Bayesian Regularized Neural Network (BRNNO) for modelling ordinal data. The proposed model was fitted using a Bayesian framework; we used the data augmentation algorithm to facilitate computations. The proposed model was fitted using the Gibbs Maximum a Posteriori and Generalized EM algorithm implemented by combining code written in C and R programming languages. The new model was tested with two real maize datasets evaluated for Septoria and GLS diseases and was compared with the Bayesian Ordered Probit Model (BOPM). Results indicated that the BRNNO model performed better in terms of genomic-based prediction than the BOPM model.
Collapse
Affiliation(s)
| | | | | | | | | | - José Crossa
- Colegio de Postgraduados, CP 56230, Montecillos, Edo. de, México
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, 06600, Cd. de, México
| |
Collapse
|
37
|
Koumakis L. Deep learning models in genomics; are we there yet? Comput Struct Biotechnol J 2020; 18:1466-1473. [PMID: 32637044 PMCID: PMC7327302 DOI: 10.1016/j.csbj.2020.06.017] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 06/07/2020] [Accepted: 06/08/2020] [Indexed: 12/23/2022] Open
Abstract
With the evolution of biotechnology and the introduction of the high throughput sequencing, researchers have the ability to produce and analyze vast amounts of genomics data. Since genomics produce big data, most of the bioinformatics algorithms are based on machine learning methodologies, and lately deep learning, to identify patterns, make predictions and model the progression or treatment of a disease. Advances in deep learning created an unprecedented momentum in biomedical informatics and have given rise to new bioinformatics and computational biology research areas. It is evident that deep learning models can provide higher accuracies in specific tasks of genomics than the state of the art methodologies. Given the growing trend on the application of deep learning architectures in genomics research, in this mini review we outline the most prominent models, we highlight possible pitfalls and discuss future directions. We foresee deep learning accelerating changes in the area of genomics, especially for multi-scale and multimodal data analysis for precision medicine.
Collapse
Affiliation(s)
- Lefteris Koumakis
- Foundation for Research and Technology - Hellas (FORTH), Institute of Computer Science, Heraklion, Crete, Greece
| |
Collapse
|
38
|
Moreira FF, Oliveira HR, Volenec JJ, Rainey KM, Brito LF. Integrating High-Throughput Phenotyping and Statistical Genomic Methods to Genetically Improve Longitudinal Traits in Crops. FRONTIERS IN PLANT SCIENCE 2020; 11:681. [PMID: 32528513 PMCID: PMC7264266 DOI: 10.3389/fpls.2020.00681] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 04/30/2020] [Indexed: 05/28/2023]
Abstract
The rapid development of remote sensing in agronomic research allows the dynamic nature of longitudinal traits to be adequately described, which may enhance the genetic improvement of crop efficiency. For traits such as light interception, biomass accumulation, and responses to stressors, the data generated by the various high-throughput phenotyping (HTP) methods requires adequate statistical techniques to evaluate phenotypic records throughout time. As a consequence, information about plant functioning and activation of genes, as well as the interaction of gene networks at different stages of plant development and in response to environmental stimulus can be exploited. In this review, we outline the current analytical approaches in quantitative genetics that are applied to longitudinal traits in crops throughout development, describe the advantages and pitfalls of each approach, and indicate future research directions and opportunities.
Collapse
Affiliation(s)
- Fabiana F. Moreira
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Hinayah R. Oliveira
- Department of Animal Sciences, Purdue University, West Lafayette, IN, United States
| | - Jeffrey J. Volenec
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Katy M. Rainey
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Luiz F. Brito
- Department of Animal Sciences, Purdue University, West Lafayette, IN, United States
| |
Collapse
|
39
|
Francisco Ribeiro P, Camargo Rodriguez AV. Emerging Advanced Technologies to Mitigate the Impact of Climate Change in Africa. PLANTS (BASEL, SWITZERLAND) 2020; 9:E381. [PMID: 32204576 PMCID: PMC7154875 DOI: 10.3390/plants9030381] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 03/08/2020] [Accepted: 03/17/2020] [Indexed: 12/17/2022]
Abstract
Agriculture remains critical to Africa's socioeconomic development, employing 65% of the work force and contributing 32% of GDP (Gross Domestic Product). Low productivity, which characterises food production in many Africa countries, remains a major concern. Compounded by the effects of climate change and lack of technical expertise, recent reports suggest that the impacts of climate change on agriculture and food systems in African countries may have further-reaching consequences than previously anticipated. Thus, it has become imperative that African scientists and farmers adopt new technologies which facilitate their research and provide smart agricultural solutions to mitigating current and future climate change-related challenges. Advanced technologies have been developed across the globe to facilitate adaptation to climate change in the agriculture sector. Clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated protein 9 (Cas9), synthetic biology, and genomic selection, among others, constitute examples of some of these technologies. In this work, emerging advanced technologies with the potential to effectively mitigate climate change in Africa are reviewed. The authors show how these technologies can be utilised to enhance knowledge discovery for increased production in a climate change-impacted environment. We conclude that the application of these technologies could empower African scientists to explore agricultural strategies more resilient to the effects of climate change. Additionally, we conclude that support for African scientists from the international community in various forms is necessary to help Africans avoid the full undesirable effects of climate change.
Collapse
|
40
|
Zingaretti LM, Gezan SA, Ferrão LFV, Osorio LF, Monfort A, Muñoz PR, Whitaker VM, Pérez-Enciso M. Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species. FRONTIERS IN PLANT SCIENCE 2020; 11:25. [PMID: 32117371 PMCID: PMC7015897 DOI: 10.3389/fpls.2020.00025] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 01/10/2020] [Indexed: 05/21/2023]
Abstract
Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry. The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis. However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/.
Collapse
Affiliation(s)
- Laura M. Zingaretti
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, Barcelona, Spain
| | - Salvador Alejandro Gezan
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL, United States
| | - Luis Felipe V. Ferrão
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| | - Luis F. Osorio
- IFAS Gulf Coast Research and Education Center, University of Florida, Wimauma, FL, United States
| | - Amparo Monfort
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, Barcelona, Spain
- Institut de Recerca i Tecnologia Agroalimentàries (IRTA), Barcelona, Spain
| | - Patricio R. Muñoz
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| | - Vance M. Whitaker
- IFAS Gulf Coast Research and Education Center, University of Florida, Wimauma, FL, United States
| | - Miguel Pérez-Enciso
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, Barcelona, Spain
- ICREA, Passeig de Lluís Companys 23, Barcelona, Spain
| |
Collapse
|
41
|
Gianola D, Fernando RL. A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits. Genetics 2020; 214:305-331. [PMID: 31879318 PMCID: PMC7017027 DOI: 10.1534/genetics.119.302934] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 12/20/2019] [Indexed: 12/21/2022] Open
Abstract
A multiple-trait Bayesian LASSO (MBL) for genome-based analysis and prediction of quantitative traits is presented and applied to two real data sets. The data-generating model is a multivariate linear Bayesian regression on possibly a huge number of molecular markers, and with a Gaussian residual distribution posed. Each (one per marker) of the [Formula: see text] vectors of regression coefficients (T: number of traits) is assigned the same T-variate Laplace prior distribution, with a null mean vector and unknown scale matrix Σ. The multivariate prior reduces to that of the standard univariate Bayesian LASSO when [Formula: see text] The covariance matrix of the residual distribution is assigned a multivariate Jeffreys prior, and Σ is given an inverse-Wishart prior. The unknown quantities in the model are learned using a Markov chain Monte Carlo sampling scheme constructed using a scale-mixture of normal distributions representation. MBL is demonstrated in a bivariate context employing two publicly available data sets using a bivariate genomic best linear unbiased prediction model (GBLUP) for benchmarking results. The first data set is one where wheat grain yields in two different environments are treated as distinct traits. The second data set comes from genotyped Pinus trees, with each individual measured for two traits: rust bin and gall volume. In MBL, the bivariate marker effects are shrunk differentially, i.e., "short" vectors are more strongly shrunk toward the origin than in GBLUP; conversely, "long" vectors are shrunk less. A predictive comparison was carried out as well in wheat, where the comparators of MBL were bivariate GBLUP and bivariate Bayes Cπ-a variable selection procedure. A training-testing layout was used, with 100 random reconstructions of training and testing sets. For the wheat data, all methods produced similar predictions. In Pinus, MBL gave better predictions that either a Bayesian bivariate GBLUP or the single trait Bayesian LASSO. MBL has been implemented in the Julia language package JWAS, and is now available for the scientific community to explore with different traits, species, and environments. It is well known that there is no universally best prediction machine, and MBL represents a new resource in the armamentarium for genome-enabled analysis and prediction of complex traits.
Collapse
Affiliation(s)
- Daniel Gianola
- Department of Animal Sciences, University of Wisconsin-Madison, Wisconsin 53706
- Department of Dairy Science, University of Wisconsin-Madison, Wisconsin 53706
- Department of Animal Science, Iowa State University, Ames, Iowa 50011
- Department of Plant Sciences, Technical University of Munich (TUM), TUM School of Life Sciences, Freising, 85354 Germany
| | - Rohan L Fernando
- Department of Animal Science, Iowa State University, Ames, Iowa 50011
| |
Collapse
|
42
|
Crossa J, Martini JWR, Gianola D, Pérez-Rodríguez P, Jarquin D, Juliana P, Montesinos-López O, Cuevas J. Deep Kernel and Deep Learning for Genome-Based Prediction of Single Traits in Multienvironment Breeding Trials. Front Genet 2019; 10:1168. [PMID: 31921277 PMCID: PMC6913188 DOI: 10.3389/fgene.2019.01168] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 10/23/2019] [Indexed: 11/13/2022] Open
Abstract
Deep learning (DL) is a promising method for genomic-enabled prediction. However, the implementation of DL is difficult because many hyperparameters (number of hidden layers, number of neurons, learning rate, number of epochs, batch size, etc.) need to be tuned. For this reason, deep kernel methods, which only require defining the number of layers, may be an attractive alternative. Deep kernel methods emulate DL models with a large number of neurons, but are defined by relatively easily computed covariance matrices. In this research, we compared the genome-based prediction of DL to a deep kernel (arc-cosine kernel, AK), to the commonly used non-additive Gaussian kernel (GK), as well as to the conventional additive genomic best linear unbiased predictor (GBLUP/GB). We used two real wheat data sets for benchmarking these methods. On average, AK and GK outperformed DL and GB. The gain in terms of prediction performance of AK and GK over DL and GB was not large, but AK and GK have the advantage that only one parameter, the number of layers (AK) or the bandwidth parameter (GK), has to be tuned in each method. Furthermore, although AK and GK had similar performance, deep kernel AK is easier to implement than GK, since the parameter "number of layers" is more easily determined than the bandwidth parameter of GK. Comparing AK and DL for the data set of year 2015-2016, the difference in performance of the two methods was bigger, with AK predicting much better than DL. On this data, the optimization of the hyperparameters for DL was difficult and the finally used parameters may have been suboptimal. Our results suggest that AK is a good alternative to DL with the advantage that practically no tuning process is required.
Collapse
Affiliation(s)
- José Crossa
- Biometrics and Statistics Unit, Genetic Resources Program, and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico.,Programa de Postgrado de Socioeconomia, Estadistica e Informatica, Colegio de Postgraduados, Texcoco, Mexico
| | - Johannes W R Martini
- Biometrics and Statistics Unit, Genetic Resources Program, and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Daniel Gianola
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, United States
| | - Paulino Pérez-Rodríguez
- Programa de Postgrado de Socioeconomia, Estadistica e Informatica, Colegio de Postgraduados, Texcoco, Mexico
| | - Diego Jarquin
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Philomin Juliana
- Biometrics and Statistics Unit, Genetic Resources Program, and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - Jaime Cuevas
- Departamento de Ciencias, Universidad de Quintana Roo, Chetumal, Mexico
| |
Collapse
|
43
|
Montesinos-López OA, Montesinos-López A, Tuberosa R, Maccaferri M, Sciara G, Ammar K, Crossa J. Multi-Trait, Multi-Environment Genomic Prediction of Durum Wheat With Genomic Best Linear Unbiased Predictor and Deep Learning Methods. FRONTIERS IN PLANT SCIENCE 2019; 10:1311. [PMID: 31787990 PMCID: PMC6856087 DOI: 10.3389/fpls.2019.01311] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 09/20/2019] [Indexed: 05/23/2023]
Abstract
Although durum wheat (Triticum turgidum var. durum Desf.) is a minor cereal crop representing just 5-7% of the world's total wheat crop, it is a staple food in Mediterranean countries, where it is used to produce pasta, couscous, bulgur and bread. In this paper, we cover multi-trait prediction of grain yield (GY), days to heading (DH) and plant height (PH) of 270 durum wheat lines that were evaluated in 43 environments (country-location-year combinations) across a broad range of water regimes in the Mediterranean Basin and other locations. Multi-trait prediction analyses were performed by implementing a multi-trait deep learning model (MTDL) with a feed-forward network topology and a rectified linear unit activation function with a grid search approach for the selection of hyper-parameters. The results of the multi-trait deep learning method were also compared with univariate predictions of the genomic best linear unbiased predictor (GBLUP) method and the univariate counterpart of the multi-trait deep learning method (UDL). All models were implemented with and without the genotype × environment interaction term. We found that the best predictions were observed without the genotype × environment interaction term in the UDL and MTDL methods. However, under the GBLUP method, the best predictions were observed when the genotype × environment interaction term was taken into account. We also found that in general the best predictions were observed under the GBLUP model; however, the predictions of the MTDL were very similar to those of the GBLUP model. This result provides more evidence that the GBLUP model is a powerful approach for genomic prediction, but also that the deep learning method is a practical approach for predicting univariate and multivariate traits in the context of genomic selection.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Mexico
| | - Roberto Tuberosa
- Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Marco Maccaferri
- Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Giuseppe Sciara
- Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Karim Ammar
- Global Wheat Breeding Program, International Maize and Wheat Improvement Center (CIMMYT), Mexico City, Mexico
| | - José Crossa
- Global Wheat Breeding Program, International Maize and Wheat Improvement Center (CIMMYT), Mexico City, Mexico
| |
Collapse
|
44
|
Cuevas J, Montesinos-López O, Juliana P, Guzmán C, Pérez-Rodríguez P, González-Bucio J, Burgueño J, Montesinos-López A, Crossa J. Deep Kernel for Genomic and Near Infrared Predictions in Multi-environment Breeding Trials. G3 (BETHESDA, MD.) 2019; 9:2913-2924. [PMID: 31289023 PMCID: PMC6723142 DOI: 10.1534/g3.119.400493] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 07/04/2019] [Indexed: 01/15/2023]
Abstract
Kernel methods are flexible and easy to interpret and have been successfully used in genomic-enabled prediction of various plant species. Kernel methods used in genomic prediction comprise the linear genomic best linear unbiased predictor (GBLUP or GB) kernel, and the Gaussian kernel (GK). In general, these kernels have been used with two statistical models: single-environment and genomic × environment (GE) models. Recently near infrared spectroscopy (NIR) has been used as an inexpensive and non-destructive high-throughput phenotyping method for predicting unobserved line performance in plant breeding trials. In this study, we used a non-linear arc-cosine kernel (AK) that emulates deep learning artificial neural networks. We compared AK prediction accuracy with the prediction accuracy of GB and GK kernel methods in four genomic data sets, one of which also includes pedigree and NIR information. Results show that for all four data sets, AK and GK kernels achieved higher prediction accuracy than the linear GB kernel for the single-environment and GE multi-environment models. In addition, AK achieved similar or slightly higher prediction accuracy than the GK kernel. For all data sets, the GE model achieved higher prediction accuracy than the single-environment model. For the data set that includes pedigree, markers and NIR, results show that the NIR wavelength alone achieved lower prediction accuracy than the genomic information alone; however, the pedigree plus NIR information achieved only slightly lower prediction accuracy than the marker plus the NIR high-throughput data.
Collapse
Affiliation(s)
- Jaime Cuevas
- Universidad de Quintana Roo, Chetumal, Quintana Roo, 77019 México
| | | | - Philomin Juliana
- International Maize and Wheat Improvement Center (CIMMYT), Carretera Mexico- Veracruz Km. 45, El Batán, 56237, Texcoco, Edo. de Mexico, Mexico
| | - Carlos Guzmán
- International Maize and Wheat Improvement Center (CIMMYT), Carretera Mexico- Veracruz Km. 45, El Batán, 56237, Texcoco, Edo. de Mexico, Mexico
| | | | | | - Juan Burgueño
- International Maize and Wheat Improvement Center (CIMMYT), Carretera Mexico- Veracruz Km. 45, El Batán, 56237, Texcoco, Edo. de Mexico, Mexico
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías, (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, 44430
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera Mexico- Veracruz Km. 45, El Batán, 56237, Texcoco, Edo. de Mexico, Mexico
| |
Collapse
|