1
|
Pedrosa VB, Chen SY, Gloria LS, Doucette JS, Boerman JP, Rosa GJM, Brito LF. Machine learning methods for genomic prediction of cow behavioral traits measured by automatic milking systems in North American Holstein cattle. J Dairy Sci 2024; 107:4758-4771. [PMID: 38395400 DOI: 10.3168/jds.2023-24082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 01/18/2024] [Indexed: 02/25/2024]
Abstract
Identifying genome-enabled methods that provide more accurate genomic prediction is crucial when evaluating complex traits such as dairy cow behavior. In this study, we aimed to compare the predictive performance of traditional genomic prediction methods and deep learning algorithms for genomic prediction of milking refusals (MREF) and milking failures (MFAIL) in North American Holstein cows measured by automatic milking systems (milking robots). A total of 1,993,509 daily records from 4,511 genotyped Holstein cows were collected by 36 milking robot stations. After quality control, 57,600 SNPs were available for the analyses. Four genomic prediction methods were considered: Bayesian least absolute shrinkage and selection operator (LASSO), multiple layer perceptron (MLP), convolutional neural network (CNN), and GBLUP. We implemented the first 3 methods using the Keras and TensorFlow libraries in Python (v.3.9) but the GBLUP method was implemented using the BLUPF90+ family programs. The accuracy of genomic prediction (mean square error) for MREF and MFAIL was 0.34 (0.08) and 0.27 (0.08) based on LASSO, 0.36 (0.09) and 0.32 (0.09) for MLP, 0.37 (0.08) and 0.30 (0.09) for CNN, and 0.35 (0.09) and 0.31(0.09) based on GBLUP, respectively. Additionally, we observed a lower reranking of top selected individuals based on the MLP versus CNN methods compared with the other approaches for both MREF and MFAIL. Although the deep learning methods showed slightly higher accuracies than GBLUP, the results may not be sufficient to justify their use over traditional methods due to their higher computational demand and the difficulty of performing genomic prediction for nongenotyped individuals using deep learning procedures. Overall, this study provides insights into the potential feasibility of using deep learning methods to enhance genomic prediction accuracy for behavioral traits in livestock. Further research is needed to determine their practical applicability to large dairy cattle breeding programs.
Collapse
Affiliation(s)
- Victor B Pedrosa
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907
| | - Shi-Yi Chen
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
| | - Leonardo S Gloria
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907
| | - Jarrod S Doucette
- Agriculture Information Technology (AgIT), Purdue University, West Lafayette, IN 47907
| | | | - Guilherme J M Rosa
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI, 53706
| | - Luiz F Brito
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907.
| |
Collapse
|
2
|
Dieng I, Gardunia B, Covarrubias-Pazaran G, Gemenet DC, Trognitz B, Ofodile S, Fowobaje K, Ntukidem S, Shah T, Imoro S, Tripathi L, Mushoriwa H, Mbabazi R, Salvo S, Derera J. Q&A: Methods for estimating genetic gain in sub-Saharan Africa and achieving improved gains. THE PLANT GENOME 2024:e20471. [PMID: 38923724 DOI: 10.1002/tpg2.20471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 04/17/2024] [Accepted: 04/19/2024] [Indexed: 06/28/2024]
Abstract
Regular measurement of realized genetic gain allows plant breeders to assess and review the effectiveness of their strategies, allocate resources efficiently, and make informed decisions throughout the breeding process. Realized genetic gain estimation requires separating genetic trends from nongenetic trends using the linear mixed model (LMM) on historical multi-environment trial data. The LMM, accounting for the year effect, experimental designs, and heterogeneous residual variances, estimates best linear unbiased estimators of genotypes and regresses them on their years of origin. An illustrative example of estimating realized genetic gain was provided by analyzing historical data on fresh cassava (Manihot esculenta Crantz) yield in West Africa (https://github.com/Biometrics-IITA/Estimating-Realized-Genetic-Gain). This approach can serve as a model applicable to other crops and regions. Modernization of breeding programs is necessary to maximize the rate of genetic gain. This can be achieved by adopting genomics to enable faster breeding, accurate selection, and improved traits through genomic selection and gene editing. Tracking operational costs, establishing robust, digitalized data management and analytics systems, and developing effective varietal selection processes based on customer insights are also crucial for success. Capacity building and collaboration of breeding programs and institutions also play a significant role in accelerating genetic gains.
Collapse
Affiliation(s)
- Ibnou Dieng
- International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | | | | | - Dorcus C Gemenet
- EiB-CIMMYT c/o ICRAF House United Nations Avenue, Nairobi, Kenya
| | | | - Sam Ofodile
- International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | - Kayode Fowobaje
- International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | - Solomon Ntukidem
- International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | - Trushar Shah
- IITA c/o International Livestock Research Institute (ILRI), Nairobi, Kenya
| | - Simon Imoro
- International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | - Leena Tripathi
- IITA c/o International Livestock Research Institute (ILRI), Nairobi, Kenya
| | - Hapson Mushoriwa
- International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | | | | | - John Derera
- International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| |
Collapse
|
3
|
Chen C, Bhuiyan SA, Ross E, Powell O, Dinglasan E, Wei X, Atkin F, Deomano E, Hayes B. Genomic prediction for sugarcane diseases including hybrid Bayesian-machine learning approaches. FRONTIERS IN PLANT SCIENCE 2024; 15:1398903. [PMID: 38751840 PMCID: PMC11095127 DOI: 10.3389/fpls.2024.1398903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 04/15/2024] [Indexed: 05/18/2024]
Abstract
Sugarcane smut and Pachymetra root rots are two serious diseases of sugarcane, with susceptible infected crops losing over 30% of yield. A heritable component to both diseases has been demonstrated, suggesting selection could improve disease resistance. Genomic selection could accelerate gains even further, enabling early selection of resistant seedlings for breeding and clonal propagation. In this study we evaluated four types of algorithms for genomic predictions of clonal performance for disease resistance. These algorithms were: Genomic best linear unbiased prediction (GBLUP), including extensions to model dominance and epistasis, Bayesian methods including BayesC and BayesR, Machine learning methods including random forest, multilayer perceptron (MLP), modified convolutional neural network (CNN) and attention networks designed to capture epistasis across the genome-wide markers. Simple hybrid methods, that first used BayesR/GWAS to identify a subset of 1000 markers with moderate to large marginal additive effects, then used attention networks to derive predictions from these effects and their interactions, were also developed and evaluated. The hypothesis for this approach was that using a subset of markers more likely to have an effect would enable better estimation of interaction effects than when there were an extremely large number of possible interactions, especially with our limited data set size. To evaluate the methods, we applied both random five-fold cross-validation and a structured PCA based cross-validation that separated 4702 sugarcane clones (that had disease phenotypes and genotyped for 26k genome wide SNP markers) by genomic relationship. The Bayesian methods (BayesR and BayesC) gave the highest accuracy of prediction, followed closely by hybrid methods with attention networks. The hybrid methods with attention networks gave the lowest variation in accuracy of prediction across validation folds (and lowest MSE), which may be a criteria worth considering in practical breeding programs. This suggests that hybrid methods incorporating the attention mechanism could be useful for genomic prediction of clonal performance, particularly where non-additive effects may be important.
Collapse
Affiliation(s)
- Chensong Chen
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Shamsul A. Bhuiyan
- Sugar Research Australia, Woodford, QLD, Australia
- Queensland Micro- and Nanotechnology Centre, Griffith University, Nathan, QLD, Australia
| | - Elizabeth Ross
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Owen Powell
- Center for Crop Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Eric Dinglasan
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Xianming Wei
- Sugar Research Australia, Indooroopilly, QLD, Australia
| | | | - Emily Deomano
- Sugar Research Australia, Indooroopilly, QLD, Australia
| | - Ben Hayes
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
4
|
Mota LFM, Giannuzzi D, Pegolo S, Sturaro E, Gianola D, Negrini R, Trevisi E, Ajmone Marsan P, Cecchinato A. Genomic prediction of blood biomarkers of metabolic disorders in Holstein cattle using parametric and nonparametric models. Genet Sel Evol 2024; 56:31. [PMID: 38684971 PMCID: PMC11057143 DOI: 10.1186/s12711-024-00903-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 04/12/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Metabolic disturbances adversely impact productive and reproductive performance of dairy cattle due to changes in endocrine status and immune function, which increase the risk of disease. This may occur in the post-partum phase, but also throughout lactation, with sub-clinical symptoms. Recently, increased attention has been directed towards improved health and resilience in dairy cattle, and genomic selection (GS) could be a helpful tool for selecting animals that are more resilient to metabolic disturbances throughout lactation. Hence, we evaluated the genomic prediction of serum biomarkers levels for metabolic distress in 1353 Holsteins genotyped with the 100K single nucleotide polymorphism (SNP) chip assay. The GS was evaluated using parametric models best linear unbiased prediction (GBLUP), Bayesian B (BayesB), elastic net (ENET), and nonparametric models, gradient boosting machine (GBM) and stacking ensemble (Stack), which combines ENET and GBM approaches. RESULTS The results show that the Stack approach outperformed other methods with a relative difference (RD), calculated as an increment in prediction accuracy, of approximately 18.0% compared to GBLUP, 12.6% compared to BayesB, 8.7% compared to ENET, and 4.4% compared to GBM. The highest RD in prediction accuracy between other models with respect to GBLUP was observed for haptoglobin (hapto) from 17.7% for BayesB to 41.2% for Stack; for Zn from 9.8% (BayesB) to 29.3% (Stack); for ceruloplasmin (CuCp) from 9.3% (BayesB) to 27.9% (Stack); for ferric reducing antioxidant power (FRAP) from 8.0% (BayesB) to 40.0% (Stack); and for total protein (PROTt) from 5.7% (BayesB) to 22.9% (Stack). Using a subset of top SNPs (1.5k) selected from the GBM approach improved the accuracy for GBLUP from 1.8 to 76.5%. However, for the other models reductions in prediction accuracy of 4.8% for ENET (average of 10 traits), 5.9% for GBM (average of 21 traits), and 6.6% for Stack (average of 16 traits) were observed. CONCLUSIONS Our results indicate that the Stack approach was more accurate in predicting metabolic disturbances than GBLUP, BayesB, ENET, and GBM and seemed to be competitive for predicting complex phenotypes with various degrees of mode of inheritance, i.e. additive and non-additive effects. Selecting markers based on GBM improved accuracy of GBLUP.
Collapse
Affiliation(s)
- Lucio F M Mota
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy.
| | - Diana Giannuzzi
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| | - Sara Pegolo
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy.
| | - Enrico Sturaro
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| | - Daniel Gianola
- Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - Riccardo Negrini
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Erminio Trevisi
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
- Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Paolo Ajmone Marsan
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
- Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Alessio Cecchinato
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| |
Collapse
|
5
|
Hong JK, Kim YM, Cho ES, Lee JB, Kim YS, Park HB. Application of deep learning with bivariate models for genomic prediction of sow lifetime productivity-related traits. Anim Biosci 2024; 37:622-630. [PMID: 38228129 PMCID: PMC10915216 DOI: 10.5713/ab.23.0264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 08/31/2023] [Accepted: 11/03/2023] [Indexed: 01/18/2024] Open
Abstract
OBJECTIVE Pig breeders cannot obtain phenotypic information at the time of selection for sow lifetime productivity (SLP). They would benefit from obtaining genetic information of candidate sows. Genomic data interpreted using deep learning (DL) techniques could contribute to the genetic improvement of SLP to maximize farm profitability because DL models capture nonlinear genetic effects such as dominance and epistasis more efficiently than conventional genomic prediction methods based on linear models. This study aimed to investigate the usefulness of DL for the genomic prediction of two SLP-related traits; lifetime number of litters (LNL) and lifetime pig production (LPP). METHODS Two bivariate DL models, convolutional neural network (CNN) and local convolutional neural network (LCNN), were compared with conventional bivariate linear models (i.e., genomic best linear unbiased prediction, Bayesian ridge regression, Bayes A, and Bayes B). Phenotype and pedigree data were collected from 40,011 sows that had husbandry records. Among these, 3,652 pigs were genotyped using the PorcineSNP60K BeadChip. RESULTS The best predictive correlation for LNL was obtained with CNN (0.28), followed by LCNN (0.26) and conventional linear models (approximately 0.21). For LPP, the best predictive correlation was also obtained with CNN (0.29), followed by LCNN (0.27) and conventional linear models (approximately 0.25). A similar trend was observed with the mean squared error of prediction for the SLP traits. CONCLUSION This study provides an example of a CNN that can outperform against the linear model-based genomic prediction approaches when the nonlinear interaction components are important because LNL and LPP exhibited strong epistatic interaction components. Additionally, our results suggest that applying bivariate DL models could also contribute to the prediction accuracy by utilizing the genetic correlation between LNL and LPP.
Collapse
Affiliation(s)
- Joon-Ki Hong
- Swine Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000,
Korea
| | - Yong-Min Kim
- Swine Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000,
Korea
| | - Eun-Seok Cho
- Swine Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000,
Korea
| | - Jae-Bong Lee
- Korea Zoonosis Research Institute, Jeonbuk National University, Iksan 54531,
Korea
| | - Young-Sin Kim
- Swine Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000,
Korea
| | - Hee-Bok Park
- Department of Animal Resources Science, Kongju National University, Yesan 32439,
Korea
- Resource Science Research Institute, Kongju National University, Yesan 32439,
Korea
| |
Collapse
|
6
|
Zhou W, Yan Z, Zhang L. A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction. Sci Rep 2024; 14:5905. [PMID: 38467662 PMCID: PMC10928191 DOI: 10.1038/s41598-024-55243-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/21/2024] [Indexed: 03/13/2024] Open
Abstract
To explore a robust tool for advancing digital breeding practices through an artificial intelligence-driven phenotype prediction expert system, we undertook a thorough analysis of 11 non-linear regression models. Our investigation specifically emphasized the significance of Support Vector Regression (SVR) and SHapley Additive exPlanations (SHAP) in predicting soybean branching. By using branching data (phenotype) of 1918 soybean accessions and 42 k SNP (Single Nucleotide Polymorphism) polymorphic data (genotype), this study systematically compared 11 non-linear regression AI models, including four deep learning models (DBN (deep belief network) regression, ANN (artificial neural network) regression, Autoencoders regression, and MLP (multilayer perceptron) regression) and seven machine learning models (e.g., SVR (support vector regression), XGBoost (eXtreme Gradient Boosting) regression, Random Forest regression, LightGBM regression, GPs (Gaussian processes) regression, Decision Tree regression, and Polynomial regression). After being evaluated by four valuation metrics: R2 (R-squared), MAE (Mean Absolute Error), MSE (Mean Squared Error), and MAPE (Mean Absolute Percentage Error), it was found that the SVR, Polynomial Regression, DBN, and Autoencoder outperformed other models and could obtain a better prediction accuracy when they were used for phenotype prediction. In the assessment of deep learning approaches, we exemplified the SVR model, conducting analyses on feature importance and gene ontology (GO) enrichment to provide comprehensive support. After comprehensively comparing four feature importance algorithms, no notable distinction was observed in the feature importance ranking scores across the four algorithms, namely Variable Ranking, Permutation, SHAP, and Correlation Matrix, but the SHAP value could provide rich information on genes with negative contributions, and SHAP importance was chosen for feature selection. The results of this study offer valuable insights into AI-mediated plant breeding, addressing challenges faced by traditional breeding programs. The method developed has broad applicability in phenotype prediction, minor QTL (quantitative trait loci) mining, and plant smart-breeding systems, contributing significantly to the advancement of AI-based breeding practices and transitioning from experience-based to data-based breeding.
Collapse
Affiliation(s)
- Wei Zhou
- Florida Agricultural and Mechanical University, Tallahassee, FL, 32307, USA.
| | - Zhengxiao Yan
- Florida State University, Tallahassee, FL, 32306, USA
| | - Liting Zhang
- Florida State University, Tallahassee, FL, 32306, USA
| |
Collapse
|
7
|
Lozada DN, Sandhu KS, Bhatta M. Ridge regression and deep learning models for genome-wide selection of complex traits in New Mexican Chile peppers. BMC Genom Data 2023; 24:80. [PMID: 38110866 PMCID: PMC10726521 DOI: 10.1186/s12863-023-01179-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 12/05/2023] [Indexed: 12/20/2023] Open
Abstract
BACKGROUND Genomewide prediction estimates the genomic breeding values of selection candidates which can be utilized for population improvement and cultivar development. Ridge regression and deep learning-based selection models were implemented for yield and agronomic traits of 204 chile pepper genotypes evaluated in multi-environment trials in New Mexico, USA. RESULTS Accuracy of prediction differed across different models under ten-fold cross-validations, where high prediction accuracy was observed for highly heritable traits such as plant height and plant width. No model was superior across traits using 14,922 SNP markers for genomewide selection. Bayesian ridge regression had the highest average accuracy for first pod date (0.77) and total yield per plant (0.33). Multilayer perceptron (MLP) was the most superior for flowering time (0.76) and plant height (0.73), whereas the genomic BLUP model had the highest accuracy for plant width (0.62). Using a subset of 7,690 SNP loci resulting from grouping markers based on linkage disequilibrium coefficients resulted in improved accuracy for first pod date, ten pod weight, and total yield per plant, even under a relatively small training population size for MLP and random forest models. Genomic and ridge regression BLUP models were sufficient for optimal prediction accuracies for small training population size. Combining phenotypic selection and genomewide selection resulted in improved selection response for yield-related traits, indicating that integrated approaches can result in improved gains achieved through selection. CONCLUSIONS Accuracy values for ridge regression and deep learning prediction models demonstrate the potential of implementing genomewide selection for genetic improvement in chile pepper breeding programs. Ultimately, a large training data is relevant for improved genomic selection accuracy for the deep learning models.
Collapse
Affiliation(s)
- Dennis N Lozada
- Department of Plant and Environmental Sciences, New Mexico State University, Las Cruces, NM, 88003, USA.
- Chile Pepper Institute, New Mexico State University, Las Cruces, NM, 88003, USA.
| | | | | |
Collapse
|
8
|
Martins FB, Aono AH, Moraes ADCL, Ferreira RCU, Vilela MDM, Pessoa-Filho M, Rodrigues-Motta M, Simeão RM, de Souza AP. Genome-wide family prediction unveils molecular mechanisms underlying the regulation of agronomic traits in Urochloa ruziziensis. FRONTIERS IN PLANT SCIENCE 2023; 14:1303417. [PMID: 38148869 PMCID: PMC10749977 DOI: 10.3389/fpls.2023.1303417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 11/15/2023] [Indexed: 12/28/2023]
Abstract
Tropical forage grasses, particularly those belonging to the Urochloa genus, play a crucial role in cattle production and serve as the main food source for animals in tropical and subtropical regions. The majority of these species are apomictic and tetraploid, highlighting the significance of U. ruziziensis, a sexual diploid species that can be tetraploidized for use in interspecific crosses with apomictic species. As a means to support breeding programs, our study investigates the feasibility of genome-wide family prediction in U. ruziziensis families to predict agronomic traits. Fifty half-sibling families were assessed for green matter yield, dry matter yield, regrowth capacity, leaf dry matter, and stem dry matter across different clippings established in contrasting seasons with varying available water capacity. Genotyping was performed using a genotyping-by-sequencing approach based on DNA samples from family pools. In addition to conventional genomic prediction methods, machine learning and feature selection algorithms were employed to reduce the necessary number of markers for prediction and enhance predictive accuracy across phenotypes. To explore the regulation of agronomic traits, our study evaluated the significance of selected markers for prediction using a tree-based approach, potentially linking these regions to quantitative trait loci (QTLs). In a multiomic approach, genes from the species transcriptome were mapped and correlated to those markers. A gene coexpression network was modeled with gene expression estimates from a diverse set of U. ruziziensis genotypes, enabling a comprehensive investigation of molecular mechanisms associated with these regions. The heritabilities of the evaluated traits ranged from 0.44 to 0.92. A total of 28,106 filtered SNPs were used to predict phenotypic measurements, achieving a mean predictive ability of 0.762. By employing feature selection techniques, we could reduce the dimensionality of SNP datasets, revealing potential genotype-phenotype associations. The functional annotation of genes near these markers revealed associations with auxin transport and biosynthesis of lignin, flavonol, and folic acid. Further exploration with the gene coexpression network uncovered associations with DNA metabolism, stress response, and circadian rhythm. These genes and regions represent important targets for expanding our understanding of the metabolic regulation of agronomic traits and offer valuable insights applicable to species breeding. Our work represents an innovative contribution to molecular breeding techniques for tropical forages, presenting a viable marker-assisted breeding approach and identifying target regions for future molecular studies on these agronomic traits.
Collapse
Affiliation(s)
- Felipe Bitencourt Martins
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Alexandre Hild Aono
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Aline da Costa Lima Moraes
- Department of Plant Biology, Biology Institute, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | | | | | - Marco Pessoa-Filho
- Embrapa Cerrados, Brazilian Agricultural Research Corporation, Brasília, Brazil
| | | | - Rosangela Maria Simeão
- Embrapa Gado de Corte, Brazilian Agricultural Research Corporation, Campo Grande, Mato Grosso, Brazil
| | - Anete Pereira de Souza
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
- Department of Plant Biology, Biology Institute, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| |
Collapse
|
9
|
Chen C, Powell O, Dinglasan E, Ross EM, Yadav S, Wei X, Atkin F, Deomano E, Hayes BJ. Genomic prediction with machine learning in sugarcane, a complex highly polyploid clonally propagated crop with substantial non-additive variation for key traits. THE PLANT GENOME 2023; 16:e20390. [PMID: 37728221 DOI: 10.1002/tpg2.20390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 08/01/2023] [Accepted: 08/29/2023] [Indexed: 09/21/2023]
Abstract
Sugarcane has a complex, highly polyploid genome with multi-species ancestry. Additive models for genomic prediction of clonal performance might not capture interactions between genes and alleles from different ploidies and ancestral species. As such, genomic prediction in sugarcane presents an interesting case for machine learning (ML) methods, which are purportedly able to deal with high levels of complexity in prediction. Here, we investigated deep learning (DL) neural networks, including multilayer networks (MLP) and convolution neural networks (CNN), and an ensemble machine learning approach, random forest (RF), for genomic prediction in sugarcane. The data set used was 2912 sugarcane clones, scored for 26,086 genome wide single nucleotide polymorphism markers, with final assessment trial data for total cane harvested (TCH), commercial cane sugar (CCS), and fiber content (Fiber). The clones in the latest trial (2017) were used as a validation set. We compared prediction accuracy of these methods to genomic best linear unbiased prediction (GBLUP) extended to include dominance and epistatic effects. The prediction accuracies from GBLUP models were up to 0.37 for TCH, 0.43 for CCS, and 0.48 for Fiber, while the optimized ML models had prediction accuracies of 0.35 for TCH, 0.38 for CCS, and 0.48 for Fiber. Both RF and DL neural network models have comparable predictive ability with the additive GBLUP model but are less accurate than the extended GBLUP model.
Collapse
Affiliation(s)
- Chensong Chen
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Owen Powell
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Eric Dinglasan
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Elizabeth M Ross
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Seema Yadav
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | | | | | | | - Ben J Hayes
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| |
Collapse
|
10
|
Akutsu H, Na’iem M, Widiyatno, Indrioko S, Sawitri, Purnomo S, Uchiyama K, Tsumura Y, Tani N. Comparing modeling methods of genomic prediction for growth traits of a tropical timber species, Shorea macrophylla. FRONTIERS IN PLANT SCIENCE 2023; 14:1241908. [PMID: 38023878 PMCID: PMC10644202 DOI: 10.3389/fpls.2023.1241908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 09/13/2023] [Indexed: 12/01/2023]
Abstract
Introduction Shorea macrophylla is a commercially important tropical tree species grown for timber and oil. It is amenable to plantation forestry due to its fast initial growth. Genomic selection (GS) has been used in tree breeding studies to shorten long breeding cycles but has not previously been applied to S. macrophylla. Methods To build genomic prediction models for GS, leaves and growth trait data were collected from a half-sib progeny population of S. macrophylla in Sari Bumi Kusuma forest concession, central Kalimantan, Indonesia. 18037 SNP markers were identified in two ddRAD-seq libraries. Genomic prediction models based on these SNPs were then generated for diameter at breast height and total height in the 7th year from planting (D7 and H7). Results and discussion These traits were chosen because of their relatively high narrow-sense genomic heritability and because seven years was considered long enough to assess initial growth. Genomic prediction models were built using 6 methods and their derivatives with the full set of identified SNPs and subsets of 48, 96, and 192 SNPs selected based on the results of a genome-wide association study (GWAS). The GBLUP and RKHS methods gave the highest predictive ability for D7 and H7 with the sets of selected SNPs and showed that D7 has an additive genetic architecture while H7 has an epistatic genetic architecture. LightGBM and CNN1D also achieved high predictive abilities for D7 with 48 and 96 selected SNPs, and for H7 with 96 and 192 selected SNPs, showing that gradient boosting decision trees and deep learning can be useful in genomic prediction. Predictive abilities were higher in H7 when smaller number of SNP subsets selected by GWAS p-value was used, However, D7 showed the contrary tendency, which might have originated from the difference in genetic architecture between primary and secondary growth of the species. This study suggests that GS with GWAS-based SNP selection can be used in breeding for non-cultivated tree species to improve initial growth and reduce genotyping costs for next-generation seedlings.
Collapse
Affiliation(s)
- Haruto Akutsu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Mohammad Na’iem
- Faculty of Forestry, Gadjah Mada University, Yogyakarta, Indonesia
| | - Widiyatno
- Faculty of Forestry, Gadjah Mada University, Yogyakarta, Indonesia
| | - Sapto Indrioko
- Faculty of Forestry, Gadjah Mada University, Yogyakarta, Indonesia
| | - Sawitri
- Faculty of Forestry, Gadjah Mada University, Yogyakarta, Indonesia
| | - Susilo Purnomo
- PT. Sari Bumi Kusuma, Pontianak, West Kalimantan, Indonesia
| | - Kentaro Uchiyama
- Department of Forest Molecular Genetics and Biotechnology, Forestry and Forest Products Research Institute, Tsukuba, Ibaraki, Japan
| | - Yoshihiko Tsumura
- Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Naoki Tani
- Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan
- Forestry Division, Japan International Research Center for Agricultural Sciences, Tsukuba, Ibaraki, Japan
| |
Collapse
|
11
|
Weber SE, Chawla HS, Ehrig L, Hickey LT, Frisch M, Snowdon RJ. Accurate prediction of quantitative traits with failed SNP calls in canola and maize. FRONTIERS IN PLANT SCIENCE 2023; 14:1221750. [PMID: 37936929 PMCID: PMC10627008 DOI: 10.3389/fpls.2023.1221750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 10/05/2023] [Indexed: 11/09/2023]
Abstract
In modern plant breeding, genomic selection is becoming the gold standard to select superior genotypes in large breeding populations that are only partially phenotyped. Many breeding programs commonly rely on single-nucleotide polymorphism (SNP) markers to capture genome-wide data for selection candidates. For this purpose, SNP arrays with moderate to high marker density represent a robust and cost-effective tool to generate reproducible, easy-to-handle, high-throughput genotype data from large-scale breeding populations. However, SNP arrays are prone to technical errors that lead to failed allele calls. To overcome this problem, failed calls are often imputed, based on the assumption that failed SNP calls are purely technical. However, this ignores the biological causes for failed calls-for example: deletions-and there is increasing evidence that gene presence-absence and other kinds of genome structural variants can play a role in phenotypic expression. Because deletions are frequently not in linkage disequilibrium with their flanking SNPs, permutation of missing SNP calls can potentially obscure valuable marker-trait associations. In this study, we analyze published datasets for canola and maize using four parametric and two machine learning models and demonstrate that failed allele calls in genomic prediction are highly predictive for important agronomic traits. We present two statistical pipelines, based on population structure and linkage disequilibrium, that enable the filtering of failed SNP calls that are likely caused by biological reasons. For the population and trait examined, prediction accuracy based on these filtered failed allele calls was competitive to standard SNP-based prediction, underlying the potential value of missing data in genomic prediction approaches. The combination of SNPs with all failed allele calls or the filtered allele calls did not outperform predictions with only SNP-based prediction due to redundancy in genomic relationship estimates.
Collapse
Affiliation(s)
- Sven E. Weber
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | | | - Lennard Ehrig
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Lee T. Hickey
- Centre for Crop Science, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD, Australia
| | - Matthias Frisch
- Department of Biometry and Population Genetics, Justus Liebig University, Giessen, Germany
| | - Rod J. Snowdon
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| |
Collapse
|
12
|
Chafai N, Hayah I, Houaga I, Badaoui B. A review of machine learning models applied to genomic prediction in animal breeding. Front Genet 2023; 14:1150596. [PMID: 37745853 PMCID: PMC10516561 DOI: 10.3389/fgene.2023.1150596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 08/22/2023] [Indexed: 09/26/2023] Open
Abstract
The advent of modern genotyping technologies has revolutionized genomic selection in animal breeding. Large marker datasets have shown several drawbacks for traditional genomic prediction methods in terms of flexibility, accuracy, and computational power. Recently, the application of machine learning models in animal breeding has gained a lot of interest due to their tremendous flexibility and their ability to capture patterns in large noisy datasets. Here, we present a general overview of a handful of machine learning algorithms and their application in genomic prediction to provide a meta-picture of their performance in genomic estimated breeding values estimation, genotype imputation, and feature selection. Finally, we discuss a potential adoption of machine learning models in genomic prediction in developing countries. The results of the reviewed studies showed that machine learning models have indeed performed well in fitting large noisy data sets and modeling minor nonadditive effects in some of the studies. However, sometimes conventional methods outperformed machine learning models, which confirms that there's no universal method for genomic prediction. In summary, machine learning models have great potential for extracting patterns from single nucleotide polymorphism datasets. Nonetheless, the level of their adoption in animal breeding is still low due to data limitations, complex genetic interactions, a lack of standardization and reproducibility, and the lack of interpretability of machine learning models when trained with biological data. Consequently, there is no remarkable outperformance of machine learning methods compared to traditional methods in genomic prediction. Therefore, more research should be conducted to discover new insights that could enhance livestock breeding programs.
Collapse
Affiliation(s)
- Narjice Chafai
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Ichrak Hayah
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Isidore Houaga
- Centre for Tropical Livestock Genetics and Health, The Roslin Institute, Royal (Dick) School of Veterinary Medicine, The University of Edinburgh, Edinburgh, United Kingdom
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
- African Sustainable Agriculture Research Institute (ASARI), Mohammed VI Polytechnic University (UM6P), Laayoune, Morocco
| |
Collapse
|
13
|
Mora-Poblete F, Maldonado C, Henrique L, Uhdre R, Scapim CA, Mangolim CA. Multi-trait and multi-environment genomic prediction for flowering traits in maize: a deep learning approach. FRONTIERS IN PLANT SCIENCE 2023; 14:1153040. [PMID: 37593046 PMCID: PMC10428628 DOI: 10.3389/fpls.2023.1153040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 07/12/2023] [Indexed: 08/19/2023]
Abstract
Maize (Zea mays L.), the third most widely cultivated cereal crop in the world, plays a critical role in global food security. To improve the efficiency of selecting superior genotypes in breeding programs, researchers have aimed to identify key genomic regions that impact agronomic traits. In this study, the performance of multi-trait, multi-environment deep learning models was compared to that of Bayesian models (Markov Chain Monte Carlo generalized linear mixed models (MCMCglmm), Bayesian Genomic Genotype-Environment Interaction (BGGE), and Bayesian Multi-Trait and Multi-Environment (BMTME)) in terms of the prediction accuracy of flowering-related traits (Anthesis-Silking Interval: ASI, Female Flowering: FF, and Male Flowering: MF). A tropical maize panel of 258 inbred lines from Brazil was evaluated in three sites (Cambira-2018, Sabaudia-2018, and Iguatemi-2020 and 2021) using approximately 290,000 single nucleotide polymorphisms (SNPs). The results demonstrated a 14.4% increase in prediction accuracy when employing multi-trait models compared to the use of a single trait in a single environment approach. The accuracy of predictions also improved by 6.4% when using a single trait in a multi-environment scheme compared to using multi-trait analysis. Additionally, deep learning models consistently outperformed Bayesian models in both single and multiple trait and environment approaches. A complementary genome-wide association study identified associations with 26 candidate genes related to flowering time traits, and 31 marker-trait associations were identified, accounting for 37%, 37%, and 22% of the phenotypic variation of ASI, FF and MF, respectively. In conclusion, our findings suggest that deep learning models have the potential to significantly improve the accuracy of predictions, regardless of the approach used and provide support for the efficacy of this method in genomic selection for flowering-related traits in tropical maize.
Collapse
Affiliation(s)
| | - Carlos Maldonado
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
| | - Luma Henrique
- Department of Agronomy, State University of Maringá, Paraná, Brazil
| | - Renan Uhdre
- Department of Agronomy, State University of Maringá, Paraná, Brazil
| | | | | |
Collapse
|
14
|
Lee HJ, Lee JH, Gondro C, Koh YJ, Lee SH. deepGBLUP: joint deep learning networks and GBLUP framework for accurate genomic prediction of complex traits in Korean native cattle. Genet Sel Evol 2023; 55:56. [PMID: 37525091 PMCID: PMC10392020 DOI: 10.1186/s12711-023-00825-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 07/07/2023] [Indexed: 08/02/2023] Open
Abstract
BACKGROUND Genomic prediction has become widespread as a valuable tool to estimate genetic merit in animal and plant breeding. Here we develop a novel genomic prediction algorithm, called deepGBLUP, which integrates deep learning networks and a genomic best linear unbiased prediction (GBLUP) framework. The deep learning networks assign marker effects using locally-connected layers and subsequently use them to estimate an initial genomic value through fully-connected layers. The GBLUP framework estimates three genomic values (additive, dominance, and epistasis) by leveraging respective genetic relationship matrices. Finally, deepGBLUP predicts a final genomic value by summing all the estimated genomic values. RESULTS We compared the proposed deepGBLUP with the conventional GBLUP and Bayesian methods. Extensive experiments demonstrate that the proposed deepGBLUP yields state-of-the-art performance on Korean native cattle data across diverse traits, marker densities, and training sizes. In addition, they show that the proposed deepGBLUP can outperform the previous methods on simulated data across various heritabilities and quantitative trait loci (QTL) effects. CONCLUSIONS We introduced a novel genomic prediction algorithm, deepGBLUP, which successfully integrates deep learning networks and GBLUP framework. Through comprehensive evaluations on the Korean native cattle data and simulated data, deepGBLUP consistently achieved superior performance across various traits, marker densities, training sizes, heritabilities, and QTL effects. Therefore, deepGBLUP is an efficient method to estimate an accurate genomic value. The source code and manual for deepGBLUP are available at https://github.com/gywns6287/deepGBLUP .
Collapse
Affiliation(s)
- Hyo-Jun Lee
- Department of Bio-AI Convergence, Chungnam National University, 305-764, Daejeon, Korea
| | - Jun Heon Lee
- Division of Animal and Dairy Science, Chungnam National University, 305-764, Daejeon, Korea
| | - Cedric Gondro
- Department of Animal Science, Michigan State University, East Lansing, MI, USA
| | - Yeong Jun Koh
- Department of Computer Science and Engineering, Chungnam National University, 305-764, Daejeon, Korea.
| | - Seung Hwan Lee
- Division of Animal and Dairy Science, Chungnam National University, 305-764, Daejeon, Korea.
| |
Collapse
|
15
|
Bhat JA, Feng X, Mir ZA, Raina A, Siddique KHM. Recent advances in artificial intelligence, mechanistic models, and speed breeding offer exciting opportunities for precise and accelerated genomics-assisted breeding. PHYSIOLOGIA PLANTARUM 2023; 175:e13969. [PMID: 37401892 DOI: 10.1111/ppl.13969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 06/11/2023] [Accepted: 06/27/2023] [Indexed: 07/05/2023]
Abstract
Given the challenges of population growth and climate change, there is an urgent need to expedite the development of high-yielding stress-tolerant crop cultivars. While traditional breeding methods have been instrumental in ensuring global food security, their efficiency, precision, and labour intensiveness have become increasingly inadequate to address present and future challenges. Fortunately, recent advances in high-throughput phenomics and genomics-assisted breeding (GAB) provide a promising platform for enhancing crop cultivars with greater efficiency. However, several obstacles must be overcome to optimize the use of these techniques in crop improvement, such as the complexity of phenotypic analysis of big image data. In addition, the prevalent use of linear models in genome-wide association studies (GWAS) and genomic selection (GS) fails to capture the nonlinear interactions of complex traits, limiting their applicability for GAB and impeding crop improvement. Recent advances in artificial intelligence (AI) techniques have opened doors to nonlinear modelling approaches in crop breeding, enabling the capture of nonlinear and epistatic interactions in GWAS and GS and thus making this variation available for GAB. While statistical and software challenges persist in AI-based models, they are expected to be resolved soon. Furthermore, recent advances in speed breeding have significantly reduced the time (3-5-fold) required for conventional breeding. Thus, integrating speed breeding with AI and GAB could improve crop cultivar development within a considerably shorter timeframe while ensuring greater accuracy and efficiency. In conclusion, this integrated approach could revolutionize crop breeding paradigms and safeguard food production in the face of population growth and climate change.
Collapse
Affiliation(s)
| | - Xianzhong Feng
- Zhejiang Lab, Hangzhou, China
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
| | - Zahoor A Mir
- ICAR-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Aamir Raina
- Department of Botany, Faculty of Life Sciences, Aligarh Muslim University, Aligarh, India
| | - Kadambot H M Siddique
- The UWA Institute of Agriculture and School of Agriculture & Environment, The University of Western Australia, Perth, Western Australia, Australia
| |
Collapse
|
16
|
Ortiz R, Reslow F, Vetukuri R, García-Gil MR, Pérez-Rodríguez P, Crossa J. Inbreeding Effects on the Performance and Genomic Prediction for Polysomic Tetraploid Potato Offspring Grown at High Nordic Latitudes. Genes (Basel) 2023; 14:1302. [PMID: 37372482 DOI: 10.3390/genes14061302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 06/18/2023] [Accepted: 06/19/2023] [Indexed: 06/29/2023] Open
Abstract
Inbreeding depression (ID) is caused by increased homozygosity in the offspring after selfing. Although the self-compatible, highly heterozygous, tetrasomic polyploid potato (Solanum tuberosum L.) suffers from ID, some argue that the potential genetic gains from using inbred lines in a sexual propagation system of potato are too large to be ignored. The aim of this research was to assess the effects of inbreeding on potato offspring performance under a high latitude and the accuracy of the genomic prediction of breeding values (GEBVs) for further use in selection. Four inbred (S1) and two hybrid (F1) offspring and their parents (S0) were used in the experiment, with a field layout of an augmented design with the four S0 replicated in nine incomplete blocks comprising 100, four-plant plots at Umeå (63°49'30″ N 20°15'50″ E), Sweden. S0 was significantly (p < 0.01) better than both S1 and F1 offspring for tuber weight (total and according to five grading sizes), tuber shape and size uniformity, tuber eye depth and reducing sugars in the tuber flesh, while F1 was significantly (p < 0.01) better than S1 for all tuber weight and uniformity traits. Some F1 hybrid offspring (15-19%) had better total tuber yield than the best-performing parent. The GEBV accuracy ranged from -0.3928 to 0.4436. Overall, tuber shape uniformity had the highest GEBV accuracy, while tuber weight traits exhibited the lowest accuracy. The F1 full sib's GEBV accuracy was higher, on average, than that of S1. Genomic prediction may facilitate eliminating undesired inbred or hybrid offspring for further use in the genetic betterment of potato.
Collapse
Affiliation(s)
- Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), SE 23436 Lomma, Sweden
- Umeå Plant Science Center, SLU Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences (SLU), SE 90183 Umeå, Sweden
| | - Fredrik Reslow
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), SE 23436 Lomma, Sweden
| | - Ramesh Vetukuri
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), SE 23436 Lomma, Sweden
| | - M Rosario García-Gil
- Umeå Plant Science Center, SLU Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences (SLU), SE 90183 Umeå, Sweden
| | | | - José Crossa
- Colegio de Postgraduados (COLPOS), Montecillos 56230, Edo. de México, Mexico
- International Maize and Wheat Improvement Center (CIMMYT), El Batán, Texcoco 56237, Edo. de México, Mexico
| |
Collapse
|
17
|
Zhao L, Walkowiak S, Fernando WGD. Artificial Intelligence: A Promising Tool in Exploring the Phytomicrobiome in Managing Disease and Promoting Plant Health. PLANTS (BASEL, SWITZERLAND) 2023; 12:plants12091852. [PMID: 37176910 PMCID: PMC10180744 DOI: 10.3390/plants12091852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/25/2023] [Accepted: 04/27/2023] [Indexed: 05/15/2023]
Abstract
There is increasing interest in harnessing the microbiome to improve cropping systems. With the availability of high-throughput and low-cost sequencing technologies, gathering microbiome data is becoming more routine. However, the analysis of microbiome data is challenged by the size and complexity of the data, and the incomplete nature of many microbiome databases. Further, to bring microbiome data value, it often needs to be analyzed in conjunction with other complex data that impact on crop health and disease management, such as plant genotype and environmental factors. Artificial intelligence (AI), boosted through deep learning (DL), has achieved significant breakthroughs and is a powerful tool for managing large complex datasets such as the interplay between the microbiome, crop plants, and their environment. In this review, we aim to provide readers with a brief introduction to AI techniques, and we introduce how AI has been applied to areas of microbiome sequencing taxonomy, the functional annotation for microbiome sequences, associating the microbiome community with host traits, designing synthetic communities, genomic selection, field phenotyping, and disease forecasting. At the end of this review, we proposed further efforts that are required to fully exploit the power of AI in studying phytomicrobiomes.
Collapse
Affiliation(s)
- Liang Zhao
- Department of Plant Science, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
| | | | | |
Collapse
|
18
|
Artificial Intelligence in Food Safety: A Decade Review and Bibliometric Analysis. Foods 2023; 12:foods12061242. [PMID: 36981168 PMCID: PMC10048131 DOI: 10.3390/foods12061242] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 03/06/2023] [Accepted: 03/09/2023] [Indexed: 03/17/2023] Open
Abstract
Artificial Intelligence (AI) technologies have been powerful solutions used to improve food yield, quality, and nutrition, increase safety and traceability while decreasing resource consumption, and eliminate food waste. Compared with several qualitative reviews on AI in food safety, we conducted an in-depth quantitative and systematic review based on the Core Collection database of WoS (Web of Science). To discover the historical trajectory and identify future trends, we analysed the literature concerning AI technologies in food safety from 2012 to 2022 by CiteSpace. In this review, we used bibliometric methods to describe the development of AI in food safety, including performance analysis, science mapping, and network analysis by CiteSpace. Among the 1855 selected articles, China and the United States contributed the most literature, and the Chinese Academy of Sciences released the largest number of relevant articles. Among all the journals in this field, PLoS ONE and Computers and Electronics in Agriculture ranked first and second in terms of annual publications and co-citation frequency. The present character, hot spots, and future research trends of AI technologies in food safety research were determined. Furthermore, based on our analyses, we provide researchers, practitioners, and policymakers with the big picture of research on AI in food safety across the whole process, from precision agriculture to precision nutrition, through 28 enlightening articles.
Collapse
|
19
|
Yan J, Wang X. Machine learning bridges omics sciences and plant breeding. TRENDS IN PLANT SCIENCE 2023; 28:199-210. [PMID: 36153276 DOI: 10.1016/j.tplants.2022.08.018] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 08/15/2022] [Accepted: 08/23/2022] [Indexed: 06/16/2023]
Abstract
Some of the biological knowledge obtained from fundamental research will be implemented in applied plant breeding. To bridge basic research and breeding practice, machine learning (ML) holds great promise to translate biological knowledge and omics data into precision-designed plant breeding. Here, we review ML for multi-omics analysis in plants, including data dimensionality reduction, inference of gene-regulation networks, and gene discovery and prioritization. These applications will facilitate understanding trait regulation mechanisms and identifying target genes potentially applicable to knowledge-driven molecular design breeding. We also highlight applications of deep learning in plant phenomics and ML in genomic selection-assisted breeding, such as various ML algorithms that model the correlations among genotypes (genes), phenotypes (traits), and environments, to ultimately achieve data-driven genomic design breeding.
Collapse
Affiliation(s)
- Jun Yan
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China; Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China
| | - Xiangfeng Wang
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China; Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China.
| |
Collapse
|
20
|
Guo T, Li X. Machine learning for predicting phenotype from genotype and environment. Curr Opin Biotechnol 2023; 79:102853. [PMID: 36463837 DOI: 10.1016/j.copbio.2022.102853] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 11/01/2022] [Accepted: 11/07/2022] [Indexed: 12/03/2022]
Abstract
Predicting phenotype with genomic and environmental information is critically needed and challenging. Machine learning methods have emerged as powerful tools to make accurate predictions from large and complex biological data. Here, we review the progress of phenotype prediction models enabled or improved by machine learning methods. We categorized the applications into three scenarios: prediction with genotypic information, with environmental information, and with both. In each scenario, we illustrate the practicality of prediction models, the advantages of machine learning, and the challenges of modeling complex relationships. We discuss the promising potential of leveraging machine learning and genetics theories to develop models that can predict phenotype and also interpret the biological consequences of changes in genotype and environment.
Collapse
Affiliation(s)
- Tingting Guo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China.
| | - Xianran Li
- USDA, Agricultural Research Service, Wheat Health, Genetics, and Quality Research Unit, Pullman, WA 99164, USA; Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA.
| |
Collapse
|
21
|
Li Z, Gao E, Zhou J, Han W, Xu X, Gao X. Applications of deep learning in understanding gene regulation. CELL REPORTS METHODS 2023; 3:100384. [PMID: 36814848 PMCID: PMC9939384 DOI: 10.1016/j.crmeth.2022.100384] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Gene regulation is a central topic in cell biology. Advances in omics technologies and the accumulation of omics data have provided better opportunities for gene regulation studies than ever before. For this reason deep learning, as a data-driven predictive modeling approach, has been successfully applied to this field during the past decade. In this article, we aim to give a brief yet comprehensive overview of representative deep-learning methods for gene regulation. Specifically, we discuss and compare the design principles and datasets used by each method, creating a reference for researchers who wish to replicate or improve existing methods. We also discuss the common problems of existing approaches and prospectively introduce the emerging deep-learning paradigms that will potentially alleviate them. We hope that this article will provide a rich and up-to-date resource and shed light on future research directions in this area.
Collapse
Affiliation(s)
- Zhongxiao Li
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Elva Gao
- The KAUST School, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Juexiao Zhou
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Wenkai Han
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Xiaopeng Xu
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| |
Collapse
|
22
|
Jubair S, Domaratzki M. Crop genomic selection with deep learning and environmental data: A survey. Front Artif Intell 2023; 5:1040295. [PMID: 36703955 PMCID: PMC9871498 DOI: 10.3389/frai.2022.1040295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/22/2022] [Indexed: 01/12/2023] Open
Abstract
Machine learning techniques for crop genomic selections, especially for single-environment plants, are well-developed. These machine learning models, which use dense genome-wide markers to predict phenotype, routinely perform well on single-environment datasets, especially for complex traits affected by multiple markers. On the other hand, machine learning models for predicting crop phenotype, especially deep learning models, using datasets that span different environmental conditions, have only recently emerged. Models that can accept heterogeneous data sources, such as temperature, soil conditions and precipitation, are natural choices for modeling GxE in multi-environment prediction. Here, we review emerging deep learning techniques that incorporate environmental data directly into genomic selection models.
Collapse
Affiliation(s)
- Sheikh Jubair
- Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada,*Correspondence: Sheikh Jubair ✉
| | - Mike Domaratzki
- Department of Computer Science, University of Western Ontario, London, ON, Canada
| |
Collapse
|
23
|
Wang K, Abid MA, Rasheed A, Crossa J, Hearne S, Li H. DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. MOLECULAR PLANT 2023; 16:279-293. [PMID: 36366781 DOI: 10.1016/j.molp.2022.11.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 09/28/2022] [Accepted: 11/08/2022] [Indexed: 06/16/2023]
Abstract
Genomic prediction is an effective way to accelerate the rate of agronomic trait improvement in plants. Traditional methods typically use linear regression models with clear assumptions; such methods are unable to capture the complex relationships between genotypes and phenotypes. Non-linear models (e.g., deep neural networks) have been proposed as a superior alternative to linear models because they can capture complex non-additive effects. Here we introduce a deep learning (DL) method, deep neural network genomic prediction (DNNGP), for integration of multi-omics data in plants. We trained DNNGP on four datasets and compared its performance with methods built with five classic models: genomic best linear unbiased prediction (GBLUP); two methods based on a machine learning (ML) framework, light gradient boosting machine (LightGBM) and support vector regression (SVR); and two methods based on a DL framework, deep learning genomic selection (DeepGS) and deep learning genome-wide association study (DLGWAS). DNNGP is novel in five ways. First, it can be applied to a variety of omics data to predict phenotypes. Second, the multilayered hierarchical structure of DNNGP dynamically learns features from raw data, avoiding overfitting and improving the convergence rate using a batch normalization layer and early stopping and rectified linear activation (rectified linear unit) functions. Third, when small datasets were used, DNNGP produced results that are competitive with results from the other five methods, showing greater prediction accuracy than the other methods when large-scale breeding data were used. Fourth, the computation time required by DNNGP was comparable with that of commonly used methods, up to 10 times faster than DeepGS. Fifth, hyperparameters can easily be batch tuned on a local machine. Compared with GBLUP, LightGBM, SVR, DeepGS and DLGWAS, DNNGP is superior to these existing widely used genomic selection (GS) methods. Moreover, DNNGP can generate robust assessments from diverse datasets, including omics data, and quickly incorporate complex and large datasets into usable models, making it a promising and practical approach for straightforward integration into existing GS platforms.
Collapse
Affiliation(s)
- Kelin Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT - China Office, 12 Zhongguancun South Street, Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China
| | | | - Awais Rasheed
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT - China Office, 12 Zhongguancun South Street, Beijing 100081, China; Department of Plant Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, Texcoco, D.F. 06600, Mexico
| | - Sarah Hearne
- International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, Texcoco, D.F. 06600, Mexico
| | - Huihui Li
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT - China Office, 12 Zhongguancun South Street, Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China.
| |
Collapse
|
24
|
Nazzicari N, Biscarini F. Stacked kinship CNN vs. GBLUP for genomic predictions of additive and complex continuous phenotypes. Sci Rep 2022; 12:19889. [PMID: 36400808 PMCID: PMC9674857 DOI: 10.1038/s41598-022-24405-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022] Open
Abstract
Deep learning is impacting many fields of data science with often spectacular results. However, its application to whole-genome predictions in plant and animal science or in human biology has been rather limited, with mostly underwhelming results. While most works focus on exploring alternative network architectures, in this study we propose an innovative representation of marker genotype data and tested it against the GBLUP (Genomic BLUP) benchmark with linear and nonlinear phenotypes. From publicly available cattle SNP genotype data, different types of genomic kinship matrices are stacked together in a 3D pile from where 2D grayscale slices are extracted and fed to a deep convolutional neural network (DNN). We simulated nine phenotype scenarios with combinations of additivity, dominance and epistasis, and compared the DNN to GBLUP-A (computed using only the additive kinship matrix) and GBLUP-optim (additive, dominance, and epistasis kinship matrices, as needed). Results varied depending on the accuracy metric employed, with DNN performing better in terms of root mean squared error (1-12% lower than GBLUP-A; 1-9% lower than GBLUP-optim) but worse in terms of Pearson's correlation (0.505 for DNN compared to 0.672 and 0.669 of GBLUP-A and GBLUP-optim for fully additive case; 0.274 for DNN, 0.279 for GBLUP-A, and 0.477 for GBLUP-optim for fully dominant case). The proposed approach offers a basis to explore further the application of DNN to tabular data in whole-genome predictions.
Collapse
Affiliation(s)
- Nelson Nazzicari
- CREA Council for Agricultural Research and Analysis of Agricultural Economics, Research Centre for Animal Production and Aquaculture, Viale Piacenza 29, 26900 Lodi, Italy
| | - Filippo Biscarini
- grid.510304.3CNR: National Research Council, Institute of Agricultural Biology and Biotechnology, Via Bassini 15, Milan, 20133 Italy
| |
Collapse
|
25
|
Perez BC, Bink MCAM, Svenson KL, Churchill GA, Calus MPL. Adding gene transcripts into genomic prediction improves accuracy and reveals sampling time dependence. G3 (BETHESDA, MD.) 2022; 12:jkac258. [PMID: 36161485 PMCID: PMC9635642 DOI: 10.1093/g3journal/jkac258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 09/07/2022] [Indexed: 06/16/2023]
Abstract
Recent developments allowed generating multiple high-quality 'omics' data that could increase the predictive performance of genomic prediction for phenotypes and genetic merit in animals and plants. Here, we have assessed the performance of parametric and nonparametric models that leverage transcriptomics in genomic prediction for 13 complex traits recorded in 478 animals from an outbred mouse population. Parametric models were implemented using the best linear unbiased prediction, while nonparametric models were implemented using the gradient boosting machine algorithm. We also propose a new model named GTCBLUP that aims to remove between-omics-layer covariance from predictors, whereas its counterpart GTBLUP does not do that. While gradient boosting machine models captured more phenotypic variation, their predictive performance did not exceed the best linear unbiased prediction models for most traits. Models leveraging gene transcripts captured higher proportions of the phenotypic variance for almost all traits when these were measured closer to the moment of measuring gene transcripts in the liver. In most cases, the combination of layers was not able to outperform the best single-omics models to predict phenotypes. Using only gene transcripts, the gradient boosting machine model was able to outperform best linear unbiased prediction for most traits except body weight, but the same pattern was not observed when using both single nucleotide polymorphism genotypes and gene transcripts. Although the GTCBLUP model was not able to produce the most accurate phenotypic predictions, it showed the highest accuracies for breeding values for 9 out of 13 traits. We recommend using the GTBLUP model for prediction of phenotypes and using the GTCBLUP for prediction of breeding values.
Collapse
Affiliation(s)
- Bruno C Perez
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | - Marco C A M Bink
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | | | | | - Mario P L Calus
- Corresponding author: Animal Breeding and Genomics, Wageningen University & Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands.
| |
Collapse
|
26
|
Mbo Nkoulou LF, Ngalle HB, Cros D, Adje COA, Fassinou NVH, Bell J, Achigan-Dako EG. Perspective for genomic-enabled prediction against black sigatoka disease and drought stress in polyploid species. FRONTIERS IN PLANT SCIENCE 2022; 13:953133. [PMID: 36388523 PMCID: PMC9650417 DOI: 10.3389/fpls.2022.953133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 09/28/2022] [Indexed: 06/16/2023]
Abstract
Genomic selection (GS) in plant breeding is explored as a promising tool to solve the problems related to the biotic and abiotic threats. Polyploid plants like bananas (Musa spp.) face the problem of drought and black sigatoka disease (BSD) that restrict their production. The conventional plant breeding is experiencing difficulties, particularly phenotyping costs and long generation interval. To overcome these difficulties, GS in plant breeding is explored as an alternative with a great potential for reducing costs and time in selection process. So far, GS does not have the same success in polyploid plants as with diploid plants because of the complexity of their genome. In this review, we present the main constraints to the application of GS in polyploid plants and the prospects for overcoming these constraints. Particular emphasis is placed on breeding for BSD and drought-two major threats to banana production-used in this review as a model of polyploid plant. It emerges that the difficulty in obtaining markers of good quality in polyploids is the first challenge of GS on polyploid plants, because the main tools used were developed for diploid species. In addition to that, there is a big challenge of mastering genetic interactions such as dominance and epistasis effects as well as the genotype by environment interaction, which are very common in polyploid plants. To get around these challenges, we have presented bioinformatics tools, as well as artificial intelligence approaches, including machine learning. Furthermore, a scheme for applying GS to banana for BSD and drought has been proposed. This review is of paramount impact for breeding programs that seek to reduce the selection cycle of polyploids despite the complexity of their genome.
Collapse
Affiliation(s)
- Luther Fort Mbo Nkoulou
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
- Institute of Agricultural Research for Development, Centre de Recherche Agricole de Mbalmayo (CRAM), Mbalmayo, Cameroon
| | - Hermine Bille Ngalle
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
| | - David Cros
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Unité Mixte de Recherche (UMR) Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP) Institut, Montpellier, France
- Unité Mixte de Recherche (UMR) Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP) Institut, University of Montpellier, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Institut Agro, Montpellier, France
| | - Charlotte O. A. Adje
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| | - Nicodeme V. H. Fassinou
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| | - Joseph Bell
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
| | - Enoch G. Achigan-Dako
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| |
Collapse
|
27
|
A divide-and-conquer approach for genomic prediction in rubber tree using machine learning. Sci Rep 2022; 12:18023. [PMID: 36289298 PMCID: PMC9605989 DOI: 10.1038/s41598-022-20416-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 09/13/2022] [Indexed: 01/20/2023] Open
Abstract
Rubber tree (Hevea brasiliensis) is the main feedstock for commercial rubber; however, its long vegetative cycle has hindered the development of more productive varieties via breeding programs. With the availability of H. brasiliensis genomic data, several linkage maps with associated quantitative trait loci have been constructed and suggested as a tool for marker-assisted selection. Nonetheless, novel genomic strategies are still needed, and genomic selection (GS) may facilitate rubber tree breeding programs aimed at reducing the required cycles for performance assessment. Even though such a methodology has already been shown to be a promising tool for rubber tree breeding, increased model predictive capabilities and practical application are still needed. Here, we developed a novel machine learning-based approach for predicting rubber tree stem circumference based on molecular markers. Through a divide-and-conquer strategy, we propose a neural network prediction system with two stages: (1) subpopulation prediction and (2) phenotype estimation. This approach yielded higher accuracies than traditional statistical models in a single-environment scenario. By delivering large accuracy improvements, our methodology represents a powerful tool for use in Hevea GS strategies. Therefore, the incorporation of machine learning techniques into rubber tree GS represents an opportunity to build more robust models and optimize Hevea breeding programs.
Collapse
|
28
|
Kim KW, Nawade B, Nam J, Chu SH, Ha J, Park YJ. Development of an inclusive 580K SNP array and its application for genomic selection and genome-wide association studies in rice. FRONTIERS IN PLANT SCIENCE 2022; 13:1036177. [PMID: 36352876 PMCID: PMC9637963 DOI: 10.3389/fpls.2022.1036177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Accepted: 09/30/2022] [Indexed: 06/16/2023]
Abstract
Rice is a globally cultivated crop and is primarily a staple food source for more than half of the world's population. Various single-nucleotide polymorphism (SNP) arrays have been developed and utilized as standard genotyping methods for rice breeding research. Considering the importance of SNP arrays with more inclusive genetic information for GWAS and genomic selection, we integrated SNPs from eight different data resources: resequencing data from the Korean World Rice Collection (KRICE) of 475 accessions, 3,000 rice genome project (3 K-RGP) data, 700 K high-density rice array, Affymetrix 44 K SNP array, QTARO, Reactome, and plastid and GMO information. The collected SNPs were filtered and selected based on the breeder's interest, covering all key traits or research areas to develop an integrated array system representing inclusive genomic polymorphisms. A total of 581,006 high-quality SNPs were synthesized with an average distance of 200 bp between adjacent SNPs, generating a 580 K Axiom Rice Genotyping Chip (580 K _ KNU chip). Further validation of this array on 4,720 genotypes revealed robust and highly efficient genotyping. This has also been demonstrated in genome-wide association studies (GWAS) and genomic selection (GS) of three traits: clum length, heading date, and panicle length. Several SNPs significantly associated with cut-off, -log10 p-value >7.0, were detected in GWAS, and the GS predictabilities for the three traits were more than 0.5, in both rrBLUP and convolutional neural network (CNN) models. The Axiom 580 K Genotyping array will provide a cost-effective genotyping platform and accelerate rice GWAS and GS studies.
Collapse
Affiliation(s)
- Kyu-Won Kim
- Center for Crop Breeding on Omics and Artificial Intelligence, Kongju National University, Yesan, South Korea
| | - Bhagwat Nawade
- Center for Crop Breeding on Omics and Artificial Intelligence, Kongju National University, Yesan, South Korea
| | - Jungrye Nam
- Center for Crop Breeding on Omics and Artificial Intelligence, Kongju National University, Yesan, South Korea
| | - Sang-Ho Chu
- Center for Crop Breeding on Omics and Artificial Intelligence, Kongju National University, Yesan, South Korea
| | - Jungmin Ha
- Department of Plant Science, Gangneung-Wonju National University, Gangneung, South Korea
| | - Yong-Jin Park
- Center for Crop Breeding on Omics and Artificial Intelligence, Kongju National University, Yesan, South Korea
- Department of Plant Resources, College of Industrial Sciences, Kongju National University, Yesan, South Korea
| |
Collapse
|
29
|
Zhang F, Kang J, Long R, Li M, Sun Y, He F, Jiang X, Yang C, Yang X, Kong J, Wang Y, Wang Z, Zhang Z, Yang Q. Application of machine learning to explore the genomic prediction accuracy of fall dormancy in autotetraploid alfalfa. HORTICULTURE RESEARCH 2022; 10:uhac225. [PMID: 36643744 PMCID: PMC9832841 DOI: 10.1093/hr/uhac225] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 09/25/2022] [Indexed: 06/17/2023]
Abstract
Fall dormancy (FD) is an essential trait to overcome winter damage and for alfalfa (Medicago sativa) cultivar selection. The plant regrowth height after autumn clipping is an indirect way to evaluate FD. Transcriptomics, proteomics, and quantitative trait locus mapping have revealed crucial genes correlated with FD; however, these genes cannot predict alfalfa FD very well. Here, we conducted genomic prediction of FD using whole-genome SNP markers based on machine learning-related methods, including support vector machine (SVM) regression, and regularization-related methods, such as Lasso and ridge regression. The results showed that using SVM regression with linear kernel and the top 3000 genome-wide association study (GWAS)-associated markers achieved the highest prediction accuracy for FD of 64.1%. For plant regrowth height, the prediction accuracy was 59.0% using the 3000 GWAS-associated markers and the SVM linear model. This was better than the results using whole-genome markers (25.0%). Therefore, the method we explored for alfalfa FD prediction outperformed the other models, such as Lasso and ElasticNet. The study suggests the feasibility of using machine learning to predict FD with GWAS-associated markers, and the GWAS-associated markers combined with machine learning would benefit FD-related traits as well. Application of the methodology may provide potential targets for FD selection, which would accelerate genetic research and molecular breeding of alfalfa with optimized FD.
Collapse
Affiliation(s)
- Fan Zhang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China, 100193
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, USA, 99163
| | - Junmei Kang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China, 100193
| | - Ruicai Long
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China, 100193
| | - Mingna Li
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China, 100193
| | - Yan Sun
- Department of Turf Science and Engineering, College of Grassland Science and Technology, China Agricultural University, Beijing, China, 100193
| | - Fei He
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China, 100193
| | - Xueqian Jiang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China, 100193
| | - Changfu Yang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China, 100193
| | - Xijiang Yang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China, 100193
| | - Jie Kong
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China, 100193
| | - Yiwen Wang
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia, 3052
| | - Zhen Wang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China, 100193
| | - Zhiwu Zhang
- Corresponding author: Zhiwu Zhang (, Phone (Office): 509-335-2899, Fax: 509-335-8674) or Qingchuan Yang (, Phone: 010-62815996, Fax: 010-62815996)
| | - Qingchuan Yang
- Corresponding author: Zhiwu Zhang (, Phone (Office): 509-335-2899, Fax: 509-335-8674) or Qingchuan Yang (, Phone: 010-62815996, Fax: 010-62815996)
| |
Collapse
|
30
|
Genomic prediction through machine learning and neural networks for traits with epistasis. Comput Struct Biotechnol J 2022; 20:5490-5499. [PMID: 36249559 PMCID: PMC9547190 DOI: 10.1016/j.csbj.2022.09.029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 09/20/2022] [Accepted: 09/20/2022] [Indexed: 11/22/2022] Open
Abstract
Performance of machine learning and neural netowrks in Genomic analysis. Heritability and QTL number impacts on performance machine learning methods. Machine learning models in genomic analyses. Neural networks can present better performance for complex quantitative traits.
Genomic wide selection (GWS) is one contributions of molecular genetics to breeding. Machine learning (ML) and artificial neural networks (ANN) methods are non-parameterized and can develop more accurate and parsimonious models for GWS analysis. Multivariate Adaptive Regression Splines (MARS) is considered one of the most flexible ML methods, automatically modeling nonlinearities and interactions of the predictor variables. This study aimed to evaluate and compare methods based on ANN, ML, including MARS, and G-BLUP through GWS. An F2 population formed by 1000 individuals and genotyped for 4010 SNP markers and twelve traits from a model considering epistatic effect, with QTL numbers ranging from eight to 480 and heritability (h2) of 0.3, 0.5 or 0.8 were simulated. Variation in heritability and number of QTL impacts the performance of methods. About quantitative traits (40, 80, 120, 240, and 480 QTLs) was observed highest R2 to Radial Base Network (RBF) and G-BLUP, followed by Random Forest (RF), Bagging (BA), and Boosting (BO). RF and BA also showed better results for traits to h2 of 0.3 with R2 values 16.51% and 16.30%, respectively, while MARS methods showed better results for oligogenic traits with R2 values ranging from 39,12 % to 43,20 % in h2 of 0.5 and from 59.92% to 78,56% in h2 of 0.8. Non-additive MARS methods also showed high R2 for traits with high heritability and 240 QTLs or more. ANN and ML methods are powerful tools to predict genetic values in traits with epistatic effect, for different degrees of heritability and QTL numbers.
Collapse
|
31
|
Cortés AJ, López-Hernández F, Blair MW. Genome–Environment Associations, an Innovative Tool for Studying Heritable Evolutionary Adaptation in Orphan Crops and Wild Relatives. Front Genet 2022; 13:910386. [PMID: 35991553 PMCID: PMC9389289 DOI: 10.3389/fgene.2022.910386] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 05/30/2022] [Indexed: 11/23/2022] Open
Abstract
Leveraging innovative tools to speed up prebreeding and discovery of genotypic sources of adaptation from landraces, crop wild relatives, and orphan crops is a key prerequisite to accelerate genetic gain of abiotic stress tolerance in annual crops such as legumes and cereals, many of which are still orphan species despite advances in major row crops. Here, we review a novel, interdisciplinary approach to combine ecological climate data with evolutionary genomics under the paradigm of a new field of study: genome–environment associations (GEAs). We first exemplify how GEA utilizes in situ georeferencing from genotypically characterized, gene bank accessions to pinpoint genomic signatures of natural selection. We later discuss the necessity to update the current GEA models to predict both regional- and local- or micro-habitat–based adaptation with mechanistic ecophysiological climate indices and cutting-edge GWAS-type genetic association models. Furthermore, to account for polygenic evolutionary adaptation, we encourage the community to start gathering genomic estimated adaptive values (GEAVs) for genomic prediction (GP) and multi-dimensional machine learning (ML) models. The latter two should ideally be weighted by de novo GWAS-based GEA estimates and optimized for a scalable marker subset. We end the review by envisioning avenues to make adaptation inferences more robust through the merging of high-resolution data sources, such as environmental remote sensing and summary statistics of the genomic site frequency spectrum, with the epigenetic molecular functionality responsible for plastic inheritance in the wild. Ultimately, we believe that coupling evolutionary adaptive predictions with innovations in ecological genomics such as GEA will help capture hidden genetic adaptations to abiotic stresses based on crop germplasm resources to assist responses to climate change. “I shall endeavor to find out how nature’s forces act upon one another, and in what manner the geographic environment exerts its influence on animals and plants. In short, I must find out about the harmony in nature” Alexander von Humboldt—Letter to Karl Freiesleben, June 1799.
Collapse
Affiliation(s)
- Andrés J. Cortés
- Corporacion Colombiana de Investigacion Agropecuaria AGROSAVIA, C.I. La Selva, Rionegro, Colombia
- *Correspondence: Andrés J. Cortés, ; Matthew W. Blair,
| | - Felipe López-Hernández
- Corporacion Colombiana de Investigacion Agropecuaria AGROSAVIA, C.I. La Selva, Rionegro, Colombia
| | - Matthew W. Blair
- Department of Agricultural & Environmental Sciences, Tennessee State University, Nashville, TN, United States
- *Correspondence: Andrés J. Cortés, ; Matthew W. Blair,
| |
Collapse
|
32
|
A joint learning approach for genomic prediction in polyploid grasses. Sci Rep 2022; 12:12499. [PMID: 35864135 PMCID: PMC9304331 DOI: 10.1038/s41598-022-16417-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 07/11/2022] [Indexed: 12/20/2022] Open
Abstract
Poaceae, among the most abundant plant families, includes many economically important polyploid species, such as forage grasses and sugarcane (Saccharum spp.). These species have elevated genomic complexities and limited genetic resources, hindering the application of marker-assisted selection strategies. Currently, the most promising approach for increasing genetic gains in plant breeding is genomic selection. However, due to the polyploidy nature of these polyploid species, more accurate models for incorporating genomic selection into breeding schemes are needed. This study aims to develop a machine learning method by using a joint learning approach to predict complex traits from genotypic data. Biparental populations of sugarcane and two species of forage grasses (Urochloa decumbens, Megathyrsus maximus) were genotyped, and several quantitative traits were measured. High-quality markers were used to predict several traits in different cross-validation scenarios. By combining classification and regression strategies, we developed a predictive system with promising results. Compared with traditional genomic prediction methods, the proposed strategy achieved accuracy improvements exceeding 50%. Our results suggest that the developed methodology could be implemented in breeding programs, helping reduce breeding cycles and increase genetic gains.
Collapse
|
33
|
Danilevicz MF, Gill M, Anderson R, Batley J, Bennamoun M, Bayer PE, Edwards D. Plant Genotype to Phenotype Prediction Using Machine Learning. Front Genet 2022; 13:822173. [PMID: 35664329 PMCID: PMC9159391 DOI: 10.3389/fgene.2022.822173] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 03/07/2022] [Indexed: 12/13/2022] Open
Abstract
Genomic prediction tools support crop breeding based on statistical methods, such as the genomic best linear unbiased prediction (GBLUP). However, these tools are not designed to capture non-linear relationships within multi-dimensional datasets, or deal with high dimension datasets such as imagery collected by unmanned aerial vehicles. Machine learning (ML) algorithms have the potential to surpass the prediction accuracy of current tools used for genotype to phenotype prediction, due to their capacity to autonomously extract data features and represent their relationships at multiple levels of abstraction. This review addresses the challenges of applying statistical and machine learning methods for predicting phenotypic traits based on genetic markers, environment data, and imagery for crop breeding. We present the advantages and disadvantages of explainable model structures, discuss the potential of machine learning models for genotype to phenotype prediction in crop breeding, and the challenges, including the scarcity of high-quality datasets, inconsistent metadata annotation and the requirements of ML models.
Collapse
Affiliation(s)
- Monica F. Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Mitchell Gill
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Robyn Anderson
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Mohammed Bennamoun
- School of Physics, Mathematics and Computing, University of Western Australia, Perth, WA, Australia
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
- *Correspondence: David Edwards,
| |
Collapse
|
34
|
Wang X, Shi S, Wang G, Luo W, Wei X, Qiu A, Luo F, Ding X. Using machine learning to improve the accuracy of genomic prediction of reproduction traits in pigs. J Anim Sci Biotechnol 2022; 13:60. [PMID: 35578371 PMCID: PMC9112588 DOI: 10.1186/s40104-022-00708-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 03/13/2022] [Indexed: 12/02/2022] Open
Abstract
Background Recently, machine learning (ML) has become attractive in genomic prediction, but its superiority in genomic prediction over conventional (ss) GBLUP methods and the choice of optimal ML methods need to be investigated. Results In this study, 2566 Chinese Yorkshire pigs with reproduction trait records were genotyped with the GenoBaits Porcine SNP 50 K and PorcineSNP50 panels. Four ML methods, including support vector regression (SVR), kernel ridge regression (KRR), random forest (RF) and Adaboost.R2 were implemented. Through 20 replicates of fivefold cross-validation (CV) and one prediction for younger individuals, the utility of ML methods in genomic prediction was explored. In CV, compared with genomic BLUP (GBLUP), single-step GBLUP (ssGBLUP) and the Bayesian method BayesHE, ML methods significantly outperformed these conventional methods. ML methods improved the genomic prediction accuracy of GBLUP, ssGBLUP, and BayesHE by 19.3%, 15.0% and 20.8%, respectively. In addition, ML methods yielded smaller mean squared error (MSE) and mean absolute error (MAE) in all scenarios. ssGBLUP yielded an improvement of 3.8% on average in accuracy compared to that of GBLUP, and the accuracy of BayesHE was close to that of GBLUP. In genomic prediction of younger individuals, RF and Adaboost.R2_KRR performed better than GBLUP and BayesHE, while ssGBLUP performed comparably with RF, and ssGBLUP yielded slightly higher accuracy and lower MSE than Adaboost.R2_KRR in the prediction of total number of piglets born, while for number of piglets born alive, Adaboost.R2_KRR performed significantly better than ssGBLUP. Among ML methods, Adaboost.R2_KRR consistently performed well in our study. Our findings also demonstrated that optimal hyperparameters are useful for ML methods. After tuning hyperparameters in CV and in predicting genomic outcomes of younger individuals, the average improvement was 14.3% and 21.8% over those using default hyperparameters, respectively. Conclusion Our findings demonstrated that ML methods had better overall prediction performance than conventional genomic selection methods, and could be new options for genomic prediction. Among ML methods, Adaboost.R2_KRR consistently performed well in our study, and tuning hyperparameters is necessary for ML methods. The optimal hyperparameters depend on the character of traits, datasets etc. Supplementary Information The online version contains supplementary material available at 10.1186/s40104-022-00708-0.
Collapse
Affiliation(s)
- Xue Wang
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Shaolei Shi
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Guijiang Wang
- Hebei Province Animal Husbandry and Improved Breeds Work Station, Shijiazhuang, Hebei, China
| | - Wenxue Luo
- Hebei Province Animal Husbandry and Improved Breeds Work Station, Shijiazhuang, Hebei, China
| | - Xia Wei
- Zhangjiakou Dahao Heshan New Agricultural Development Co., Ltd, Zhangjiakou, Hebei, China
| | - Ao Qiu
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Fei Luo
- Hebei Province Animal Husbandry and Improved Breeds Work Station, Shijiazhuang, Hebei, China
| | - Xiangdong Ding
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China.
| |
Collapse
|
35
|
Mathew B, Hauptmann A, Léon J, Sillanpää MJ. NeuralLasso: Neural Networks Meet Lasso in Genomic Prediction. FRONTIERS IN PLANT SCIENCE 2022; 13:800161. [PMID: 35574107 PMCID: PMC9100816 DOI: 10.3389/fpls.2022.800161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 03/18/2022] [Indexed: 06/15/2023]
Abstract
Prediction of complex traits based on genome-wide marker information is of central importance for both animal and plant breeding. Numerous models have been proposed for the prediction of complex traits and still considerable effort has been given to improve the prediction accuracy of these models, because various genetics factors like additive, dominance and epistasis effects can influence of the prediction accuracy of such models. Recently machine learning (ML) methods have been widely applied for prediction in both animal and plant breeding programs. In this study, we propose a new algorithm for genomic prediction which is based on neural networks, but incorporates classical elements of LASSO. Our new method is able to account for the local epistasis (higher order interaction between the neighboring markers) in the prediction. We compare the prediction accuracy of our new method with the most commonly used prediction methods, such as BayesA, BayesB, Bayesian Lasso (BL), genomic BLUP and Elastic Net (EN) using the heterogenous stock mouse and rice field data sets.
Collapse
Affiliation(s)
- Boby Mathew
- Bayer CropScience, Monheim am Rhein, Germany
- Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany
| | - Andreas Hauptmann
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
- Department of Computer Science, University College London, London, United Kingdom
| | - Jens Léon
- Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany
| | - Mikko J. Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
| |
Collapse
|
36
|
Genome-Enabled Prediction Methods Based on Machine Learning. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:189-218. [PMID: 35451777 DOI: 10.1007/978-1-0716-2205-6_7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Growth of artificial intelligence and machine learning (ML) methodology has been explosive in recent years. In this class of procedures, computers get knowledge from sets of experiences and provide forecasts or classification. In genome-wide based prediction (GWP), many ML studies have been carried out. This chapter provides a description of main semiparametric and nonparametric algorithms used in GWP in animals and plants. Thirty-four ML comparative studies conducted in the last decade were used to develop a meta-analysis through a Thurstonian model, to evaluate algorithms with the best predictive qualities. It was found that some kernel, Bayesian, and ensemble methods displayed greater robustness and predictive ability. However, the type of study and data distribution must be considered in order to choose the most appropriate model for a given problem.
Collapse
|
37
|
Edger PP, Iorizzo M, Bassil NV, Benevenuto J, Ferrão LFV, Giongo L, Hummer K, Lawas LMF, Leisner CP, Li C, Munoz PR, Ashrafi H, Atucha A, Babiker EM, Canales E, Chagné D, DeVetter L, Ehlenfeldt M, Espley RV, Gallardo K, Günther CS, Hardigan M, Hulse-Kemp AM, Jacobs M, Lila MA, Luby C, Main D, Mengist MF, Owens GL, Perkins-Veazie P, Polashock J, Pottorff M, Rowland LJ, Sims CA, Song GQ, Spencer J, Vorsa N, Yocca AE, Zalapa J. There and back again; historical perspective and future directions for Vaccinium breeding and research studies. HORTICULTURE RESEARCH 2022; 9:uhac083. [PMID: 35611183 PMCID: PMC9123236 DOI: 10.1093/hr/uhac083] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 03/22/2022] [Indexed: 06/02/2023]
Abstract
The genus Vaccinium L. (Ericaceae) contains a wide diversity of culturally and economically important berry crop species. Consumer demand and scientific research in blueberry (Vaccinium spp.) and cranberry (Vaccinium macrocarpon) have increased worldwide over the crops' relatively short domestication history (~100 years). Other species, including bilberry (Vaccinium myrtillus), lingonberry (Vaccinium vitis-idaea), and ohelo berry (Vaccinium reticulatum) are largely still harvested from the wild but with crop improvement efforts underway. Here, we present a review article on these Vaccinium berry crops on topics that span taxonomy to genetics and genomics to breeding. We highlight the accomplishments made thus far for each of these crops, along their journey from the wild, and propose research areas and questions that will require investments by the community over the coming decades to guide future crop improvement efforts. New tools and resources are needed to underpin the development of superior cultivars that are not only more resilient to various environmental stresses and higher yielding, but also produce fruit that continue to meet a variety of consumer preferences, including fruit quality and health related traits.
Collapse
Affiliation(s)
- Patrick P Edger
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
- MSU AgBioResearch, Michigan State University, East Lansing, MI, 48824, USA
| | - Massimo Iorizzo
- Plants for Human Health Institute, North Carolina State University, Kannapolis, NC USA
- Department of Horticultural Science, North Carolina State University, Raleigh, NC USA
| | - Nahla V Bassil
- USDA-ARS, National Clonal Germplasm Repository, Corvallis, OR 97333, USA
| | - Juliana Benevenuto
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Luis Felipe V Ferrão
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Lara Giongo
- Fondazione Edmund Mach - Research and Innovation CentreItaly
| | - Kim Hummer
- USDA-ARS, National Clonal Germplasm Repository, Corvallis, OR 97333, USA
| | - Lovely Mae F Lawas
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| | - Courtney P Leisner
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| | - Changying Li
- Phenomics and Plant Robotics Center, College of Engineering, University of Georgia, Athens, USA
| | - Patricio R Munoz
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Hamid Ashrafi
- Department of Horticultural Science, North Carolina State University, Raleigh, NC USA
| | - Amaya Atucha
- Department of Horticulture, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Ebrahiem M Babiker
- USDA-ARS Southern Horticultural Laboratory, Poplarville, MS 39470-0287, USA
| | - Elizabeth Canales
- Department of Agricultural Economics, Mississippi State University, Mississippi State, MS 39762, USA
| | - David Chagné
- The New Zealand Institute for Plant and Food Research Limited (PFR), Palmerston North, New Zealand
| | - Lisa DeVetter
- Department of Horticulture, Washington State University Northwestern Washington Research and Extension Center, Mount Vernon, WA, 98221, USA
| | - Mark Ehlenfeldt
- SEBS, Plant Biology, Rutgers University, New Brunswick NJ 01019 USA
| | - Richard V Espley
- The New Zealand Institute for Plant and Food Research Limited (PFR), Palmerston North, New Zealand
| | - Karina Gallardo
- School of Economic Sciences, Washington State University, Puyallup, WA 98371, USA
| | - Catrin S Günther
- The New Zealand Institute for Plant and Food Research Limited (PFR), Palmerston North, New Zealand
| | - Michael Hardigan
- USDA-ARS, Horticulture Crops Research Unit, Corvallis, OR 97333, USA
| | - Amanda M Hulse-Kemp
- USDA-ARS, Genomics and Bioinformatics Research Unit, Raleigh, NC 27695, USA
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - MacKenzie Jacobs
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48823, USA
| | - Mary Ann Lila
- Plants for Human Health Institute, North Carolina State University, Kannapolis, NC USA
| | - Claire Luby
- USDA-ARS, Horticulture Crops Research Unit, Corvallis, OR 97333, USA
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA, 99163, USA
| | - Molla F Mengist
- Plants for Human Health Institute, North Carolina State University, Kannapolis, NC USA
- Department of Horticultural Science, North Carolina State University, Raleigh, NC USA
| | | | | | - James Polashock
- SEBS, Plant Biology, Rutgers University, New Brunswick NJ 01019 USA
| | - Marti Pottorff
- Plants for Human Health Institute, North Carolina State University, Kannapolis, NC USA
| | - Lisa J Rowland
- USDA-ARS, Genetic Improvement of Fruits and Vegetables Laboratory, Beltsville, MD 20705, USA
| | - Charles A Sims
- Food Science and Human Nutrition Department, University of Florida, Gainesville, FL 32611, USA
| | - Guo-qing Song
- Plant Biotechnology Resource and Outreach Center, Department of Horticulture, Michigan State University, East Lansing, MI 48824, USA
| | - Jessica Spencer
- Department of Horticultural Science, North Carolina State University, Raleigh, NC USA
| | - Nicholi Vorsa
- SEBS, Plant Biology, Rutgers University, New Brunswick NJ 01019 USA
| | - Alan E Yocca
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Juan Zalapa
- USDA-ARS, VCRU, Department of Horticulture, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
38
|
Perez BC, Bink MCAM, Svenson KL, Churchill GA, Calus MPL. Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice. G3 (BETHESDA, MD.) 2022; 12:6528848. [PMID: 35166767 PMCID: PMC8982369 DOI: 10.1093/g3journal/jkac039] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 01/29/2022] [Indexed: 12/14/2022]
Abstract
We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6 generations. Traits analyzed were bone mineral density, body weight at 10, 15, and 20 weeks, fat percentage, circulating cholesterol, glucose, insulin, triglycerides, and urine creatinine. The youngest generation was used as a validation subset, and predictions were based on all older generations. Model performance was evaluated by comparing predictions for animals in the validation subset against their adjusted phenotypes. Linear models outperformed gradient boosting machine for 7 out of 10 traits. For bone mineral density, cholesterol, and glucose, the gradient boosting machine model showed better prediction accuracy and lower relative root mean squared error than the linear models. Interestingly, for these 3 traits, there is evidence of a relevant portion of phenotypic variance being explained by epistatic effects. Using a subset of top markers selected from a gradient boosting machine model helped for some of the traits to improve the accuracy of prediction when these were fitted into linear and gradient boosting machine models. Our results indicate that gradient boosting machine is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Although the linear models outperformed gradient boosting machine for the polygenic traits, our results suggest that gradient boosting machine is a competitive method to predict complex traits with assumed epistatic effects.
Collapse
Affiliation(s)
- Bruno C Perez
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | - Marco C A M Bink
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | | | | | - Mario P L Calus
- Wageningen University & Research, Animal Breeding and Genomics, 6700 AH Wageningen, The Netherlands
| |
Collapse
|
39
|
Galli G, Sabadin F, Yassue RM, Galves C, Carvalho HF, Crossa J, Montesinos-López OA, Fritsche-Neto R. Automated Machine Learning: A Case Study of Genomic "Image-Based" Prediction in Maize Hybrids. FRONTIERS IN PLANT SCIENCE 2022; 13:845524. [PMID: 35321444 PMCID: PMC8936805 DOI: 10.3389/fpls.2022.845524] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 02/03/2022] [Indexed: 06/14/2023]
Abstract
Machine learning methods such as multilayer perceptrons (MLP) and Convolutional Neural Networks (CNN) have emerged as promising methods for genomic prediction (GP). In this context, we assess the performance of MLP and CNN on regression and classification tasks in a case study with maize hybrids. The genomic information was provided to the MLP as a relationship matrix and to the CNN as "genomic images." In the regression task, the machine learning models were compared along with GBLUP. Under the classification task, MLP and CNN were compared. In this case, the traits (plant height and grain yield) were discretized in such a way to create balanced (moderate selection intensity) and unbalanced (extreme selection intensity) datasets for further evaluations. An automatic hyperparameter search for MLP and CNN was performed, and the best models were reported. For both task types, several metrics were calculated under a validation scheme to assess the effect of the prediction method and other variables. Overall, MLP and CNN presented competitive results to GBLUP. Also, we bring new insights on automated machine learning for genomic prediction and its implications to plant breeding.
Collapse
Affiliation(s)
- Giovanni Galli
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Felipe Sabadin
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, United States
| | - Rafael Massahiro Yassue
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Cassia Galves
- Department of Food Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | | | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - Roberto Fritsche-Neto
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
- International Rice Research Institute (IRRI), Los Baños, Philippines
| |
Collapse
|
40
|
Abstract
Consumers often regard heirloom fruit varieties grown in the garden as more flavorful than commercial varieties purchased at the grocery store. While plant breeders have historically focused on improving producer-orientated traits such as yield, consumer-oriented traits such as flavor have regularly been neglected. This is, in part, due to the difficulty associated with measuring the sensory perceptions of flavor. Here, we combine fruit chemical and consumer sensory panel information to train machine learning models that can predict how flavorful a fruit will be from its chemistry. By increasing the throughput of flavor evaluations, these models will help plant breeders to integrate flavor earlier in the breeding pipeline and aid in the design of varieties with exceptional flavor profiles. Although they are staple foods in cuisines globally, many commercial fruit varieties have become progressively less flavorful over time. Due to the cost and difficulty associated with flavor phenotyping, breeding programs have long been challenged in selecting for this complex trait. To address this issue, we leveraged targeted metabolomics of diverse tomato and blueberry accessions and their corresponding consumer panel ratings to create statistical and machine learning models that can predict sensory perceptions of fruit flavor. Using these models, a breeding program can assess flavor ratings for a large number of genotypes, previously limited by the low throughput of consumer sensory panels. The ability to predict consumer ratings of liking, sweet, sour, umami, and flavor intensity was evaluated by a 10-fold cross-validation, and the accuracies of 18 different models were assessed. The prediction accuracies were high for most attributes and ranged from 0.87 for sourness intensity in blueberry using XGBoost to 0.46 for overall liking in tomato using linear regression. Further, the best-performing models were used to infer the flavor compounds (sugars, acids, and volatiles) that contribute most to each flavor attribute. We found that the variance decomposition of overall liking score estimates that 42% and 56% of the variance was explained by volatile organic compounds in tomato and blueberry, respectively. We expect that these models will enable an earlier incorporation of flavor as breeding targets and encourage selection and release of more flavorful fruit varieties.
Collapse
|
41
|
Nguyen Ba AN, Lawrence KR, Rego-Costa A, Gopalakrishnan S, Temko D, Michor F, Desai MM. Barcoded Bulk QTL mapping reveals highly polygenic and epistatic architecture of complex traits in yeast. eLife 2022; 11:73983. [PMID: 35147078 PMCID: PMC8979589 DOI: 10.7554/elife.73983] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 02/11/2022] [Indexed: 11/25/2022] Open
Abstract
Mapping the genetic basis of complex traits is critical to uncovering the biological mechanisms that underlie disease and other phenotypes. Genome-wide association studies (GWAS) in humans and quantitative trait locus (QTL) mapping in model organisms can now explain much of the observed heritability in many traits, allowing us to predict phenotype from genotype. However, constraints on power due to statistical confounders in large GWAS and smaller sample sizes in QTL studies still limit our ability to resolve numerous small-effect variants, map them to causal genes, identify pleiotropic effects across multiple traits, and infer non-additive interactions between loci (epistasis). Here, we introduce barcoded bulk quantitative trait locus (BB-QTL) mapping, which allows us to construct, genotype, and phenotype 100,000 offspring of a budding yeast cross, two orders of magnitude larger than the previous state of the art. We use this panel to map the genetic basis of eighteen complex traits, finding that the genetic architecture of these traits involves hundreds of small-effect loci densely spaced throughout the genome, many with widespread pleiotropic effects across multiple traits. Epistasis plays a central role, with thousands of interactions that provide insight into genetic networks. By dramatically increasing sample size, BB-QTL mapping demonstrates the potential of natural variants in high-powered QTL studies to reveal the highly polygenic, pleiotropic, and epistatic architecture of complex traits.
Collapse
Affiliation(s)
- Alex N Nguyen Ba
- Department of Organismic and Evolutionary Biology, Harvard University
| | | | - Artur Rego-Costa
- Department of Organismic and Evolutionary Biology, Harvard University
| | | | | | | | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard University
| |
Collapse
|
42
|
Sandhu KS, Patil SS, Aoun M, Carter AH. Multi-Trait Multi-Environment Genomic Prediction for End-Use Quality Traits in Winter Wheat. Front Genet 2022; 13:831020. [PMID: 35173770 PMCID: PMC8841657 DOI: 10.3389/fgene.2022.831020] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 01/06/2022] [Indexed: 11/13/2022] Open
Abstract
Soft white wheat is a wheat class used in foreign and domestic markets to make various end products requiring specific quality attributes. Due to associated cost, time, and amount of seed needed, phenotyping for the end-use quality trait is delayed until later generations. Previously, we explored the potential of using genomic selection (GS) for selecting superior genotypes earlier in the breeding program. Breeders typically measure multiple traits across various locations, and it opens up the avenue for exploring multi-trait-based GS models. This study's main objective was to explore the potential of using multi-trait GS models for predicting seven different end-use quality traits using cross-validation, independent prediction, and across-location predictions in a wheat breeding program. The population used consisted of 666 soft white wheat genotypes planted for 5 years at two locations in Washington, United States. We optimized and compared the performances of four uni-trait- and multi-trait-based GS models, namely, Bayes B, genomic best linear unbiased prediction (GBLUP), multilayer perceptron (MLP), and random forests. The prediction accuracies for multi-trait GS models were 5.5 and 7.9% superior to uni-trait models for the within-environment and across-location predictions. Multi-trait machine and deep learning models performed superior to GBLUP and Bayes B for across-location predictions, but their advantages diminished when the genotype by environment component was included in the model. The highest improvement in prediction accuracy, that is, 35% was obtained for flour protein content with the multi-trait MLP model. This study showed the potential of using multi-trait-based GS models to enhance prediction accuracy by using information from previously phenotyped traits. It would assist in speeding up the breeding cycle time in a cost-friendly manner.
Collapse
Affiliation(s)
- Karansher S. Sandhu
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Shruti Sunil Patil
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States1
| | - Meriem Aoun
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Arron H. Carter
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| |
Collapse
|
43
|
Montesinos-López OA, Montesinos-López A, Mosqueda-González BA, Bentley AR, Lillemo M, Varshney RK, Crossa J. A New Deep Learning Calibration Method Enhances Genome-Based Prediction of Continuous Crop Traits. Front Genet 2021; 12:798840. [PMID: 34976026 PMCID: PMC8718701 DOI: 10.3389/fgene.2021.798840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 11/18/2021] [Indexed: 11/13/2022] Open
Abstract
Genomic selection (GS) has the potential to revolutionize predictive plant breeding. A reference population is phenotyped and genotyped to train a statistical model that is used to perform genome-enabled predictions of new individuals that were only genotyped. In this vein, deep neural networks, are a type of machine learning model and have been widely adopted for use in GS studies, as they are not parametric methods, making them more adept at capturing nonlinear patterns. However, the training process for deep neural networks is very challenging due to the numerous hyper-parameters that need to be tuned, especially when imperfect tuning can result in biased predictions. In this paper we propose a simple method for calibrating (adjusting) the prediction of continuous response variables resulting from deep learning applications. We evaluated the proposed deep learning calibration method (DL_M2) using four crop breeding data sets and its performance was compared with the standard deep learning method (DL_M1), as well as the standard genomic Best Linear Unbiased Predictor (GBLUP). While the GBLUP was the most accurate model overall, the proposed deep learning calibration method (DL_M2) helped increase the genome-enabled prediction performance in all data sets when compared with the traditional DL method (DL_M1). Taken together, we provide evidence for extending the use of the proposed calibration method to evaluate its potential and consistency for predicting performance in the context of GS applied to plant breeding.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Mexico
- *Correspondence: Abelardo Montesinos-López, ; Rajeev K. Varshney, ; José Crossa,
| | - Brandon A. Mosqueda-González
- Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Esq. Miguel Othón de Mendizábal, Mexico city, Mexico
| | - Alison R. Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Morten Lillemo
- Department of Plant Sciences, Norwegian University of Life Sciences, IHA/CIGENE, As, Norway
| | - Rajeev K. Varshney
- Centre of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Murdoch University, Perth, WA, Australia
- *Correspondence: Abelardo Montesinos-López, ; Rajeev K. Varshney, ; José Crossa,
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
- Colegio de Postgraduados, Montecillo, Mexico
- *Correspondence: Abelardo Montesinos-López, ; Rajeev K. Varshney, ; José Crossa,
| |
Collapse
|
44
|
Gardiner LJ, Krishna R. Bluster or Lustre: Can AI Improve Crops and Plant Health? PLANTS (BASEL, SWITZERLAND) 2021; 10:plants10122707. [PMID: 34961177 PMCID: PMC8707749 DOI: 10.3390/plants10122707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 11/24/2021] [Accepted: 12/06/2021] [Indexed: 06/14/2023]
Abstract
In a changing climate where future food security is a growing concern, researchers are exploring new methods and technologies in the effort to meet ambitious crop yield targets. The application of Artificial Intelligence (AI) including Machine Learning (ML) methods in this area has been proposed as a potential mechanism to support this. This review explores current research in the area to convey the state-of-the-art as to how AI/ML have been used to advance research, gain insights, and generally enable progress in this area. We address the question-Can AI improve crops and plant health? We further discriminate the bluster from the lustre by identifying the key challenges that AI has been shown to address, balanced with the potential issues with its usage, and the key requisites for its success. Overall, we hope to raise awareness and, as a result, promote usage, of AI related approaches where they can have appropriate impact to improve practices in agricultural and plant sciences.
Collapse
|
45
|
Washburn JD, Cimen E, Ramstein G, Reeves T, O'Briant P, McLean G, Cooper M, Hammer G, Buckler ES. Predicting phenotypes from genetic, environment, management, and historical data using CNNs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:3997-4011. [PMID: 34448888 DOI: 10.1007/s00122-021-03943-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 08/18/2021] [Indexed: 06/13/2023]
Abstract
Convolutional Neural Networks (CNNs) can perform similarly or better than standard genomic prediction methods when sufficient genetic, environmental, and management data are provided. Predicting phenotypes from genetic (G), environmental (E), and management (M) conditions is a long-standing challenge with implications to agriculture, medicine, and conservation. Most methods reduce the factors in a dataset (feature engineering) in a subjective and potentially oversimplified manner. Deep neural networks such as Multilayer Perceptrons (MPL) and Convolutional Neural Networks (CNN) can overcome this by allowing the data itself to determine which factors are most important. CNN models were developed for predicting agronomic yield from a combination of replicated trials and historical yield survey data. The results were more accurate than standard methods when tested on held-out G, E, and M data (r = 0.50 vs. r = 0.43), and performed slightly worse than standard methods when only G was held out (r = 0.74 vs. r = 0.80). Pre-training on historical data increased accuracy compared to trial data alone. Saliency map analysis indicated the CNN has "learned" to prioritize many factors of known agricultural importance.
Collapse
Affiliation(s)
- Jacob D Washburn
- United States Department of Agriculture, Agricultural Research Service, Columbia, MO, 65211, USA.
| | - Emre Cimen
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
- Computational Intelligence and Optimization Laboratory, Industrial Engineering Department, Eskisehir Technical University, Eskisehir, Turkey
| | - Guillaume Ramstein
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000, Aarhus, Denmark
| | - Timothy Reeves
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
| | - Patrick O'Briant
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
| | - Greg McLean
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| | - Mark Cooper
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| | - Graeme Hammer
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
- Department of Agriculture, Agricultural Research Service, Ithaca, NY, 14850, USA
| |
Collapse
|
46
|
Strategies to Increase Prediction Accuracy in Genomic Selection of Complex Traits in Alfalfa ( Medicago sativa L.). Cells 2021; 10:cells10123372. [PMID: 34943880 PMCID: PMC8699225 DOI: 10.3390/cells10123372] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 11/19/2021] [Accepted: 11/24/2021] [Indexed: 12/27/2022] Open
Abstract
Agronomic traits such as biomass yield and abiotic stress tolerance are genetically complex and challenging to improve through conventional breeding approaches. Genomic selection (GS) is an alternative approach in which genome-wide markers are used to determine the genomic estimated breeding value (GEBV) of individuals in a population. In alfalfa (Medicago sativa L.), previous results indicated that low to moderate prediction accuracy values (<70%) were obtained in complex traits, such as yield and abiotic stress resistance. There is a need to increase the prediction value in order to employ GS in breeding programs. In this paper we reviewed different statistic models and their applications in polyploid crops, such as alfalfa and potato. Specifically, we used empirical data affiliated with alfalfa yield under salt stress to investigate approaches that use DNA marker importance values derived from machine learning models, and genome-wide association studies (GWAS) of marker-trait association scores based on different GWASpoly models, in weighted GBLUP analyses. This approach increased prediction accuracies from 50% to more than 80% for alfalfa yield under salt stress. Finally, we expended the weighted GBLUP approach to potato and analyzed 13 phenotypic traits and obtained similar results. This is the first report on alfalfa to use variable importance and GWAS-assisted approaches to increase the prediction accuracy of GS, thus helping to select superior alfalfa lines based on their GEBVs.
Collapse
|
47
|
Westhues CC, Mahone GS, da Silva S, Thorwarth P, Schmidt M, Richter JC, Simianer H, Beissinger TM. Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks. FRONTIERS IN PLANT SCIENCE 2021; 12:699589. [PMID: 34880880 PMCID: PMC8647909 DOI: 10.3389/fpls.2021.699589] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 10/15/2021] [Indexed: 05/26/2023]
Abstract
The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method (elastic net) and to two nonlinear gradient boosting methods based on decision tree algorithms (XGBoost, LightGBM). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments.
Collapse
Affiliation(s)
- Cathy C. Westhues
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen, Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen, Goettingen, Germany
| | | | - Sofia da Silva
- Kleinwanzlebener Saatzucht (KWS) SAAT SE, Einbeck, Germany
| | | | - Malthe Schmidt
- Kleinwanzlebener Saatzucht (KWS) SAAT SE, Einbeck, Germany
| | | | - Henner Simianer
- Center for Integrated Breeding Research, University of Goettingen, Goettingen, Germany
- Animal Breeding and Genetics Group, Department of Animal Sciences, University of Goettingen, Goettingen, Germany
| | - Timothy M. Beissinger
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen, Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen, Goettingen, Germany
| |
Collapse
|
48
|
Ubbens J, Parkin I, Eynck C, Stavness I, Sharpe AG. Deep neural networks for genomic prediction do not estimate marker effects. THE PLANT GENOME 2021; 14:e20147. [PMID: 34596363 DOI: 10.1002/tpg2.20147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 07/09/2021] [Indexed: 06/13/2023]
Abstract
Genomic prediction is a promising technology for advancing both plant and animal breeding, with many different prediction models evaluated in the literature. It has been suggested that the ability of powerful nonlinear models, such as deep neural networks, to capture complex epistatic effects between markers offers advantages for genomic prediction. However, these methods tend not to outperform classical linear methods, leaving it an open question why this capacity to model nonlinear effects does not seem to result in better predictive capability. In this work, we propose the theory that, because of a previously described principle called shortcut learning, deep neural networks tend to base their predictions on overall genetic relatedness rather than on the effects of particular markers such as epistatic effects. Using several datasets of crop plants [lentil (Lens culinaris Medik.), wheat (Triticum aestivum L.), and Brassica carinata A. Braun], we demonstrate the network's indifference to the values of the markers by showing that the same network, provided with only the locations of matches between markers for two individuals, is able to perform prediction to the same level of accuracy.
Collapse
Affiliation(s)
- Jordan Ubbens
- Global Institute for Food Security (GIFS), University of Saskatchewan, Saskatoon, SK, S7N 0W9, Canada
| | - Isobel Parkin
- Agriculture and Agri-Food Canada, Saskatoon, SK, S7N 0X2, Canada
| | - Christina Eynck
- Agriculture and Agri-Food Canada, Saskatoon, SK, S7N 0X2, Canada
| | - Ian Stavness
- Global Institute for Food Security (GIFS), University of Saskatchewan, Saskatoon, SK, S7N 0W9, Canada
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, S7N 0W9, Canada
| | - Andrew G Sharpe
- Global Institute for Food Security (GIFS), University of Saskatchewan, Saskatoon, SK, S7N 0W9, Canada
| |
Collapse
|
49
|
Bayer PE, Petereit J, Danilevicz MF, Anderson R, Batley J, Edwards D. The application of pangenomics and machine learning in genomic selection in plants. THE PLANT GENOME 2021; 14:e20112. [PMID: 34288550 DOI: 10.1002/tpg2.20112] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 05/01/2021] [Indexed: 05/10/2023]
Abstract
Genomic selection approaches have increased the speed of plant breeding, leading to growing crop yields over the last decade. However, climate change is impacting current and future yields, resulting in the need to further accelerate breeding efforts to cope with these changing conditions. Here we present approaches to accelerate plant breeding and incorporate nonadditive effects in genomic selection by applying state-of-the-art machine learning approaches. These approaches are made more powerful by the inclusion of pangenomes, which represent the entire genome content of a species. Understanding the strengths and limitations of machine learning methods, compared with more traditional genomic selection efforts, is paramount to the successful application of these methods in crop breeding. We describe examples of genomic selection and pangenome-based approaches in crop breeding, discuss machine learning-specific challenges, and highlight the potential for the application of machine learning in genomic selection. We believe that careful implementation of machine learning approaches will support crop improvement to help counter the adverse outcomes of climate change on crop production.
Collapse
Affiliation(s)
- Philipp E Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jakob Petereit
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Monica Furaste Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Robyn Anderson
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| |
Collapse
|
50
|
Sandhu K, Patil SS, Pumphrey M, Carter A. Multitrait machine- and deep-learning models for genomic selection using spectral information in a wheat breeding program. THE PLANT GENOME 2021; 14:e20119. [PMID: 34482627 DOI: 10.1002/tpg2.20119] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 05/18/2021] [Indexed: 06/13/2023]
Abstract
Prediction of breeding values is central to plant breeding and has been revolutionized by the adoption of genomic selection (GS). Use of machine- and deep-learning algorithms applied to complex traits in plants can improve prediction accuracies. Because of the tremendous increase in collected data in breeding programs and the slow rate of genetic gain increase, it is required to explore the potential of artificial intelligence in analyzing the data. The main objectives of this study include optimization of multitrait (MT) machine- and deep-learning models for predicting grain yield and grain protein content in wheat (Triticum aestivum L.) using spectral information. This study compares the performance of four machine- and deep-learning-based unitrait (UT) and MT models with traditional genomic best linear unbiased predictor (GBLUP) and Bayesian models. The dataset consisted of 650 recombinant inbred lines (RILs) from a spring wheat breeding program grown for three years (2014-2016), and spectral data were collected at heading and grain filling stages. The MT-GS models performed 0-28.5 and -0.04 to 15% superior to the UT-GS models. Random forest and multilayer perceptron were the best performing machine- and deep-learning models to predict both traits. Four explored Bayesian models gave similar accuracies, which were less than machine- and deep-learning-based models and required increased computational time. Green normalized difference vegetation index (GNDVI) best predicted grain protein content in seven out of the nine MT-GS models. Overall, this study concluded that machine- and deep-learning-based MT-GS models increased prediction accuracy and should be employed in large-scale breeding programs.
Collapse
Affiliation(s)
- Karansher Sandhu
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| | - Shruti Sunil Patil
- School of Electrical Engineering and Computer Science, WA State University, Pullman, WA, 99164, USA
| | - Michael Pumphrey
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| | - Arron Carter
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| |
Collapse
|