1
|
Novielli P, Romano D, Pavan S, Losciale P, Stellacci AM, Diacono D, Bellotti R, Tangaro S. Explainable artificial intelligence for genotype-to-phenotype prediction in plant breeding: a case study with a dataset from an almond germplasm collection. FRONTIERS IN PLANT SCIENCE 2024; 15:1434229. [PMID: 39319003 PMCID: PMC11420924 DOI: 10.3389/fpls.2024.1434229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 08/13/2024] [Indexed: 09/26/2024]
Abstract
Background Advances in DNA sequencing revolutionized plant genomics and significantly contributed to the study of genetic diversity. However, predicting phenotypes from genomic data remains a challenge, particularly in the context of plant breeding. Despite significant progress, accurately predicting phenotypes from high-dimensional genomic data remains a challenge, particularly in identifying the key genetic factors influencing these predictions. This study aims to bridge this gap by integrating explainable artificial intelligence (XAI) techniques with advanced machine learning models. This approach is intended to enhance both the predictive accuracy and interpretability of genotype-to-phenotype models, thereby improving their reliability and supporting more informed breeding decisions. Results This study compares several ML methods for genotype-to-phenotype prediction, using data available from an almond germplasm collection. After preprocessing and feature selection, regression models are employed to predict almond shelling fraction. Best predictions were obtained by the Random Forest method (correlation = 0.727 ± 0.020, an R 2 = 0.511 ± 0.025, and an RMSE = 7.746 ± 0.199). Notably, the application of the SHAP (SHapley Additive exPlanations) values algorithm to explain the results highlighted several genomic regions associated with the trait, including one, having the highest feature importance, located in a gene potentially involved in seed development. Conclusions Employing explainable artificial intelligence algorithms enhances model interpretability, identifying genetic polymorphisms associated with the shelling percentage. These findings underscore XAI's efficacy in predicting phenotypic traits from genomic data, highlighting its significance in optimizing crop production for sustainable agriculture.
Collapse
Affiliation(s)
- Pierfrancesco Novielli
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Donato Romano
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Stefano Pavan
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Pasquale Losciale
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Anna Maria Stellacci
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Domenico Diacono
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Roberto Bellotti
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
- Dipartimento Interateneo di Fisica “M. Merlin”, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Sabina Tangaro
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| |
Collapse
|
2
|
Xie Z, Weng L, He J, Feng X, Xu X, Ma Y, Bai P, Kong Q. PNNGS, a multi-convolutional parallel neural network for genomic selection. FRONTIERS IN PLANT SCIENCE 2024; 15:1410596. [PMID: 39290743 PMCID: PMC11405342 DOI: 10.3389/fpls.2024.1410596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 08/19/2024] [Indexed: 09/19/2024]
Abstract
Genomic selection (GS) can accomplish breeding faster than phenotypic selection. Improving prediction accuracy is the key to promoting GS. To improve the GS prediction accuracy and stability, we introduce parallel convolution to deep learning for GS and call it a parallel neural network for genomic selection (PNNGS). In PNNGS, information passes through convolutions of different kernel sizes in parallel. The convolutions in each branch are connected with residuals. Four different Lp loss functions train PNNGS. Through experiments, the optimal number of parallel paths for rice, sunflower, wheat, and maize is found to be 4, 6, 4, and 3, respectively. Phenotype prediction is performed on 24 cases through ridge-regression best linear unbiased prediction (RRBLUP), random forests (RF), support vector regression (SVR), deep neural network genomic prediction (DNNGP), and PNNGS. Serial DNNGP and parallel PNNGS outperform the other three algorithms. On average, PNNGS prediction accuracy is 0.031 larger than DNNGP prediction accuracy, indicating that parallelism can improve the GS model. Plants are divided into clusters through principal component analysis (PCA) and K-means clustering algorithms. The sample sizes of different clusters vary greatly, indicating that this is unbalanced data. Through stratified sampling, the prediction stability and accuracy of PNNGS are improved. When the training samples are reduced in small clusters, the prediction accuracy of PNNGS decreases significantly. Increasing the sample size of small clusters is critical to improving the prediction accuracy of GS.
Collapse
Affiliation(s)
- Zhengchao Xie
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Lin Weng
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Jingjing He
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Xianzhong Feng
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
| | - Xiaogang Xu
- School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, China
| | - Yinxing Ma
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Panpan Bai
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| | - Qihui Kong
- Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China
| |
Collapse
|
3
|
Patarroyo C, Dupas S, Restrepo S. A machine learning algorithm for the automatic classification of Phytophthora infestans genotypes into clonal lineages. APPLICATIONS IN PLANT SCIENCES 2024; 12:e11603. [PMID: 39360191 PMCID: PMC11443441 DOI: 10.1002/aps3.11603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 02/20/2024] [Accepted: 02/26/2024] [Indexed: 10/04/2024]
Abstract
Premise The prompt categorization of Phytophthora infestans isolates into described clonal lineages is a key tool for the management of its associated disease, potato late blight. New isolates of this pathogen are currently classified by comparing their microsatellite genotypes with characterized clonal lineages, but an automated classification tool would greatly improve this process. Here, we developed a flexible machine learning-based classifier for P. infestans genotypes. Methods The performance of different machine learning algorithms in classifying P. infestans genotypes into its clonal lineages was preliminarily evaluated with decreasing amounts of training data. The four best algorithms were then evaluated using all collected genotypes. Results mlpML, cforest, nnet, and AdaBag performed best in the preliminary test, correctly classifying almost 100% of the genotypes. AdaBag performed significantly better than the others when tested using the complete data set (Tukey HSD P < 0.001). This algorithm was then implemented in a web application for the automated classification of P. infestans genotypes, which is freely available at https://github.com/cpatarroyo/genotypeclas. Discussion We developed a gradient boosting-based tool to automatically classify P. infestans genotypes into its clonal lineages. This could become a valuable resource for the prompt identification of clonal lineages spreading into new regions.
Collapse
Affiliation(s)
- Camilo Patarroyo
- Department of Biological SciencesUniversidad de los AndesBogotáColombia
- Université Paris‐Saclay, CNRS, IRD, UMR Évolution, Génomes, Comportement et ÉcologieGif‐sur‐Yvette91198France
| | - Stéphane Dupas
- Université Paris‐Saclay, CNRS, IRD, UMR Évolution, Génomes, Comportement et ÉcologieGif‐sur‐Yvette91198France
| | - Silvia Restrepo
- Department of Food and Chemical EngineeringUniversidad de los AndesBogotáColombia
| |
Collapse
|
4
|
Khatibi SMH, Ali J. Harnessing the power of machine learning for crop improvement and sustainable production. FRONTIERS IN PLANT SCIENCE 2024; 15:1417912. [PMID: 39188546 PMCID: PMC11346375 DOI: 10.3389/fpls.2024.1417912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 07/15/2024] [Indexed: 08/28/2024]
Abstract
Crop improvement and production domains encounter large amounts of expanding data with multi-layer complexity that forces researchers to use machine-learning approaches to establish predictive and informative models to understand the sophisticated mechanisms underlying these processes. All machine-learning approaches aim to fit models to target data; nevertheless, it should be noted that a wide range of specialized methods might initially appear confusing. The principal objective of this study is to offer researchers an explicit introduction to some of the essential machine-learning approaches and their applications, comprising the most modern and utilized methods that have gained widespread adoption in crop improvement or similar domains. This article explicitly explains how different machine-learning methods could be applied for given agricultural data, highlights newly emerging techniques for machine-learning users, and lays out technical strategies for agri/crop research practitioners and researchers.
Collapse
Affiliation(s)
| | - Jauhar Ali
- Rice Breeding Platform, International Rice Research Institute, Los Baños, Laguna, Philippines
| |
Collapse
|
5
|
Shen Z, Shen E, Yang K, Fan Z, Zhu QH, Fan L, Ye CY. BreedingAIDB: A database integrating crop genome-to-phenotype paired data with machine learning tools applicable to breeding. PLANT COMMUNICATIONS 2024; 5:100894. [PMID: 38571312 PMCID: PMC11287151 DOI: 10.1016/j.xplc.2024.100894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 03/04/2024] [Accepted: 04/02/2024] [Indexed: 04/05/2024]
Affiliation(s)
- Zijie Shen
- Hainan Institute, Zhejiang University, Sanya 572025, China; Institute of Crop Science & Institute of Bioinformatics, College of Agriculture & Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Enhui Shen
- Institute of Crop Science & Institute of Bioinformatics, College of Agriculture & Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Kun Yang
- Institute of Crop Science & Institute of Bioinformatics, College of Agriculture & Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Zuoqian Fan
- Institute of Crop Science & Institute of Bioinformatics, College of Agriculture & Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Qian-Hao Zhu
- CSIRO Agriculture and Food, Canberra, ACT 2601, Australia
| | - Longjiang Fan
- Hainan Institute, Zhejiang University, Sanya 572025, China; Institute of Crop Science & Institute of Bioinformatics, College of Agriculture & Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Chu-Yu Ye
- Institute of Crop Science & Institute of Bioinformatics, College of Agriculture & Biotechnology, Zhejiang University, Hangzhou 310058, China.
| |
Collapse
|
6
|
Hu H, Li R, Zhao J, Batley J, Edwards D. Technological Development and Advances for Constructing and Analyzing Plant Pangenomes. Genome Biol Evol 2024; 16:evae081. [PMID: 38669452 PMCID: PMC11058698 DOI: 10.1093/gbe/evae081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/09/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
A pangenome captures the genomic diversity for a species, derived from a collection of genetic sequences of diverse populations. Advances in sequencing technologies have given rise to three primary methods for pangenome construction and analysis: de novo assembly and comparison, reference genome-based iterative assembly, and graph-based pangenome construction. Each method presents advantages and challenges in processing varying amounts and structures of DNA sequencing data. With the emergence of high-quality genome assemblies and advanced bioinformatic tools, the graph-based pangenome is emerging as an advanced reference for exploring the biological and functional implications of genetic variations.
Collapse
Affiliation(s)
- Haifei Hu
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangzhou 510640, China
| | - Risheng Li
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangzhou 510640, China
- College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Junliang Zhao
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangzhou 510640, China
| | - Jacqueline Batley
- School of Biological Sciences, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Perth, WA, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| |
Collapse
|
7
|
Zhou W, Yan Z, Zhang L. A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction. Sci Rep 2024; 14:5905. [PMID: 38467662 PMCID: PMC10928191 DOI: 10.1038/s41598-024-55243-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/21/2024] [Indexed: 03/13/2024] Open
Abstract
To explore a robust tool for advancing digital breeding practices through an artificial intelligence-driven phenotype prediction expert system, we undertook a thorough analysis of 11 non-linear regression models. Our investigation specifically emphasized the significance of Support Vector Regression (SVR) and SHapley Additive exPlanations (SHAP) in predicting soybean branching. By using branching data (phenotype) of 1918 soybean accessions and 42 k SNP (Single Nucleotide Polymorphism) polymorphic data (genotype), this study systematically compared 11 non-linear regression AI models, including four deep learning models (DBN (deep belief network) regression, ANN (artificial neural network) regression, Autoencoders regression, and MLP (multilayer perceptron) regression) and seven machine learning models (e.g., SVR (support vector regression), XGBoost (eXtreme Gradient Boosting) regression, Random Forest regression, LightGBM regression, GPs (Gaussian processes) regression, Decision Tree regression, and Polynomial regression). After being evaluated by four valuation metrics: R2 (R-squared), MAE (Mean Absolute Error), MSE (Mean Squared Error), and MAPE (Mean Absolute Percentage Error), it was found that the SVR, Polynomial Regression, DBN, and Autoencoder outperformed other models and could obtain a better prediction accuracy when they were used for phenotype prediction. In the assessment of deep learning approaches, we exemplified the SVR model, conducting analyses on feature importance and gene ontology (GO) enrichment to provide comprehensive support. After comprehensively comparing four feature importance algorithms, no notable distinction was observed in the feature importance ranking scores across the four algorithms, namely Variable Ranking, Permutation, SHAP, and Correlation Matrix, but the SHAP value could provide rich information on genes with negative contributions, and SHAP importance was chosen for feature selection. The results of this study offer valuable insights into AI-mediated plant breeding, addressing challenges faced by traditional breeding programs. The method developed has broad applicability in phenotype prediction, minor QTL (quantitative trait loci) mining, and plant smart-breeding systems, contributing significantly to the advancement of AI-based breeding practices and transitioning from experience-based to data-based breeding.
Collapse
Affiliation(s)
- Wei Zhou
- Florida Agricultural and Mechanical University, Tallahassee, FL, 32307, USA.
| | - Zhengxiao Yan
- Florida State University, Tallahassee, FL, 32306, USA
| | - Liting Zhang
- Florida State University, Tallahassee, FL, 32306, USA
| |
Collapse
|
8
|
Magon G, De Rosa V, Martina M, Falchi R, Acquadro A, Barcaccia G, Portis E, Vannozzi A, De Paoli E. Boosting grapevine breeding for climate-smart viticulture: from genetic resources to predictive genomics. FRONTIERS IN PLANT SCIENCE 2023; 14:1293186. [PMID: 38148866 PMCID: PMC10750425 DOI: 10.3389/fpls.2023.1293186] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 11/27/2023] [Indexed: 12/28/2023]
Abstract
The multifaceted nature of climate change is increasing the urgency to select resilient grapevine varieties, or generate new, fitter cultivars, to withstand a multitude of new challenging conditions. The attainment of this goal is hindered by the limiting pace of traditional breeding approaches, which require decades to result in new selections. On the other hand, marker-assisted breeding has proved useful when it comes to traits governed by one or few genes with great effects on the phenotype, but its efficacy is still restricted for complex traits controlled by many loci. On these premises, innovative strategies are emerging which could help guide selection, taking advantage of the genetic diversity within the Vitis genus in its entirety. Multiple germplasm collections are also available as a source of genetic material for the introgression of alleles of interest via adapted and pioneering transformation protocols, which present themselves as promising tools for future applications on a notably recalcitrant species such as grapevine. Genome editing intersects both these strategies, not only by being an alternative to obtain focused changes in a relatively rapid way, but also by supporting a fine-tuning of new genotypes developed with other methods. A review on the state of the art concerning the available genetic resources and the possibilities of use of innovative techniques in aid of selection is presented here to support the production of climate-smart grapevine genotypes.
Collapse
Affiliation(s)
- Gabriele Magon
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), Laboratory of Plant Genetics and Breeding, University of Padova, Agripolis, Viale dell’Università 16, Legnaro, Italy
| | - Valeria De Rosa
- Department of Agricultural, Food, Environmental and Animal Sciences (DI4A), University of Udine, Via delle Scienze, 206, Udine, Italy
| | - Matteo Martina
- Department of Agricultural, Forest and Food Sciences (DISAFA), Plant Genetics, University of Torino, Largo P. Braccini 2, Grugliasco, Italy
| | - Rachele Falchi
- Department of Agricultural, Food, Environmental and Animal Sciences (DI4A), University of Udine, Via delle Scienze, 206, Udine, Italy
| | - Alberto Acquadro
- Department of Agricultural, Forest and Food Sciences (DISAFA), Plant Genetics, University of Torino, Largo P. Braccini 2, Grugliasco, Italy
| | - Gianni Barcaccia
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), Laboratory of Plant Genetics and Breeding, University of Padova, Agripolis, Viale dell’Università 16, Legnaro, Italy
| | - Ezio Portis
- Department of Agricultural, Forest and Food Sciences (DISAFA), Plant Genetics, University of Torino, Largo P. Braccini 2, Grugliasco, Italy
| | - Alessandro Vannozzi
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), Laboratory of Plant Genetics and Breeding, University of Padova, Agripolis, Viale dell’Università 16, Legnaro, Italy
| | - Emanuele De Paoli
- Department of Agricultural, Food, Environmental and Animal Sciences (DI4A), University of Udine, Via delle Scienze, 206, Udine, Italy
| |
Collapse
|
9
|
Heinrich F, Lange TM, Kircher M, Ramzan F, Schmitt AO, Gültas M. Exploring the potential of incremental feature selection to improve genomic prediction accuracy. Genet Sel Evol 2023; 55:78. [PMID: 37946104 PMCID: PMC10634161 DOI: 10.1186/s12711-023-00853-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 11/02/2023] [Indexed: 11/12/2023] Open
Abstract
BACKGROUND The ever-increasing availability of high-density genomic markers in the form of single nucleotide polymorphisms (SNPs) enables genomic prediction, i.e. the inference of phenotypes based solely on genomic data, in the field of animal and plant breeding, where it has become an important tool. However, given the limited number of individuals, the abundance of variables (SNPs) can reduce the accuracy of prediction models due to overfitting or irrelevant SNPs. Feature selection can help to reduce the number of irrelevant SNPs and increase the model performance. In this study, we investigated an incremental feature selection approach based on ranking the SNPs according to the results of a genome-wide association study that we combined with random forest as a prediction model, and we applied it on several animal and plant datasets. RESULTS Applying our approach to different datasets yielded a wide range of outcomes, i.e. from a substantial increase in prediction accuracy in a few cases to minor improvements when only a fraction of the available SNPs were used. Compared with models using all available SNPs, our approach was able to achieve comparable performances with a considerably reduced number of SNPs in several cases. Our approach showcased state-of-the-art efficiency and performance while having a faster computation time. CONCLUSIONS The results of our study suggest that our incremental feature selection approach has the potential to improve prediction accuracy substantially. However, this gain seems to depend on the genomic data used. Even for datasets where the number of markers is smaller than the number of individuals, feature selection may still increase the performance of the genomic prediction. Our approach is implemented in R and is available at https://github.com/FelixHeinrich/GP_with_IFS/ .
Collapse
Affiliation(s)
- Felix Heinrich
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075, Göttingen, Germany.
| | - Thomas Martin Lange
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075, Göttingen, Germany
| | - Magdalena Kircher
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, 30559, Hannover, Germany
| | - Faisal Ramzan
- Institute of Animal and Dairy Sciences, University of Agriculture Faisalabad, Jail Road, 38000, Faisalabad, Pakistan
| | - Armin Otto Schmitt
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075, Göttingen, Germany
- Center for Integrated Breeding Research (CiBreed), Georg-August University, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany
| | - Mehmet Gültas
- Center for Integrated Breeding Research (CiBreed), Georg-August University, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany.
- Faculty of Agriculture, South Westphalia University of Applied Sciences, 59494, Soest, Germany.
| |
Collapse
|
10
|
Mostafa S, Mondal D, Panjvani K, Kochian L, Stavness I. Explainable deep learning in plant phenotyping. Front Artif Intell 2023; 6:1203546. [PMID: 37795496 PMCID: PMC10546035 DOI: 10.3389/frai.2023.1203546] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 08/25/2023] [Indexed: 10/06/2023] Open
Abstract
The increasing human population and variable weather conditions, due to climate change, pose a threat to the world's food security. To improve global food security, we need to provide breeders with tools to develop crop cultivars that are more resilient to extreme weather conditions and provide growers with tools to more effectively manage biotic and abiotic stresses in their crops. Plant phenotyping, the measurement of a plant's structural and functional characteristics, has the potential to inform, improve and accelerate both breeders' selections and growers' management decisions. To improve the speed, reliability and scale of plant phenotyping procedures, many researchers have adopted deep learning methods to estimate phenotypic information from images of plants and crops. Despite the successful results of these image-based phenotyping studies, the representations learned by deep learning models remain difficult to interpret, understand, and explain. For this reason, deep learning models are still considered to be black boxes. Explainable AI (XAI) is a promising approach for opening the deep learning model's black box and providing plant scientists with image-based phenotypic information that is interpretable and trustworthy. Although various fields of study have adopted XAI to advance their understanding of deep learning models, it has yet to be well-studied in the context of plant phenotyping research. In this review article, we reviewed existing XAI studies in plant shoot phenotyping, as well as related domains, to help plant researchers understand the benefits of XAI and make it easier for them to integrate XAI into their future studies. An elucidation of the representations within a deep learning model can help researchers explain the model's decisions, relate the features detected by the model to the underlying plant physiology, and enhance the trustworthiness of image-based phenotypic information used in food production systems.
Collapse
Affiliation(s)
- Sakib Mostafa
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| | - Debajyoti Mondal
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| | - Karim Panjvani
- Global Institute for Food Security, University of Saskatchewan, Saskatoon, SK, Canada
| | - Leon Kochian
- Global Institute for Food Security, University of Saskatchewan, Saskatoon, SK, Canada
| | - Ian Stavness
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
11
|
Jones D, Fornarelli R, Derbyshire M, Gibberd M, Barker K, Hane J. The pursuit of genetic gain in agricultural crops through the application of machine-learning to genomic prediction. Front Genet 2023; 14:1186782. [PMID: 37614817 PMCID: PMC10443705 DOI: 10.3389/fgene.2023.1186782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 07/24/2023] [Indexed: 08/25/2023] Open
Abstract
Current practice in agriculture applies genomic prediction to assist crop breeding in the analysis of genetic marker data. Genomic selection methods typically use linear mixed models, but using machine-learning may provide further potential for improved selection accuracy, or may provide additional information. Here we describe SelectML, an automated pipeline for testing and comparing the performance of a range of linear mixed model and machine-learning-based genomic selection methods. We demonstrate the use of SelectML on an in silico-generated marker dataset which simulated a randomly-sampled (mixed) and an unevenly-sampled (unbalanced) population, comparing the relative performance of various methods included in SelectML on the two datasets. Although machine-learning based methods performed similarly overall to linear mixed models, they performed worse on the mixed dataset and marginally better on the unbalanced dataset, being more affected than linear mixed models by the imposed sampling bias. SelectML can assist in the training, comparison, and selection of genomic selection models, and is available from https://github.com/darcyabjones/selectml.
Collapse
Affiliation(s)
- Darcy Jones
- Centre for Crop and Disease Management, Curtin University, Perth, WA, Australia
| | - Roberta Fornarelli
- Centre for Crop and Disease Management, Curtin University, Perth, WA, Australia
- Curtin Institute for Computation, Curtin University, Perth, WA, Australia
| | - Mark Derbyshire
- Centre for Crop and Disease Management, Curtin University, Perth, WA, Australia
| | - Mark Gibberd
- Centre for Crop and Disease Management, Curtin University, Perth, WA, Australia
| | - Kathryn Barker
- Curtin Institute for Computation, Curtin University, Perth, WA, Australia
| | - James Hane
- Centre for Crop and Disease Management, Curtin University, Perth, WA, Australia
| |
Collapse
|
12
|
Salman Z, Muhammad A, Piran MJ, Han D. Crop-saving with AI: latest trends in deep learning techniques for plant pathology. FRONTIERS IN PLANT SCIENCE 2023; 14:1224709. [PMID: 37600194 PMCID: PMC10433211 DOI: 10.3389/fpls.2023.1224709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 06/12/2023] [Indexed: 08/22/2023]
Abstract
Plant diseases pose a major threat to agricultural production and the food supply chain, as they expose plants to potentially disruptive pathogens that can affect the lives of those who are associated with it. Deep learning has been applied in a range of fields such as object detection, autonomous vehicles, fraud detection etc. Several researchers have tried to implement deep learning techniques in precision agriculture. However, there are pros and cons to the approaches they have opted for disease detection and identification. In this survey, we have made an attempt to capture the significant advancements in machine-learning based disease detection. We have discussed prevalent datasets and techniques that have been employed as well as highlighted emerging approaches being used for plant disease detection. By exploring these advancements, we aim to present a comprehensive overview of the prominent approaches in precision agriculture, along with their associated challenges and potential improvements. This paper delves into the challenges associated with the implementation and briefly discusses the future trends. Overall, this paper presents a bird's eye view of plant disease datasets, deep learning techniques, their accuracies and the challenges associated with them. Our insights will serve as a valuable resource for researchers and practitioners in the field. We hope that this survey will inform and inspire future research efforts, ultimately leading to improved precision agriculture practices and enhanced crop health management.
Collapse
Affiliation(s)
| | | | | | - Dongil Han
- Department of Computer Science and Engineering, Sejong University, Seoul, Republic of Korea
| |
Collapse
|
13
|
He W, Ye Z, Li M, Yan Y, Lu W, Xing G. Extraction of soybean plant trait parameters based on SfM-MVS algorithm combined with GRNN. FRONTIERS IN PLANT SCIENCE 2023; 14:1181322. [PMID: 37560031 PMCID: PMC10407792 DOI: 10.3389/fpls.2023.1181322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 07/06/2023] [Indexed: 08/11/2023]
Abstract
Soybean is an important grain and oil crop worldwide and is rich in nutritional value. Phenotypic morphology plays an important role in the selection and breeding of excellent soybean varieties to achieve high yield. Nowadays, the mainstream manual phenotypic measurement has some problems such as strong subjectivity, high labor intensity and slow speed. To address the problems, a three-dimensional (3D) reconstruction method for soybean plants based on structure from motion (SFM) was proposed. First, the 3D point cloud of a soybean plant was reconstructed from multi-view images obtained by a smartphone based on the SFM algorithm. Second, low-pass filtering, Gaussian filtering, Ordinary Least Square (OLS) plane fitting, and Laplacian smoothing were used in fusion to automatically segment point cloud data, such as individual plants, stems, and leaves. Finally, Eleven morphological traits, such as plant height, minimum bounding box volume per plant, leaf projection area, leaf projection length and width, and leaf tilt information, were accurately and nondestructively measured by the proposed an algorithm for leaf phenotype measurement (LPM). Moreover, Support Vector Machine (SVM), Back Propagation Neural Network (BP), and Back Propagation Neural Network (GRNN) prediction models were established to predict and identify soybean plant varieties. The results indicated that, compared with the manual measurement, the root mean square error (RMSE) of plant height, leaf length, and leaf width were 0.9997, 0.2357, and 0.2666 cm, and the mean absolute percentage error (MAPE) were 2.7013%, 1.4706%, and 1.8669%, and the coefficients of determination (R2) were 0.9775, 0.9785, and 0.9487, respectively. The accuracy of predicting plant species according to the six leaf parameters was highest when using GRNN, reaching 0.9211, and the RMSE was 18.3263. Based on the phenotypic traits of plants, the differences between C3, 47-6 and W82 soybeans were analyzed genetically, and because C3 was an insect-resistant line, the trait parametes (minimum box volume per plant, number of leaves, minimum size of single leaf box, leaf projection area).The results show that the proposed method can effectively extract the 3D phenotypic structure information of soybean plants and leaves without loss which has the potential using ability in other plants with dense leaves.
Collapse
Affiliation(s)
- Wei He
- College of Engineering, Nanjing Agricultural University, Nanjing, China
| | - Zhihao Ye
- Soybean Research Institute, Ministry of Agriculture and Rural Affairs (MARA) National Center for Soybean Improvement, Ministry of Agriculture and Rural Affairs (MARA) Key Laboratory of Biology and Genetic Improvement of Soybean, National Key Laboratory for Crop Genetics & Germplasm Enhancement and Utilization, Jiangsu Collaborative Innovation Center for Modern Crop Production, College of Agriculture, Nanjing Agricultural University, Nanjing, China
| | - Mingshuang Li
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, China
| | - Yulu Yan
- Soybean Research Institute, Ministry of Agriculture and Rural Affairs (MARA) National Center for Soybean Improvement, Ministry of Agriculture and Rural Affairs (MARA) Key Laboratory of Biology and Genetic Improvement of Soybean, National Key Laboratory for Crop Genetics & Germplasm Enhancement and Utilization, Jiangsu Collaborative Innovation Center for Modern Crop Production, College of Agriculture, Nanjing Agricultural University, Nanjing, China
| | - Wei Lu
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, China
| | - Guangnan Xing
- Soybean Research Institute, Ministry of Agriculture and Rural Affairs (MARA) National Center for Soybean Improvement, Ministry of Agriculture and Rural Affairs (MARA) Key Laboratory of Biology and Genetic Improvement of Soybean, National Key Laboratory for Crop Genetics & Germplasm Enhancement and Utilization, Jiangsu Collaborative Innovation Center for Modern Crop Production, College of Agriculture, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|
14
|
Abstract
Over the past decade, advances in plant genotyping have been critical in enabling the identification of genetic diversity, in understanding evolution, and in dissecting important traits in both crops and native plants. The widespread popularity of single-nucleotide polymorphisms (SNPs) has prompted significant improvements to SNP-based genotyping, including SNP arrays, genotyping by sequencing, and whole-genome resequencing. More recent approaches, including genotyping structural variants, utilizing pangenomes to capture species-wide genetic diversity and exploiting machine learning to analyze genotypic data sets, are pushing the boundaries of what plant genotyping can offer. In this chapter, we highlight these innovations and discuss how they will accelerate and advance future genotyping efforts.
Collapse
|
15
|
Jubair S, Domaratzki M. Crop genomic selection with deep learning and environmental data: A survey. Front Artif Intell 2023; 5:1040295. [PMID: 36703955 PMCID: PMC9871498 DOI: 10.3389/frai.2022.1040295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/22/2022] [Indexed: 01/12/2023] Open
Abstract
Machine learning techniques for crop genomic selections, especially for single-environment plants, are well-developed. These machine learning models, which use dense genome-wide markers to predict phenotype, routinely perform well on single-environment datasets, especially for complex traits affected by multiple markers. On the other hand, machine learning models for predicting crop phenotype, especially deep learning models, using datasets that span different environmental conditions, have only recently emerged. Models that can accept heterogeneous data sources, such as temperature, soil conditions and precipitation, are natural choices for modeling GxE in multi-environment prediction. Here, we review emerging deep learning techniques that incorporate environmental data directly into genomic selection models.
Collapse
Affiliation(s)
- Sheikh Jubair
- Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada
| | - Mike Domaratzki
- Department of Computer Science, University of Western Ontario, London, ON, Canada
| |
Collapse
|
16
|
Tirnaz S, Zandberg J, Thomas WJW, Marsh J, Edwards D, Batley J. Application of crop wild relatives in modern breeding: An overview of resources, experimental and computational methodologies. FRONTIERS IN PLANT SCIENCE 2022; 13:1008904. [PMID: 36466237 PMCID: PMC9712971 DOI: 10.3389/fpls.2022.1008904] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/25/2022] [Indexed: 06/01/2023]
Abstract
Global agricultural industries are under pressure to meet the future food demand; however, the existing crop genetic diversity might not be sufficient to meet this expectation. Advances in genome sequencing technologies and availability of reference genomes for over 300 plant species reveals the hidden genetic diversity in crop wild relatives (CWRs), which could have significant impacts in crop improvement. There are many ex-situ and in-situ resources around the world holding rare and valuable wild species, of which many carry agronomically important traits and it is crucial for users to be aware of their availability. Here we aim to explore the available ex-/in- situ resources such as genebanks, botanical gardens, national parks, conservation hotspots and inventories holding CWR accessions. In addition we highlight the advances in availability and use of CWR genomic resources, such as their contribution in pangenome construction and introducing novel genes into crops. We also discuss the potential and challenges of modern breeding experimental approaches (e.g. de novo domestication, genome editing and speed breeding) used in CWRs and the use of computational (e.g. machine learning) approaches that could speed up utilization of CWR species in breeding programs towards crop adaptability and yield improvement.
Collapse
|
17
|
Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022; 15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]
Abstract
The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.
Collapse
Affiliation(s)
- Yunbi Xu
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; CIMMYT-China Tropical Maize Research Center, School of Food Science and Engineering, Foshan University, Foshan, Guangdong 528231, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China.
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China
| | - Huihui Li
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan 572024, China
| | - Hongjian Zheng
- CIMMYT-China Specialty Maize Research Center, Shanghai Academy of Agricultural Sciences, Shanghai 201400, China
| | - Jianan Zhang
- MolBreeding Biotechnology Co., Ltd., Shijiazhuang, Hebei 050035, China
| | - Michael S Olsen
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Rajeev K Varshney
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Australia
| | - Boddupalli M Prasanna
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Qian Qian
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
18
|
Zandberg JD, Fernandez CT, Danilevicz MF, Thomas WJW, Edwards D, Batley J. The Global Assessment of Oilseed Brassica Crop Species Yield, Yield Stability and the Underlying Genetics. PLANTS (BASEL, SWITZERLAND) 2022; 11:2740. [PMID: 36297764 PMCID: PMC9610009 DOI: 10.3390/plants11202740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 10/08/2022] [Accepted: 10/09/2022] [Indexed: 06/16/2023]
Abstract
The global demand for oilseeds is increasing along with the human population. The family of Brassicaceae crops are no exception, typically harvested as a valuable source of oil, rich in beneficial molecules important for human health. The global capacity for improving Brassica yield has steadily risen over the last 50 years, with the major crop Brassica napus (rapeseed, canola) production increasing to ~72 Gt in 2020. In contrast, the production of Brassica mustard crops has fluctuated, rarely improving in farming efficiency. The drastic increase in global yield of B. napus is largely due to the demand for a stable source of cooking oil. Furthermore, with the adoption of highly efficient farming techniques, yield enhancement programs, breeding programs, the integration of high-throughput phenotyping technology and establishing the underlying genetics, B. napus yields have increased by >450 fold since 1978. Yield stability has been improved with new management strategies targeting diseases and pests, as well as by understanding the complex interaction of environment, phenotype and genotype. This review assesses the global yield and yield stability of agriculturally important oilseed Brassica species and discusses how contemporary farming and genetic techniques have driven improvements.
Collapse
Affiliation(s)
- Jaco D. Zandberg
- School of Biological Sciences, University of Western Australia, Perth, WA 6009, Australia
| | | | - Monica F. Danilevicz
- School of Biological Sciences, University of Western Australia, Perth, WA 6009, Australia
| | - William J. W. Thomas
- School of Biological Sciences, University of Western Australia, Perth, WA 6009, Australia
| | - David Edwards
- Center for Applied Bioinformatics, School of Biological Sciences, University of Western Australia, Perth, WA 6009, Australia
| | - Jacqueline Batley
- School of Biological Sciences, University of Western Australia, Perth, WA 6009, Australia
| |
Collapse
|