51
|
Ubbens J, Parkin I, Eynck C, Stavness I, Sharpe AG. Deep neural networks for genomic prediction do not estimate marker effects. THE PLANT GENOME 2021; 14:e20147. [PMID: 34596363 DOI: 10.1002/tpg2.20147] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 07/09/2021] [Indexed: 06/13/2023]
Abstract
Genomic prediction is a promising technology for advancing both plant and animal breeding, with many different prediction models evaluated in the literature. It has been suggested that the ability of powerful nonlinear models, such as deep neural networks, to capture complex epistatic effects between markers offers advantages for genomic prediction. However, these methods tend not to outperform classical linear methods, leaving it an open question why this capacity to model nonlinear effects does not seem to result in better predictive capability. In this work, we propose the theory that, because of a previously described principle called shortcut learning, deep neural networks tend to base their predictions on overall genetic relatedness rather than on the effects of particular markers such as epistatic effects. Using several datasets of crop plants [lentil (Lens culinaris Medik.), wheat (Triticum aestivum L.), and Brassica carinata A. Braun], we demonstrate the network's indifference to the values of the markers by showing that the same network, provided with only the locations of matches between markers for two individuals, is able to perform prediction to the same level of accuracy.
Collapse
Affiliation(s)
- Jordan Ubbens
- Global Institute for Food Security (GIFS), University of Saskatchewan, Saskatoon, SK, S7N 0W9, Canada
| | - Isobel Parkin
- Agriculture and Agri-Food Canada, Saskatoon, SK, S7N 0X2, Canada
| | - Christina Eynck
- Agriculture and Agri-Food Canada, Saskatoon, SK, S7N 0X2, Canada
| | - Ian Stavness
- Global Institute for Food Security (GIFS), University of Saskatchewan, Saskatoon, SK, S7N 0W9, Canada
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, S7N 0W9, Canada
| | - Andrew G Sharpe
- Global Institute for Food Security (GIFS), University of Saskatchewan, Saskatoon, SK, S7N 0W9, Canada
| |
Collapse
|
52
|
Bayer PE, Petereit J, Danilevicz MF, Anderson R, Batley J, Edwards D. The application of pangenomics and machine learning in genomic selection in plants. THE PLANT GENOME 2021; 14:e20112. [PMID: 34288550 DOI: 10.1002/tpg2.20112] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 05/01/2021] [Indexed: 05/10/2023]
Abstract
Genomic selection approaches have increased the speed of plant breeding, leading to growing crop yields over the last decade. However, climate change is impacting current and future yields, resulting in the need to further accelerate breeding efforts to cope with these changing conditions. Here we present approaches to accelerate plant breeding and incorporate nonadditive effects in genomic selection by applying state-of-the-art machine learning approaches. These approaches are made more powerful by the inclusion of pangenomes, which represent the entire genome content of a species. Understanding the strengths and limitations of machine learning methods, compared with more traditional genomic selection efforts, is paramount to the successful application of these methods in crop breeding. We describe examples of genomic selection and pangenome-based approaches in crop breeding, discuss machine learning-specific challenges, and highlight the potential for the application of machine learning in genomic selection. We believe that careful implementation of machine learning approaches will support crop improvement to help counter the adverse outcomes of climate change on crop production.
Collapse
Affiliation(s)
- Philipp E Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jakob Petereit
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Monica Furaste Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Robyn Anderson
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| |
Collapse
|
53
|
Sandhu K, Patil SS, Pumphrey M, Carter A. Multitrait machine- and deep-learning models for genomic selection using spectral information in a wheat breeding program. THE PLANT GENOME 2021; 14:e20119. [PMID: 34482627 DOI: 10.1002/tpg2.20119] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 05/18/2021] [Indexed: 06/13/2023]
Abstract
Prediction of breeding values is central to plant breeding and has been revolutionized by the adoption of genomic selection (GS). Use of machine- and deep-learning algorithms applied to complex traits in plants can improve prediction accuracies. Because of the tremendous increase in collected data in breeding programs and the slow rate of genetic gain increase, it is required to explore the potential of artificial intelligence in analyzing the data. The main objectives of this study include optimization of multitrait (MT) machine- and deep-learning models for predicting grain yield and grain protein content in wheat (Triticum aestivum L.) using spectral information. This study compares the performance of four machine- and deep-learning-based unitrait (UT) and MT models with traditional genomic best linear unbiased predictor (GBLUP) and Bayesian models. The dataset consisted of 650 recombinant inbred lines (RILs) from a spring wheat breeding program grown for three years (2014-2016), and spectral data were collected at heading and grain filling stages. The MT-GS models performed 0-28.5 and -0.04 to 15% superior to the UT-GS models. Random forest and multilayer perceptron were the best performing machine- and deep-learning models to predict both traits. Four explored Bayesian models gave similar accuracies, which were less than machine- and deep-learning-based models and required increased computational time. Green normalized difference vegetation index (GNDVI) best predicted grain protein content in seven out of the nine MT-GS models. Overall, this study concluded that machine- and deep-learning-based MT-GS models increased prediction accuracy and should be employed in large-scale breeding programs.
Collapse
Affiliation(s)
- Karansher Sandhu
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| | - Shruti Sunil Patil
- School of Electrical Engineering and Computer Science, WA State University, Pullman, WA, 99164, USA
| | - Michael Pumphrey
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| | - Arron Carter
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| |
Collapse
|
54
|
Vu NT, Phuc TH, Oanh KTP, Sang NV, Trang TT, Nguyen NH. Accuracies of genomic predictions for disease resistance of striped catfish to Edwardsiella ictaluri using artificial intelligence algorithms. G3-GENES GENOMES GENETICS 2021; 12:6408442. [PMID: 34788431 PMCID: PMC8727988 DOI: 10.1093/g3journal/jkab361] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 10/10/2021] [Indexed: 02/04/2023]
Abstract
Assessments of genomic prediction accuracies using artificial intelligent (AI) algorithms (i.e., machine and deep learning methods) are currently not available or very limited in aquaculture species. The principal aim of this study was to examine the predictive performance of these new methods for disease resistance to Edwardsiella ictaluri in a population of striped catfish Pangasianodon hypophthalmus and to make comparisons with four common methods, i.e., pedigree-based best linear unbiased prediction (PBLUP), genomic-based best linear unbiased prediction (GBLUP), single-step GBLUP (ssGBLUP) and a nonlinear Bayesian approach (notably BayesR). Our analyses using machine learning (i.e., ML-KAML) and deep learning (i.e., DL-MLP and DL-CNN) together with the four common methods (PBLUP, GBLUP, ssGBLUP, and BayesR) were conducted for two main disease resistance traits (i.e., survival status coded as 0 and 1 and survival time, i.e., days that the animals were still alive after the challenge test) in a pedigree consisting of 560 individual animals (490 offspring and 70 parents) genotyped for 14,154 single nucleotide polymorphism (SNPs). The results using 6,470 SNPs after quality control showed that machine learning methods outperformed PBLUP, GBLUP, and ssGBLUP, with the increases in the prediction accuracies for both traits by 9.1–15.4%. However, the prediction accuracies obtained from machine learning methods were comparable to those estimated using BayesR. Imputation of missing genotypes using AlphaFamImpute increased the prediction accuracies by 5.3–19.2% in all the methods and data used. On the other hand, there were insignificant decreases (0.3–5.6%) in the prediction accuracies for both survival status and survival time when multivariate models were used in comparison to univariate analyses. Interestingly, the genomic prediction accuracies based on only highly significant SNPs (P < 0.00001, 318–400 SNPs for survival status and 1,362–1,589 SNPs for survival time) were somewhat lower (0.3–15.6%) than those obtained from the whole set of 6,470 SNPs. In most of our analyses, the accuracies of genomic prediction were somewhat higher for survival time than survival status (0/1 data). It is concluded that although there are prospects for the application of genomic selection to increase disease resistance to E. ictaluri in striped catfish breeding programs, further evaluation of these methods should be made in independent families/populations when more data are accumulated in future generations to avoid possible biases in the genetic parameters estimates and prediction accuracies for the disease-resistant traits studied in this population of striped catfish P. hypophthalmus.
Collapse
Affiliation(s)
- Nguyen Thanh Vu
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Genecology Research Center, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Research Institute for Aquaculture No.2, Ho Chi Minh 710000, Vietnam
| | - Tran Huu Phuc
- Research Institute for Aquaculture No.2, Ho Chi Minh 710000, Vietnam
| | - Kim Thi Phuong Oanh
- Institute of Genome Research, Vietnam Academy of Science and Technology, Hanoi, Vietnam
| | - Nguyen Van Sang
- Research Institute for Aquaculture No.2, Ho Chi Minh 710000, Vietnam
| | - Trinh Thi Trang
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Genecology Research Center, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Vietnam National University of Agriculture, Gia Lam 131000, Vietnam
| | - Nguyen Hong Nguyen
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Genecology Research Center, University of the Sunshine Coast, Sippy Downs, QLD, Australia
| |
Collapse
|
55
|
Amas J, Anderson R, Edwards D, Cowling W, Batley J. Status and advances in mining for blackleg (Leptosphaeria maculans) quantitative resistance (QR) in oilseed rape (Brassica napus). TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:3123-3145. [PMID: 34104999 PMCID: PMC8440254 DOI: 10.1007/s00122-021-03877-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 05/29/2021] [Indexed: 05/04/2023]
Abstract
KEY MESSAGE Quantitative resistance (QR) loci discovered through genetic and genomic analyses are abundant in the Brassica napus genome, providing an opportunity for their utilization in enhancing blackleg resistance. Quantitative resistance (QR) has long been utilized to manage blackleg in Brassica napus (canola, oilseed rape), even before major resistance genes (R-genes) were extensively explored in breeding programmes. In contrast to R-gene-mediated qualitative resistance, QR reduces blackleg symptoms rather than completely eliminating the disease. As a polygenic trait, QR is controlled by numerous genes with modest effects, which exerts less pressure on the pathogen to evolve; hence, its effectiveness is more durable compared to R-gene-mediated resistance. Furthermore, combining QR with major R-genes has been shown to enhance resistance against diseases in important crops, including oilseed rape. For these reasons, there has been a renewed interest among breeders in utilizing QR in crop improvement. However, the mechanisms governing QR are largely unknown, limiting its deployment. Advances in genomics are facilitating the dissection of the genetic and molecular underpinnings of QR, resulting in the discovery of several loci and genes that can be potentially deployed to enhance blackleg resistance. Here, we summarize the efforts undertaken to identify blackleg QR loci in oilseed rape using linkage and association analysis. We update the knowledge on the possible mechanisms governing QR and the advances in searching for the underlying genes. Lastly, we lay out strategies to accelerate the genetic improvement of blackleg QR in oilseed rape using improved phenotyping approaches and genomic prediction tools.
Collapse
Affiliation(s)
- Junrey Amas
- School of Biological Sciences and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6001 Australia
| | - Robyn Anderson
- School of Biological Sciences and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6001 Australia
| | - David Edwards
- School of Biological Sciences and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6001 Australia
| | - Wallace Cowling
- School of Agriculture and Environment and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6009 Australia
| | - Jacqueline Batley
- School of Biological Sciences and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6001 Australia
| |
Collapse
|
56
|
Ahmar S, Ballesta P, Ali M, Mora-Poblete F. Achievements and Challenges of Genomics-Assisted Breeding in Forest Trees: From Marker-Assisted Selection to Genome Editing. Int J Mol Sci 2021; 22:10583. [PMID: 34638922 PMCID: PMC8508745 DOI: 10.3390/ijms221910583] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 09/26/2021] [Accepted: 09/27/2021] [Indexed: 12/23/2022] Open
Abstract
Forest tree breeding efforts have focused mainly on improving traits of economic importance, selecting trees suited to new environments or generating trees that are more resilient to biotic and abiotic stressors. This review describes various methods of forest tree selection assisted by genomics and the main technological challenges and achievements in research at the genomic level. Due to the long rotation time of a forest plantation and the resulting long generation times necessary to complete a breeding cycle, the use of advanced techniques with traditional breeding have been necessary, allowing the use of more precise methods for determining the genetic architecture of traits of interest, such as genome-wide association studies (GWASs) and genomic selection (GS). In this sense, main factors that determine the accuracy of genomic prediction models are also addressed. In turn, the introduction of genome editing opens the door to new possibilities in forest trees and especially clustered regularly interspaced short palindromic repeats and CRISPR-associated protein 9 (CRISPR/Cas9). It is a highly efficient and effective genome editing technique that has been used to effectively implement targetable changes at specific places in the genome of a forest tree. In this sense, forest trees still lack a transformation method and an inefficient number of genotypes for CRISPR/Cas9. This challenge could be addressed with the use of the newly developing technique GRF-GIF with speed breeding.
Collapse
Affiliation(s)
- Sunny Ahmar
- Institute of Biological Sciences, University of Talca, 1 Poniente 1141, Talca 3460000, Chile;
| | - Paulina Ballesta
- The National Fund for Scientific and Technological Development, Av. del Agua 3895, Talca 3460000, Chile
| | - Mohsin Ali
- Department of Forestry and Range Management, University of Agriculture Faisalabad, Faisalabad 38000, Pakistan;
| | - Freddy Mora-Poblete
- Institute of Biological Sciences, University of Talca, 1 Poniente 1141, Talca 3460000, Chile;
| |
Collapse
|
57
|
|
58
|
Ferrão LFV, Amadeu RR, Benevenuto J, de Bem Oliveira I, Munoz PR. Genomic Selection in an Outcrossing Autotetraploid Fruit Crop: Lessons From Blueberry Breeding. FRONTIERS IN PLANT SCIENCE 2021; 12:676326. [PMID: 34194453 PMCID: PMC8236943 DOI: 10.3389/fpls.2021.676326] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 05/12/2021] [Indexed: 05/17/2023]
Abstract
Blueberry (Vaccinium corymbosum and hybrids) is a specialty crop with expanding production and consumption worldwide. The blueberry breeding program at the University of Florida (UF) has greatly contributed to expanding production areas by developing low-chilling cultivars better adapted to subtropical and Mediterranean climates of the globe. The breeding program has historically focused on recurrent phenotypic selection. As an autopolyploid, outcrossing, perennial, long juvenile phase crop, blueberry breeding cycles are costly and time consuming, which results in low genetic gains per unit of time. Motivated by applying molecular markers for a more accurate selection in the early stages of breeding, we performed pioneering genomic selection studies and optimization for its implementation in the blueberry breeding program. We have also addressed some complexities of sequence-based genotyping and model parametrization for an autopolyploid crop, providing empirical contributions that can be extended to other polyploid species. We herein revisited some of our previous genomic selection studies and showed for the first time its application in an independent validation set. In this paper, our contribution is three-fold: (i) summarize previous results on the relevance of model parametrizations, such as diploid or polyploid methods, and inclusion of dominance effects; (ii) assess the importance of sequence depth of coverage and genotype dosage calling steps; (iii) demonstrate the real impact of genomic selection on leveraging breeding decisions by using an independent validation set. Altogether, we propose a strategy for using genomic selection in blueberry, with the potential to be applied to other polyploid species of a similar background.
Collapse
Affiliation(s)
- Luís Felipe V. Ferrão
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| | - Rodrigo R. Amadeu
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| | - Juliana Benevenuto
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| | - Ivone de Bem Oliveira
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
- Hortifrut North America, Inc., Estero, FL, United States
| | - Patricio R. Munoz
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| |
Collapse
|
59
|
Yang B, Xu Y. Applications of deep-learning approaches in horticultural research: a review. HORTICULTURE RESEARCH 2021; 8:123. [PMID: 34059657 PMCID: PMC8167084 DOI: 10.1038/s41438-021-00560-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 03/13/2021] [Accepted: 03/22/2021] [Indexed: 05/24/2023]
Abstract
Deep learning is known as a promising multifunctional tool for processing images and other big data. By assimilating large amounts of heterogeneous data, deep-learning technology provides reliable prediction results for complex and uncertain phenomena. Recently, it has been increasingly used by horticultural researchers to make sense of the large datasets produced during planting and postharvest processes. In this paper, we provided a brief introduction to deep-learning approaches and reviewed 71 recent research works in which deep-learning technologies were applied in the horticultural domain for variety recognition, yield estimation, quality detection, stress phenotyping detection, growth monitoring, and other tasks. We described in detail the application scenarios reported in the relevant literature, along with the applied models and frameworks, the used data, and the overall performance results. Finally, we discussed the current challenges and future trends of deep learning in horticultural research. The aim of this review is to assist researchers and provide guidance for them to fully understand the strengths and possible weaknesses when applying deep learning in horticultural sectors. We also hope that this review will encourage researchers to explore some significant examples of deep learning in horticultural science and will promote the advancement of intelligent horticulture.
Collapse
Affiliation(s)
- Biyun Yang
- College of Mechanical and Electronic Engineering, Fujian Agriculture and Forestry University, 350002, Fuzhou, China
| | - Yong Xu
- College of Mechanical and Electronic Engineering, Fujian Agriculture and Forestry University, 350002, Fuzhou, China.
- Institute of Machine Learning and Intelligent Science, Fujian University of Technology, 33 Xuefu South Road, 350118, Fuzhou, China.
| |
Collapse
|
60
|
Cortés AJ, López-Hernández F. Harnessing Crop Wild Diversity for Climate Change Adaptation. Genes (Basel) 2021; 12:783. [PMID: 34065368 PMCID: PMC8161384 DOI: 10.3390/genes12050783] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 04/28/2021] [Accepted: 05/19/2021] [Indexed: 12/20/2022] Open
Abstract
Warming and drought are reducing global crop production with a potential to substantially worsen global malnutrition. As with the green revolution in the last century, plant genetics may offer concrete opportunities to increase yield and crop adaptability. However, the rate at which the threat is happening requires powering new strategies in order to meet the global food demand. In this review, we highlight major recent 'big data' developments from both empirical and theoretical genomics that may speed up the identification, conservation, and breeding of exotic and elite crop varieties with the potential to feed humans. We first emphasize the major bottlenecks to capture and utilize novel sources of variation in abiotic stress (i.e., heat and drought) tolerance. We argue that adaptation of crop wild relatives to dry environments could be informative on how plant phenotypes may react to a drier climate because natural selection has already tested more options than humans ever will. Because isolated pockets of cryptic diversity may still persist in remote semi-arid regions, we encourage new habitat-based population-guided collections for genebanks. We continue discussing how to systematically study abiotic stress tolerance in these crop collections of wild and landraces using geo-referencing and extensive environmental data. By uncovering the genes that underlie the tolerance adaptive trait, natural variation has the potential to be introgressed into elite cultivars. However, unlocking adaptive genetic variation hidden in related wild species and early landraces remains a major challenge for complex traits that, as abiotic stress tolerance, are polygenic (i.e., regulated by many low-effect genes). Therefore, we finish prospecting modern analytical approaches that will serve to overcome this issue. Concretely, genomic prediction, machine learning, and multi-trait gene editing, all offer innovative alternatives to speed up more accurate pre- and breeding efforts toward the increase in crop adaptability and yield, while matching future global food demands in the face of increased heat and drought. In order for these 'big data' approaches to succeed, we advocate for a trans-disciplinary approach with open-source data and long-term funding. The recent developments and perspectives discussed throughout this review ultimately aim to contribute to increased crop adaptability and yield in the face of heat waves and drought events.
Collapse
Affiliation(s)
- Andrés J. Cortés
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Km 7 Vía Rionegro, Las Palmas, Rionegro 054048, Colombia;
- Departamento de Ciencias Forestales, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Sede Medellín, Medellín 050034, Colombia
| | - Felipe López-Hernández
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Km 7 Vía Rionegro, Las Palmas, Rionegro 054048, Colombia;
| |
Collapse
|
61
|
Zingaretti LM, Monfort A, Pérez-Enciso M. Automatic Fruit Morphology Phenome and Genetic Analysis: An Application in the Octoploid Strawberry. PLANT PHENOMICS (WASHINGTON, D.C.) 2021; 2021:9812910. [PMID: 34056620 PMCID: PMC8139333 DOI: 10.34133/2021/9812910] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 04/20/2021] [Indexed: 06/01/2023]
Abstract
Automatizing phenotype measurement will decisively contribute to increase plant breeding efficiency. Among phenotypes, morphological traits are relevant in many fruit breeding programs, as appearance influences consumer preference. Often, these traits are manually or semiautomatically obtained. Yet, fruit morphology evaluation can be enhanced using fully automatized procedures and digital images provide a cost-effective opportunity for this purpose. Here, we present an automatized pipeline for comprehensive phenomic and genetic analysis of morphology traits extracted from internal and external strawberry (Fragaria x ananassa) images. The pipeline segments, classifies, and labels the images and extracts conformation features, including linear (area, perimeter, height, width, circularity, shape descriptor, ratio between height and width) and multivariate (Fourier elliptical components and Generalized Procrustes) statistics. Internal color patterns are obtained using an autoencoder to smooth out the image. In addition, we develop a variational autoencoder to automatically detect the most likely number of underlying shapes. Bayesian modeling is employed to estimate both additive and dominance effects for all traits. As expected, conformational traits are clearly heritable. Interestingly, dominance variance is higher than the additive component for most of the traits. Overall, we show that fruit shape and color can be quickly and automatically evaluated and are moderately heritable. Although we study strawberry images, the algorithm can be applied to other fruits, as shown in the GitHub repository.
Collapse
Affiliation(s)
- Laura M. Zingaretti
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, 08193 Bellaterra, Barcelona, Spain
| | - Amparo Monfort
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, 08193 Bellaterra, Barcelona, Spain
- Institut de Recerca i Tecnologia Agroalimentàries (IRTA), 08193 Barcelona, Spain
| | - Miguel Pérez-Enciso
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, 08193 Bellaterra, Barcelona, Spain
- ICREA, Passeig de Lluís Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
62
|
Han J, Gondro C, Reid K, Steibel JP. Heuristic hyperparameter optimization of deep learning models for genomic prediction. G3-GENES GENOMES GENETICS 2021; 11:6129776. [PMID: 33993261 PMCID: PMC8495939 DOI: 10.1093/g3journal/jkab032] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 01/23/2021] [Indexed: 11/17/2022]
Abstract
There is a growing interest among quantitative geneticists and animal breeders in the use of deep learning (DL) for genomic prediction. However, the performance of DL is affected by hyperparameters that are typically manually set by users. These hyperparameters do not simply specify the architecture of the model; they are also critical for the efficacy of the optimization and model-fitting process. To date, most DL approaches used for genomic prediction have concentrated on identifying suitable hyperparameters by exploring discrete options from a subset of the hyperparameter space. Enlarging the hyperparameter optimization search space with continuous hyperparameters is a daunting combinatorial problem. To deal with this problem, we propose using differential evolution (DE) to perform an efficient search of arbitrarily complex hyperparameter spaces in DL models, and we apply this to the specific case of genomic prediction of livestock phenotypes. This approach was evaluated on two pig and cattle datasets with real genotypes and simulated phenotypes (N = 7,539 animals and M = 48,541 markers) and one real dataset (N = 910 individuals and M = 28,916 markers). Hyperparameters were evaluated using cross-validation. We compared the predictive performance of DL models using hyperparameters optimized by DE against DL models with “best practice” hyperparameters selected from published studies and baseline DL models with randomly specified hyperparameters. Optimized models using DE showed a clear improvement in predictive performance across all three datasets. DE optimized hyperparameters also resulted in DL models with less overfitting and less variation in predictive performance over repeated retraining compared to non-optimized DL models.
Collapse
Affiliation(s)
- Junjie Han
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA.,Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Cedric Gondro
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
| | - Kenneth Reid
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA.,Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
63
|
Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, Barrón-López JA, Martini JWR, Fajardo-Flores SB, Gaytan-Lugo LS, Santana-Mancilla PC, Crossa J. A review of deep learning applications for genomic selection. BMC Genomics 2021; 22:19. [PMID: 33407114 PMCID: PMC7789712 DOI: 10.1186/s12864-020-07319-x] [Citation(s) in RCA: 89] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 12/10/2020] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Several conventional genomic Bayesian (or no Bayesian) prediction methods have been proposed including the standard additive genetic effect model for which the variance components are estimated with mixed model equations. In recent years, deep learning (DL) methods have been considered in the context of genomic prediction. The DL methods are nonparametric models providing flexibility to adapt to complicated associations between data and output with the ability to adapt to very complex patterns. MAIN BODY We review the applications of deep learning (DL) methods in genomic selection (GS) to obtain a meta-picture of GS performance and highlight how these tools can help solve challenging plant breeding problems. We also provide general guidance for the effective use of DL methods including the fundamentals of DL and the requirements for its appropriate use. We discuss the pros and cons of this technique compared to traditional genomic prediction approaches as well as the current trends in DL applications. CONCLUSIONS The main requirement for using DL is the quality and sufficiently large training data. Although, based on current literature GS in plant and animal breeding we did not find clear superiority of DL in terms of prediction power compared to conventional genome based prediction models. Nevertheless, there are clear evidences that DL algorithms capture nonlinear patterns more efficiently than conventional genome based. Deep learning algorithms are able to integrate data from different sources as is usually needed in GS assisted breeding and it shows the ability for improving prediction accuracy for large plant breeding data. It is important to apply DL to large training-testing data sets.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, Mexico.
| | | | - José Alberto Barrón-López
- Department of Animal Production (DPA), Universidad Nacional Agraria La Molina, Av. La Molina s/n La Molina, 15024, Lima, Peru
| | - Johannes W R Martini
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45, CP 52640, Carretera Mexico-Veracruz, Mexico
| | | | - Laura S Gaytan-Lugo
- School of Mechanical and Electrical Engineering, Universidad de Colima, 28040, Colima, Colima, Mexico
| | | | - José Crossa
- Colegio de Postgraduados, CP 56230, Montecillos, Edo. de México, Mexico.
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45, CP 52640, Carretera Mexico-Veracruz, Mexico.
| |
Collapse
|
64
|
Sandhu KS, Lozada DN, Zhang Z, Pumphrey MO, Carter AH. Deep Learning for Predicting Complex Traits in Spring Wheat Breeding Program. FRONTIERS IN PLANT SCIENCE 2021; 11:613325. [PMID: 33469463 PMCID: PMC7813801 DOI: 10.3389/fpls.2020.613325] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 11/30/2020] [Indexed: 05/12/2023]
Abstract
Genomic selection (GS) is transforming the field of plant breeding and implementing models that improve prediction accuracy for complex traits is needed. Analytical methods for complex datasets traditionally used in other disciplines represent an opportunity for improving prediction accuracy in GS. Deep learning (DL) is a branch of machine learning (ML) which focuses on densely connected networks using artificial neural networks for training the models. The objective of this research was to evaluate the potential of DL models in the Washington State University spring wheat breeding program. We compared the performance of two DL algorithms, namely multilayer perceptron (MLP) and convolutional neural network (CNN), with ridge regression best linear unbiased predictor (rrBLUP), a commonly used GS model. The dataset consisted of 650 recombinant inbred lines (RILs) from a spring wheat nested association mapping (NAM) population planted from 2014-2016 growing seasons. We predicted five different quantitative traits with varying genetic architecture using cross-validations (CVs), independent validations, and different sets of SNP markers. Hyperparameters were optimized for DL models by lowering the root mean square in the training set, avoiding model overfitting using dropout and regularization. DL models gave 0 to 5% higher prediction accuracy than rrBLUP model under both cross and independent validations for all five traits used in this study. Furthermore, MLP produces 5% higher prediction accuracy than CNN for grain yield and grain protein content. Altogether, DL approaches obtained better prediction accuracy for each trait, and should be incorporated into a plant breeder's toolkit for use in large scale breeding programs.
Collapse
Affiliation(s)
- Karansher S. Sandhu
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Dennis N. Lozada
- Department of Plant and Environmental Sciences, New Mexico State University, Las Cruces, NM, United States
| | - Zhiwu Zhang
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Michael O. Pumphrey
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Arron H. Carter
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| |
Collapse
|
65
|
Maldonado C, Mora-Poblete F, Contreras-Soto RI, Ahmar S, Chen JT, do Amaral Júnior AT, Scapim CA. Genome-Wide Prediction of Complex Traits in Two Outcrossing Plant Species Through Deep Learning and Bayesian Regularized Neural Network. FRONTIERS IN PLANT SCIENCE 2020; 11:593897. [PMID: 33329658 PMCID: PMC7728740 DOI: 10.3389/fpls.2020.593897] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 10/27/2020] [Indexed: 05/25/2023]
Abstract
Genomic selection models were investigated to predict several complex traits in breeding populations of Zea mays L. and Eucalyptus globulus Labill. For this, the following methods of Machine Learning (ML) were implemented: (i) Deep Learning (DL) and (ii) Bayesian Regularized Neural Network (BRNN) both in combination with different hyperparameters. These ML methods were also compared with Genomic Best Linear Unbiased Prediction (GBLUP) and different Bayesian regression models [Bayes A, Bayes B, Bayes Cπ, Bayesian Ridge Regression, Bayesian LASSO, and Reproducing Kernel Hilbert Space (RKHS)]. DL models, using Rectified Linear Units (as the activation function), had higher predictive ability values, which varied from 0.27 (pilodyn penetration of 6 years old eucalypt trees) to 0.78 (flowering-related traits of maize). Moreover, the larger mini-batch size (100%) had a significantly higher predictive ability for wood-related traits than the smaller mini-batch size (10%). On the other hand, in the BRNN method, the architectures of one and two layers that used only the pureline function showed better results of prediction, with values ranging from 0.21 (pilodyn penetration) to 0.71 (flowering traits). A significant increase in the prediction ability was observed for DL in comparison with other methods of genomic prediction (Bayesian alphabet models, GBLUP, RKHS, and BRNN). Another important finding was the usefulness of DL models (through an iterative algorithm) as an SNP detection strategy for genome-wide association studies. The results of this study confirm the importance of DL for genome-wide analyses and crop/tree improvement strategies, which holds promise for accelerating breeding progress.
Collapse
Affiliation(s)
- Carlos Maldonado
- Instituto de Ciencias Agroalimentarias, Animales y Ambientales, Universidad de O’ Higgins, San Fernando, Chile
| | | | - Rodrigo Iván Contreras-Soto
- Instituto de Ciencias Agroalimentarias, Animales y Ambientales, Universidad de O’ Higgins, San Fernando, Chile
| | - Sunny Ahmar
- Institute of Biological Sciences, University of Talca, Talca, Chile
- College of Plant Sciences and Technology, Huazhong Agricultural University, Wuhan, China
| | - Jen-Tsung Chen
- Department of Life Sciences, National University of Kaohsiung, Kaohsiung, Taiwan
| | - Antônio Teixeira do Amaral Júnior
- Laboratory de Melhoramento Genético Veget al., Universidade Estadual do Norte Fluminense Darcy Ribeiro/CCTA, Campos dos Goytacazes, Brazil
| | | |
Collapse
|