1
|
Soria Bretones C, Roncero Parra C, Cascón J, Borja AL, Mateo Sotos J. Automatic identification of schizophrenia employing EEG records analyzed with deep learning algorithms. Schizophr Res 2023; 261:36-46. [PMID: 37690170 DOI: 10.1016/j.schres.2023.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 07/24/2023] [Accepted: 09/04/2023] [Indexed: 09/12/2023]
Abstract
Electroencephalography is a method of detecting and analyzing electrical activity in the brain. This electrical activity can be recorded and processed to aid in the clinical diagnosis of mental disorders. In this study, a novel system for classifying schizophrenia patients from EEG recordings is presented. The developed algorithm decomposes the EEG signals into a system of radial basis functions using the method of fuzzy means. This decomposition helps to obtain the information from the various electrodes of the EEG and allows separating between healthy controls and patients with schizophrenia. The proposed method has been compared with classical machine learning algorithms, such as, K-Nearest Neighbor, Adaboost, Support Vector Machine, and Bayesian Linear Discriminant Analysis. The results show that the proposed method obtains the highest values in terms of balanced accuracy, recall, precision and F1 score, close to 93 % in all cases. The model developed in this study can be implemented in brain activity analysis systems that help in the prediction of patients with schizophrenia.
Collapse
Affiliation(s)
| | - Carlos Roncero Parra
- Departamento de Sistema Informáticos, Universidad de Castilla-La Mancha, 02071 Albacete, Spain
| | - Joaquín Cascón
- Departamento de Ingeniería Eléctrica, Electrónica, Automática y Comunicaciones, Universidad de Castilla-La Mancha, 02071 Albacete, Spain; Expert Group in Medical Analysis, Instituto de Tecnología, Construcción y Telecomunicaciones, Universidad de Castilla-La Mancha, 16071 Cuenca, Spain
| | - Alejandro L Borja
- Departamento de Ingeniería Eléctrica, Electrónica, Automática y Comunicaciones, Universidad de Castilla-La Mancha, 02071 Albacete, Spain.
| | - Jorge Mateo Sotos
- Departamento de Ingeniería Eléctrica, Electrónica, Automática y Comunicaciones, Universidad de Castilla-La Mancha, 02071 Albacete, Spain; Expert Group in Medical Analysis, Instituto de Tecnología, Construcción y Telecomunicaciones, Universidad de Castilla-La Mancha, 16071 Cuenca, Spain
| |
Collapse
|
2
|
Onogi A. A Bayesian model for genomic prediction using metabolic networks. BIOINFORMATICS ADVANCES 2023; 3:vbad106. [PMID: 39131740 PMCID: PMC11312854 DOI: 10.1093/bioadv/vbad106] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/26/2023] [Accepted: 08/10/2023] [Indexed: 08/13/2024]
Abstract
Motivation Genomic prediction is now an essential technique in breeding and medicine, and it is interesting to see how omics data can be used to improve prediction accuracy. Precedent work proposed a metabolic network-based method in biomass prediction of Arabidopsis; however, the method consists of multiple steps that possibly degrade prediction accuracy. Results We proposed a Bayesian model that integrates all steps and jointly infers all fluxes of reactions related to biomass production. The proposed model showed higher accuracies than methods compared both in simulated and real data. The findings support the previous excellent idea that metabolic network information can be used for prediction. Availability and implementation All R and stan scripts to reproduce the results of this study are available at https://github.com/Onogi/MetabolicModeling.
Collapse
Affiliation(s)
- Akio Onogi
- Department of Life Sciences, Faculty of Agriculture, Ryukoku
University, Otsu, Shiga 520-2194, Japan
| |
Collapse
|
3
|
Alves AAC, Fernandes AFA, Lopes FB, Breen V, Hawken R, Gianola D, Rosa GJDM. (Quasi) multitask support vector regression with heuristic hyperparameter optimization for whole-genome prediction of complex traits: a case study with carcass traits in broilers. G3 (BETHESDA, MD.) 2023; 13:jkad109. [PMID: 37216670 PMCID: PMC10411556 DOI: 10.1093/g3journal/jkad109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 03/13/2023] [Accepted: 04/24/2023] [Indexed: 05/24/2023]
Abstract
This study investigates nonlinear kernels for multitrait (MT) genomic prediction using support vector regression (SVR) models. We assessed the predictive ability delivered by single-trait (ST) and MT models for 2 carcass traits (CT1 and CT2) measured in purebred broiler chickens. The MT models also included information on indicator traits measured in vivo [Growth and feed efficiency trait (FE)]. We proposed an approach termed (quasi) multitask SVR (QMTSVR), with hyperparameter optimization performed via genetic algorithm. ST and MT Bayesian shrinkage and variable selection models [genomic best linear unbiased predictor (GBLUP), BayesC (BC), and reproducing kernel Hilbert space (RKHS) regression] were employed as benchmarks. MT models were trained using 2 validation designs (CV1 and CV2), which differ if the information on secondary traits is available in the testing set. Models' predictive ability was assessed with prediction accuracy (ACC; i.e. the correlation between predicted and observed values, divided by the square root of phenotype accuracy), standardized root-mean-squared error (RMSE*), and inflation factor (b). To account for potential bias in CV2-style predictions, we also computed a parametric estimate of accuracy (ACCpar). Predictive ability metrics varied according to trait, model, and validation design (CV1 or CV2), ranging from 0.71 to 0.84 for ACC, 0.78 to 0.92 for RMSE*, and between 0.82 and 1.34 for b. The highest ACC and smallest RMSE* were achieved with QMTSVR-CV2 in both traits. We observed that for CT1, model/validation design selection was sensitive to the choice of accuracy metric (ACC or ACCpar). Nonetheless, the higher predictive accuracy of QMTSVR over MTGBLUP and MTBC was replicated across accuracy metrics, besides the similar performance between the proposed method and the MTRKHS model. Results showed that the proposed approach is competitive with conventional MT Bayesian regression models using either Gaussian or spike-slab multivariate priors.
Collapse
Affiliation(s)
| | | | | | - Vivian Breen
- Cobb-Vantress Inc., Siloam Springs, AR 72761, USA
| | | | - Daniel Gianola
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| | | |
Collapse
|
4
|
Qu J, Runcie D, Cheng H. Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits. Genetics 2023; 223:6931802. [PMID: 36529897 PMCID: PMC9991502 DOI: 10.1093/genetics/iyac183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 05/06/2022] [Accepted: 11/17/2022] [Indexed: 12/23/2022] Open
Abstract
Large-scale phenotype data are expected to increase the accuracy of genome-wide prediction and the power of genome-wide association analyses. However, genomic analyses of high-dimensional, highly correlated traits are challenging. We developed a method for implementing high-dimensional Bayesian multivariate regression to simultaneously analyze genetic variants underlying thousands of traits. As a demonstration, we implemented the BayesC prior in the R package MegaLMM. Applied to Genomic Prediction, MegaBayesC effectively integrated hyperspectral reflectance data from 620 hyperspectral wavelengths to improve the accuracy of genetic value prediction on grain yield in a wheat dataset. Applied to Genome-Wide Association Studies, we used simulations to show that MegaBayesC can accurately estimate the effect sizes of QTL across a range of genetic architectures and causes of correlations among traits. To apply MegaBayesC to a realistic scenario involving whole-genome marker data, we developed a 2-stage procedure involving a preliminary step of candidate marker selection prior to multivariate regression. We then used MegaBayesC to identify genetic associations with flowering time in Arabidopsis thaliana, leveraging expression data from 20,843 genes. MegaBayesC selected 15 single nucleotide polymorphisms as important for flowering time, with 13 located within 100 kb of known flowering-time related genes, a higher validation rate than achieved by a single-stage analysis using only the flowering time data itself. These results demonstrate that MegaBayesC can efficiently and effectively leverage high-dimensional phenotypes in genetic analyses.
Collapse
Affiliation(s)
- Jiayi Qu
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA
| | - Daniel Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| | - Hao Cheng
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| |
Collapse
|
5
|
Štrbac L, Pracner D, Šaran M, Janković D, Trivunović S, Ivković M, Tarjan L, Dedović N. Mathematical Modeling and Software Tools for Breeding Value Estimation Based on Phenotypic, Pedigree and Genomic Information of Holstein Friesian Cattle in Serbia. Animals (Basel) 2023; 13:ani13040597. [PMID: 36830383 PMCID: PMC9951744 DOI: 10.3390/ani13040597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 01/22/2023] [Accepted: 01/26/2023] [Indexed: 02/11/2023] Open
Abstract
In this paper, six univariate and two multivariate best linear unbiased prediction (BLUP) models were tested for the estimation of breeding values (BV) in Holstein Friesian cattle in Serbia. Two univariate models were formed using the numerator relationship matrix (NRM), four using the genomic relationship matrix (GRM). Multivariate models contained only an NRM. Two cases were studied, the first when only first lactations were observed, and the second when all lactations were observed using a repeatability model. A total of 6041 animals were included, and of them, 2565 had data on milk yield (MY), milk fat yield (FY), milk fat content (FC), milk protein yield (PY) and milk protein content (PC). Finally, out of those 2565 cows, 1491 were genotyped. A higher accuracy of BV was obtained when using a combination of NRM and GRM compared to NRM alone in univariate analysis, while multivariate analysis with repeated measures gave the highest accuracy with all 6041 animals. When only genotyped animals were observed, the highest accuracy of the estimated BV was calculated by the ssGBLUPp model, and the lowest by the univariate BLUP model. In conclusion, the current breeding programs in Serbia should be changed to use multivariate analysis with repeated measurements until the optimal size of the reference population, which must include genotyping data on both bulls and cows, is reached.
Collapse
Affiliation(s)
- Ljuba Štrbac
- Faculty of Agriculture, University of Novi Sad, 21000 Novi Sad, Serbia
| | - Doni Pracner
- Faculty of Science, University of Novi Sad, 21000 Novi Sad, Serbia
- Correspondence:
| | - Momčilo Šaran
- Faculty of Agriculture, University of Novi Sad, 21000 Novi Sad, Serbia
| | - Dobrila Janković
- Faculty of Agriculture, University of Novi Sad, 21000 Novi Sad, Serbia
| | | | - Mirko Ivković
- Faculty of Agriculture, University of Novi Sad, 21000 Novi Sad, Serbia
| | - Laslo Tarjan
- Faculty of Technical Science, University of Novi Sad, 21000 Novi Sad, Serbia
| | - Nebojša Dedović
- Faculty of Agriculture, University of Novi Sad, 21000 Novi Sad, Serbia
| |
Collapse
|
6
|
Jubair S, Domaratzki M. Crop genomic selection with deep learning and environmental data: A survey. Front Artif Intell 2023; 5:1040295. [PMID: 36703955 PMCID: PMC9871498 DOI: 10.3389/frai.2022.1040295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/22/2022] [Indexed: 01/12/2023] Open
Abstract
Machine learning techniques for crop genomic selections, especially for single-environment plants, are well-developed. These machine learning models, which use dense genome-wide markers to predict phenotype, routinely perform well on single-environment datasets, especially for complex traits affected by multiple markers. On the other hand, machine learning models for predicting crop phenotype, especially deep learning models, using datasets that span different environmental conditions, have only recently emerged. Models that can accept heterogeneous data sources, such as temperature, soil conditions and precipitation, are natural choices for modeling GxE in multi-environment prediction. Here, we review emerging deep learning techniques that incorporate environmental data directly into genomic selection models.
Collapse
Affiliation(s)
- Sheikh Jubair
- Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada,*Correspondence: Sheikh Jubair ✉
| | - Mike Domaratzki
- Department of Computer Science, University of Western Ontario, London, ON, Canada
| |
Collapse
|
7
|
Multi-Trait Genomic Prediction Improves Accuracy of Selection among Doubled Haploid Lines in Maize. Int J Mol Sci 2022; 23:ijms232314558. [PMID: 36498886 PMCID: PMC9735914 DOI: 10.3390/ijms232314558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 11/11/2022] [Accepted: 11/19/2022] [Indexed: 11/24/2022] Open
Abstract
Recent advances in maize doubled haploid (DH) technology have enabled the development of large numbers of DH lines quickly and efficiently. However, testing all possible hybrid crosses among DH lines is a challenge. Phenotyping haploid progenitors created during the DH process could accelerate the selection of DH lines. Based on phenotypic and genotypic data of a DH population and its corresponding haploids, we compared phenotypes and estimated genetic correlations between the two populations, compared genomic prediction accuracy of multi-trait models against conventional univariate models within the DH population, and evaluated whether incorporating phenotypic data from haploid lines into a multi-trait model could better predict performance of DH lines. We found significant phenotypic differences between DH and haploid lines for nearly all traits; however, their genetic correlations between populations were moderate to strong. Furthermore, a multi-trait model taking into account genetic correlations between traits in the single-environment trial or genetic covariances in multi-environment trials can significantly increase genomic prediction accuracy. However, integrating information of haploid lines did not further improve our prediction. Our findings highlight the superiority of multi-trait models in predicting performance of DH lines in maize breeding, but do not support the routine phenotyping and selection on haploid progenitors of DH lines.
Collapse
|
8
|
Li Z, Liu S, Conaty W, Zhu QH, Moncuquet P, Stiller W, Wilson I. Genomic prediction of cotton fibre quality and yield traits using Bayesian regression methods. Heredity (Edinb) 2022; 129:103-112. [PMID: 35523950 PMCID: PMC9338257 DOI: 10.1038/s41437-022-00537-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 04/05/2022] [Accepted: 04/07/2022] [Indexed: 01/26/2023] Open
Abstract
Genomic selection or genomic prediction (GP) has increasingly become an important molecular breeding technology for crop improvement. GP aims to utilise genome-wide marker data to predict genomic breeding value for traits of economic importance. Though GP studies have been widely conducted in various crop species such as wheat and maize, its application in cotton, an essential renewable textile fibre crop, is still significantly underdeveloped. We aim to develop a new GP-based breeding system that can improve the efficiency of our cotton breeding program. This article presents a GP study on cotton fibre quality and yield traits using 1385 breeding lines from the Commonwealth Scientific and Industrial Research Organisation (CSIRO, Australia) cotton breeding program which were genotyped using a high-density SNP chip that generated 12,296 informative SNPs. The aim of this study was twofold: (1) to identify the models and data sources (i.e. genomic and pedigree) that produce the highest prediction accuracies; and (2) to assess the effectiveness of GP as a selection tool in the CSIRO cotton breeding program. The prediction analyses were conducted under various scenarios using different Bayesian predictive models. Results highlighted that the model combining genomic and pedigree information resulted in the best cross validated prediction accuracies: 0.76 for fibre length, 0.65 for fibre strength, and 0.64 for lint yield. Overall, this work represents the largest scale genomic selection studies based on cotton breeding trial data. Prediction accuracies reported in our study indicate the potential of GP as a breeding tool for cotton. The study highlighted the importance of incorporating pedigree and environmental factors in GP models to optimise the prediction performance.
Collapse
Affiliation(s)
- Zitong Li
- CSIRO Agriculture & Food, GPO Box 1600, Canberra, ACT, 2601, Australia.
| | - Shiming Liu
- CSIRO Agriculture & Food, Locked Bag 59, Narrabri, NSW, 2390, Australia
| | - Warren Conaty
- CSIRO Agriculture & Food, Locked Bag 59, Narrabri, NSW, 2390, Australia
| | - Qian-Hao Zhu
- CSIRO Agriculture & Food, GPO Box 1600, Canberra, ACT, 2601, Australia
| | | | - Warwick Stiller
- CSIRO Agriculture & Food, Locked Bag 59, Narrabri, NSW, 2390, Australia
| | - Iain Wilson
- CSIRO Agriculture & Food, GPO Box 1600, Canberra, ACT, 2601, Australia
| |
Collapse
|
9
|
Ni P, Anche MT, Ruan Y, Dang D, Morales N, Li L, Liu M, Wang S, Robbins KR. Genomic Prediction Strategies for Dry-Down-Related Traits in Maize. FRONTIERS IN PLANT SCIENCE 2022; 13:930429. [PMID: 35845649 PMCID: PMC9280646 DOI: 10.3389/fpls.2022.930429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 05/23/2022] [Indexed: 06/15/2023]
Abstract
For efficient mechanical harvesting, low grain moisture content at harvest time is essential. Dry-down rate (DR), which refers to the reduction in grain moisture content after the plants enter physiological maturity, is one of the main factors affecting the amount of moisture in the kernels. Dry-down rate is estimated using kernel moisture content at physiological maturity and at harvest time; however, measuring kernel water content at physiological maturity, which is sometimes referred as kernel water content at black layer formation (BWC), is time-consuming and resource-demanding. Therefore, inferring BWC from other correlated and easier to measure traits could improve the efficiency of breeding efforts for dry-down-related traits. In this study, multi-trait genomic prediction models were used to estimate genetic correlations between BWC and water content at harvest time (HWC) and flowering time (FT). The results show there is moderate-to-high genetic correlation between the traits (0.24-0.66), which supports the use of multi-trait genomic prediction models. To investigate genomic prediction strategies, several cross-validation scenarios representing possible implementations of genomic prediction were evaluated. The results indicate that, in most scenarios, the use of multi-trait genomic prediction models substantially increases prediction accuracy. Furthermore, the inclusion of historical records for correlated traits can improve prediction accuracy, even when the target trait is not measured on all the plots in the training set.
Collapse
Affiliation(s)
- Pengzun Ni
- Shenyang Key Laboratory of Maize Genomic Selection Breeding, Liaoning Province Research Center of Plant Genetic Engineering Technology, College of Biological Science and Technology, Shenyang Agricultural University, Shenyang, China
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, United States
- College of Agronomy, Shenyang Agricultural University, Shenyang, China
| | - Mahlet Teka Anche
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, United States
| | - Yanye Ruan
- Shenyang Key Laboratory of Maize Genomic Selection Breeding, Liaoning Province Research Center of Plant Genetic Engineering Technology, College of Biological Science and Technology, Shenyang Agricultural University, Shenyang, China
| | - Dongdong Dang
- Shenyang Key Laboratory of Maize Genomic Selection Breeding, Liaoning Province Research Center of Plant Genetic Engineering Technology, College of Biological Science and Technology, Shenyang Agricultural University, Shenyang, China
| | - Nicolas Morales
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, United States
| | - Lingyue Li
- Shenyang Key Laboratory of Maize Genomic Selection Breeding, Liaoning Province Research Center of Plant Genetic Engineering Technology, College of Biological Science and Technology, Shenyang Agricultural University, Shenyang, China
| | - Meiling Liu
- Shenyang Key Laboratory of Maize Genomic Selection Breeding, Liaoning Province Research Center of Plant Genetic Engineering Technology, College of Biological Science and Technology, Shenyang Agricultural University, Shenyang, China
| | - Shu Wang
- College of Agronomy, Shenyang Agricultural University, Shenyang, China
| | - Kelly R. Robbins
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, United States
| |
Collapse
|
10
|
Bartholomé J, Prakash PT, Cobb JN. Genomic Prediction: Progress and Perspectives for Rice Improvement. Methods Mol Biol 2022; 2467:569-617. [PMID: 35451791 DOI: 10.1007/978-1-0716-2205-6_21] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genomic prediction can be a powerful tool to achieve greater rates of genetic gain for quantitative traits if thoroughly integrated into a breeding strategy. In rice as in other crops, the interest in genomic prediction is very strong with a number of studies addressing multiple aspects of its use, ranging from the more conceptual to the more practical. In this chapter, we review the literature on rice (Oryza sativa) and summarize important considerations for the integration of genomic prediction in breeding programs. The irrigated breeding program at the International Rice Research Institute is used as a concrete example on which we provide data and R scripts to reproduce the analysis but also to highlight practical challenges regarding the use of predictions. The adage "To someone with a hammer, everything looks like a nail" describes a common psychological pitfall that sometimes plagues the integration and application of new technologies to a discipline. We have designed this chapter to help rice breeders avoid that pitfall and appreciate the benefits and limitations of applying genomic prediction, as it is not always the best approach nor the first step to increasing the rate of genetic gain in every context.
Collapse
Affiliation(s)
- Jérôme Bartholomé
- CIRAD, UMR AGAP Institut, Montpellier, France.
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France.
- Rice Breeding Platform, International Rice Research Institute, Manila, Philippines.
| | | | | |
Collapse
|
11
|
Jubair S, Tucker JR, Henderson N, Hiebert CW, Badea A, Domaratzki M, Fernando WGD. GPTransformer: A Transformer-Based Deep Learning Method for Predicting Fusarium Related Traits in Barley. FRONTIERS IN PLANT SCIENCE 2021; 12:761402. [PMID: 34975945 PMCID: PMC8716695 DOI: 10.3389/fpls.2021.761402] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 11/23/2021] [Indexed: 05/27/2023]
Abstract
Fusarium head blight (FHB) incited by Fusarium graminearum Schwabe is a devastating disease of barley and other cereal crops worldwide. Fusarium head blight is associated with trichothecene mycotoxins such as deoxynivalenol (DON), which contaminates grains, making them unfit for malting or animal feed industries. While genetically resistant cultivars offer the best economic and environmentally responsible means to mitigate disease, parent lines with adequate resistance are limited in barley. Resistance breeding based upon quantitative genetic gains has been slow to date, due to intensive labor requirements of disease nurseries. The production of a high-throughput genome-wide molecular marker assembly for barley permits use in development of genomic prediction models for traits of economic importance to this crop. A diverse panel consisting of 400 two-row spring barley lines was assembled to focus on Canadian barley breeding programs. The panel was evaluated for FHB and DON content in three environments and over 2 years. Moreover, it was genotyped using an Illumina Infinium High-Throughput Screening (HTS) iSelect custom beadchip array of single nucleotide polymorphic molecular markers (50 K SNP), where over 23 K molecular markers were polymorphic. Genomic prediction has been demonstrated to successfully reduce FHB and DON content in cereals using various statistical models. Herein, we have studied an alternative method based on machine learning and compare it with a statistical approach. The bi-allelic SNPs represented pairs of alleles and were encoded in two ways: as categorical (-1, 0, 1) or using Hardy-Weinberg probability frequencies. This was followed by selecting essential genomic markers for phenotype prediction. Subsequently, a Transformer-based deep learning algorithm was applied to predict FHB and DON. Apart from the Transformer method, a Residual Fully Connected Neural Network (RFCNN) was also applied. Pearson correlation coefficients were calculated to compare true vs. predicted outputs. Models which included all markers generally showed marginal improvement in prediction. Hardy-Weinberg encoding generally improved correlation for FHB (6.9%) and DON (9.6%) for the Transformer network. This study suggests the potential of the Transformer based method as an alternative to the popular BLUP model for genomic prediction of complex traits such as FHB or DON, having performed equally or better than existing machine learning and statistical methods.
Collapse
Affiliation(s)
- Sheikh Jubair
- Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada
| | - James R. Tucker
- Department of Plant Science, University of Manitoba, Winnipeg, MB, Canada
- Brandon Research and Development Centre, Agriculture and Agri-Food Canada, Brandon, MB, Canada
| | - Nathan Henderson
- Brandon Research and Development Centre, Agriculture and Agri-Food Canada, Brandon, MB, Canada
| | - Colin W. Hiebert
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB, Canada
| | - Ana Badea
- Department of Plant Science, University of Manitoba, Winnipeg, MB, Canada
- Brandon Research and Development Centre, Agriculture and Agri-Food Canada, Brandon, MB, Canada
| | - Michael Domaratzki
- Department of Computer Science, University of Western Ontario, London, ON, Canada
| | | |
Collapse
|
12
|
Hu H, Campbell MT, Yeats TH, Zheng X, Runcie DE, Covarrubias-Pazaran G, Broeckling C, Yao L, Caffe-Treml M, Gutiérrez LA, Smith KP, Tanaka J, Hoekenga OA, Sorrells ME, Gore MA, Jannink JL. Multi-omics prediction of oat agronomic and seed nutritional traits across environments and in distantly related populations. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021. [PMID: 34643760 DOI: 10.25739/8p1e-0931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Integration of multi-omics data improved prediction accuracies of oat agronomic and seed nutritional traits in multi-environment trials and distantly related populations in addition to the single-environment prediction. Multi-omics prediction has been shown to be superior to genomic prediction with genome-wide DNA-based genetic markers (G) for predicting phenotypes. However, most of the existing studies were based on historical datasets from one environment; therefore, they were unable to evaluate the efficiency of multi-omics prediction in multi-environment trials and distantly related populations. To fill those gaps, we designed a systematic experiment to collect omics data and evaluate 17 traits in two oat breeding populations planted in single and multiple environments. In the single-environment trial, transcriptomic BLUP (T), metabolomic BLUP (M), G + T, G + M, and G + T + M models showed greater prediction accuracy than GBLUP for 5, 10, 11, 17, and 17 traits, respectively, and metabolites generally performed better than transcripts when combined with SNPs. In the multi-environment trial, multi-trait models with omics data outperformed both counterpart multi-trait GBLUP models and single-environment omics models, and the highest prediction accuracy was achieved when modeling genetic covariance as an unstructured covariance model. We also demonstrated that omics data can be used to prioritize loci from one population with omics data to improve genomic prediction in a distantly related population using a two-kernel linear model that accommodated both likely casual loci with large-effect and loci that explain little or no phenotypic variance. We propose that the two-kernel linear model is superior to most genomic prediction models that assume each variant is equally likely to affect the trait and can be used to improve prediction accuracy for any trait with prior knowledge of genetic architecture.
Collapse
Affiliation(s)
- Haixiao Hu
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA.
| | - Malachy T Campbell
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Trevor H Yeats
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Xuying Zheng
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Daniel E Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616, USA
| | - Giovanny Covarrubias-Pazaran
- International Maize and Wheat Improvement Center (CIMMYT), Km. 45, Carretera México-Veracruz, El Batán, 56130, Texcoco, Edo. de México, México
| | - Corey Broeckling
- Proteomics and Metabolomics Facility, Colorado State University, C130 Microbiology, 2021 Campus Delivery, Fort Collins, CO, 80521, USA
| | - Linxing Yao
- Proteomics and Metabolomics Facility, Colorado State University, C130 Microbiology, 2021 Campus Delivery, Fort Collins, CO, 80521, USA
| | - Melanie Caffe-Treml
- Department of Agronomy, Horticulture & Plant Science, South Dakota State University, Brookings, SD, 57007, USA
| | - Lucı A Gutiérrez
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Kevin P Smith
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - James Tanaka
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Owen A Hoekenga
- Cayuga Genetics Consulting Group LLC, Ithaca, NY, 14850, USA
| | - Mark E Sorrells
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Jean-Luc Jannink
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
- USDA-ARS, Robert W. Holley Center for Agriculture and Health, Ithaca, NY, 14853, USA
| |
Collapse
|
13
|
Hu H, Campbell MT, Yeats TH, Zheng X, Runcie DE, Covarrubias-Pazaran G, Broeckling C, Yao L, Caffe-Treml M, Gutiérrez LA, Smith KP, Tanaka J, Hoekenga OA, Sorrells ME, Gore MA, Jannink JL. Multi-omics prediction of oat agronomic and seed nutritional traits across environments and in distantly related populations. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:4043-4054. [PMID: 34643760 PMCID: PMC8580906 DOI: 10.1007/s00122-021-03946-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Accepted: 09/05/2021] [Indexed: 05/26/2023]
Abstract
Integration of multi-omics data improved prediction accuracies of oat agronomic and seed nutritional traits in multi-environment trials and distantly related populations in addition to the single-environment prediction. Multi-omics prediction has been shown to be superior to genomic prediction with genome-wide DNA-based genetic markers (G) for predicting phenotypes. However, most of the existing studies were based on historical datasets from one environment; therefore, they were unable to evaluate the efficiency of multi-omics prediction in multi-environment trials and distantly related populations. To fill those gaps, we designed a systematic experiment to collect omics data and evaluate 17 traits in two oat breeding populations planted in single and multiple environments. In the single-environment trial, transcriptomic BLUP (T), metabolomic BLUP (M), G + T, G + M, and G + T + M models showed greater prediction accuracy than GBLUP for 5, 10, 11, 17, and 17 traits, respectively, and metabolites generally performed better than transcripts when combined with SNPs. In the multi-environment trial, multi-trait models with omics data outperformed both counterpart multi-trait GBLUP models and single-environment omics models, and the highest prediction accuracy was achieved when modeling genetic covariance as an unstructured covariance model. We also demonstrated that omics data can be used to prioritize loci from one population with omics data to improve genomic prediction in a distantly related population using a two-kernel linear model that accommodated both likely casual loci with large-effect and loci that explain little or no phenotypic variance. We propose that the two-kernel linear model is superior to most genomic prediction models that assume each variant is equally likely to affect the trait and can be used to improve prediction accuracy for any trait with prior knowledge of genetic architecture.
Collapse
Affiliation(s)
- Haixiao Hu
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA.
| | - Malachy T Campbell
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Trevor H Yeats
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Xuying Zheng
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Daniel E Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616, USA
| | - Giovanny Covarrubias-Pazaran
- International Maize and Wheat Improvement Center (CIMMYT), Km. 45, Carretera México-Veracruz, El Batán, 56130, Texcoco, Edo. de México, México
| | - Corey Broeckling
- Proteomics and Metabolomics Facility, Colorado State University, C130 Microbiology, 2021 Campus Delivery, Fort Collins, CO, 80521, USA
| | - Linxing Yao
- Proteomics and Metabolomics Facility, Colorado State University, C130 Microbiology, 2021 Campus Delivery, Fort Collins, CO, 80521, USA
| | - Melanie Caffe-Treml
- Department of Agronomy, Horticulture & Plant Science, South Dakota State University, Brookings, SD, 57007, USA
| | - Lucı A Gutiérrez
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Kevin P Smith
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - James Tanaka
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Owen A Hoekenga
- Cayuga Genetics Consulting Group LLC, Ithaca, NY, 14850, USA
| | - Mark E Sorrells
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Jean-Luc Jannink
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
- USDA-ARS, Robert W. Holley Center for Agriculture and Health, Ithaca, NY, 14853, USA
| |
Collapse
|
14
|
Schrauf MF, de los Campos G, Munilla S. Comparing Genomic Prediction Models by Means of Cross Validation. FRONTIERS IN PLANT SCIENCE 2021; 12:734512. [PMID: 34868117 PMCID: PMC8639521 DOI: 10.3389/fpls.2021.734512] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 10/26/2021] [Indexed: 06/13/2023]
Abstract
In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called "hyper-parameters"). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.
Collapse
Affiliation(s)
- Matías F. Schrauf
- Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina
- Animal Breeding & Genomics, Wageningen Livestock Research, Wageningen University & Research, Wageningen, Netherlands
| | - Gustavo de los Campos
- Departments of Epidemiology, Biostatistics, Statistics, and Probabilty, Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, United States
| | - Sebastián Munilla
- Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina
- Instituto de Investigaciones en Producción Animal (INPA), CONICET-Universidad de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
15
|
Montesinos-López A, Runcie DE, Ibba MI, Pérez-Rodríguez P, Montesinos-López OA, Crespo LA, Bentley AR, Crossa J. Multi-trait genomic-enabled prediction enhances accuracy in multi-year wheat breeding trials. G3-GENES GENOMES GENETICS 2021; 11:6332007. [PMID: 34568924 PMCID: PMC8496321 DOI: 10.1093/g3journal/jkab270] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 07/25/2021] [Indexed: 11/14/2022]
Abstract
Implementing genomic-based prediction models in genomic selection requires an understanding of the measures for evaluating prediction accuracy from different models and methods using multi-trait data. In this study, we compared prediction accuracy using six large multi-trait wheat data sets (quality and grain yield). The data were used to predict 1 year (testing) from the previous year (training) to assess prediction accuracy using four different prediction models. The results indicated that the conventional Pearson’s correlation between observed and predicted values underestimated the true correlation value, whereas the corrected Pearson’s correlation calculated by fitting a bivariate model was higher than the division of the Pearson’s correlation by the squared root of the heritability across traits, by 2.53–11.46%. Across the datasets, the corrected Pearson’s correlation was higher than the uncorrected by 5.80–14.01%. Overall, we found that for grain yield the prediction performance was highest using a multi-trait compared to a single-trait model. The higher the absolute genetic correlation between traits the greater the benefits of multi-trait models for increasing the genomic-enabled prediction accuracy of traits.
Collapse
Affiliation(s)
- Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico
| | - Daniel E Runcie
- Department of Plant Sciences, College of Agricultural & Environmental Sciences, University of California Davis, Davis CA 95616, USA
| | - Maria Itria Ibba
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, México
| | | | | | - Leonardo A Crespo
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, México
| | - Alison R Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, México
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, México.,Colegio de Postgraduados (COLPOS), Montecillos, Edo. de México, México
| |
Collapse
|
16
|
Abstract
Tradeoffs among plant traits help maintain relative fitness under unpredictable conditions and maximize reproductive success. However, modifying tradeoffs is a breeding challenge since many genes of minor effect are involved. The intensive crosstalk and fine-tuning between growth and defense responsive phytohormones via transcription factors optimizes growth, reproduction, and stress tolerance. There are regulating genes in grain crops that deploy diverse functions to overcome tradeoffs, e.g., miR-156-IPA1 regulates crosstalk between growth and defense to achieve high disease resistance and yield, while OsALDH2B1 loss of function causes imbalance among defense, growth, and reproduction in rice. GNI-A1 regulates seed number and weight in wheat by suppressing distal florets and altering assimilate distribution of proximal seeds in spikelets. Knocking out ABA-induced transcription repressors (AITRs) enhances abiotic stress adaptation without fitness cost in Arabidopsis. Deploying AITRs homologs in grain crops may facilitate breeding. This knowledge suggests overcoming tradeoffs through breeding may expose new ones.
Collapse
Affiliation(s)
| | | | - Rodomiro Ortiz
- Swedish University of Agricultural Sciences (SLU), Alnarp, Sweden
| |
Collapse
|
17
|
Brault C, Doligez A, Cunff L, Coupel-Ledru A, Simonneau T, Chiquet J, This P, Flutre T. Harnessing multivariate, penalized regression methods for genomic prediction and QTL detection of drought-related traits in grapevine. G3-GENES GENOMES GENETICS 2021; 11:6325507. [PMID: 34544146 PMCID: PMC8496232 DOI: 10.1093/g3journal/jkab248] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 07/02/2021] [Indexed: 11/13/2022]
Abstract
Viticulture has to cope with climate change and to decrease pesticide inputs, while maintaining yield and wine quality. Breeding is a key lever to meet this challenge, and genomic prediction a promising tool to accelerate breeding programs. Multivariate methods are potentially more accurate than univariate ones. Moreover, some prediction methods also provide marker selection, thus allowing quantitative trait loci (QTLs) detection and the identification of positional candidate genes. To study both genomic prediction and QTL detection for drought-related traits in grapevine, we applied several methods, interval mapping (IM) as well as univariate and multivariate penalized regression, in a bi-parental progeny. With a dense genetic map, we simulated two traits under four QTL configurations. The penalized regression method Elastic Net (EN) for genomic prediction, and controlling the marginal False Discovery Rate on EN selected markers to prioritize the QTLs. Indeed, penalized methods were more powerful than IM for QTL detection across various genetic architectures. Multivariate prediction did not perform better than its univariate counterpart, despite strong genetic correlation between traits. Using 14 traits measured in semi-controlled conditions under different watering conditions, penalized regression methods proved very efficient for intra-population prediction whatever the genetic architecture of the trait, with predictive abilities reaching 0.68. Compared to a previous study on the same traits, these methods applied on a denser map found new QTLs controlling traits linked to drought tolerance and provided relevant candidate genes. Overall, these findings provide a strong evidence base for implementing genomic prediction in grapevine breeding.
Collapse
Affiliation(s)
- Charlotte Brault
- Institut Français de la Vigne et du Vin, Montpellier F-34398, France.,UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier F-34398, France.,UMT Geno-Vigne®, IFV-INRAE-Institut Agro, Montpellier F-34398, France
| | - Agnès Doligez
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier F-34398, France.,UMT Geno-Vigne®, IFV-INRAE-Institut Agro, Montpellier F-34398, France
| | - Le Cunff
- Institut Français de la Vigne et du Vin, Montpellier F-34398, France.,UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier F-34398, France.,UMT Geno-Vigne®, IFV-INRAE-Institut Agro, Montpellier F-34398, France
| | - Aude Coupel-Ledru
- LEPSE, Univ Montpellier, INRAE, Institut Agro, Montpellier 34000, France
| | - Thierry Simonneau
- LEPSE, Univ Montpellier, INRAE, Institut Agro, Montpellier 34000, France
| | | | - Patrice This
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier F-34398, France.,UMT Geno-Vigne®, IFV-INRAE-Institut Agro, Montpellier F-34398, France
| | - Timothée Flutre
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Gif-sur-Yvette 91190, France
| |
Collapse
|
18
|
de Sousa K, van Etten J, Poland J, Fadda C, Jannink JL, Kidane YG, Lakew BF, Mengistu DK, Pè ME, Solberg SØ, Dell'Acqua M. Data-driven decentralized breeding increases prediction accuracy in a challenging crop production environment. Commun Biol 2021; 4:944. [PMID: 34413464 PMCID: PMC8376984 DOI: 10.1038/s42003-021-02463-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 07/16/2021] [Indexed: 02/07/2023] Open
Abstract
Crop breeding must embrace the broad diversity of smallholder agricultural systems to ensure food security to the hundreds of millions of people living in challenging production environments. This need can be addressed by combining genomics, farmers' knowledge, and environmental analysis into a data-driven decentralized approach (3D-breeding). We tested this idea as a proof-of-concept by comparing a durum wheat (Triticum durum Desf.) decentralized trial distributed as incomplete blocks in 1,165 farmer-managed fields across the Ethiopian highlands with a benchmark representing genomic prediction applied to conventional breeding. We found that 3D-breeding could double the prediction accuracy of the benchmark. 3D-breeding could identify genotypes with enhanced local adaptation providing superior productive performance across seasons. We propose this decentralized approach to leverage the diversity in farmer fields and complement conventional plant breeding to enhance local adaptation in challenging crop production environments.
Collapse
Affiliation(s)
- Kauê de Sousa
- Department of Agricultural Sciences, Inland Norway University of Applied Sciences, Hamar, Norway
- Digital Inclusion, Bioversity International, Montpellier, France
| | - Jacob van Etten
- Digital Inclusion, Bioversity International, Montpellier, France
| | - Jesse Poland
- Department of Plant Pathology, Kansas State University, Manhattan, KS, USA
| | - Carlo Fadda
- Biodiversity for Food and Agriculture, Bioversity International, Nairobi, Kenya
| | - Jean-Luc Jannink
- College of Agriculture and Life Sciences, Cornell University, Ithaca, NY, USA
- Agricultural Research Service, United States Department of Agriculture, Ithaca, NY, USA
| | - Yosef Gebrehawaryat Kidane
- Biodiversity for Food and Agriculture, Bioversity International, Nairobi, Kenya
- Institute of Life Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
| | - Basazen Fantahun Lakew
- Biodiversity for Food and Agriculture, Bioversity International, Nairobi, Kenya
- Ethiopian Biodiversity Institute, Addis Ababa, Ethiopia
| | - Dejene Kassahun Mengistu
- Biodiversity for Food and Agriculture, Bioversity International, Nairobi, Kenya
- Institute of Life Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
| | - Mario Enrico Pè
- Institute of Life Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
| | - Svein Øivind Solberg
- Department of Agricultural Sciences, Inland Norway University of Applied Sciences, Hamar, Norway
| | - Matteo Dell'Acqua
- Institute of Life Sciences, Scuola Superiore Sant'Anna, Pisa, Italy.
| |
Collapse
|
19
|
Runcie DE, Qu J, Cheng H, Crawford L. MegaLMM: Mega-scale linear mixed models for genomic predictions with thousands of traits. Genome Biol 2021; 22:213. [PMID: 34301310 PMCID: PMC8299638 DOI: 10.1186/s13059-021-02416-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 06/23/2021] [Indexed: 12/21/2022] Open
Abstract
Large-scale phenotype data can enhance the power of genomic prediction in plant and animal breeding, as well as human genetics. However, the statistical foundation of multi-trait genomic prediction is based on the multivariate linear mixed effect model, a tool notorious for its fragility when applied to more than a handful of traits. We present MegaLMM, a statistical framework and associated software package for mixed model analyses of a virtually unlimited number of traits. Using three examples with real plant data, we show that MegaLMM can leverage thousands of traits at once to significantly improve genetic value prediction accuracy.
Collapse
Affiliation(s)
- Daniel E. Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA USA
| | - Jiayi Qu
- Department of Plant Sciences, University of California Davis, Davis, CA USA
| | - Hao Cheng
- Department of Plant Sciences, University of California Davis, Davis, CA USA
| | | |
Collapse
|
20
|
Arouisse B, Theeuwen TPJM, van Eeuwijk FA, Kruijer W. Improving Genomic Prediction Using High-Dimensional Secondary Phenotypes. Front Genet 2021; 12:667358. [PMID: 34108993 PMCID: PMC8181460 DOI: 10.3389/fgene.2021.667358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 04/14/2021] [Indexed: 11/17/2022] Open
Abstract
In the past decades, genomic prediction has had a large impact on plant breeding. Given the current advances of high-throughput phenotyping and sequencing technologies, it is increasingly common to observe a large number of traits, in addition to the target trait of interest. This raises the important question whether these additional or “secondary” traits can be used to improve genomic prediction for the target trait. With only a small number of secondary traits, this is known to be the case, given sufficiently high heritabilities and genetic correlations. Here we focus on the more challenging situation with a large number of secondary traits, which is increasingly common since the arrival of high-throughput phenotyping. In this case, secondary traits are usually incorporated through additional relatedness matrices. This approach is however infeasible when secondary traits are not measured on the test set, and cannot distinguish between genetic and non-genetic correlations. An alternative direction is to extend the classical selection indices using penalized regression. So far, penalized selection indices have not been applied in a genomic prediction setting, and require plot-level data in order to reliably estimate genetic correlations. Here we aim to overcome these limitations, using two novel approaches. Our first approach relies on a dimension reduction of the secondary traits, using either penalized regression or random forests (LS-BLUP/RF-BLUP). We then compute the bivariate GBLUP with the dimension reduction as secondary trait. For simulated data (with available plot-level data), we also use bivariate GBLUP with the penalized selection index as secondary trait (SI-BLUP). In our second approach (GM-BLUP), we follow existing multi-kernel methods but replace secondary traits by their genomic predictions, with the advantage that genomic prediction is also possible when secondary traits are only measured on the training set. For most of our simulated data, SI-BLUP was most accurate, often closely followed by RF-BLUP or LS-BLUP. In real datasets, involving metabolites in Arabidopsis and transcriptomics in maize, no method could substantially improve over univariate prediction when secondary traits were only available on the training set. LS-BLUP and RF-BLUP were most accurate when secondary traits were available also for the test set.
Collapse
Affiliation(s)
- Bader Arouisse
- Biometris, Wageningen University and Research, Wageningen, Netherlands
| | - Tom P J M Theeuwen
- Laboratory of Genetics, Wageningen University and Research, Wageningen, Netherlands
| | | | - Willem Kruijer
- Biometris, Wageningen University and Research, Wageningen, Netherlands
| |
Collapse
|
21
|
Michel S, Wagner C, Nosenko T, Steiner B, Samad-Zamini M, Buerstmayr M, Mayer K, Buerstmayr H. Merging Genomics and Transcriptomics for Predicting Fusarium Head Blight Resistance in Wheat. Genes (Basel) 2021; 12:114. [PMID: 33477759 PMCID: PMC7832326 DOI: 10.3390/genes12010114] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 01/14/2021] [Accepted: 01/16/2021] [Indexed: 01/13/2023] Open
Abstract
Genomic selection with genome-wide distributed molecular markers has evolved into a well-implemented tool in many breeding programs during the last decade. The resistance against Fusarium head blight (FHB) in wheat is probably one of the most thoroughly studied systems within this framework. Aside from the genome, other biological strata like the transcriptome have likewise shown some potential in predictive breeding strategies but have not yet been investigated for the FHB-wheat pathosystem. The aims of this study were thus to compare the potential of genomic with transcriptomic prediction, and to assess the merit of blending incomplete transcriptomic with complete genomic data by the single-step method. A substantial advantage of gene expression data over molecular markers has been observed for the prediction of FHB resistance in the studied diversity panel of breeding lines and released cultivars. An increase in prediction ability was likewise found for the single-step predictions, although this can mostly be attributed to an increased accuracy among the RNA-sequenced genotypes. The usage of transcriptomics can thus be seen as a complement to already established predictive breeding pipelines with pedigree and genomic data, particularly when more cost-efficient multiplexing techniques for RNA-sequencing will become more accessible in the future.
Collapse
Affiliation(s)
- Sebastian Michel
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| | - Christian Wagner
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| | - Tetyana Nosenko
- PGSB Plant Genome and Systems Biology, Helmholtz Center Munich, German Research Center for Environmental Health, 85764 Neuherberg, Germany; (T.N.); (K.M.)
- Research Unit Environmental Simulation (EUS) at the Institute of Biochemical Plant Pathology (BIOP), Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Barbara Steiner
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| | - Mina Samad-Zamini
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
- Saatzucht Edelhof GmbH, 3910 Zwettl, Austria
| | - Maria Buerstmayr
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| | - Klaus Mayer
- PGSB Plant Genome and Systems Biology, Helmholtz Center Munich, German Research Center for Environmental Health, 85764 Neuherberg, Germany; (T.N.); (K.M.)
| | - Hermann Buerstmayr
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| |
Collapse
|
22
|
Brunes LC, Baldi F, Lopes FB, Narciso MG, Lobo RB, Espigolan R, Costa MFO, Magnabosco CU. Genomic prediction ability for feed efficiency traits using different models and pseudo-phenotypes under several validation strategies in Nelore cattle. Animal 2020; 15:100085. [PMID: 33573965 DOI: 10.1016/j.animal.2020.100085] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 09/09/2020] [Accepted: 09/15/2020] [Indexed: 10/22/2022] Open
Abstract
There is a growing interest to improve feed efficiency (FE) traits in cattle. The genomic selection was proposed to improve these traits since they are difficult and expensive to measure. Up to date, there are scarce studies about the implementation of genomic selection for FE traits in indicine cattle under different scenarios of pseudo-phenotypes, models, and validation strategies on a commercial large scale. Thus, the aim was to evaluate the feasibility of genomic selection implementation for FE traits in Nelore cattle applying different models and pseudo-phenotypes under validation strategies. Phenotypic and genotypic information from 4 329 and 3 467 animals were used, respectively, which were tested for residual feed intake, DM intake, feed efficiency, feed conversion ratio, residual BW gain, and residual intake and BW gain. Six prediction methods were used: single-step genomic best linear unbiased prediction, Bayes A, Bayes B, Bayes Cπ, Bayesian least absolute shrinkage and selection operator (BLASSO), and Bayes R. Phenotypes adjusted for fixed effects (Y*), estimated breeding value (EBV), and EBV deregressed (DEBV) were used as pseudo-phenotypes. The validation approaches used were: (1) random: the data was randomly divided into ten subsets and the validation was done in each subset at a time; (2) age: the partition into training and testing sets was based on year of birth and testing animals were born after 2016; and (3) EBV accuracy: the data was split into two groups, being animals with accuracy above 0.45 the training set; and below 0.45 the validation set. In the analyses that used the Y* as pseudo-phenotype, prediction ability (PA) was obtained by dividing the correlation between pseudo-phenotype and genomic EBV (GEBV) by the square root of the heritability of the trait. When EBV and DEBV were used as the pseudo-phenotype, the simple correlation of this quantity with the GEBV was considered as PA. The prediction methods show similar results for PA and bias. The random cross-validation presented higher PA (0.17) than EBV accuracy (0.14) and age (0.13). The PA was higher for Y* than for EBV and DEBV (30.0 and 34.3%, respectively). Random validation presented the highest PA, being indicated for use in populations composed mainly of young animals and traits with few generations of data recording. For high heritability traits, the validation can be done by age, enabling the prediction of the next-generation genetic merit. These results would support breeders to identify genomic approaches that are more viable for genomic prediction for FE-related traits.
Collapse
Affiliation(s)
- L C Brunes
- Animal Science Department, Goiás Federal University, 74690-900 Goiânia, GO, Brazil; Embrapa Rice and Beans, GO-462, km 12, 75375-000 Santo Antônio de Goiás, GO, Brazil.
| | - F Baldi
- Animal Science Department, São Paulo State University - Júlio de Mesquita Filho (UNESP), Prof. Paulo Donato Castelane, 14884-900 Jaboticabal, SP, Brazil
| | - F B Lopes
- Cobb-Vantress, Inc., 72761 Siloam Springs, AR, USA
| | - M G Narciso
- Embrapa Rice and Beans, GO-462, km 12, 75375-000 Santo Antônio de Goiás, GO, Brazil
| | - R B Lobo
- National Association of Breeders and Researchers, 14020-230 Ribeirão Preto, Brazil
| | - R Espigolan
- Department of Veterinary Medicine, Faculty of Animal Science and Food Engineering, University of Sao Paulo, 13635-900 Pirassununga, SP, Brazil
| | - M F O Costa
- Embrapa Rice and Beans, GO-462, km 12, 75375-000 Santo Antônio de Goiás, GO, Brazil
| | - C U Magnabosco
- Embrapa Cerrados, BR-020, 18 Sobradinho, 70770-901 Brasilia, DF, Brazil
| |
Collapse
|