1
|
Crossa J, Montesinos-Lopez OA, Costa-Neto G, Vitale P, Martini JWR, Runcie D, Fritsche-Neto R, Montesinos-Lopez A, Pérez-Rodríguez P, Gerard G, Dreisigacker S, Crespo-Herrera L, Pierre CS, Lillemo M, Cuevas J, Bentley A, Ortiz R. Machine learning algorithms translate big data into predictive breeding accuracy. TRENDS IN PLANT SCIENCE 2025; 30:167-184. [PMID: 39462718 DOI: 10.1016/j.tplants.2024.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 08/23/2024] [Accepted: 09/23/2024] [Indexed: 10/29/2024]
Abstract
Statistical machine learning (ML) extracts patterns from extensive genomic, phenotypic, and environmental data. ML algorithms automatically identify relevant features and use cross-validation to ensure robust models and improve prediction reliability in new lines. Furthermore, ML analyses of genotype-by-environment (G×E) interactions can offer insights into the genetic factors that affect performance in specific environments. By leveraging historical breeding data, ML streamlines strategies and automates analyses to reveal genomic patterns. In this review we examine the transformative impact of big data, including multi-trait genomics, phenomics, and environmental covariables, on genomic-enabled prediction in plant breeding. We discuss how big data and ML are revolutionizing the field by enhancing prediction accuracy, deepening our understanding of G×E interactions, and optimizing breeding strategies through the analysis of extensive and diverse datasets.
Collapse
Affiliation(s)
- José Crossa
- Louisiana State University, College of Agriculture, Baton Rouge, LA, USA; Colegio de Postgraduados, Montecillos, CP 56230, Estado de México, Mexico; International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico; Department of Statistics and Operations Research and Distinguished Scientist Fellowship Program, King Saud University, Riyadh 11451, Saudi Arabia
| | | | | | - Paolo Vitale
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico
| | | | - Daniel Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA, USA
| | | | - Abelardo Montesinos-Lopez
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430 Guadalajara, Jalisco, Mexico
| | | | - Guillermo Gerard
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico
| | - Susanna Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico
| | - Leonardo Crespo-Herrera
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico
| | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico
| | - Morten Lillemo
- Norwegian University of Life Science (NMBU), Department of Plant Science, Ås, Norway
| | - Jaime Cuevas
- Universidad de Quintana Roo, Chetumal, Quintana Roo, 77019, Mexico
| | - Alison Bentley
- Australian National University, Research School of Biology, Canberra, NSW, Australia.
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), PO Box 190 Sundsvagen 10, SE 23422 Lomma, Sweden.
| |
Collapse
|
2
|
Nova A, Bourguiba-Hachemi S, Vince N, Gourraud PA, Bernardinelli L, Fazia T. Disentangling Multiple Sclerosis heterogeneity in the French territory among genetic and environmental factors via Bayesian heritability analysis. Mult Scler Relat Disord 2024; 88:105730. [PMID: 38880029 DOI: 10.1016/j.msard.2024.105730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 06/07/2024] [Accepted: 06/12/2024] [Indexed: 06/18/2024]
Abstract
BACKGROUND This study aimed to investigate the factors contributing to the variability of Multiple Sclerosis (MS) among individuals born and residing in France. Geographical variation in MS prevalence was observed in France, but the role of genetic and environmental factors in explaining this heterogeneity has not been yet elucidated. METHODS We employed a heritability analysis on a cohort of 403 trios with an MS-affected proband in the French population. This sample was retrieved from REFGENSEP register of MS cases collected in 23 French hospital centers from 1992 to 2017. Our objective was to quantify the proportion of MS liability variability explained by genetic variability, sex, shared environment effects, region of birth and year of birth. We further considered gene x environment (GxE) interaction effects between genetic variability and region of birth. We have implemented a Bayesian liability threshold model to obtain posterior distributions for the parameters of interest adjusting for ascertainment bias. RESULTS Our analysis revealed that GxE interaction effects between genetic variability and region of birth represent the primary significant explanatory factor for MS liability variability in French individuals (29 % [95 %CI: 5 %; 53 %]), suggesting that additive genetic effects are modified by environmental factors associated to the region of birth. The individual contributions of genetic variability and region of birth explained, respectively, ≈15 % and ≈16 % of MS variability, highlighting a significantly higher MS liability in individuals born in the Northern regions compared to the Southern region. Overall, the joint contribution of genetic variability, region of birth, and their interaction was then estimated to explain 65 % [95 %CI: 35 %; 92 %] of MS liability variability. The remaining proportion of MS variability is attributed to environmental exposures associated with the year of birth, shared within the same household, and specific to individuals. CONCLUSION Overall, our analysis highlighted the interaction between genetic variability and environmental exposures linked to the region of birth as the main factor explaining MS variability within individuals born and residing in France. Among the environmental exposures prevalent in the Northern regions, and potentially interacting with genetic variability, lower vitamin D levels due to reduced sun exposure, higher obesity prevalence and higher pollution levels represent the main risk factors in influencing MS risk. These findings emphasize the importance of accounting for environmental factors linked to geographical location in the investigation of MS risk factors, as well as to further explore the influence of GxE interactions in modifying genetic risk.
Collapse
Affiliation(s)
- Andrea Nova
- Department of Brain and Behavioral Sciences, University of Pavia, Via Agostino Bassi 21, Pavia 27100, Italy.
| | - Sonia Bourguiba-Hachemi
- UMR 1064, Center for Research in Transplantation and Translational Immunology, Nantes Université, CHU Nantes, INSERM, Nantes F-44000, France
| | - Nicolas Vince
- UMR 1064, Center for Research in Transplantation and Translational Immunology, Nantes Université, CHU Nantes, INSERM, Nantes F-44000, France
| | - Pierre-Antoine Gourraud
- UMR 1064, Center for Research in Transplantation and Translational Immunology, Nantes Université, CHU Nantes, INSERM, Nantes F-44000, France
| | - Luisa Bernardinelli
- Department of Brain and Behavioral Sciences, University of Pavia, Via Agostino Bassi 21, Pavia 27100, Italy
| | - Teresa Fazia
- Department of Brain and Behavioral Sciences, University of Pavia, Via Agostino Bassi 21, Pavia 27100, Italy
| |
Collapse
|
3
|
Xu Y, Zhang Y, Cui Y, Zhou K, Yu G, Yang W, Wang X, Li F, Guan X, Zhang X, Yang Z, Xu S, Xu C. GA-GBLUP: leveraging the genetic algorithm to improve the predictability of genomic selection. Brief Bioinform 2024; 25:bbae385. [PMID: 39101500 PMCID: PMC11299030 DOI: 10.1093/bib/bbae385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 07/03/2024] [Accepted: 07/24/2024] [Indexed: 08/06/2024] Open
Abstract
Genomic selection (GS) has emerged as an effective technology to accelerate crop hybrid breeding by enabling early selection prior to phenotype collection. Genomic best linear unbiased prediction (GBLUP) is a robust method that has been routinely used in GS breeding programs. However, GBLUP assumes that markers contribute equally to the total genetic variance, which may not be the case. In this study, we developed a novel GS method called GA-GBLUP that leverages the genetic algorithm (GA) to select markers related to the target trait. We defined four fitness functions for optimization, including AIC, BIC, R2, and HAT, to improve the predictability and bin adjacent markers based on the principle of linkage disequilibrium to reduce model dimension. The results demonstrate that the GA-GBLUP model, equipped with R2 and HAT fitness function, produces much higher predictability than GBLUP for most traits in rice and maize datasets, particularly for traits with low heritability. Moreover, we have developed a user-friendly R package, GAGBLUP, for GS, and the package is freely available on CRAN (https://CRAN.R-project.org/package=GAGBLUP).
Collapse
Affiliation(s)
- Yang Xu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou, Jiangsu 225009, China
| | - Yuxiang Zhang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou, Jiangsu 225009, China
| | - Yanru Cui
- College of Agronomy, Hebei Agricultural University, Baoding, Hebei 071001, China
| | - Kai Zhou
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou, Jiangsu 225009, China
| | - Guangning Yu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou, Jiangsu 225009, China
| | - Wenyan Yang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou, Jiangsu 225009, China
| | - Xin Wang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou, Jiangsu 225009, China
| | - Furong Li
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou, Jiangsu 225009, China
| | - Xiusheng Guan
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou, Jiangsu 225009, China
| | - Xuecai Zhang
- Global Maize Program, International Maize and Wheat Improvement Centre, Texcoco 56237, Mexico
| | - Zefeng Yang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou, Jiangsu 225009, China
| | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, United States
| | - Chenwu Xu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou, Jiangsu 225009, China
| |
Collapse
|
4
|
Chen C, Bhuiyan SA, Ross E, Powell O, Dinglasan E, Wei X, Atkin F, Deomano E, Hayes B. Genomic prediction for sugarcane diseases including hybrid Bayesian-machine learning approaches. FRONTIERS IN PLANT SCIENCE 2024; 15:1398903. [PMID: 38751840 PMCID: PMC11095127 DOI: 10.3389/fpls.2024.1398903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 04/15/2024] [Indexed: 05/18/2024]
Abstract
Sugarcane smut and Pachymetra root rots are two serious diseases of sugarcane, with susceptible infected crops losing over 30% of yield. A heritable component to both diseases has been demonstrated, suggesting selection could improve disease resistance. Genomic selection could accelerate gains even further, enabling early selection of resistant seedlings for breeding and clonal propagation. In this study we evaluated four types of algorithms for genomic predictions of clonal performance for disease resistance. These algorithms were: Genomic best linear unbiased prediction (GBLUP), including extensions to model dominance and epistasis, Bayesian methods including BayesC and BayesR, Machine learning methods including random forest, multilayer perceptron (MLP), modified convolutional neural network (CNN) and attention networks designed to capture epistasis across the genome-wide markers. Simple hybrid methods, that first used BayesR/GWAS to identify a subset of 1000 markers with moderate to large marginal additive effects, then used attention networks to derive predictions from these effects and their interactions, were also developed and evaluated. The hypothesis for this approach was that using a subset of markers more likely to have an effect would enable better estimation of interaction effects than when there were an extremely large number of possible interactions, especially with our limited data set size. To evaluate the methods, we applied both random five-fold cross-validation and a structured PCA based cross-validation that separated 4702 sugarcane clones (that had disease phenotypes and genotyped for 26k genome wide SNP markers) by genomic relationship. The Bayesian methods (BayesR and BayesC) gave the highest accuracy of prediction, followed closely by hybrid methods with attention networks. The hybrid methods with attention networks gave the lowest variation in accuracy of prediction across validation folds (and lowest MSE), which may be a criteria worth considering in practical breeding programs. This suggests that hybrid methods incorporating the attention mechanism could be useful for genomic prediction of clonal performance, particularly where non-additive effects may be important.
Collapse
Affiliation(s)
- Chensong Chen
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Shamsul A. Bhuiyan
- Sugar Research Australia, Woodford, QLD, Australia
- Queensland Micro- and Nanotechnology Centre, Griffith University, Nathan, QLD, Australia
| | - Elizabeth Ross
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Owen Powell
- Center for Crop Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Eric Dinglasan
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Xianming Wei
- Sugar Research Australia, Indooroopilly, QLD, Australia
| | | | - Emily Deomano
- Sugar Research Australia, Indooroopilly, QLD, Australia
| | - Ben Hayes
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
5
|
Cuevas J, González-Diéguez D, Dreisigacker S, Martini JWR, Crespo-Herrera L, Lozano-Ramirez N, Singh PK, He X, Huerta J, Crossa J. Modeling within and between Sub-Genomes Epistasis of Synthetic Hexaploid Wheat for Genome-Enabled Prediction of Diseases. Genes (Basel) 2024; 15:262. [PMID: 38540321 PMCID: PMC10970072 DOI: 10.3390/genes15030262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 02/12/2024] [Accepted: 02/18/2024] [Indexed: 06/15/2024] Open
Abstract
Common wheat (Triticum aestivum) is a hexaploid crop comprising three diploid sub-genomes labeled A, B, and D. The objective of this study is to investigate whether there is a discernible influence pattern from the D sub-genome with epistasis in genomic models for wheat diseases. Four genomic statistical models were employed; two models considered the linear genomic relationship of the lines. The first model (G) utilized all molecular markers, while the second model (ABD) utilized three matrices representing the A, B, and D sub-genomes. The remaining two models incorporated epistasis, one (GI) using all markers and the other (ABDI) considering markers in sub-genomes A, B, and D, including inter- and intra-sub-genome interactions. The data utilized pertained to three diseases: tan spot (TS), septoria nodorum blotch (SNB), and spot blotch (SB), for synthetic hexaploid wheat (SHW) lines. The results (variance components) indicate that epistasis makes a substantial contribution to explaining genomic variation, accounting for approximately 50% in SNB and SB and only 29% for TS. In this contribution of epistasis, the influence of intra- and inter-sub-genome interactions of the D sub-genome is crucial, being close to 50% in TS and higher in SNB (60%) and SB (60%). This increase in explaining genomic variation is reflected in an enhancement of predictive ability from the G model (additive) to the ABDI model (additive and epistasis) by 9%, 5%, and 1% for SNB, SB, and TS, respectively. These results, in line with other studies, underscore the significance of the D sub-genome in disease traits and suggest a potential application to be explored in the future regarding the selection of parental crosses based on sub-genomes.
Collapse
Affiliation(s)
- Jaime Cuevas
- Departamento de Energía, Universidad Autónoma del Estado de Quintana Roo, Chetumal 77019, Quintana Roo, Mexico;
| | - David González-Diéguez
- International Maize and Wheat Improvement Center (CIMMYT), Km. 45, Carretera México-Veracruz, Texcoco 56237, Edo. de México, Mexico; (D.G.-D.); (S.D.); (J.W.R.M.); (L.C.-H.); (P.K.S.); (X.H.); (J.H.)
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Km. 45, Carretera México-Veracruz, Texcoco 56237, Edo. de México, Mexico; (D.G.-D.); (S.D.); (J.W.R.M.); (L.C.-H.); (P.K.S.); (X.H.); (J.H.)
| | - Johannes W. R. Martini
- International Maize and Wheat Improvement Center (CIMMYT), Km. 45, Carretera México-Veracruz, Texcoco 56237, Edo. de México, Mexico; (D.G.-D.); (S.D.); (J.W.R.M.); (L.C.-H.); (P.K.S.); (X.H.); (J.H.)
| | - Leo Crespo-Herrera
- International Maize and Wheat Improvement Center (CIMMYT), Km. 45, Carretera México-Veracruz, Texcoco 56237, Edo. de México, Mexico; (D.G.-D.); (S.D.); (J.W.R.M.); (L.C.-H.); (P.K.S.); (X.H.); (J.H.)
| | - Nerida Lozano-Ramirez
- International Maize and Wheat Improvement Center (CIMMYT), Km. 45, Carretera México-Veracruz, Texcoco 56237, Edo. de México, Mexico; (D.G.-D.); (S.D.); (J.W.R.M.); (L.C.-H.); (P.K.S.); (X.H.); (J.H.)
| | - Pawan K. Singh
- International Maize and Wheat Improvement Center (CIMMYT), Km. 45, Carretera México-Veracruz, Texcoco 56237, Edo. de México, Mexico; (D.G.-D.); (S.D.); (J.W.R.M.); (L.C.-H.); (P.K.S.); (X.H.); (J.H.)
| | - Xinyao He
- International Maize and Wheat Improvement Center (CIMMYT), Km. 45, Carretera México-Veracruz, Texcoco 56237, Edo. de México, Mexico; (D.G.-D.); (S.D.); (J.W.R.M.); (L.C.-H.); (P.K.S.); (X.H.); (J.H.)
| | - Julio Huerta
- International Maize and Wheat Improvement Center (CIMMYT), Km. 45, Carretera México-Veracruz, Texcoco 56237, Edo. de México, Mexico; (D.G.-D.); (S.D.); (J.W.R.M.); (L.C.-H.); (P.K.S.); (X.H.); (J.H.)
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km. 45, Carretera México-Veracruz, Texcoco 56237, Edo. de México, Mexico; (D.G.-D.); (S.D.); (J.W.R.M.); (L.C.-H.); (P.K.S.); (X.H.); (J.H.)
- Colegio de Postgraduados (COLPOS), Montecillos 56230, Edo. de México, Mexico
| |
Collapse
|
6
|
Chen C, Powell O, Dinglasan E, Ross EM, Yadav S, Wei X, Atkin F, Deomano E, Hayes BJ. Genomic prediction with machine learning in sugarcane, a complex highly polyploid clonally propagated crop with substantial non-additive variation for key traits. THE PLANT GENOME 2023; 16:e20390. [PMID: 37728221 DOI: 10.1002/tpg2.20390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 08/01/2023] [Accepted: 08/29/2023] [Indexed: 09/21/2023]
Abstract
Sugarcane has a complex, highly polyploid genome with multi-species ancestry. Additive models for genomic prediction of clonal performance might not capture interactions between genes and alleles from different ploidies and ancestral species. As such, genomic prediction in sugarcane presents an interesting case for machine learning (ML) methods, which are purportedly able to deal with high levels of complexity in prediction. Here, we investigated deep learning (DL) neural networks, including multilayer networks (MLP) and convolution neural networks (CNN), and an ensemble machine learning approach, random forest (RF), for genomic prediction in sugarcane. The data set used was 2912 sugarcane clones, scored for 26,086 genome wide single nucleotide polymorphism markers, with final assessment trial data for total cane harvested (TCH), commercial cane sugar (CCS), and fiber content (Fiber). The clones in the latest trial (2017) were used as a validation set. We compared prediction accuracy of these methods to genomic best linear unbiased prediction (GBLUP) extended to include dominance and epistatic effects. The prediction accuracies from GBLUP models were up to 0.37 for TCH, 0.43 for CCS, and 0.48 for Fiber, while the optimized ML models had prediction accuracies of 0.35 for TCH, 0.38 for CCS, and 0.48 for Fiber. Both RF and DL neural network models have comparable predictive ability with the additive GBLUP model but are less accurate than the extended GBLUP model.
Collapse
Affiliation(s)
- Chensong Chen
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Owen Powell
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Eric Dinglasan
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Elizabeth M Ross
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Seema Yadav
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | | | | | | | - Ben J Hayes
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| |
Collapse
|
7
|
Liang Z, Prakapenka D, Da Y. Comparison of the Accuracy of Epistasis and Haplotype Models for Genomic Prediction of Seven Human Phenotypes. Biomolecules 2023; 13:1478. [PMID: 37892160 PMCID: PMC10604971 DOI: 10.3390/biom13101478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/26/2023] [Accepted: 09/28/2023] [Indexed: 10/29/2023] Open
Abstract
The accuracy of predicting seven human phenotypes of 3657-7564 individuals using global epistasis effects was evaluated and compared to the accuracy of haplotype genomic prediction using 380,705 SNPs and 10-fold cross-validation studies. The seven human phenotypes were the normality transformed high density lipoproteins (HDL), low density lipoproteins (LDL), total cholesterol (TC), triglycerides (TG), weight (WT), and the original phenotypic observations of height (HTo) and body mass index (BMIo). Fourth-order epistasis effects virtually had no contribution to the phenotypic variances, and third-order epistasis effects did not affect the prediction accuracy. Without haplotype effects in the prediction model, pairwise epistasis effects improved the prediction accuracy over the SNP models for six traits, with accuracy increases of 2.41%, 3.85%, 0.70%, 0.97%, 0.62% and 0.93% for HDL, LDL, TC, HTo, WT and BMIo respectively. However, none of the epistasis models had higher prediction accuracy than the haplotype models we previously reported. The epistasis model for TG decreased the prediction accuracy by 2.35% relative to the accuracy of the SNP model. The integrated models with epistasis and haplotype effects had slightly higher prediction accuracy than the haplotype models for two traits, HDL and BMIo. These two traits were the only traits where additive × dominance effects increased the prediction accuracy. These results indicated that haplotype effects containing local high-order epistasis effects had a tendency to be more important than global pairwise epistasis effects for the seven human phenotypes, and that the genetic mechanism of HDL and BMIo was more complex than that of the other traits.
Collapse
Affiliation(s)
| | | | - Yang Da
- Department of Animal Science, University of Minnesota, Saint Paul, MN 55108, USA; (Z.L.); (D.P.)
| |
Collapse
|
8
|
El Hanafi S, Jiang Y, Kehel Z, Schulthess AW, Zhao Y, Mascher M, Haupt M, Himmelbach A, Stein N, Amri A, Reif JC. Genomic predictions to leverage phenotypic data across genebanks. FRONTIERS IN PLANT SCIENCE 2023; 14:1227656. [PMID: 37701801 PMCID: PMC10493331 DOI: 10.3389/fpls.2023.1227656] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 08/07/2023] [Indexed: 09/14/2023]
Abstract
Genome-wide prediction is a powerful tool in breeding. Initial results suggest that genome-wide approaches are also promising for enhancing the use of the genebank material: predicting the performance of plant genetic resources can unlock their hidden potential and fill the information gap in genebanks across the world and, hence, underpin prebreeding programs. As a proof of concept, we evaluated the power of across-genebank prediction for extensive germplasm collections relying on historical data on flowering/heading date, plant height, and thousand kernel weight of 9,344 barley (Hordeum vulgare L.) plant genetic resources from the German Federal Ex situ Genebank for Agricultural and Horticultural Crops (IPK) and of 1,089 accessions from the International Center for Agriculture Research in the Dry Areas (ICARDA) genebank. Based on prediction abilities for each trait, three scenarios for predictive characterization were compared: 1) a benchmark scenario, where test and training sets only contain ICARDA accessions, 2) across-genebank predictions using IPK as training and ICARDA as test set, and 3) integrated genebank predictions that include IPK with 30% of ICARDA accessions as a training set to predict the rest of ICARDA accessions. Within the population of ICARDA accessions, prediction abilities were low to moderate, which was presumably caused by a limited number of accessions used to train the model. Interestingly, ICARDA prediction abilities were boosted up to ninefold by using training sets composed of IPK plus 30% of ICARDA accessions. Pervasive genotype × environment interactions (GEIs) can become a potential obstacle to train robust genome-wide prediction models across genebanks. This suggests that the potential adverse effect of GEI on prediction ability was counterbalanced by the augmented training set with certain connectivity to the test set. Therefore, across-genebank predictions hold the promise to improve the curation of the world's genebank collections and contribute significantly to the long-term development of traditional genebanks toward biodigital resource centers.
Collapse
Affiliation(s)
- Samira El Hanafi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | - Yong Jiang
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | - Zakaria Kehel
- International Center for Agricultural Research in Dry Areas (ICARDA), Rabat, Morocco
| | - Albert W. Schulthess
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | - Yusheng Zhao
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | - Max Haupt
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | - Axel Himmelbach
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
- Center for Integrated Breeding Research (CiBreed), Georg-August-University, Göttingen, Germany
| | - Ahmed Amri
- International Center for Agricultural Research in Dry Areas (ICARDA), Rabat, Morocco
| | - Jochen C. Reif
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| |
Collapse
|
9
|
Liang Z, Prakapenka D, Parker Gaddis KL, VandeHaar MJ, Weigel KA, Tempelman RJ, Koltes JE, Santos JEP, White HM, Peñagaricano F, Baldwin VI RL, Da Y. Impact of epistasis effects on the accuracy of predicting phenotypic values of residual feed intake in U. S Holstein cows. Front Genet 2022; 13:1017490. [PMID: 36386803 PMCID: PMC9664219 DOI: 10.3389/fgene.2022.1017490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 09/27/2022] [Indexed: 11/06/2022] Open
Abstract
The impact of genomic epistasis effects on the accuracy of predicting the phenotypic values of residual feed intake (RFI) in U.S. Holstein cows was evaluated using 6215 Holstein cows and 78,964 SNPs. Two SNP models and seven epistasis models were initially evaluated. Heritability estimates and the accuracy of predicting the RFI phenotypic values from 10-fold cross-validation studies identified the model with SNP additive effects and additive × additive (A×A) epistasis effects (A + A×A model) to be the best prediction model. Under the A + A×A model, additive heritability was 0.141, and A×A heritability was 0.263 that consisted of 0.260 inter-chromosome A×A heritability and 0.003 intra-chromosome A×A heritability, showing that inter-chromosome A×A effects were responsible for the accuracy increases due to A×A. Under the SNP additive model (A-only model), the additive heritability was 0.171. In the 10 validation populations, the average accuracy for predicting the RFI phenotypic values was 0.246 (with range 0.197-0.333) under A + A×A model and was 0.231 (with range of 0.188-0.319) under the A-only model. The average increase in the accuracy of predicting the RFI phenotypic values by the A + A×A model over the A-only model was 6.49% (with range of 3.02-14.29%). Results in this study showed A×A epistasis effects had a positive impact on the accuracy of predicting the RFI phenotypic values when combined with additive effects in the prediction model.
Collapse
Affiliation(s)
- Zuoxiang Liang
- Department of Animal Science, University of Minnesota, Saint Paul, MN, United States
| | - Dzianis Prakapenka
- Department of Animal Science, University of Minnesota, Saint Paul, MN, United States
| | | | - Michael J. VandeHaar
- Department of Animal Science, Michigan State University, East Lansing, MI, United States
| | - Kent A. Weigel
- Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI, United States
| | - Robert J. Tempelman
- Department of Animal Science, Michigan State University, East Lansing, MI, United States
| | - James E. Koltes
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | | | - Heather M. White
- Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI, United States
| | - Francisco Peñagaricano
- Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI, United States
| | - Ransom L. Baldwin VI
- Animal Genomics and Improvement Laboratory, ARS, USDA, Beltsville, MD, United States
| | - Yang Da
- Department of Animal Science, University of Minnesota, Saint Paul, MN, United States,*Correspondence: Yang Da,
| |
Collapse
|
10
|
Da Y, Liang Z, Prakapenka D. Multifactorial methods integrating haplotype and epistasis effects for genomic estimation and prediction of quantitative traits. Front Genet 2022; 13:922369. [PMID: 36313431 PMCID: PMC9614238 DOI: 10.3389/fgene.2022.922369] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Accepted: 09/12/2022] [Indexed: 11/19/2022] Open
Abstract
The rapid growth in genomic selection data provides unprecedented opportunities to discover and utilize complex genetic effects for improving phenotypes, but the methodology is lacking. Epistasis effects are interaction effects, and haplotype effects may contain local high-order epistasis effects. Multifactorial methods with SNP, haplotype, and epistasis effects up to the third-order are developed to investigate the contributions of global low-order and local high-order epistasis effects to the phenotypic variance and the accuracy of genomic prediction of quantitative traits. These methods include genomic best linear unbiased prediction (GBLUP) with associated reliability for individuals with and without phenotypic observations, including a computationally efficient GBLUP method for large validation populations, and genomic restricted maximum estimation (GREML) of the variance and associated heritability using a combination of EM-REML and AI-REML iterative algorithms. These methods were developed for two models, Model-I with 10 effect types and Model-II with 13 effect types, including intra- and inter-chromosome pairwise epistasis effects that replace the pairwise epistasis effects of Model-I. GREML heritability estimate and GBLUP effect estimate for each effect of an effect type are derived, except for third-order epistasis effects. The multifactorial models evaluate each effect type based on the phenotypic values adjusted for the remaining effect types and can use more effect types than separate models of SNP, haplotype, and epistasis effects, providing a methodology capability to evaluate the contributions of complex genetic effects to the phenotypic variance and prediction accuracy and to discover and utilize complex genetic effects for improving the phenotypes of quantitative traits.
Collapse
Affiliation(s)
- Yang Da
- Department of Animal Science, University of Minnesota, Saint Paul, MN, United States
| | | | | |
Collapse
|
11
|
Varona L, Legarra A, Toro MA, Vitezica ZG. Genomic Prediction Methods Accounting for Nonadditive Genetic Effects. Methods Mol Biol 2022; 2467:219-243. [PMID: 35451778 DOI: 10.1007/978-1-0716-2205-6_8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The use of genomic information for prediction of future phenotypes or breeding values for the candidates to selection has become a standard over the last decade. However, most procedures for genomic prediction only consider the additive (or substitution) effects associated with polymorphic markers. Nevertheless, the implementation of models that consider nonadditive genetic variation may be interesting because they (1) may increase the ability of prediction, (2) can be used to define mate allocation procedures in plant and animal breeding schemes, and (3) can be used to benefit from nonadditive genetic variation in crossbreeding or purebred breeding schemes. This study reviews the available methods for incorporating nonadditive effects into genomic prediction procedures and their potential applications in predicting future phenotypic performance, mate allocation, and crossbred and purebred selection. Finally, a brief outline of some future research lines is also proposed.
Collapse
Affiliation(s)
- Luis Varona
- Departamento de Anatomía, Embriología y Genética Animal, Universidad de Zaragoza, Zaragoza, Spain.
- Instituto Agroalimentario de Aragón (IA2), Zaragoza, Spain.
| | | | - Miguel A Toro
- Dpto. Producción Agraria, ETS Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, Madrid, Spain
| | | |
Collapse
|
12
|
Abstract
In this chapter, we discuss the motivation for integrating other types of omics data into genomic prediction methods. We give an overview of literature investigating the performance of omics-enhanced predictions, and highlight potential pitfalls when applying these methods in breeding. We emphasize that the statistical methods available for genomic data can be transferred to the general omics case. However, when using a framework of omic relationship matrices, the standardization of the variables may be more relevant than it is for a genomic relationship matrix based on single-nucleotide polymorphisms.
Collapse
Affiliation(s)
- Johannes W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Veracruz, CP, Mexico.
| | - Ning Gao
- School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Veracruz, CP, Mexico
| |
Collapse
|
13
|
Vojgani E, Pook T, Martini JWR, Hölker AC, Mayer M, Schön CC, Simianer H. Accounting for epistasis improves genomic prediction of phenotypes with univariate and bivariate models across environments. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:2913-2930. [PMID: 34115154 PMCID: PMC8354961 DOI: 10.1007/s00122-021-03868-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 05/24/2021] [Indexed: 06/12/2023]
Abstract
The accuracy of genomic prediction of phenotypes can be increased by including the top-ranked pairwise SNP interactions into the prediction model. We compared the predictive ability of various prediction models for a maize dataset derived from 910 doubled haploid lines from two European landraces (Kemater Landmais Gelb and Petkuser Ferdinand Rot), which were tested at six locations in Germany and Spain. The compared models were Genomic Best Linear Unbiased Prediction (GBLUP) as an additive model, Epistatic Random Regression BLUP (ERRBLUP) accounting for all pairwise SNP interactions, and selective Epistatic Random Regression BLUP (sERRBLUP) accounting for a selected subset of pairwise SNP interactions. These models have been compared in both univariate and bivariate statistical settings for predictions within and across environments. Our results indicate that modeling all pairwise SNP interactions into the univariate/bivariate model (ERRBLUP) is not superior in predictive ability to the respective additive model (GBLUP). However, incorporating only a selected subset of interactions with the highest effect variances in univariate/bivariate sERRBLUP can increase predictive ability significantly compared to the univariate/bivariate GBLUP. Overall, bivariate models consistently outperform univariate models in predictive ability. Across all studied traits, locations and landraces, the increase in prediction accuracy from univariate GBLUP to univariate sERRBLUP ranged from 5.9 to 112.4 percent, with an average increase of 47 percent. For bivariate models, the change ranged from -0.3 to + 27.9 percent comparing the bivariate sERRBLUP to the bivariate GBLUP, with an average increase of 11 percent. This considerable increase in predictive ability achieved by sERRBLUP may be of interest for "sparse testing" approaches in which only a subset of the lines/hybrids of interest is observed at each location.
Collapse
Affiliation(s)
- Elaheh Vojgani
- Center for Integrated Breeding Research, Animal Breeding and Genetics Group, University of Goettingen, Goettingen, Germany.
| | - Torsten Pook
- Center for Integrated Breeding Research, Animal Breeding and Genetics Group, University of Goettingen, Goettingen, Germany
| | - Johannes W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, State of Mexico, Mexico
| | - Armin C Hölker
- Plant Breeding, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Manfred Mayer
- Plant Breeding, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Chris-Carolin Schön
- Plant Breeding, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Henner Simianer
- Center for Integrated Breeding Research, Animal Breeding and Genetics Group, University of Goettingen, Goettingen, Germany
| |
Collapse
|
14
|
Bayat A, Hosking B, Jain Y, Hosking C, Kodikara M, Reti D, Twine NA, Bauer DC. Fast and accurate exhaustive higher-order epistasis search with BitEpi. Sci Rep 2021; 11:15923. [PMID: 34354094 PMCID: PMC8342486 DOI: 10.1038/s41598-021-94959-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 07/20/2021] [Indexed: 01/03/2023] Open
Abstract
Complex genetic diseases may be modulated by a large number of epistatic interactions affecting a polygenic phenotype. Identifying these interactions is difficult due to computational complexity, especially in the case of higher-order interactions where more than two genomic variants are involved. In this paper, we present BitEpi, a fast and accurate method to test all possible combinations of up to four bi-allelic variants (i.e. Single Nucleotide Variant or SNV for short). BitEpi introduces a novel bitwise algorithm that is 1.7 and 56 times faster for 3-SNV and 4-SNV search, than established software. The novel entropy statistic used in BitEpi is 44% more accurate to identify interactive SNVs, incorporating a p-value-based significance testing. We demonstrate BitEpi on real world data of 4900 samples and 87,000 SNPs. We also present EpiExplorer to visualize the potentially large number of individual and interacting SNVs in an interactive Cytoscape graph. EpiExplorer uses various visual elements to facilitate the discovery of true biological events in a complex polygenic environment.
Collapse
Affiliation(s)
- Arash Bayat
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.,The Kinghorn Cancer Centre, Darlinghurst, NSW, 2010, Australia
| | - Brendan Hosking
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia
| | - Yatish Jain
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.,Department of Biomedical Sciences, Macquarie University, Macquarie Park, NSW, 2113, Australia
| | - Cameron Hosking
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia
| | - Milindi Kodikara
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia
| | - Daniel Reti
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.,Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park, NSW, 2113, Australia
| | - Natalie A Twine
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia.,Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park, NSW, 2113, Australia
| | - Denis C Bauer
- Transformations Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, 2113, Australia. .,Department of Biomedical Sciences, Macquarie University, Macquarie Park, NSW, 2113, Australia. .,Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park, NSW, 2113, Australia.
| |
Collapse
|
15
|
Onogi A, Watanabe T, Ogino A, Kurogi K, Togashi K. Genomic prediction with non-additive effects in beef cattle: stability of variance component and genetic effect estimates against population size. BMC Genomics 2021; 22:512. [PMID: 34233617 PMCID: PMC8262069 DOI: 10.1186/s12864-021-07792-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 06/10/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genomic prediction is now an essential technology for genetic improvement in animal and plant breeding. Whereas emphasis has been placed on predicting the breeding values, the prediction of non-additive genetic effects has also been of interest. In this study, we assessed the potential of genomic prediction using non-additive effects for phenotypic prediction in Japanese Black, a beef cattle breed. In addition, we examined the stability of variance component and genetic effect estimates against population size by subsampling with different sample sizes. RESULTS Records of six carcass traits, namely, carcass weight, rib eye area, rib thickness, subcutaneous fat thickness, yield rate and beef marbling score, for 9850 animals were used for analyses. As the non-additive genetic effects, dominance, additive-by-additive, additive-by-dominance and dominance-by-dominance effects were considered. The covariance structures of these genetic effects were defined using genome-wide SNPs. Using single-trait animal models with different combinations of genetic effects, it was found that 12.6-19.5 % of phenotypic variance were occupied by the additive-by-additive variance, whereas little dominance variance was observed. In cross-validation, adding the additive-by-additive effects had little influence on predictive accuracy and bias. Subsampling analyses showed that estimation of the additive-by-additive effects was highly variable when phenotypes were not available. On the other hand, the estimates of the additive-by-additive variance components were less affected by reduction of the population size. CONCLUSIONS The six carcass traits of Japanese Black cattle showed moderate or relatively high levels of additive-by-additive variance components, although incorporating the additive-by-additive effects did not improve the predictive accuracy. Subsampling analysis suggested that estimation of the additive-by-additive effects was highly reliant on the phenotypic values of the animals to be estimated, as supported by low off-diagonal values of the relationship matrix. On the other hand, estimates of the additive-by-additive variance components were relatively stable against reduction of the population size compared with the estimates of the corresponding genetic effects.
Collapse
Affiliation(s)
- Akio Onogi
- Department of Plant Life Science, Faculty of Agriculture, Ryukoku University, 1-5, Yokotani, Seta, Oe-cho, Shiga, 520-2194, Otsu, Japan.
| | - Toshio Watanabe
- Maebashi Institute of Animal Science, Livestock Improvement Association of Japan, Inc, 371-0121, Maebashi, Japan
| | - Atsushi Ogino
- Maebashi Institute of Animal Science, Livestock Improvement Association of Japan, Inc, 371-0121, Maebashi, Japan
| | - Kazuhito Kurogi
- Cattle Breeding Department, Livestock Improvement Association of Japan, Inc, 135-0041, Tokyo, Japan
| | - Kenji Togashi
- Maebashi Institute of Animal Science, Livestock Improvement Association of Japan, Inc, 371-0121, Maebashi, Japan
| |
Collapse
|
16
|
Zhang J, Liu F, Reif JC, Jiang Y. On the use of GBLUP and its extension for GWAS with additive and epistatic effects. G3-GENES GENOMES GENETICS 2021; 11:6237487. [PMID: 33871030 PMCID: PMC8495923 DOI: 10.1093/g3journal/jkab122] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 04/04/2021] [Indexed: 11/29/2022]
Abstract
Genomic best linear unbiased prediction (GBLUP) is the most widely used model for genome-wide predictions. Interestingly, it is also possible to perform genome-wide association studies (GWAS) based on GBLUP. Although the estimated marker effects in GBLUP are shrunken and the conventional test based on such effects has low power, it was observed that a modified test statistic can be produced and the result of test was identical to a standard GWAS model. Later, a mathematical proof was given for the special case that there is no fixed covariate in GBLUP. Since then, the new approach has been called “GWAS by GBLUP”. Nevertheless, covariates such as environmental and subpopulation effects are very common in GBLUP. Thus, it is necessary to confirm the equivalence in the general case. Recently, the concept was generalized to GWAS for epistatic effects and the new approach was termed rapid epistatic mixed-model association analysis (REMMA) because it greatly improved the computational efficiency. However, the relationship between REMMA and the standard GWAS model has not been investigated. In this study, we first provided a general mathematical proof of the equivalence between “GWAS by GBLUP” and the standard GWAS model for additive effects. Then, we compared REMMA with the standard GWAS model for epistatic effects by a theoretical investigation and by empirical data analyses. We hypothesized that the similarity of the two models is influenced by the relative contribution of additive and epistatic effects to the phenotypic variance, which was verified by empirical and simulation studies.
Collapse
Affiliation(s)
- Jie Zhang
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Stadt Seeland, Germany
| | - Fang Liu
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Stadt Seeland, Germany
| | - Jochen C Reif
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Stadt Seeland, Germany
| | - Yong Jiang
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Stadt Seeland, Germany
| |
Collapse
|