1
|
Chen C, Powell O, Dinglasan E, Ross EM, Yadav S, Wei X, Atkin F, Deomano E, Hayes BJ. Genomic prediction with machine learning in sugarcane, a complex highly polyploid clonally propagated crop with substantial non-additive variation for key traits. THE PLANT GENOME 2023; 16:e20390. [PMID: 37728221 DOI: 10.1002/tpg2.20390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 08/01/2023] [Accepted: 08/29/2023] [Indexed: 09/21/2023]
Abstract
Sugarcane has a complex, highly polyploid genome with multi-species ancestry. Additive models for genomic prediction of clonal performance might not capture interactions between genes and alleles from different ploidies and ancestral species. As such, genomic prediction in sugarcane presents an interesting case for machine learning (ML) methods, which are purportedly able to deal with high levels of complexity in prediction. Here, we investigated deep learning (DL) neural networks, including multilayer networks (MLP) and convolution neural networks (CNN), and an ensemble machine learning approach, random forest (RF), for genomic prediction in sugarcane. The data set used was 2912 sugarcane clones, scored for 26,086 genome wide single nucleotide polymorphism markers, with final assessment trial data for total cane harvested (TCH), commercial cane sugar (CCS), and fiber content (Fiber). The clones in the latest trial (2017) were used as a validation set. We compared prediction accuracy of these methods to genomic best linear unbiased prediction (GBLUP) extended to include dominance and epistatic effects. The prediction accuracies from GBLUP models were up to 0.37 for TCH, 0.43 for CCS, and 0.48 for Fiber, while the optimized ML models had prediction accuracies of 0.35 for TCH, 0.38 for CCS, and 0.48 for Fiber. Both RF and DL neural network models have comparable predictive ability with the additive GBLUP model but are less accurate than the extended GBLUP model.
Collapse
Affiliation(s)
- Chensong Chen
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Owen Powell
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Eric Dinglasan
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Elizabeth M Ross
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Seema Yadav
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | | | | | | | - Ben J Hayes
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| |
Collapse
|
2
|
Ganaparthi VR, Rennberger G, Wechter P, Levi A, Branham SE. Genome-Wide Association Mapping and Genomic Prediction of Fusarium Wilt Race 2 Resistance in the USDA Citrullus amarus Collection. PLANT DISEASE 2023; 107:3836-3842. [PMID: 37386705 DOI: 10.1094/pdis-02-23-0400-re] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Fusarium wilt caused by Fusarium oxysporum f. sp. niveum (Fon) race 2 is a serious disease in watermelon and can reduce yields by 80%. Genome-wide association studies (GWAS) are a valuable tool in dissecting the genetic basis of traits. Citrullus amarus accessions (n = 120) from the USDA germplasm collection were genotyped with whole-genome resequencing, resulting in 2,126,759 single nucleotide polymorphic (SNP) markers that were utilized for GWAS. Three models were used for GWAS with the R package GAPIT. Mixed linear model (MLM) analysis did not identify any significant marker associations. FarmCPU identified four quantitative trait nucleotides (QTN) on three different chromosomes (i.e., chromosomes 1, 5, and 9), and Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK) identified one QTN on chromosome 10 as significantly associated with Fon race 2 resistance. FarmCPU identified four QTN that explained 60% of Fon race 2 resistance, and the single QTN from BLINK explained 27%. Relevant candidate genes were found within the linkage disequilibrium (LD) blocks of these significant SNPs, including genes encoding aquaporins, expansins, 2S albumins, and glutathione S-transferases which have been shown to be involved in imparting resistance to Fusarium spp. Genomic predictions (GP) for Fon race 2 resistance using all 2,126,759 SNPs resulted in a mean prediction accuracy of 0.08 with five-fold cross-validation employing genomic best linear unbiased prediction (gBLUP) or ridge-regression best linear unbiased prediction (rrBLUP). Mean prediction accuracy with gBLUP leave-one-out cross-validation was 0.48. Thus, along with identifying genomic regions associated with Fon race 2 resistance among the accessions, this study observed prediction accuracies that were strongly influenced by population size.
Collapse
Affiliation(s)
| | | | - Patrick Wechter
- Coastal Research and Education Center, Clemson University, Charleston, SC
| | - Amnon Levi
- U.S. Vegetable Laboratory, USDA-ARS, Charleston, SC 29414
| | - Sandra E Branham
- Coastal Research and Education Center, Clemson University, Charleston, SC
| |
Collapse
|
3
|
Tanaka R, Wu D, Li X, Tibbs-Cortes LE, Wood JC, Magallanes-Lundback M, Bornowski N, Hamilton JP, Vaillancourt B, Li X, Deason NT, Schoenbaum GR, Buell CR, DellaPenna D, Yu J, Gore MA. Leveraging prior biological knowledge improves prediction of tocochromanols in maize grain. THE PLANT GENOME 2023; 16:e20276. [PMID: 36321716 DOI: 10.1002/tpg2.20276] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 09/21/2022] [Indexed: 06/16/2023]
Abstract
With an essential role in human health, tocochromanols are mostly obtained by consuming seed oils; however, the vitamin E content of the most abundant tocochromanols in maize (Zea mays L.) grain is low. Several large-effect genes with cis-acting variants affecting messenger RNA (mRNA) expression are mostly responsible for tocochromanol variation in maize grain, with other relevant associated quantitative trait loci (QTL) yet to be fully resolved. Leveraging existing genomic and transcriptomic information for maize inbreds could improve prediction when selecting for higher vitamin E content. Here, we first evaluated a multikernel genomic best linear unbiased prediction (MK-GBLUP) approach for modeling known QTL in the prediction of nine tocochromanol grain phenotypes (12-21 QTL per trait) within and between two panels of 1,462 and 242 maize inbred lines. On average, MK-GBLUP models improved predictive abilities by 7.0-13.6% when compared with GBLUP. In a second approach with a subset of 545 lines from the larger panel, the highest average improvement in predictive ability relative to GBLUP was achieved with a multi-trait GBLUP model (15.4%) that had a tocochromanol phenotype and transcript abundances in developing grain for a few large-effect candidate causal genes (1-3 genes per trait) as multiple response variables. Taken together, our study illustrates the enhancement of prediction models when informed by existing biological knowledge pertaining to QTL and candidate causal genes.
Collapse
Affiliation(s)
- Ryokei Tanaka
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Di Wu
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Xiaowei Li
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | | | - Joshua C Wood
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | | | - Nolan Bornowski
- Dep. of Plant Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | - John P Hamilton
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Brieanne Vaillancourt
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Xianran Li
- USDA ARS, Wheat Health, Genetics, and Quality Research Unit, Pullman, WA, 99164, USA
| | - Nicholas T Deason
- Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | | | - C Robin Buell
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Dean DellaPenna
- Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | - Jianming Yu
- Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| |
Collapse
|
4
|
Feldmann MJ, Covarrubias-Pazaran G, Piepho HP. Complex traits and candidate genes: estimation of genetic variance components across multiple genetic architectures. G3 (BETHESDA, MD.) 2023; 13:jkad148. [PMID: 37405459 PMCID: PMC10468314 DOI: 10.1093/g3journal/jkad148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/09/2023] [Accepted: 06/12/2023] [Indexed: 07/06/2023]
Abstract
Large-effect loci-those statistically significant loci discovered by genome-wide association studies or linkage mapping-associated with key traits segregate amidst a background of minor, often undetectable, genetic effects in wild and domesticated plants and animals. Accurately attributing mean differences and variance explained to the correct components in the linear mixed model analysis is vital for selecting superior progeny and parents in plant and animal breeding, gene therapy, and medical genetics in humans. Marker-assisted prediction and its successor, genomic prediction, have many advantages for selecting superior individuals and understanding disease risk. However, these two approaches are less often integrated to study complex traits with different genetic architectures. This simulation study demonstrates that the average semivariance can be applied to models incorporating Mendelian, oligogenic, and polygenic terms simultaneously and yields accurate estimates of the variance explained for all relevant variables. Our previous research focused on large-effect loci and polygenic variance separately. This work aims to synthesize and expand the average semivariance framework to various genetic architectures and the corresponding mixed models. This framework independently accounts for the effects of large-effect loci and the polygenic genetic background and is universally applicable to genetics studies in humans, plants, animals, and microbes.
Collapse
Affiliation(s)
- Mitchell J Feldmann
- Department of Plant Sciences, University of California Davis, One Shields Ave, Davis, CA 95616, USA
| | - Giovanny Covarrubias-Pazaran
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, El Batán, 56130 Texcoco, Edo. de México, México
| | - Hans-Peter Piepho
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart 70599, Germany
| |
Collapse
|
5
|
Morales L, Ametz C, Dallinger HG, Löschenberger F, Neumayer A, Zimmerl S, Buerstmayr H. Comparison of linear and semi-parametric models incorporating genomic, pedigree, and associated loci information for the prediction of resistance to stripe rust in an Austrian winter wheat breeding program. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:23. [PMID: 36692839 PMCID: PMC9873752 DOI: 10.1007/s00122-023-04249-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 11/11/2022] [Indexed: 06/17/2023]
Abstract
We used a historical dataset on stripe rust resistance across 11 years in an Austrian winter wheat breeding program to evaluate genomic and pedigree-based linear and semi-parametric prediction methods. Stripe rust (yellow rust) is an economically important foliar disease of wheat (Triticum aestivum L.) caused by the fungus Puccinia striiformis f. sp. tritici. Resistance to stripe rust is controlled by both qualitative (R-genes) and quantitative (small- to medium-effect quantitative trait loci, QTL) mechanisms. Genomic and pedigree-based prediction methods can accelerate selection for quantitative traits such as stripe rust resistance. Here we tested linear and semi-parametric models incorporating genomic, pedigree, and QTL information for cross-validated, forward, and pairwise prediction of adult plant resistance to stripe rust across 11 years (2008-2018) in an Austrian winter wheat breeding program. Semi-parametric genomic modeling had the greatest predictive ability and genetic variance overall, but differences between models were small. Including QTL as covariates improved predictive ability in some years where highly significant QTL had been detected via genome-wide association analysis. Predictive ability was moderate within years (cross-validated) but poor in cross-year frameworks.
Collapse
Affiliation(s)
- Laura Morales
- Institute of Biotechnology in Plant Production, Department of Agrobiotechnology, University of Natural Resources and Life Sciences Vienna, Tulln, Austria.
| | | | - Hermann Gregor Dallinger
- Institute of Biotechnology in Plant Production, Department of Agrobiotechnology, University of Natural Resources and Life Sciences Vienna, Tulln, Austria
- Saatzucht Donau GmbH and CoKG, Probstdorf, Austria
| | | | | | - Simone Zimmerl
- Institute of Biotechnology in Plant Production, Department of Agrobiotechnology, University of Natural Resources and Life Sciences Vienna, Tulln, Austria
| | - Hermann Buerstmayr
- Institute of Biotechnology in Plant Production, Department of Agrobiotechnology, University of Natural Resources and Life Sciences Vienna, Tulln, Austria
| |
Collapse
|
6
|
Ramirez-Diaz J, Cenadelli S, Bornaghi V, Bongioni G, Montedoro SM, Achilli A, Capelli C, Rincon JC, Milanesi M, Passamonti MM, Colli L, Barbato M, Williams JL, Marsan PA. Identification of genomic regions associated with total and progressive sperm motility in Italian Holstein bulls. J Dairy Sci 2023; 106:407-420. [PMID: 36400619 DOI: 10.3168/jds.2021-21700] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 08/10/2022] [Indexed: 11/17/2022]
Abstract
Sperm motility is directly related to the ability of sperm to move through the female reproductive tract to reach the ovum. Sperm motility is a complex trait that is influenced by environmental and genetic factors and is associated with male fertility, oocyte penetration rate, and reproductive success of cattle. In this study we carried out a GWAS in Italian Holstein bulls to identify candidate regions and genes associated with variations in progressive and total motility (PM and TM, respectively). After quality control, the final data set consisted of 5,960 records from 949 bulls having semen collected in 10 artificial insemination stations and genotyped at 412,737 SNPs (call rate >95%; minor allele frequency >5%). (Co)variance components were estimated using single trait mixed models, and associations between SNPs and phenotypes were assessed using a genomic BLUP approach. Ten windows that explained the greatest percentage of genetic variance were located on Bos taurus autosomes 1, 2, 4, 6, 7, 23, and 26 for TM and Bos taurus autosomes 1, 2, 4, 6, 8, 16, 23, and 26 for PM. A total of 150 genes for TM and 72 genes for PM were identified within these genomic regions. Gene Ontology enrichment analyses identified significant Gene Ontology terms involved with energy homeostasis, membrane functions, sperm-egg interactions, protection against oxidative stress, olfactory receptors, and immune system. There was significant enrichment of quantitative trait loci for fertility, calving ease, immune response, feed intake, and carcass weight within the candidate windows. These results contribute to understanding the architecture of the genetic control of sperm motility and may aid in the development of strategies to identify subfertile bulls and improve reproductive success.
Collapse
Affiliation(s)
- J Ramirez-Diaz
- Department of Animal Sciences, Food and Nutrition (DIANA), Università Cattolica del Sacro Cuore, Piacenza, Italy 29122; Institute of Agricultural Biology and Biotechnology (IBBA), Consiglio Nazionale di Ricerca, Milano, Italy.
| | - S Cenadelli
- Institute Lazzaro Spallanzani, Rivolta d'Adda (CR), Cremona, Italy
| | - V Bornaghi
- Institute Lazzaro Spallanzani, Rivolta d'Adda (CR), Cremona, Italy
| | - G Bongioni
- Institute Lazzaro Spallanzani, Rivolta d'Adda (CR), Cremona, Italy
| | - S M Montedoro
- Institute Lazzaro Spallanzani, Rivolta d'Adda (CR), Cremona, Italy
| | - A Achilli
- Department of Biology and Biotechnology, Università degli Studi di Pavia, Pavia, Italy
| | - C Capelli
- Department of Chemical, Life and Environmental Sustainability Sciences, Università degli Studi di Parma, Parma, Italy
| | - J C Rincon
- Department of Animal Science, Universidad Nacional de Colombia, Palmira, Valle del Cauca, Colombia
| | - M Milanesi
- Department for Innovation in Biological, Agri-food and Forestry Systems (DIBAF), Università degli Studi della Tuscia, Viterbo, Italy
| | - M M Passamonti
- Department of Animal Sciences, Food and Nutrition (DIANA), Università Cattolica del Sacro Cuore, Piacenza, Italy 29122
| | - L Colli
- Department of Animal Sciences, Food and Nutrition (DIANA), Università Cattolica del Sacro Cuore, Piacenza, Italy 29122
| | - M Barbato
- Department of Animal Sciences, Food and Nutrition (DIANA), Università Cattolica del Sacro Cuore, Piacenza, Italy 29122
| | - J L Williams
- Department of Animal Sciences, Food and Nutrition (DIANA), Università Cattolica del Sacro Cuore, Piacenza, Italy 29122
| | - P Ajmone Marsan
- Department of Animal Sciences, Food and Nutrition (DIANA), Università Cattolica del Sacro Cuore, Piacenza, Italy 29122
| |
Collapse
|
7
|
Duan J, Zhang J, Liu L, Wen Y. A guidance of model selection for genomic prediction based on linear mixed models for complex traits. Front Genet 2022; 13:1017380. [PMID: 36276959 PMCID: PMC9581223 DOI: 10.3389/fgene.2022.1017380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 09/20/2022] [Indexed: 11/27/2022] Open
Abstract
Brain imaging outcomes are important for Alzheimer's disease (AD) detection, and their prediction based on both genetic and demographic risk factors can facilitate the ongoing prevention and treatment of AD. Existing studies have identified numerous significantly AD-associated SNPs. However, how to make the best use of them for prediction analyses remains unknown. In this research, we first explored the relationship between genetic architecture and prediction accuracy of linear mixed models via visualizing the Manhattan plots generated based on the data obtained from the Wellcome Trust Case Control Consortium, and then constructed prediction models for eleven AD-related brain imaging outcomes using data from United Kingdom Biobank and Alzheimer's Disease Neuroimaging Initiative studies. We found that the simple Manhattan plots can be informative for the selection of prediction models. For traits that do not exhibit any significant signals from the Manhattan plots, the simple genomic best linear unbiased prediction (gBLUP) model is recommended due to its robust and accurate prediction performance as well as its computational efficiency. For diseases and traits that show spiked signals on the Manhattan plots, the latent Dirichlet process regression is preferred, as it can flexibly accommodate both the oligogenic and omnigenic models. For the prediction of AD-related traits, the Manhattan plots suggest their polygenic nature, and gBLUP has achieved robust performance for all these traits. We found that for these AD-related traits, genetic factors themselves only explain a very small proportion of the heritability, and the well-known AD risk factors can substantially improve the prediction model.
Collapse
Affiliation(s)
- Jiefang Duan
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Jiayu Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Long Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Yalu Wen
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China.,Department of Statistics, University of Auckland, Auckland, New Zealand
| |
Collapse
|