51
|
Estimating genetic variance contributed by a quantitative trait locus: A random model approach. PLoS Comput Biol 2022; 18:e1009923. [PMID: 35275920 PMCID: PMC8942241 DOI: 10.1371/journal.pcbi.1009923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 03/23/2022] [Accepted: 02/13/2022] [Indexed: 11/20/2022] Open
Abstract
Detecting quantitative trait loci (QTL) and estimating QTL variances (represented by the squared QTL effects) are two main goals of QTL mapping and genome-wide association studies (GWAS). However, there are issues associated with estimated QTL variances and such issues have not attracted much attention from the QTL mapping community. Estimated QTL variances are usually biased upwards due to estimation being associated with significance tests. The phenomenon is called the Beavis effect. However, estimated variances of QTL without significance tests can also be biased upwards, which cannot be explained by the Beavis effect; rather, this bias is due to the fact that QTL variances are often estimated as the squares of the estimated QTL effects. The parameters are the QTL effects and the estimated QTL variances are obtained by squaring the estimated QTL effects. This square transformation failed to incorporate the errors of estimated QTL effects into the transformation. The consequence is biases in estimated QTL variances. To correct the biases, we can either reformulate the QTL model by treating the QTL effect as random and directly estimate the QTL variance (as a variance component) or adjust the bias by taking into account the error of the estimated QTL effect. A moment method of estimation has been proposed to correct the bias. The method has been validated via Monte Carlo simulation studies. The method has been applied to QTL mapping for the 10-week-body-weight trait from an F2 mouse population. One of the goals of QTL mapping and GWAS is to quantify the size of a QTL, which is measured by the QTL variance or the proportion of trait variance explained by the QTL. The effect of a QTL appears in a linear or linear mixed model as a regression coefficient and defined as a fixed effect. The estimated QTL variance in conventional QTL mapping studies takes the square of the estimated QTL effect. This is a biased estimate of QTL variance. An unbiased estimate of the QTL variance should be obtained by (1) treating the QTL effect as random and estimating the variance of the random effect or (2) adjusting the squared estimated QTL effect by the squared estimation error. We proved that the two methods are identical. We further proved that the usual R2 (goodness of fit) in regression analysis is equivalent to the biased QTL heritability while the adjusted R2 is equivalent to the bias corrected QTL heritability.
Collapse
|
52
|
Yang L, Qu Q, Hao Z, Sha K, Li Z, Li S. Powerful Identification of Large Quantitative Trait Loci Using Genome-wide R/glmnet-Based Regression. J Hered 2022; 113:472-478. [PMID: 35134967 DOI: 10.1093/jhered/esac006] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 02/02/2022] [Indexed: 11/14/2022] Open
Abstract
R/glmnet has been successfully applied to jointly-mapped multiple quantitative trait loci for linkage analysis, along with statistical inference for quantitative trait loci candidates with non-zero genetic effects using R/lm for normally distributed traits, R/glm for discrete traits, and R/coxph for survival times. In this study, we extended R/glmnet to a genome-wide association study by means of parallel computation. A multi-locus genome-wide association study for high-throughput single nucleotide polymorphisms was implemented in the "Multi-Runking" software written within the R workspace. This software can better detect common and large quantitative trait nucleotides and more accurately estimate than genome-wide mixed model analysis for one single nucleotide polymorphism at a time and linear mixed models-least absolute shrinkage and selection operator. Its applicability and utility were demonstrated by multi-locus genome-wide association studies for the simulated and real traits distributed normally, binary traits, and survival times.
Collapse
Affiliation(s)
- Li'ang Yang
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Qiannan Qu
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Zhiyu Hao
- College of Animal Science and Technology, Northeast Agricultural University, Harbin 150030, China
| | - Ke Sha
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Ziyu Li
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Shuling Li
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| |
Collapse
|
53
|
Bonnett D, Li Y, Crossa J, Dreisigacker S, Basnet B, Pérez-Rodríguez P, Alvarado G, Jannink JL, Poland J, Sorrells M. Response to Early Generation Genomic Selection for Yield in Wheat. FRONTIERS IN PLANT SCIENCE 2022; 12:718611. [PMID: 35087542 PMCID: PMC8787636 DOI: 10.3389/fpls.2021.718611] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 10/22/2021] [Indexed: 06/14/2023]
Abstract
We investigated increasing genetic gain for grain yield using early generation genomic selection (GS). A training set of 1,334 elite wheat breeding lines tested over three field seasons was used to generate Genomic Estimated Breeding Values (GEBVs) for grain yield under irrigated conditions applying markers and three different prediction methods: (1) Genomic Best Linear Unbiased Predictor (GBLUP), (2) GBLUP with the imputation of missing genotypic data by Ridge Regression BLUP (rrGBLUP_imp), and (3) Reproducing Kernel Hilbert Space (RKHS) a.k.a. Gaussian Kernel (GK). F2 GEBVs were generated for 1,924 individuals from 38 biparental cross populations between 21 parents selected from the training set. Results showed that F2 GEBVs from the different methods were not correlated. Experiment 1 consisted of selecting F2s with the highest average GEBVs and advancing them to form genomically selected bulks and make intercross populations aiming to combine favorable alleles for yield. F4:6 lines were derived from genomically selected bulks, intercrosses, and conventional breeding methods with similar numbers from each. Results of field-testing for Experiment 1 did not find any difference in yield with genomic compared to conventional selection. Experiment 2 compared the predictive ability of the different GEBV calculation methods in F2 using a set of single plant-derived F2:4 lines from randomly selected F2 plants. Grain yield results from Experiment 2 showed a significant positive correlation between observed yields of F2:4 lines and predicted yield GEBVs of F2 single plants from GK (the predictive ability of 0.248, P < 0.001) and GBLUP (0.195, P < 0.01) but no correlation with rrGBLUP_imp. Results demonstrate the potential for the application of GS in early generations of wheat breeding and the importance of using the appropriate statistical model for GEBV calculation, which may not be the same as the best model for inbreds.
Collapse
Affiliation(s)
- David Bonnett
- International Maize and Wheat Improvement Center, Texcoco, Mexico
- BASF Wheat Breeding, Sabin, MN, United States
| | - Yongle Li
- School of Agriculture, Food and Wine, Faculty of Sciences, The University of Adelaide, Adelaide, SA, Australia
| | - Jose Crossa
- International Maize and Wheat Improvement Center, Texcoco, Mexico
- Colegio de Postgraduados, Texcoco, Mexico
| | | | - Bhoja Basnet
- International Maize and Wheat Improvement Center, Texcoco, Mexico
| | | | - G. Alvarado
- International Maize and Wheat Improvement Center, Texcoco, Mexico
| | - J. L. Jannink
- USDA-ARS, Robert W. Holley Center for Agriculture and Health, Ithaca, NY, United States
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Jesse Poland
- Department of Plant Pathology, Kansas State University, Manhattan, KS, United States
| | - Mark Sorrells
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| |
Collapse
|
54
|
Selga C, Reslow F, Pérez-Rodríguez P, Ortiz R. The power of genomic estimated breeding values for selection when using a finite population size in genetic improvement of tetraploid potato. G3 (BETHESDA, MD.) 2022; 12:6407142. [PMID: 34849763 PMCID: PMC8728039 DOI: 10.1093/g3journal/jkab362] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 10/08/2021] [Indexed: 12/02/2022]
Abstract
Potato breeding relies heavily on visual phenotypic scoring for clonal selection. Obtaining robust phenotypic data can be labor intensive and expensive, especially in the early cycles of a potato breeding program where the number of genotypes is very large. We have investigated the power of genomic estimated breeding values (GEBVs) for selection from a limited population size in potato breeding. We collected genotypic data from 669 tetraploid potato clones from all cycles of a potato breeding program, as well as phenotypic data for eight important breeding traits. The genotypes were partitioned into a training and a test population distinguished by cycle of selection in the breeding program. GEBVs for seven traits were predicted for individuals from the first stage of the breeding program (T1) which had not undergone any selection, or individuals selected at least once in the field (T2). An additional approach in which GEBVs were predicted within and across full-sib families from unselected material (T1) was tested for four breeding traits. GEBVs were obtained by using a Bayesian Ridge Regression model estimating single marker effects and phenotypic data from individuals at later stages of selection of the breeding program. Our results suggest that, for most traits included in this study, information from individuals from later stages of selection cannot be utilized to make selections based on GEBVs in earlier clonal generations. Predictions of GEBVs across full-sib families yielded similarly low prediction accuracies as across generations. The most promising approach for selection using GEBVs was found to be making predictions within full-sib families.
Collapse
Affiliation(s)
- Catja Selga
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), Lomma SE-23422, Sweden
- Corresponding author:
| | - Fredrik Reslow
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), Lomma SE-23422, Sweden
| | | | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), Lomma SE-23422, Sweden
| |
Collapse
|
55
|
Varona L, Legarra A, Toro MA, Vitezica ZG. Genomic Prediction Methods Accounting for Nonadditive Genetic Effects. Methods Mol Biol 2022; 2467:219-243. [PMID: 35451778 DOI: 10.1007/978-1-0716-2205-6_8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The use of genomic information for prediction of future phenotypes or breeding values for the candidates to selection has become a standard over the last decade. However, most procedures for genomic prediction only consider the additive (or substitution) effects associated with polymorphic markers. Nevertheless, the implementation of models that consider nonadditive genetic variation may be interesting because they (1) may increase the ability of prediction, (2) can be used to define mate allocation procedures in plant and animal breeding schemes, and (3) can be used to benefit from nonadditive genetic variation in crossbreeding or purebred breeding schemes. This study reviews the available methods for incorporating nonadditive effects into genomic prediction procedures and their potential applications in predicting future phenotypic performance, mate allocation, and crossbred and purebred selection. Finally, a brief outline of some future research lines is also proposed.
Collapse
Affiliation(s)
- Luis Varona
- Departamento de Anatomía, Embriología y Genética Animal, Universidad de Zaragoza, Zaragoza, Spain.
- Instituto Agroalimentario de Aragón (IA2), Zaragoza, Spain.
| | | | - Miguel A Toro
- Dpto. Producción Agraria, ETS Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, Madrid, Spain
| | | |
Collapse
|
56
|
Martini JWR, Gao N, Crossa J. Incorporating Omics Data in Genomic Prediction. Methods Mol Biol 2022; 2467:341-357. [PMID: 35451782 DOI: 10.1007/978-1-0716-2205-6_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this chapter, we discuss the motivation for integrating other types of omics data into genomic prediction methods. We give an overview of literature investigating the performance of omics-enhanced predictions, and highlight potential pitfalls when applying these methods in breeding. We emphasize that the statistical methods available for genomic data can be transferred to the general omics case. However, when using a framework of omic relationship matrices, the standardization of the variables may be more relevant than it is for a genomic relationship matrix based on single-nucleotide polymorphisms.
Collapse
Affiliation(s)
- Johannes W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Veracruz, CP, Mexico.
| | - Ning Gao
- School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Veracruz, CP, Mexico
| |
Collapse
|
57
|
Covarrubias-Pazaran G. Overview of Major Computer Packages for Genomic Prediction of Complex Traits. Methods Mol Biol 2022; 2467:157-187. [PMID: 35451776 DOI: 10.1007/978-1-0716-2205-6_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genomic prediction models are showing their power to increase the rate of genetic gain by boosting all the elements of the breeder's equation. Insight into the factors associated with the successful implementation of this prediction model is increasing with time but the technology has reached a stage of acceptance. Most genomic prediction models require specialized computer packages based mainly on linear models and related methods. The number of computer packages has exploded in recent years given the interest in this technology. In this chapter, we explore the main computer packages available to fit these models; we also review the special features, strengths, and weaknesses of the methods behind the most popular computer packages.
Collapse
Affiliation(s)
- Giovanny Covarrubias-Pazaran
- Centro Internacional de Mejoramiento de Maiz y Trigo (CIMMYT), Texcoco, Mexico.
- Excellence in Breeding Platform (EiB), Texcoco, Mexico.
| |
Collapse
|
58
|
Montesinos-López OA, Montesinos-López A, Hernandez-Suarez CM, Barrón-López JA, Crossa J. Deep-learning power and perspectives for genomic selection. THE PLANT GENOME 2021; 14:e20122. [PMID: 34309215 DOI: 10.1002/tpg2.20122] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Accepted: 05/24/2021] [Indexed: 06/13/2023]
Abstract
Deep learning (DL) is revolutionizing the development of artificial intelligence systems. For example, before 2015, humans were better than artificial machines at classifying images and solving many problems of computer vision (related to object localization and detection using images), but nowadays, artificial machines have surpassed the ability of humans in this specific task. This is just one example of how the application of these models has surpassed human abilities and the performance of other machine-learning algorithms. For this reason, DL models have been adopted for genomic selection (GS). In this article we provide insight about the power of DL in solving complex prediction tasks and how combining GS and DL models can accelerate the revolution provoked by GS methodology in plant breeding. Furthermore, we will mention some trends of DL methods, emphasizing some areas of opportunity to really exploit the DL methodology in GS; however, we are aware that considerable research is required to be able not only to use the existing DL in conjunction with GS, but to adapt and develop DL methods that take the peculiarities of breeding inputs and GS into consideration.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, 44430, México
| | | | - José Alberto Barrón-López
- Department of Animal Production (DPA), Universidad Nacional Agraria La Molina, Av. La Molina s/n La Molina, Lima, 15024, Perú
| | - José Crossa
- Colegio de Postgraduados, Montecillos, Edo, de México, 56230, México
- Biometrics and Statistics Unit, Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, Edo. De, Mexico DF, 52640, Mexico
| |
Collapse
|
59
|
Lell M, Reif J, Zhao Y. Optimizing the setup of multienvironmental hybrid wheat yield trials for boosting the selection capability. THE PLANT GENOME 2021; 14:e20150. [PMID: 34541826 DOI: 10.1002/tpg2.20150] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 07/22/2021] [Indexed: 06/13/2023]
Abstract
The accuracy of genomic prediction increases with increasing heritability, and thus the challenge of optimizing the design of multienvironment yield trials under a limited budget arises. With this in mind, we aimed to find the best of several options to sparsely distribute a fixed number of plots across different environments to increase the accuracy of hybrid performance prediction. We used a comprehensive published genomic and phenotypic data set of 1,604 winter wheat (Triticum aestivum L.) hybrids and compared several commonly used biometric models for phenotypic data analysis in a resampling study to identify the one that most accurately estimated the hybrid performance in different imbalanced trials. Our results showed that when using information about genotypic relationships, genotypic values were more strongly associated with the reference values than when this information was ignored. In addition, a balanced environmental sampling resulted in an adequate characterization of each environment and increased the accuracy for estimating the hybrid performance. One promising design involved dividing the genotypes into equally sized subgroups that were tested in a subset of environments, with the constraint that the subgroups overlapped with respect to the environments. This scenario appears to be particularly appropriate, as it provided both high accuracies in the estimates of genotypic values and had low variability resulting from the data sample used. Thus, we were able to clearly demonstrate the utility for optimizing the design of multienvironment hybrid wheat yield trials in times of genomic selection.
Collapse
Affiliation(s)
- Moritz Lell
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, D-06466, Germany
| | - Jochen Reif
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, D-06466, Germany
| | - Yusheng Zhao
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, D-06466, Germany
| |
Collapse
|
60
|
Wang Z, Cheng H. Single-Trait and Multiple-Trait Genomic Prediction From Multi-Class Bayesian Alphabet Models Using Biological Information. Front Genet 2021; 12:717457. [PMID: 34707638 PMCID: PMC8542848 DOI: 10.3389/fgene.2021.717457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 08/23/2021] [Indexed: 11/13/2022] Open
Abstract
Genomic prediction has been widely used in multiple areas and various genomic prediction methods have been developed. The majority of these methods, however, focus on statistical properties and ignore the abundant useful biological information like genome annotation or previously discovered causal variants. Therefore, to improve prediction performance, several methods have been developed to incorporate biological information into genomic prediction, mostly in single-trait analysis. A commonly used method to incorporate biological information is allocating molecular markers into different classes based on the biological information and assigning separate priors to molecular markers in different classes. It has been shown that such methods can achieve higher prediction accuracy than conventional methods in some circumstances. However, these methods mainly focus on single-trait analysis, and available priors of these methods are limited. Thus, in both single-trait and multiple-trait analysis, we propose the multi-class Bayesian Alphabet methods, in which multiple Bayesian Alphabet priors, including RR-BLUP, BayesA, BayesB, BayesCΠ, and Bayesian LASSO, can be used for markers allocated to different classes. The superior performance of the multi-class Bayesian Alphabet in genomic prediction is demonstrated using both real and simulated data. The software tool JWAS offers open-source routines to perform these analyses.
Collapse
Affiliation(s)
- Zigui Wang
- Department of Animal Science, University of California, Davis, Davis, CA, United States
| | - Hao Cheng
- Department of Animal Science, University of California, Davis, Davis, CA, United States
| |
Collapse
|
61
|
Zhu S, Guo T, Yuan C, Liu J, Li J, Han M, Zhao H, Wu Y, Sun W, Wang X, Wang T, Liu J, Tiambo CK, Yue Y, Yang B. Evaluation of Bayesian alphabet and GBLUP based on different marker density for genomic prediction in Alpine Merino sheep. G3 (BETHESDA, MD.) 2021; 11:6310012. [PMID: 34849779 PMCID: PMC8527494 DOI: 10.1093/g3journal/jkab206] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 06/01/2021] [Indexed: 01/20/2023]
Abstract
The marker density, the heritability level of trait and the statistical models adopted are critical to the accuracy of genomic prediction (GP) or selection (GS). If the potential of GP is to be fully utilized to optimize the effect of breeding and selection, in addition to incorporating the above factors into simulated data for analysis, it is essential to incorporate these factors into real data for understanding their impact on GP accuracy, more clearly and intuitively. Herein, we studied the GP of six wool traits of sheep by two different models, including Bayesian Alphabet (BayesA, BayesB, BayesCπ, and Bayesian LASSO) and genomic best linear unbiased prediction (GBLUP). We adopted fivefold cross-validation to perform the accuracy evaluation based on the genotyping data of Alpine Merino sheep (n = 821). The main aim was to study the influence and interaction of different models and marker densities on GP accuracy. The GP accuracy of the six traits was found to be between 0.28 and 0.60, as demonstrated by the cross-validation results. We showed that the accuracy of GP could be improved by increasing the marker density, which is closely related to the model adopted and the heritability level of the trait. Moreover, based on two different marker densities, it was derived that the prediction effect of GBLUP model for traits with low heritability was better; while with the increase of heritability level, the advantage of Bayesian Alphabet would be more obvious, therefore, different models of GP are appropriate in different traits. These findings indicated the significance of applying appropriate models for GP which would assist in further exploring the optimization of GP.
Collapse
Affiliation(s)
- Shaohua Zhu
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Tingting Guo
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Chao Yuan
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Jianbin Liu
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Jianye Li
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Mei Han
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Hongchang Zhao
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Yi Wu
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Weibo Sun
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Xijun Wang
- Gansu Provincial Sheep Breeding Technology Extension Station, Sunan 734400, China
| | - Tianxiang Wang
- Gansu Provincial Sheep Breeding Technology Extension Station, Sunan 734400, China
| | - Jigang Liu
- Gansu Provincial Sheep Breeding Technology Extension Station, Sunan 734400, China
| | - Christian Keambou Tiambo
- Centre for Tropical Livestock Genetics and Health (CTLGH), International Livestock Research Institute, Nairobi 00100, Kenya
| | - Yaojing Yue
- Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Bohui Yang
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| |
Collapse
|
62
|
Ahmar S, Ballesta P, Ali M, Mora-Poblete F. Achievements and Challenges of Genomics-Assisted Breeding in Forest Trees: From Marker-Assisted Selection to Genome Editing. Int J Mol Sci 2021; 22:10583. [PMID: 34638922 PMCID: PMC8508745 DOI: 10.3390/ijms221910583] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 09/26/2021] [Accepted: 09/27/2021] [Indexed: 12/23/2022] Open
Abstract
Forest tree breeding efforts have focused mainly on improving traits of economic importance, selecting trees suited to new environments or generating trees that are more resilient to biotic and abiotic stressors. This review describes various methods of forest tree selection assisted by genomics and the main technological challenges and achievements in research at the genomic level. Due to the long rotation time of a forest plantation and the resulting long generation times necessary to complete a breeding cycle, the use of advanced techniques with traditional breeding have been necessary, allowing the use of more precise methods for determining the genetic architecture of traits of interest, such as genome-wide association studies (GWASs) and genomic selection (GS). In this sense, main factors that determine the accuracy of genomic prediction models are also addressed. In turn, the introduction of genome editing opens the door to new possibilities in forest trees and especially clustered regularly interspaced short palindromic repeats and CRISPR-associated protein 9 (CRISPR/Cas9). It is a highly efficient and effective genome editing technique that has been used to effectively implement targetable changes at specific places in the genome of a forest tree. In this sense, forest trees still lack a transformation method and an inefficient number of genotypes for CRISPR/Cas9. This challenge could be addressed with the use of the newly developing technique GRF-GIF with speed breeding.
Collapse
Affiliation(s)
- Sunny Ahmar
- Institute of Biological Sciences, University of Talca, 1 Poniente 1141, Talca 3460000, Chile;
| | - Paulina Ballesta
- The National Fund for Scientific and Technological Development, Av. del Agua 3895, Talca 3460000, Chile
| | - Mohsin Ali
- Department of Forestry and Range Management, University of Agriculture Faisalabad, Faisalabad 38000, Pakistan;
| | - Freddy Mora-Poblete
- Institute of Biological Sciences, University of Talca, 1 Poniente 1141, Talca 3460000, Chile;
| |
Collapse
|
63
|
Shalizi MN, Cumbie WP, Isik F. Genomic prediction for fusiform rust disease incidence in a large cloned population of Pinus taeda. G3 (BETHESDA, MD.) 2021; 11:6325506. [PMID: 34544145 PMCID: PMC8496308 DOI: 10.1093/g3journal/jkab235] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 06/30/2021] [Indexed: 04/12/2023]
Abstract
In this study, 723 Pinus taeda L. (loblolly pine) clonal varieties genotyped with 16920 SNP markers were used to evaluate genomic selection for fusiform rust disease caused by the fungus Cronartium quercuum f. sp. fusiforme. The 723 clonal varieties were from five full-sib families. They were a subset of a larger population (1831 clonal varieties), field-tested across 26 locations in the southeast US. Ridge regression, Bayes B, and Bayes Cπ models were implemented to study marker-trait associations and estimate predictive ability for selection. A cross-validation scenario based on a random sampling of 80% of the clonal varieties for the model building had higher (0.71-0.76) prediction accuracies of genomic estimated breeding values compared with family and within-family cross-validation scenarios. Random sampling within families for model training to predict genomic estimated breeding values of the remaining progenies within each family produced accuracies between 0.38 and 0.66. Using four families out of five for model training was not successful. The results showed the importance of genetic relatedness between the training and validation sets. Bayesian whole-genome regression models detected three QTL with large effects on the disease outcome, explaining 54% of the genetic variation in the trait. The significance of QTL was validated with GWAS while accounting for the population structure and polygenic effect. The odds of disease incidence for heterozygous AB genotypes were 10.7 and 12.1 times greater than the homozygous AA genotypes for SNP11965 and SNP6347 loci, respectively. Genomic selection for fusiform rust disease incidence could be effective in P. taeda breeding. Markers with large effects could be fit as fixed covariates to increase the prediction accuracies, provided that their effects are validated further.
Collapse
Affiliation(s)
- Mohammad Nasir Shalizi
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC 27695-8002, USA
| | | | - Fikret Isik
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC 27695-8002, USA
- Corresponding author:
| |
Collapse
|
64
|
Rios EF, Andrade MHML, Resende MFR, Kirst M, de Resende MDV, de Almeida Filho JE, Gezan SA, Munoz P. Genomic prediction in family bulks using different traits and cross-validations in pine. G3-GENES GENOMES GENETICS 2021; 11:6321952. [PMID: 34544139 PMCID: PMC8496210 DOI: 10.1093/g3journal/jkab249] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 07/02/2021] [Indexed: 11/13/2022]
Abstract
Genomic prediction integrates statistical, genomic, and computational tools to improve the estimation of breeding values and increase genetic gain. Due to the broad diversity in mating systems, breeding schemes, propagation methods, and unit of selection, no universal genomic prediction approach can be applied in all crops. In a genome-wide family prediction (GWFP) approach, the family is the basic unit of selection. We tested GWFP in two loblolly pine (Pinus taeda L.) datasets: a breeding population composed of 63 full-sib families (5–20 individuals per family), and a simulated population with the same pedigree structure. In both populations, phenotypic and genomic data was pooled at the family level in silico. Marker effects were estimated to compute genomic estimated breeding values (GEBV) at the individual and family (GWFP) levels. Less than six individuals per family produced inaccurate estimates of family phenotypic performance and allele frequency. Tested across different scenarios, GWFP predictive ability was higher than those for GEBV in both populations. Validation sets composed of families with similar phenotypic mean and variance as the training population yielded predictions consistently higher and more accurate than other validation sets. Results revealed potential for applying GWFP in breeding programs whose selection unit are family, and for systems where family can serve as training sets. The GWFP approach is well suited for crops that are routinely genotyped and phenotyped at the plot-level, but it can be extended to other breeding programs. Higher predictive ability obtained with GWFP would motivate the application of genomic prediction in these situations.
Collapse
Affiliation(s)
- Esteban F Rios
- Agronomy Department, University of Florida, Gainesville, FL 32611, USA
| | | | - Marcio F R Resende
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Matias Kirst
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611, USA
| | - Marcos D V de Resende
- EMBRAPA Café/Department of Statistics, Federal University of Viçosa, Avenida PH Rolfs S/N, Viçosa 36570-000, Brazil
| | | | | | - Patricio Munoz
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
65
|
Feldmann MJ, Piepho HP, Bridges WC, Knapp SJ. Average semivariance yields accurate estimates of the fraction of marker-associated genetic variance and heritability in complex trait analyses. PLoS Genet 2021; 17:e1009762. [PMID: 34437540 PMCID: PMC8425577 DOI: 10.1371/journal.pgen.1009762] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 09/08/2021] [Accepted: 08/09/2021] [Indexed: 12/15/2022] Open
Abstract
The development of genome-informed methods for identifying quantitative trait loci (QTL) and studying the genetic basis of quantitative variation in natural and experimental populations has been driven by advances in high-throughput genotyping. For many complex traits, the underlying genetic variation is caused by the segregation of one or more ‘large-effect’ loci, in addition to an unknown number of loci with effects below the threshold of statistical detection. The large-effect loci segregating in populations are often necessary but not sufficient for predicting quantitative phenotypes. They are, nevertheless, important enough to warrant deeper study and direct modelling in genomic prediction problems. We explored the accuracy of statistical methods for estimating the fraction of marker-associated genetic variance (p) and heritability ( HM2) for large-effect loci underlying complex phenotypes. We found that commonly used statistical methods overestimate p and HM2. The source of the upward bias was traced to inequalities between the expected values of variance components in the numerators and denominators of these parameters. Algebraic solutions for bias-correcting estimates of p and HM2 were found that only depend on the degrees of freedom and are constant for a given study design. We discovered that average semivariance methods, which have heretofore not been used in complex trait analyses, yielded unbiased estimates of p and HM2, in addition to best linear unbiased predictors of the additive and dominance effects of the underlying loci. The cryptic bias problem described here is unrelated to selection bias, although both cause the overestimation of p and HM2. The solutions we described are predicted to more accurately describe the contributions of large-effect loci to the genetic variation underlying complex traits of medical, biological, and agricultural importance. The contributions of individual genes to the phenotypic variation observed for genetically complex traits has been an ongoing and important challenge in biology, medicine, and agriculture. While many genes have statistically undetectable effects, those with large effects often warrant in-depth study and can be important predictors of complex phenotypes such as disease risk in humans or disease resistance in domesticated plants and animals. The genes identified through associations with genetic markers in complex trait analyses typically account for a fraction of the heritable variation, a genetic parameter we called ‘marker heritability’. We discovered that textbook statistical methods systematically overestimate marker heritability and thus overestimate the contributions of specific genes to the phenotypic variation observed for complex traits in natural and experimental populations. We describe the source of the upward bias, validate our findings through computer simulation, describe methods for bias-correcting estimates of marker heritability, and illustrate their application through empirical examples. The statistical methods we describe supply investigators with more accurate estimates of the contributions of specific genes or networks of interacting genes to the heritable variation observed in complex trait studies.
Collapse
Affiliation(s)
- Mitchell J. Feldmann
- Department of Plant Sciences, University of California, Davis, California, United States of America
| | - Hans-Peter Piepho
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, Germany
| | - William C. Bridges
- Department of Mathematical Sciences, Clemson University, Clemson, South Carolina, United States of America
| | - Steven J. Knapp
- Department of Plant Sciences, University of California, Davis, California, United States of America
- * E-mail:
| |
Collapse
|
66
|
McGaugh SE, Lorenz AJ, Flagel LE. The utility of genomic prediction models in evolutionary genetics. Proc Biol Sci 2021; 288:20210693. [PMID: 34344180 PMCID: PMC8334854 DOI: 10.1098/rspb.2021.0693] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 07/15/2021] [Indexed: 12/25/2022] Open
Abstract
Variation in complex traits is the result of contributions from many loci of small effect. Based on this principle, genomic prediction methods are used to make predictions of breeding value for an individual using genome-wide molecular markers. In breeding, genomic prediction models have been used in plant and animal breeding for almost two decades to increase rates of genetic improvement and reduce the length of artificial selection experiments. However, evolutionary genomics studies have been slow to incorporate this technique to select individuals for breeding in a conservation context or to learn more about the genetic architecture of traits, the genetic value of missing individuals or microevolution of breeding values. Here, we outline the utility of genomic prediction and provide an overview of the methodology. We highlight opportunities to apply genomic prediction in evolutionary genetics of wild populations and the best practices when using these methods on field-collected phenotypes.
Collapse
Affiliation(s)
- Suzanne E. McGaugh
- Ecology, Evolution, and Behavior, University of Minnesota, 140 Gortner Lab, 1479 Gortner Avenue, Saint Paul, MN 55108, USA
| | - Aaron J. Lorenz
- Agronomy and Plant Genetics, University of Minnesota, 411 Borlaug Hall, 1991 Upper Buford Circle, Saint Paul, MN 55108, USA
| | - Lex E. Flagel
- Plant and Microbial Biology, University of Minnesota, 140 Gortner Lab, 1479 Gortner Avenue, Saint Paul, MN 55108, USA
- Bayer Crop Science, 700 W Chesterfield Parkway, Chesterfield, MO 63017, USA
| |
Collapse
|
67
|
Pérez-Enciso M, Zingaretti LM, Ramayo-Caldas Y, de Los Campos G. Opportunities and limits of combining microbiome and genome data for complex trait prediction. Genet Sel Evol 2021; 53:65. [PMID: 34362312 PMCID: PMC8344190 DOI: 10.1186/s12711-021-00658-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 07/20/2021] [Indexed: 12/12/2022] Open
Abstract
Background Analysis and prediction of complex traits using microbiome data combined with host genomic information is a topic of utmost interest. However, numerous questions remain to be answered: how useful can the microbiome be for complex trait prediction? Are estimates of microbiability reliable? Can the underlying biological links between the host’s genome, microbiome, and phenome be recovered? Methods Here, we address these issues by (i) developing a novel simulation strategy that uses real microbiome and genotype data as inputs, and (ii) using variance-component approaches (Bayesian Reproducing Kernel Hilbert Space (RKHS) and Bayesian variable selection methods (Bayes C)) to quantify the proportion of phenotypic variance explained by the genome and the microbiome. The proposed simulation approach can mimic genetic links between the microbiome and genotype data by a permutation procedure that retains the distributional properties of the data. Results Using real genotype and rumen microbiota abundances from dairy cattle, simulation results suggest that microbiome data can significantly improve the accuracy of phenotype predictions, regardless of whether some microbiota abundances are under direct genetic control by the host or not. This improvement depends logically on the microbiome being stable over time. Overall, random-effects linear methods appear robust for variance components estimation, in spite of the typically highly leptokurtic distribution of microbiota abundances. The predictive performance of Bayes C was higher but more sensitive to the number of causative effects than RKHS. Accuracy with Bayes C depended, in part, on the number of microorganisms’ taxa that influence the phenotype. Conclusions While we conclude that, overall, genome-microbiome-links can be characterized using variance component estimates, we are less optimistic about the possibility of identifying the causative host genetic effects that affect microbiota abundances, which would require much larger sample sizes than are typically available for genome-microbiome-phenome studies. The R code to replicate the analyses is in https://github.com/miguelperezenciso/simubiome. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-021-00658-7.
Collapse
Affiliation(s)
- Miguel Pérez-Enciso
- ICREA, Passeig de Lluís Companys 23, 08010, Barcelona, Spain. .,Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, 08193, Bellaterra, Barcelona, Spain. .,Dept. of Epidemiology & Biostatistics, and Dept. of Statistics & Probability, Michigan State University, East Lansing, MI, 48824, USA.
| | - Laura M Zingaretti
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, 08193, Bellaterra, Barcelona, Spain.,Dept. of Epidemiology & Biostatistics, and Dept. of Statistics & Probability, Michigan State University, East Lansing, MI, 48824, USA
| | - Yuliaxis Ramayo-Caldas
- Animal Breeding and Genetics Program, Institute for Research and Technology in Food and Agriculture (IRTA), Torre Marimon, 08140, Caldes de Montbui, Barcelona, Spain
| | - Gustavo de Los Campos
- Dept. of Epidemiology & Biostatistics, and Dept. of Statistics & Probability, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
68
|
Hao Z, Gao J, Song Y, Yang R, Liu D. Genome-wide hierarchical mixed model association analysis. Brief Bioinform 2021; 22:6342938. [PMID: 34368830 PMCID: PMC8575042 DOI: 10.1093/bib/bbab306] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 07/05/2021] [Accepted: 07/17/2021] [Indexed: 11/14/2022] Open
Abstract
In genome-wide mixed model association analysis, we stratified the genomic mixed model into two hierarchies to estimate genomic breeding values (GBVs) using the genomic best linear unbiased prediction and statistically infer the association of GBVs with each SNP using the generalized least square. The hierarchical mixed model (Hi-LMM) can correct confounders effectively with polygenic effects as residuals for association tests, preventing potential false-negative errors produced with genome-wide rapid association using mixed model and regression or an efficient mixed-model association expedited (EMMAX). Meanwhile, the Hi-LMM performs the same statistical power as the exact mixed model association and the same computing efficiency as EMMAX. When the GBVs have been estimated precisely, the Hi-LMM can detect more quantitative trait nucleotides (QTNs) than existing methods. Especially under the Hi-LMM framework, joint association analysis can be made straightforward to improve the statistical power of detecting QTNs.
Collapse
Affiliation(s)
- Zhiyu Hao
- Institute of Animal Husbandry, Heilongjiang Academy of Agricultural Sciences
| | - Jin Gao
- Wuxi Fisheries College, Nanjing Agricultural University
| | - Yuxin Song
- Wuxi Fisheries College, Nanjing Agricultural University
| | - Runqing Yang
- Corresponding authors: Runqing Yang, Research Center for Aquatic Biotechnology, Chinese Academy of Fishery Sciences, Beijing 100141, People's Republic of China. E-mail: ; Di Liu, Institute of Animal Husbandry, Heilongjiang Academy of Agricultural Sciences, Harbin 150086, People's Republic of China. E-mail:
| | - Di Liu
- Corresponding authors: Runqing Yang, Research Center for Aquatic Biotechnology, Chinese Academy of Fishery Sciences, Beijing 100141, People's Republic of China. E-mail: ; Di Liu, Institute of Animal Husbandry, Heilongjiang Academy of Agricultural Sciences, Harbin 150086, People's Republic of China. E-mail:
| |
Collapse
|
69
|
Li H, Zhu B, Xu L, Wang Z, Xu L, Zhou P, Gao H, Guo P, Chen Y, Gao X, Zhang L, Gao H, Cai W, Xu L, Li J. Genomic Prediction Using LD-Based Haplotypes Inferred From High-Density Chip and Imputed Sequence Variants in Chinese Simmental Beef Cattle. Front Genet 2021; 12:665382. [PMID: 34394182 PMCID: PMC8358323 DOI: 10.3389/fgene.2021.665382] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 06/30/2021] [Indexed: 01/05/2023] Open
Abstract
A haplotype is defined as a combination of alleles at adjacent loci belonging to the same chromosome that can be transmitted as a unit. In this study, we used both the Illumina BovineHD chip (HD chip) and imputed whole-genome sequence (WGS) data to explore haploblocks and assess haplotype effects, and the haploblocks were defined based on the different LD thresholds. The accuracies of genomic prediction (GP) for dressing percentage (DP), meat percentage (MP), and rib eye roll weight (RERW) based on haplotype were investigated and compared for both data sets in Chinese Simmental beef cattle. The accuracies of GP using the entire imputed WGS data were lower than those using the HD chip data in all cases. For DP and MP, the accuracy of GP using haploblock approaches outperformed the individual single nucleotide polymorphism (SNP) approach (GBLUP_In_Block) at specific LD levels. Hotelling’s test confirmed that GP using LD-based haplotypes from WGS data can significantly increase the accuracies of GP for RERW, compared with the individual SNP approach (∼1.4 and 1.9% for GHBLUP and GHBLUP+GBLUP, respectively). We found that the accuracies using haploblock approach varied with different LD thresholds. The LD thresholds (r2 ≥ 0.5) were optimal for most scenarios. Our results suggested that LD-based haploblock approach can improve accuracy of genomic prediction for carcass traits using both HD chip and imputed WGS data under the optimal LD thresholds in Chinese Simmental beef cattle.
Collapse
Affiliation(s)
- Hongwei Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Bo Zhu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.,National Centre of Beef Cattle Genetic Evaluation, Beijing, China
| | - Ling Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zezhao Wang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lei Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Peinuo Zhou
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Han Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Peng Guo
- College of Computer and Information Engineering, Tianjin Agricultural University, Tianjin, China
| | - Yan Chen
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xue Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lupei Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Huijiang Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.,National Centre of Beef Cattle Genetic Evaluation, Beijing, China
| | - Wentao Cai
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lingyang Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Junya Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.,National Centre of Beef Cattle Genetic Evaluation, Beijing, China
| |
Collapse
|
70
|
Zhou G, Zhu Q, Mao Y, Chen G, Xue L, Lu H, Shi M, Zhang Z, Song X, Zhang H, Hao D. Multi-Locus Genome-Wide Association Study and Genomic Selection of Kernel Moisture Content at the Harvest Stage in Maize. FRONTIERS IN PLANT SCIENCE 2021; 12:697688. [PMID: 34305987 PMCID: PMC8299107 DOI: 10.3389/fpls.2021.697688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 06/16/2021] [Indexed: 05/26/2023]
Abstract
Kernel moisture content at the harvest stage (KMC) is an important trait that affects the mechanical harvesting of maize grain, and the identification of genetic loci for KMC is beneficial for maize molecular breeding. In this study, we performed a multi-locus genome-wide association study (ML-GWAS) to identify quantitative trait nucleotides (QTNs) for KMC using an association mapping panel of 251 maize inbred lines that were genotyped with an Affymetrix CGMB56K SNP Array and phenotypically evaluated in three environments. Ninety-eight QTNs for KMC were detected using six ML-GWAS models (mrMLM, FASTmrMLM, FASTmrEMMA, PLARmEB, PKWmEB, and ISIS EM-BLASSO). Eleven of these QTNs were considered to be stable, as they were detected by at least four ML-GWAS models under a uniformed environment or in at least two environments and BLUP using the same ML-GWAS model. With qKMC5.6 removed, the remaining 10 stable QTNs explained <10% of the phenotypic variation, suggesting that KMC is mainly controlled by multiple minor-effect genetic loci. A total of 63 candidate genes were predicted from the 11 stable QTNs, and 10 candidate genes were highly expressed in the kernel at different time points after pollination. High prediction accuracy was achieved when the KMC-associated QTNs were included as fixed effects in genomic selection, and the best strategy was to integrate all KMC QTNs identified by all six ML-GWAS models. These results further our understanding of the genetic architecture of KMC and highlight the potential of genomic selection for KMC in maize breeding.
Collapse
Affiliation(s)
- Guangfei Zhou
- Department of Food Crops, Jiangsu Yanjiang Institute of Agricultural Science, Nantong, China
- Jiangsu Collaborative Innovation Centre for Modern Crop Production, Nanjing, China
| | - Qiuli Zhu
- Jiangsu Nantong Crop Cultivation Technique Direction Station, Nantong, China
| | - Yuxiang Mao
- Department of Food Crops, Jiangsu Yanjiang Institute of Agricultural Science, Nantong, China
| | - Guoqing Chen
- Department of Food Crops, Jiangsu Yanjiang Institute of Agricultural Science, Nantong, China
- Jiangsu Collaborative Innovation Centre for Modern Crop Production, Nanjing, China
| | - Lin Xue
- Department of Food Crops, Jiangsu Yanjiang Institute of Agricultural Science, Nantong, China
- Jiangsu Collaborative Innovation Centre for Modern Crop Production, Nanjing, China
| | - Huhua Lu
- Department of Food Crops, Jiangsu Yanjiang Institute of Agricultural Science, Nantong, China
| | - Mingliang Shi
- Department of Food Crops, Jiangsu Yanjiang Institute of Agricultural Science, Nantong, China
| | - Zhenliang Zhang
- Department of Food Crops, Jiangsu Yanjiang Institute of Agricultural Science, Nantong, China
| | - Xudong Song
- Department of Food Crops, Jiangsu Yanjiang Institute of Agricultural Science, Nantong, China
| | - Huimin Zhang
- Department of Food Crops, Jiangsu Yanjiang Institute of Agricultural Science, Nantong, China
| | - Derong Hao
- Department of Food Crops, Jiangsu Yanjiang Institute of Agricultural Science, Nantong, China
| |
Collapse
|
71
|
Wu D, Tanaka R, Li X, Ramstein GP, Cu S, Hamilton JP, Buell CR, Stangoulis J, Rocheford T, Gore MA. High-resolution genome-wide association study pinpoints metal transporter and chelator genes involved in the genetic control of element levels in maize grain. G3-GENES GENOMES GENETICS 2021; 11:6156830. [PMID: 33677522 PMCID: PMC8759812 DOI: 10.1093/g3journal/jkab059] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 02/21/2021] [Indexed: 12/18/2022]
Abstract
Despite its importance to plant function and human health, the genetics underpinning element levels in maize grain remain largely unknown. Through a genome-wide association study in the maize Ames panel of nearly 2,000 inbred lines that was imputed with ∼7.7 million SNP markers, we investigated the genetic basis of natural variation for the concentration of 11 elements in grain. Novel associations were detected for the metal transporter genes rte2 (rotten ear2) and irt1 (iron-regulated transporter1) with boron and nickel, respectively. We also further resolved loci that were previously found to be associated with one or more of five elements (copper, iron, manganese, molybdenum, and/or zinc), with two metal chelator and five metal transporter candidate causal genes identified. The nas5 (nicotianamine synthase5) gene involved in the synthesis of nicotianamine, a metal chelator, was found associated with both zinc and iron and suggests a common genetic basis controlling the accumulation of these two metals in the grain. Furthermore, moderate predictive abilities were obtained for the 11 elemental grain phenotypes with two whole-genome prediction models: Bayesian Ridge Regression (0.33–0.51) and BayesB (0.33–0.53). Of the two models, BayesB, with its greater emphasis on large-effect loci, showed ∼4–10% higher predictive abilities for nickel, molybdenum, and copper. Altogether, our findings contribute to an improved genotype-phenotype map for grain element accumulation in maize.
Collapse
Affiliation(s)
- Di Wu
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Ryokei Tanaka
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Xiaowei Li
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | | | - Suong Cu
- College of Science and Engineering, Flinders University, Bedford Park, SA 5042, Australia
| | - John P Hamilton
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| | - C Robin Buell
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| | - James Stangoulis
- College of Science and Engineering, Flinders University, Bedford Park, SA 5042, Australia
| | - Torbert Rocheford
- Department of Agronomy, Purdue University, West Lafayette, IN 47907, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
72
|
Mora-Poblete F, Ballesta P, Lobos GA, Molina-Montenegro M, Gleadow R, Ahmar S, Jiménez-Aspee F. Genome-wide association study of cyanogenic glycosides, proline, sugars, and pigments in Eucalyptus cladocalyx after 18 consecutive dry summers. PHYSIOLOGIA PLANTARUM 2021; 172:1550-1569. [PMID: 33511661 DOI: 10.1111/ppl.13349] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 01/07/2021] [Accepted: 01/20/2021] [Indexed: 06/12/2023]
Abstract
Natural variation of cyanogenic glycosides, soluble sugars, proline, and nondestructive optical sensing of pigments (chlorophyll, flavonols, and anthocyanins) was examined in ex situ natural populations of Eucalyptus cladocalyx F. Muell. grown under dry environmental conditions in the southern Atacama Desert, Chile. After 18 consecutive dry seasons, considerable plant-to-plant phenotypic variation for all the traits was observed in the field. For example, leaf hydrogen cyanide (HCN) concentrations varied from 0 (two acyanogenic individuals) to 1.54 mg cyanide g-1 DW. Subsequent genome-wide association study revealed associations with several genes with a known function in plants. HCN content was associated robustly with genes encoding Cytochrome P450 proteins, and with genes involved in the detoxification mechanism of HCN in cells (β-cyanoalanine synthase and cyanoalanine nitrilase). Another important finding was that sugars, proline, and pigment content were linked to genes involved in transport, biosynthesis, and/or catabolism. Estimates of genomic heritability (based on haplotypes) ranged between 0.46 and 0.84 (HCN and proline content, respectively). Proline and soluble sugars had the highest predictive ability of genomic prediction models (PA = 0.65 and PA = 0.71, respectively). PA values for HCN content and flavonols were relatively moderate, with estimates ranging from 0.44 to 0.50. These findings provide new understanding on the genetic architecture of cyanogenic capacity, and other key complex traits in cyanogenic E. cladocalyx.
Collapse
Affiliation(s)
| | - Paulina Ballesta
- Institute of Biological Sciences, Universidad de Talca, Talca, Chile
| | - Gustavo A Lobos
- Plant Breeding and Phenomic Center, Faculty of Agricultural Sciences, Universidad de Talca, Talca, Chile
| | - Marco Molina-Montenegro
- Institute of Biological Sciences, Universidad de Talca, Talca, Chile
- Centro de Estudios Avanzados en Zonas Áridas (CEAZA), Facultad de Ciencias del Mar, Universidad Católica del Norte, Coquimbo, Chile
| | - Roslyn Gleadow
- School of Biological Sciences, Monash University, Melbourne, Victoria, Australia
| | - Sunny Ahmar
- Institute of Biological Sciences, Universidad de Talca, Talca, Chile
- College of Plant Sciences and Technology, Huazhong Agricultural University, Wuhan, China
| | - Felipe Jiménez-Aspee
- Department of Food Biofunctionality, Institute of Nutritional Sciences, University of Hohenheim, Stuttgart, Germany
- Departamento de Ciencias Básicas Biomédicas, Facultad de Ciencias de la Salud, Universidad de Talca, Talca, Chile
| |
Collapse
|
73
|
Bayesian ridge regression shows the best fit for SSR markers in Psidium guajava among Bayesian models. Sci Rep 2021; 11:13639. [PMID: 34211058 PMCID: PMC8249379 DOI: 10.1038/s41598-021-93120-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 06/14/2021] [Indexed: 11/25/2022] Open
Abstract
Markers are an important tool in plant breeding, which can improve conventional phenotypic breeding, generating more accurate information outcoming better decision making. This study aimed to apply and compare the fit of different Bayesian models BRR, BayesA, BayesB, BayesB (setting the value from very low to \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\pi$$\end{document}π = \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${10}^{-5}$$\end{document}10-5), BayesC and Bayesian Lasso (LASSO) for predictions of the genomic genetic values of productivity and quality traits of a guava population. The models were fitted for traits fruit mass, pulp mass, soluble solids content, fruit number, and production per plant in the genomic prediction with SSR markers, obtained through the CTAB extraction method with 200 primers. The Bayesian ridge regression model showed the best results for all traits and was chosen to predict the individual’s genomic values according to the cross-validation data. A good stabilization of the Markov and Monte Carlo chains was observed with the mean values close to the observed phenotypic means. Heritabilities showed good predictive accuracy. The model showed strong correlations between some traits, allowing indirect selection.
Collapse
|
74
|
Ferrão LFV, Amadeu RR, Benevenuto J, de Bem Oliveira I, Munoz PR. Genomic Selection in an Outcrossing Autotetraploid Fruit Crop: Lessons From Blueberry Breeding. FRONTIERS IN PLANT SCIENCE 2021; 12:676326. [PMID: 34194453 PMCID: PMC8236943 DOI: 10.3389/fpls.2021.676326] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 05/12/2021] [Indexed: 05/17/2023]
Abstract
Blueberry (Vaccinium corymbosum and hybrids) is a specialty crop with expanding production and consumption worldwide. The blueberry breeding program at the University of Florida (UF) has greatly contributed to expanding production areas by developing low-chilling cultivars better adapted to subtropical and Mediterranean climates of the globe. The breeding program has historically focused on recurrent phenotypic selection. As an autopolyploid, outcrossing, perennial, long juvenile phase crop, blueberry breeding cycles are costly and time consuming, which results in low genetic gains per unit of time. Motivated by applying molecular markers for a more accurate selection in the early stages of breeding, we performed pioneering genomic selection studies and optimization for its implementation in the blueberry breeding program. We have also addressed some complexities of sequence-based genotyping and model parametrization for an autopolyploid crop, providing empirical contributions that can be extended to other polyploid species. We herein revisited some of our previous genomic selection studies and showed for the first time its application in an independent validation set. In this paper, our contribution is three-fold: (i) summarize previous results on the relevance of model parametrizations, such as diploid or polyploid methods, and inclusion of dominance effects; (ii) assess the importance of sequence depth of coverage and genotype dosage calling steps; (iii) demonstrate the real impact of genomic selection on leveraging breeding decisions by using an independent validation set. Altogether, we propose a strategy for using genomic selection in blueberry, with the potential to be applied to other polyploid species of a similar background.
Collapse
Affiliation(s)
- Luís Felipe V. Ferrão
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| | - Rodrigo R. Amadeu
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| | - Juliana Benevenuto
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| | - Ivone de Bem Oliveira
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
- Hortifrut North America, Inc., Estero, FL, United States
| | - Patricio R. Munoz
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| |
Collapse
|
75
|
McGowan MT, Zhang Z, Ficklin SP. Chromosomal characteristics of salt stress heritable gene expression in the rice genome. BMC Genom Data 2021; 22:17. [PMID: 34044788 PMCID: PMC8162008 DOI: 10.1186/s12863-021-00970-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene expression is potentially an important heritable quantitative trait that mediates between genetic variation and higher-level complex phenotypes through time and condition-dependent regulatory interactions. Therefore, we sought to explore both the genomic and condition-specific characteristics of gene expression heritability within the context of chromosomal structure. RESULTS Heritability was estimated for biological gene expression using a diverse, 84-line, Oryza sativa (rice) population under optimal and salt-stressed conditions. Overall, 5936 genes were found to have heritable expression regardless of condition and 1377 genes were found to have heritable expression only during salt stress. These genes with salt-specific heritable expression are enriched for functional terms associated with response to stimulus and transcription factor activity. Additionally, we discovered that highly and lowly expressed genes, and genes with heritable expression are distributed differently along the chromosomes in patterns that follow previously identified high-throughput chromosomal conformation capture (Hi-C) A/B chromatin compartments. Furthermore, multiple genomic hot-spots enriched for genes with salt-specific heritability were identified on chromosomes 1, 4, 6, and 8. These hotspots were found to contain genes functionally enriched for transcriptional regulation and overlaps with a previously identified major QTL for salt-tolerance in rice. CONCLUSIONS Investigating the heritability of traits, and in-particular gene expression traits, is important towards developing a basic understanding of how regulatory networks behave across a population. This work provides insights into spatial patterns of heritable gene expression at the chromosomal level.
Collapse
Affiliation(s)
- Matthew T McGowan
- Molecular Plant Sciences Program, Washington State University, French Ad 324G, Pullman, WA, 99164, USA.
| | - Zhiwu Zhang
- Molecular Plant Sciences Program, Washington State University, French Ad 324G, Pullman, WA, 99164, USA.,Department of Crops and Soils, Washington State University, 105 Johnson Hall, Pullman, WA, 99164, USA
| | - Stephen P Ficklin
- Molecular Plant Sciences Program, Washington State University, French Ad 324G, Pullman, WA, 99164, USA.,Department of Horticulture, Washington State University, 149 Johnson Hall, Pullman, WA, 99164, USA
| |
Collapse
|
76
|
Puglisi D, Delbono S, Visioni A, Ozkan H, Kara İ, Casas AM, Igartua E, Valè G, Piero ARL, Cattivelli L, Tondelli A, Fricano A. Genomic Prediction of Grain Yield in a Barley MAGIC Population Modeling Genotype per Environment Interaction. FRONTIERS IN PLANT SCIENCE 2021; 12:664148. [PMID: 34108982 PMCID: PMC8183822 DOI: 10.3389/fpls.2021.664148] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 04/26/2021] [Indexed: 06/12/2023]
Abstract
Multi-parent Advanced Generation Inter-crosses (MAGIC) lines have mosaic genomes that are generated shuffling the genetic material of the founder parents following pre-defined crossing schemes. In cereal crops, these experimental populations have been extensively used to investigate the genetic bases of several traits and dissect the genetic bases of epistasis. In plants, genomic prediction models are usually fitted using either diverse panels of mostly unrelated accessions or individuals of biparental families and several empirical analyses have been conducted to evaluate the predictive ability of models fitted to these populations using different traits. In this paper, we constructed, genotyped and evaluated a barley MAGIC population of 352 individuals developed with a diverse set of eight founder parents showing contrasting phenotypes for grain yield. We combined phenotypic and genotypic information of this MAGIC population to fit several genomic prediction models which were cross-validated to conduct empirical analyses aimed at examining the predictive ability of these models varying the sizes of training populations. Moreover, several methods to optimize the composition of the training population were also applied to this MAGIC population and cross-validated to estimate the resulting predictive ability. Finally, extensive phenotypic data generated in field trials organized across an ample range of water regimes and climatic conditions in the Mediterranean were used to fit and cross-validate multi-environment genomic prediction models including G×E interaction, using both genomic best linear unbiased prediction and reproducing kernel Hilbert space along with a non-linear Gaussian Kernel. Overall, our empirical analyses showed that genomic prediction models trained with a limited number of MAGIC lines can be used to predict grain yield with values of predictive ability that vary from 0.25 to 0.60 and that beyond QTL mapping and analysis of epistatic effects, MAGIC population might be used to successfully fit genomic prediction models. We concluded that for grain yield, the single-environment genomic prediction models examined in this study are equivalent in terms of predictive ability while, in general, multi-environment models that explicitly split marker effects in main and environmental-specific effects outperform simpler multi-environment models.
Collapse
Affiliation(s)
- Damiano Puglisi
- Dipartimento di Agricoltura, Alimentazione e Ambiente (Di3A), Università di Catania, Catania, Italy
| | - Stefano Delbono
- Council for Agricultural Research and Economics–Research Centre for Genomics and Bioinformatics, Fiorenzuola d’Arda, Italy
| | - Andrea Visioni
- Biodiversity and Crop Improvement Program, International Center for Agricultural Research in the Dry Areas, Avenue Hafiane Cherkaoui, Rabat, Morocco
| | - Hakan Ozkan
- Department of Field Crops, Faculty of Agriculture, University of Cukurova, Adana, Turkey
| | - İbrahim Kara
- Bahri Dagdas International Agricultural Research Institute, Konya, Turkey
| | - Ana M. Casas
- Aula Dei Experimental Station (EEAD-CSIC), Spanish Research Council, Zaragoza, Spain
| | - Ernesto Igartua
- Aula Dei Experimental Station (EEAD-CSIC), Spanish Research Council, Zaragoza, Spain
| | - Giampiero Valè
- DiSIT, Dipartimento di Scienze e Innovazione Tecnologica, Università del Piemonte Orientale, Vercelli, Italy
| | - Angela Roberta Lo Piero
- Dipartimento di Agricoltura, Alimentazione e Ambiente (Di3A), Università di Catania, Catania, Italy
| | - Luigi Cattivelli
- Council for Agricultural Research and Economics–Research Centre for Genomics and Bioinformatics, Fiorenzuola d’Arda, Italy
| | - Alessandro Tondelli
- Council for Agricultural Research and Economics–Research Centre for Genomics and Bioinformatics, Fiorenzuola d’Arda, Italy
| | - Agostino Fricano
- Council for Agricultural Research and Economics–Research Centre for Genomics and Bioinformatics, Fiorenzuola d’Arda, Italy
| |
Collapse
|
77
|
Rice BR, Lipka AE. Diversifying maize genomic selection models. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2021; 41:33. [PMID: 37309328 PMCID: PMC10236107 DOI: 10.1007/s11032-021-01221-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/07/2021] [Indexed: 06/14/2023]
Abstract
Genomic selection (GS) is one of the most powerful tools available for maize breeding. Its use of genome-wide marker data to estimate breeding values translates to increased genetic gains with fewer breeding cycles. In this review, we cover the history of GS and highlight particular milestones during its adaptation to maize breeding. We discuss how GS can be applied to developing superior maize inbreds and hybrids. Additionally, we characterize refinements in GS models that could enable the encapsulation of non-additive genetic effects, genotype by environment interactions, and multiple levels of the biological hierarchy, all of which could ultimately result in more accurate predictions of breeding values. Finally, we suggest the stages in a maize breeding program where it would be beneficial to apply GS. Given the current sophistication of high-throughput phenotypic, genotypic, and other -omic level data currently available to the maize community, now is the time to explore the implications of their incorporation into GS models and thus ensure that genetic gains are being achieved as quickly and efficiently as possible.
Collapse
Affiliation(s)
- Brian R. Rice
- Department of Crop Sciences, University of Illinois, Urbana, IL USA
| | | |
Collapse
|
78
|
Simeão RM, Resende MDV, Alves RS, Pessoa-Filho M, Azevedo ALS, Jones CS, Pereira JF, Machado JC. Genomic Selection in Tropical Forage Grasses: Current Status and Future Applications. FRONTIERS IN PLANT SCIENCE 2021; 12:665195. [PMID: 33995461 PMCID: PMC8120112 DOI: 10.3389/fpls.2021.665195] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 04/06/2021] [Indexed: 05/06/2023]
Abstract
The world population is expected to be larger and wealthier over the next few decades and will require more animal products, such as milk and beef. Tropical regions have great potential to meet this growing global demand, where pasturelands play a major role in supporting increased animal production. Better forage is required in consonance with improved sustainability as the planted area should not increase and larger areas cultivated with one or a few forage species should be avoided. Although, conventional tropical forage breeding has successfully released well-adapted and high-yielding cultivars over the last few decades, genetic gains from these programs have been low in view of the growing food demand worldwide. To guarantee their future impact on livestock production, breeding programs should leverage genotyping, phenotyping, and envirotyping strategies to increase genetic gains. Genomic selection (GS) and genome-wide association studies play a primary role in this process, with the advantage of increasing genetic gain due to greater selection accuracy, reduced cycle time, and increased number of individuals that can be evaluated. This strategy provides solutions to bottlenecks faced by conventional breeding methods, including long breeding cycles and difficulties to evaluate complex traits. Initial results from implementing GS in tropical forage grasses (TFGs) are promising with notable improvements over phenotypic selection alone. However, the practical impact of GS in TFG breeding programs remains unclear. The development of appropriately sized training populations is essential for the evaluation and validation of selection markers based on estimated breeding values. Large panels of single-nucleotide polymorphism markers in different tropical forage species are required for multiple application targets at a reduced cost. In this context, this review highlights the current challenges, achievements, availability, and development of genomic resources and statistical methods for the implementation of GS in TFGs. Additionally, the prediction accuracies from recent experiments and the potential to harness diversity from genebanks are discussed. Although, GS in TFGs is still incipient, the advances in genomic tools and statistical models will speed up its implementation in the foreseeable future. All TFG breeding programs should be prepared for these changes.
Collapse
Affiliation(s)
| | | | - Rodrigo S. Alves
- Instituto Nacional de Ciência e Tecnologia do Café, Universidade Federal de Viçosa, Viçosa, Brazil
| | | | | | - Chris S. Jones
- International Livestock Research Institute, Nairobi, Kenya
| | | | | |
Collapse
|
79
|
de Sousa DR, do Nascimento AV, Lôbo RNB. Prediction of genomic breeding values of milk traits in Brazilian Saanen goats. J Anim Breed Genet 2021; 138:541-551. [PMID: 33861884 DOI: 10.1111/jbg.12550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 03/17/2021] [Accepted: 03/22/2021] [Indexed: 11/28/2022]
Abstract
The study's objective was to compare the genomic prediction ability methods for the traits milk yield, milk composition and somatic cell count of Saanen Brazilian goats. Nine hundred forty goats, genotyped with an Axiom_OviCap (Caprine) panel, Affimetrix customized array with 62,557 single nucleotide polymorphisms (SNPs), were used for the genomic selection analyses. The genomic methods studied to estimate the effects of SNPs and direct genomic values (DGV) were as follows: (a) genomic BLUP (GBLUP), (b) Bayes Cπ and (c) Bayesian Lasso (BLASSO). Estimated breeding values (EBV) and deregressed estimated breeding values (dEBV) were used as response variables for the genomic predictions. The prediction ability was assessed by Pearson's correlation between DGV and response variables (EBV and dEBV). Regression coefficients of the response variables on the DGV were obtained to verify if the genomic predictions were biased. In addition, the mean square error of prediction (MSE) was used as a measure of verification of model fit to the data. The means of prediction accuracy, when EBV was used as a response variable, were 0.68, 0.68 and 0.67 for GBLUP, Bayes Cπ and BLASSO, respectively. With dEBV, the mean prediction accuracy was 0.50 for all models. The averages of the EBV regression coefficients on DGV were 1.08 for all models (GBLUP, Bayes Cπ and BLASSO), higher than those obtained for the regression coefficient of dEBV on DGV, which presented values of 1.05, 1.05 and 1.08 for GBLUP, Bayes Cπ and BLASSO, respectively. None of the methods stood out in terms of prediction ability; however, the GBLUP method was the most appropriate for estimating the DGV, in a slightly more reliable and less biased way, besides presenting the lowest computational cost. In the context of the present study, EBV was the preferred response variables considering the genomic prediction accuracy despite dEBV also presented lower bias.
Collapse
Affiliation(s)
| | - André Vieira do Nascimento
- Faculty of Agricultural and Veterinary Sciences of Jaboticabal. Animal Sciences Department I, São Paulo State University "Júlio de Mesquita Filho", Jaboticabal, Brazil
| | - Raimundo Nonato Braga Lôbo
- Animal Sciences Department, Federal University of Ceará, Fortaleza, Brazil.,Brazilian Agricultural Research Corporation - EMBRAPA, Embrapa Caprinos e Ovinos, Estrada Sobral/Groaíras, Sobral, Brazil.,National Council for Scientific and Technological Development - CNPq, Lago Sul, Brazil
| |
Collapse
|
80
|
Mota LFM, Pegolo S, Baba T, Peñagaricano F, Morota G, Bittante G, Cecchinato A. Evaluating the performance of machine learning methods and variable selection methods for predicting difficult-to-measure traits in Holstein dairy cattle using milk infrared spectral data. J Dairy Sci 2021; 104:8107-8121. [PMID: 33865589 DOI: 10.3168/jds.2020-19861] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 03/05/2021] [Indexed: 12/11/2022]
Abstract
Fourier-transform infrared (FTIR) spectroscopy is a powerful high-throughput phenotyping tool for predicting traits that are expensive and difficult to measure in dairy cattle. Calibration equations are often developed using standard methods, such as partial least squares (PLS) regression. Methods that employ penalization, rank-reduction, and variable selection, as well as being able to model the nonlinear relations between phenotype and FTIR, might offer improvements in predictive ability and model robustness. This study aimed to compare the predictive ability of 2 machine learning methods, namely random forest (RF) and gradient boosting machine (GBM), and penalized regression against PLS regression for predicting 3 phenotypes differing in terms of biological meaning and relationships with milk composition (i.e., phenotypes measurable directly and not directly in milk, reflecting different biological processes which can be captured using milk spectra) in Holstein-Friesian cattle under 2 cross-validation scenarios. The data set comprised phenotypic information from 471 Holstein-Friesian cows, and 3 target phenotypes were evaluated: (1) body condition score (BCS), (2) blood β-hydroxybutyrate (BHB, mmol/L), and (3) κ-casein expressed as a percentage of nitrogen (κ-CN, % N). The data set was split considering 2 cross-validation scenarios: samples-out random in which the population was randomly split into 10-folds (8-folds for training and 1-fold for validation and testing); and herd/date-out in which the population was randomly assigned to training (70% herd), validation (10%), and testing (20% herd) based on the herd and date in which the samples were collected. The random grid search was performed using the training subset for the hyperparameter optimization and the validation set was used for the generalization of prediction error. The trained model was then used to assess the final prediction in the testing subset. The grid search for penalized regression evidenced that the elastic net (EN) was the best regularization with increase in predictive ability of 5%. The performance of PLS (standard model) was compared against 2 machine learning techniques and penalized regression using 2 cross-validation scenarios. Machine learning methods showed a greater predictive ability for BCS (0.63 for GBM and 0.61 for RF), BHB (0.80 for GBM and 0.79 for RF), and κ-CN (0.81 for GBM and 0.80 for RF) in samples-out cross-validation. Considering a herd/date-out cross-validation these values were 0.58 (GBM and RF) for BCS, 0.73 (GBM and RF) for BHB, and 0.77 (GBM and RF) for κ-CN. The GBM model tended to outperform other methods in predictive ability around 4%, 1%, and 7% for EN, RF, and PLS, respectively. The prediction accuracies of the GBM and RF models were similar, and differed statistically from the PLS model in samples-out random cross-validation. Although, machine learning techniques outperformed PLS in herd/date-out cross-validation, no significant differences were observed in terms of predictive ability due to the large standard deviation observed for predictions. Overall, GBM achieved the highest accuracy of FTIR-based prediction of the different phenotypic traits across the cross-validation scenarios. These results indicate that GBM is a promising method for obtaining more accurate FTIR-based predictions for different phenotypes in dairy cattle.
Collapse
Affiliation(s)
- Lucio F M Mota
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy
| | - Sara Pegolo
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy.
| | - Toshimi Baba
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg 24061
| | | | - Gota Morota
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg 24061
| | - Giovanni Bittante
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy
| | - Alessio Cecchinato
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy
| |
Collapse
|
81
|
Roudbar MA, Mousavi SF, Ardestani SS, Lopes FB, Momen M, Gianola D, Khatib H. Prediction of biological age and evaluation of genome-wide dynamic methylomic changes throughout human aging. G3-GENES GENOMES GENETICS 2021; 11:6214518. [PMID: 33826720 PMCID: PMC8495934 DOI: 10.1093/g3journal/jkab112] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 03/29/2021] [Indexed: 11/14/2022]
Abstract
The use of DNA methylation signatures to predict chronological age and aging rate is of interest in many fields, including disease prevention and treatment, forensics, and anti-aging medicine. Although a large number of methylation markers are significantly associated with age, most age-prediction methods use a few markers selected based on either previously published studies or datasets containing methylation information. Here, we implemented reproducing kernel Hilbert spaces (RKHS) regression and a ridge regression model in a Bayesian framework that utilized phenotypic and methylation profiles simultaneously to predict chronological age. We used over 450,000 CpG sites from the whole blood of a large cohort of 4,409 human individuals with a range of 10-101 years of age. Models were fitted using adjusted and un-adjusted methylation measurements for cell heterogeneity. Un-adjusted methylation scores delivered a significantly higher prediction accuracy than adjusted methylation data, with a correlation between age and predicted age of 0.98 and a root-mean-square error (RMSE) of 3.54 years in un-adjusted data, and 0.90 (correlation) and 7.16 (RMSE) years in adjusted data. Reducing the number of predictors (CpG sites) through subset selection improved predictive power with a correlation of 0.98 and an RMSE of 2.98 years in the RKHS model. We found distinct global methylation patterns, with a significant increase in the proportion of methylated cytosines in CpG islands and a decreased proportion in other CpG types, including CpG shore, shelf, and open sea (p < 5e-06). Epigenetic drift seemed to be a widespread phenomenon as more than 97% of the age-associated methylation sites had heteroscedasticity. Apparent methylomic aging rate (AMAR) had a sex-specific pattern, with an increase in AMAR in females with age related to males.
Collapse
Affiliation(s)
- Mahmoud Amiri Roudbar
- Department of Animal Science, Safiabad-Dezful Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education & Extension Organization (AREEO), Dezful, Iran
| | - Seyedeh Fatemeh Mousavi
- Department of Animal Science, Faculty of Agriculture Engineering, University of Kurdistan, Sanandaj, Iran
| | - Siavash Salek Ardestani
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS B2N 5E3, Canada
| | - Fernando Brito Lopes
- Department of Animal Sciences, Sao Paulo State University, Julio de Mesquita Filho (UNESP), Prof. Paulo Donato Castelane, Jaboticabal, SP, 14884-900, Brazil
| | - Mehdi Momen
- Department of Surgical Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Daniel Gianola
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, 53706, Madison, WI, USA
| | - Hasan Khatib
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, 53706, Madison, WI, USA
| |
Collapse
|
82
|
Hai Y, Wen Y. A Bayesian linear mixed model for prediction of complex traits. Bioinformatics 2021; 36:5415-5423. [PMID: 33331865 PMCID: PMC8016495 DOI: 10.1093/bioinformatics/btaa1023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 11/24/2020] [Accepted: 11/27/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Accurate disease risk prediction is essential for precision medicine. Existing models either assume that diseases are caused by groups of predictors with small-to-moderate effects or a few isolated predictors with large effects. Their performance can be sensitive to the underlying disease mechanisms, which are usually unknown in advance. RESULTS We developed a Bayesian linear mixed model (BLMM), where genetic effects were modelled using a hybrid of the sparsity regression and linear mixed model with multiple random effects. The parameters in BLMM were inferred through a computationally efficient variational Bayes algorithm. The proposed method can resemble the shape of the true effect size distributions, captures the predictive effects from both common and rare variants, and is robust against various disease models. Through extensive simulations and the application to a whole-genome sequencing dataset obtained from the Alzheimer's Disease Neuroimaging Initiatives, we have demonstrated that BLMM has better prediction performance than existing methods and can detect variables and/or genetic regions that are predictive. AVAILABILITYAND IMPLEMENTATION The R-package is available at https://github.com/yhai943/BLMM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Hai
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
83
|
Campbell MT, Hu H, Yeats TH, Brzozowski LJ, Caffe-Treml M, Gutiérrez L, Smith KP, Sorrells ME, Gore MA, Jannink JL. Improving Genomic Prediction for Seed Quality Traits in Oat (Avena sativa L.) Using Trait-Specific Relationship Matrices. Front Genet 2021; 12:643733. [PMID: 33868378 PMCID: PMC8044359 DOI: 10.3389/fgene.2021.643733] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 03/04/2021] [Indexed: 11/13/2022] Open
Abstract
The observable phenotype is the manifestation of information that is passed along different organization levels (transcriptional, translational, and metabolic) of a biological system. The widespread use of various omic technologies (RNA-sequencing, metabolomics, etc.) has provided plant genetics and breeders with a wealth of information on pertinent intermediate molecular processes that may help explain variation in conventional traits such as yield, seed quality, and fitness, among others. A major challenge is effectively using these data to help predict the genetic merit of new, unobserved individuals for conventional agronomic traits. Trait-specific genomic relationship matrices (TGRMs) model the relationships between individuals using genome-wide markers (SNPs) and place greater emphasis on markers that most relevant to the trait compared to conventional genomic relationship matrices. Given that these approaches define relationships based on putative causal loci, it is expected that these approaches should improve predictions for related traits. In this study we evaluated the use of TGRMs to accommodate information on intermediate molecular phenotypes (referred to as endophenotypes) and to predict an agronomic trait, total lipid content, in oat seed. Nine fatty acids were quantified in a panel of 336 oat lines. Marker effects were estimated for each endophenotype, and were used to construct TGRMs. A multikernel TRGM model (MK-TRGM-BLUP) was used to predict total seed lipid content in an independent panel of 210 oat lines. The MK-TRGM-BLUP approach significantly improved predictions for total lipid content when compared to a conventional genomic BLUP (gBLUP) approach. Given that the MK-TGRM-BLUP approach leverages information on the nine fatty acids to predict genetic values for total lipid content in unobserved individuals, we compared the MK-TGRM-BLUP approach to a multi-trait gBLUP (MT-gBLUP) approach that jointly fits phenotypes for fatty acids and total lipid content. The MK-TGRM-BLUP approach significantly outperformed MT-gBLUP. Collectively, these results highlight the utility of using TGRM to accommodate information on endophenotypes and improve genomic prediction for a conventional agronomic trait.
Collapse
Affiliation(s)
- Malachy T. Campbell
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Haixiao Hu
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Trevor H. Yeats
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Lauren J. Brzozowski
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Melanie Caffe-Treml
- Seed Technology Lab 113, Agronomy, Horticulture & Plant Science, South Dakota State University, Brookings, SD, United States
| | - Lucía Gutiérrez
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI, United States
| | - Kevin P. Smith
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN, United States
| | - Mark E. Sorrells
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Michael A. Gore
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Jean-Luc Jannink
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
- R.W. Holley Center for Agriculture & Health, US Department of Agriculture, Agricultural Research Service, Ithaca, NY, United States
| |
Collapse
|
84
|
Tehseen MM, Kehel Z, Sansaloni CP, Lopes MDS, Amri A, Kurtulus E, Nazari K. Comparison of Genomic Prediction Methods for Yellow, Stem, and Leaf Rust Resistance in Wheat Landraces from Afghanistan. PLANTS 2021; 10:plants10030558. [PMID: 33809650 PMCID: PMC8001917 DOI: 10.3390/plants10030558] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 02/28/2021] [Accepted: 03/13/2021] [Indexed: 11/16/2022]
Abstract
Wheat rust diseases, including yellow rust (Yr; also known as stripe rust) caused by Puccinia striiformis Westend. f. sp. tritici, leaf rust (Lr) caused by Puccinia triticina Eriks. and stem rust (Sr) caused by Puccinia graminis Pres f. sp. tritici are major threats to wheat production all around the globe. Durable resistance to wheat rust diseases can be achieved through genomic-assisted prediction of resistant accessions to increase genetic gain per unit time. Genomic prediction (GP) is a promising technology that uses genomic markers to estimate genomic-assisted breeding values (GBEVs) for selecting resistant plant genotypes and accumulating favorable alleles for adult plant resistance (APR) to wheat rust diseases. To evaluate GP we compared the predictive ability of nine different parametric, semi-parametric and Bayesian models including Genomic Unbiased Linear Prediction (GBLUP), Ridge Regression (RR), Least Absolute Shrinkage and Selection Operator (LASSO), Elastic Net (EN), Bayesian Ridge Regression (BRR), Bayesian A (BA), Bayesian B (BB), Bayesian C (BC) and Reproducing Kernel Hilbert Spacing model (RKHS) to estimate GEBV’s for APR to yellow, leaf and stem rust of wheat in a panel of 363 bread wheat landraces of Afghanistan origin. Based on five-fold cross validation the mean predictive abilities were 0.33, 0.30, 0.38, and 0.33 for Yr (2016), Yr (2017), Lr, and Sr, respectively. No single model outperformed the rest of the models for all traits. LASSO and EN showed the lowest predictive ability in four of the five traits. GBLUP and RR gave similar predictive abilities, whereas Bayesian models were not significantly different from each other as well. We also investigated the effect of the number of genotypes and the markers used in the analysis on the predictive ability of the GP model. The predictive ability was highest with 1000 markers and there was a linear trend in the predictive ability and the size of the training population. The results of the study are encouraging, confirming the feasibility of GP to be effectively applied in breeding programs for resistance to all three wheat rust diseases.
Collapse
Affiliation(s)
| | - Zakaria Kehel
- International Center for Agricultural Research in the Dry Areas (ICARDA), ICARDA-PreBreeding & Genebank Operations, Biodiversity and Crop Improvement Program, P.O. Box 10000 Rabat, Morocco; (Z.K.); (A.A.)
| | - Carolina P. Sansaloni
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz Km. 45, El Batán, Texcoco C.P. 56237, Mexico;
| | - Marta da Silva Lopes
- Sustainable Field Crops Programme, IRTA (Institute for Food and Agricultural Research and Technology), 25198 Lleida, Spain;
| | - Ahmed Amri
- International Center for Agricultural Research in the Dry Areas (ICARDA), ICARDA-PreBreeding & Genebank Operations, Biodiversity and Crop Improvement Program, P.O. Box 10000 Rabat, Morocco; (Z.K.); (A.A.)
| | - Ezgi Kurtulus
- International Center for Agricultural Research in the Dry Areas (ICARDA), Biodiversity and Crop Improvement Program, Regional Cereal Rust Research Center (RCRRC), P.O. Box 35661 Menemen, Izmir, Turkey;
| | - Kumarse Nazari
- International Center for Agricultural Research in the Dry Areas (ICARDA), Biodiversity and Crop Improvement Program, Regional Cereal Rust Research Center (RCRRC), P.O. Box 35661 Menemen, Izmir, Turkey;
- Correspondence:
| |
Collapse
|
85
|
Pérez-Enciso M, Steibel JP. Phenomes: the current frontier in animal breeding. Genet Sel Evol 2021; 53:22. [PMID: 33673800 PMCID: PMC7934239 DOI: 10.1186/s12711-021-00618-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 02/22/2021] [Indexed: 12/13/2022] Open
Abstract
Improvements in genomic technologies have outpaced the most optimistic predictions, allowing industry-scale application of genomic selection. However, only marginal gains in genetic prediction accuracy can now be expected by increasing marker density up to sequence, unless causative mutations are identified. We argue that some of the most scientifically disrupting and industry-relevant challenges relate to ‘phenomics’ instead of ‘genomics’. Thanks to developments in sensor technology and artificial intelligence, there is a wide range of analytical tools that are already available and many more will be developed. We can now address some of the pressing societal demands on the industry, such as animal welfare concerns or efficiency in the use of resources. From the statistical and computational point of view, phenomics raises two important issues that require further work: penalization and dimension reduction. This will be complicated by the inherent heterogeneity and ‘missingness’ of the data. Overall, we can expect that precision livestock technologies will make it possible to collect hundreds of traits on a continuous basis from large numbers of animals. Perhaps the main revolution will come from redesigning animal breeding schemes to explicitly allow for high-dimensional phenomics. In the meantime, phenomics data will definitely enlighten our knowledge on the biological basis of phenotypes.
Collapse
Affiliation(s)
- Miguel Pérez-Enciso
- ICREA, Passeig de Lluís Companys 23, 08010, Barcelona, Spain. .,Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Bellaterra, 08193, Barcelona, Spain.
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA.,Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
86
|
Tibbs Cortes L, Zhang Z, Yu J. Status and prospects of genome-wide association studies in plants. THE PLANT GENOME 2021; 14:e20077. [PMID: 33442955 DOI: 10.1002/tpg2.20077] [Citation(s) in RCA: 127] [Impact Index Per Article: 42.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 11/18/2020] [Indexed: 05/22/2023]
Abstract
Genome-wide association studies (GWAS) have developed into a powerful and ubiquitous tool for the investigation of complex traits. In large part, this was fueled by advances in genomic technology, enabling us to examine genome-wide genetic variants across diverse genetic materials. The development of the mixed model framework for GWAS dramatically reduced the number of false positives compared with naïve methods. Building on this foundation, many methods have since been developed to increase computational speed or improve statistical power in GWAS. These methods have allowed the detection of genomic variants associated with either traditional agronomic phenotypes or biochemical and molecular phenotypes. In turn, these associations enable applications in gene cloning and in accelerated crop breeding through marker assisted selection or genetic engineering. Current topics of investigation include rare-variant analysis, synthetic associations, optimizing the choice of GWAS model, and utilizing GWAS results to advance knowledge of biological processes. Ongoing research in these areas will facilitate further advances in GWAS methods and their applications.
Collapse
Affiliation(s)
| | - Zhiwu Zhang
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, 99164, USA
| | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, IA, 50010, USA
| |
Collapse
|
87
|
|
88
|
Han J, Gondro C, Reid K, Steibel JP. Heuristic hyperparameter optimization of deep learning models for genomic prediction. G3-GENES GENOMES GENETICS 2021; 11:6129776. [PMID: 33993261 PMCID: PMC8495939 DOI: 10.1093/g3journal/jkab032] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 01/23/2021] [Indexed: 11/17/2022]
Abstract
There is a growing interest among quantitative geneticists and animal breeders in the use of deep learning (DL) for genomic prediction. However, the performance of DL is affected by hyperparameters that are typically manually set by users. These hyperparameters do not simply specify the architecture of the model; they are also critical for the efficacy of the optimization and model-fitting process. To date, most DL approaches used for genomic prediction have concentrated on identifying suitable hyperparameters by exploring discrete options from a subset of the hyperparameter space. Enlarging the hyperparameter optimization search space with continuous hyperparameters is a daunting combinatorial problem. To deal with this problem, we propose using differential evolution (DE) to perform an efficient search of arbitrarily complex hyperparameter spaces in DL models, and we apply this to the specific case of genomic prediction of livestock phenotypes. This approach was evaluated on two pig and cattle datasets with real genotypes and simulated phenotypes (N = 7,539 animals and M = 48,541 markers) and one real dataset (N = 910 individuals and M = 28,916 markers). Hyperparameters were evaluated using cross-validation. We compared the predictive performance of DL models using hyperparameters optimized by DE against DL models with “best practice” hyperparameters selected from published studies and baseline DL models with randomly specified hyperparameters. Optimized models using DE showed a clear improvement in predictive performance across all three datasets. DE optimized hyperparameters also resulted in DL models with less overfitting and less variation in predictive performance over repeated retraining compared to non-optimized DL models.
Collapse
Affiliation(s)
- Junjie Han
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA.,Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Cedric Gondro
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
| | - Kenneth Reid
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA.,Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
89
|
Farooq M, van Dijk ADJ, Nijveen H, Aarts MGM, Kruijer W, Nguyen TP, Mansoor S, de Ridder D. Prior Biological Knowledge Improves Genomic Prediction of Growth-Related Traits in Arabidopsis thaliana. Front Genet 2021; 11:609117. [PMID: 33552126 PMCID: PMC7855462 DOI: 10.3389/fgene.2020.609117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 12/21/2020] [Indexed: 01/11/2023] Open
Abstract
Prediction of growth-related complex traits is highly important for crop breeding. Photosynthesis efficiency and biomass are direct indicators of overall plant performance and therefore even minor improvements in these traits can result in significant breeding gains. Crop breeding for complex traits has been revolutionized by technological developments in genomics and phenomics. Capitalizing on the growing availability of genomics data, genome-wide marker-based prediction models allow for efficient selection of the best parents for the next generation without the need for phenotypic information. Until now such models mostly predict the phenotype directly from the genotype and fail to make use of relevant biological knowledge. It is an open question to what extent the use of such biological knowledge is beneficial for improving genomic prediction accuracy and reliability. In this study, we explored the use of publicly available biological information for genomic prediction of photosynthetic light use efficiency (Φ PSII ) and projected leaf area (PLA) in Arabidopsis thaliana. To explore the use of various types of knowledge, we mapped genomic polymorphisms to Gene Ontology (GO) terms and transcriptomics-based gene clusters, and applied these in a Genomic Feature Best Linear Unbiased Predictor (GFBLUP) model, which is an extension to the traditional Genomic BLUP (GBLUP) benchmark. Our results suggest that incorporation of prior biological knowledge can improve genomic prediction accuracy for both Φ PSII and PLA. The improvement achieved depends on the trait, type of knowledge and trait heritability. Moreover, transcriptomics offers complementary evidence to the Gene Ontology for improvement when used to define functional groups of genes. In conclusion, prior knowledge about trait-specific groups of genes can be directly translated into improved genomic prediction.
Collapse
Affiliation(s)
- Muhammad Farooq
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Punjab, Pakistan
| | - Aalt D. J. van Dijk
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
- Biometris, Wageningen University, Wageningen, Netherlands
| | - Harm Nijveen
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
| | - Mark G. M. Aarts
- Laboratory of Genetics, Wageningen University, Wageningen, Netherlands
| | - Willem Kruijer
- Biometris, Wageningen University, Wageningen, Netherlands
| | - Thu-Phuong Nguyen
- Laboratory of Genetics, Wageningen University, Wageningen, Netherlands
| | - Shahid Mansoor
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Punjab, Pakistan
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
| |
Collapse
|
90
|
Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, Barrón-López JA, Martini JWR, Fajardo-Flores SB, Gaytan-Lugo LS, Santana-Mancilla PC, Crossa J. A review of deep learning applications for genomic selection. BMC Genomics 2021; 22:19. [PMID: 33407114 PMCID: PMC7789712 DOI: 10.1186/s12864-020-07319-x] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 12/10/2020] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Several conventional genomic Bayesian (or no Bayesian) prediction methods have been proposed including the standard additive genetic effect model for which the variance components are estimated with mixed model equations. In recent years, deep learning (DL) methods have been considered in the context of genomic prediction. The DL methods are nonparametric models providing flexibility to adapt to complicated associations between data and output with the ability to adapt to very complex patterns. MAIN BODY We review the applications of deep learning (DL) methods in genomic selection (GS) to obtain a meta-picture of GS performance and highlight how these tools can help solve challenging plant breeding problems. We also provide general guidance for the effective use of DL methods including the fundamentals of DL and the requirements for its appropriate use. We discuss the pros and cons of this technique compared to traditional genomic prediction approaches as well as the current trends in DL applications. CONCLUSIONS The main requirement for using DL is the quality and sufficiently large training data. Although, based on current literature GS in plant and animal breeding we did not find clear superiority of DL in terms of prediction power compared to conventional genome based prediction models. Nevertheless, there are clear evidences that DL algorithms capture nonlinear patterns more efficiently than conventional genome based. Deep learning algorithms are able to integrate data from different sources as is usually needed in GS assisted breeding and it shows the ability for improving prediction accuracy for large plant breeding data. It is important to apply DL to large training-testing data sets.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, Mexico.
| | | | - José Alberto Barrón-López
- Department of Animal Production (DPA), Universidad Nacional Agraria La Molina, Av. La Molina s/n La Molina, 15024, Lima, Peru
| | - Johannes W R Martini
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45, CP 52640, Carretera Mexico-Veracruz, Mexico
| | | | - Laura S Gaytan-Lugo
- School of Mechanical and Electrical Engineering, Universidad de Colima, 28040, Colima, Colima, Mexico
| | | | - José Crossa
- Colegio de Postgraduados, CP 56230, Montecillos, Edo. de México, Mexico.
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45, CP 52640, Carretera Mexico-Veracruz, Mexico.
| |
Collapse
|
91
|
Dissecting the Genetic Architecture of Biofuel-Related Traits in a Sorghum Breeding Population. G3-GENES GENOMES GENETICS 2020; 10:4565-4577. [PMID: 33051261 PMCID: PMC7718745 DOI: 10.1534/g3.120.401582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In sorghum [Sorghum bicolor (L.) Moench], hybrid cultivars for the biofuel industry are desired. Along with selection based on testcross performance, evaluation of the breeding population per se is also important for the success of hybrid breeding. In addition to additive genetic effects, non-additive (i.e., dominance and epistatic) effects are expected to contribute to the performance of early generations. Unfortunately, studies on early generations in sorghum breeding programs are limited. In this study, we analyzed a breeding population for bioenergy sorghum, which was previously developed based on testcross performance, to compare genomic selection models both trained on and evaluated for the per se performance of the 3rd generation S0 individuals. Of over 200 ancestral inbred accessions in the base population, only 13 founders contributed to the 3rd generation as progenitors. Compared to the founders, the performances of the population per se were improved for target traits. The total genetic variance within the S0 generation progenies themselves for all traits was mainly additive, although non-additive variances contributed to each trait to some extent. For genomic selection, linear regression models explicitly considering all genetic components showed a higher predictive ability than other linear and non-linear models. Although the number and effect distribution of underlying loci was different among the traits, the influence of priors for marker effects was relatively small. These results indicate the importance of considering non-additive effects for dissecting the genetic architecture of early breeding generations and predicting the performance per se.
Collapse
|
92
|
A Multiple-Trait Bayesian Variable Selection Regression Method for Integrating Phenotypic Causal Networks in Genome-Wide Association Studies. G3-GENES GENOMES GENETICS 2020; 10:4439-4448. [PMID: 33020191 PMCID: PMC7718731 DOI: 10.1534/g3.120.401618] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Bayesian regression methods that incorporate different mixture priors for marker effects are used in multi-trait genomic prediction. These methods can also be extended to genome-wide association studies (GWAS). In multiple-trait GWAS, incorporating the underlying causal structures among traits is essential for comprehensively understanding the relationship between genotypes and traits of interest. Therefore, we develop a GWAS methodology, SEM-Bayesian alphabet, which, by applying the structural equation model (SEM), can be used to incorporate causal structures into multi-trait Bayesian regression methods. SEM-Bayesian alphabet provides a more comprehensive understanding of the genotype-phenotype mapping than multi-trait GWAS by performing GWAS based on indirect, direct and overall marker effects. The superior performance of SEM-Bayesian alphabet was demonstrated by comparing its GWAS results with other similar multi-trait GWAS methods on real and simulated data. The software tool JWAS offers open-source routines to perform these analyses.
Collapse
|
93
|
Maldonado C, Mora-Poblete F, Contreras-Soto RI, Ahmar S, Chen JT, do Amaral Júnior AT, Scapim CA. Genome-Wide Prediction of Complex Traits in Two Outcrossing Plant Species Through Deep Learning and Bayesian Regularized Neural Network. FRONTIERS IN PLANT SCIENCE 2020; 11:593897. [PMID: 33329658 PMCID: PMC7728740 DOI: 10.3389/fpls.2020.593897] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 10/27/2020] [Indexed: 05/25/2023]
Abstract
Genomic selection models were investigated to predict several complex traits in breeding populations of Zea mays L. and Eucalyptus globulus Labill. For this, the following methods of Machine Learning (ML) were implemented: (i) Deep Learning (DL) and (ii) Bayesian Regularized Neural Network (BRNN) both in combination with different hyperparameters. These ML methods were also compared with Genomic Best Linear Unbiased Prediction (GBLUP) and different Bayesian regression models [Bayes A, Bayes B, Bayes Cπ, Bayesian Ridge Regression, Bayesian LASSO, and Reproducing Kernel Hilbert Space (RKHS)]. DL models, using Rectified Linear Units (as the activation function), had higher predictive ability values, which varied from 0.27 (pilodyn penetration of 6 years old eucalypt trees) to 0.78 (flowering-related traits of maize). Moreover, the larger mini-batch size (100%) had a significantly higher predictive ability for wood-related traits than the smaller mini-batch size (10%). On the other hand, in the BRNN method, the architectures of one and two layers that used only the pureline function showed better results of prediction, with values ranging from 0.21 (pilodyn penetration) to 0.71 (flowering traits). A significant increase in the prediction ability was observed for DL in comparison with other methods of genomic prediction (Bayesian alphabet models, GBLUP, RKHS, and BRNN). Another important finding was the usefulness of DL models (through an iterative algorithm) as an SNP detection strategy for genome-wide association studies. The results of this study confirm the importance of DL for genome-wide analyses and crop/tree improvement strategies, which holds promise for accelerating breeding progress.
Collapse
Affiliation(s)
- Carlos Maldonado
- Instituto de Ciencias Agroalimentarias, Animales y Ambientales, Universidad de O’ Higgins, San Fernando, Chile
| | | | - Rodrigo Iván Contreras-Soto
- Instituto de Ciencias Agroalimentarias, Animales y Ambientales, Universidad de O’ Higgins, San Fernando, Chile
| | - Sunny Ahmar
- Institute of Biological Sciences, University of Talca, Talca, Chile
- College of Plant Sciences and Technology, Huazhong Agricultural University, Wuhan, China
| | - Jen-Tsung Chen
- Department of Life Sciences, National University of Kaohsiung, Kaohsiung, Taiwan
| | - Antônio Teixeira do Amaral Júnior
- Laboratory de Melhoramento Genético Veget al., Universidade Estadual do Norte Fluminense Darcy Ribeiro/CCTA, Campos dos Goytacazes, Brazil
| | | |
Collapse
|
94
|
Alves AAC, Espigolan R, Bresolin T, Costa RM, Fernandes Júnior GA, Ventura RV, Carvalheiro R, Albuquerque LG. Genome-enabled prediction of reproductive traits in Nellore cattle using parametric models and machine learning methods. Anim Genet 2020; 52:32-46. [PMID: 33191532 DOI: 10.1111/age.13021] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/13/2020] [Indexed: 12/31/2022]
Abstract
This study aimed to assess the predictive ability of different machine learning (ML) methods for genomic prediction of reproductive traits in Nellore cattle. The studied traits were age at first calving (AFC), scrotal circumference (SC), early pregnancy (EP) and stayability (STAY). The numbers of genotyped animals and SNP markers available were 2342 and 321 419 (AFC), 4671 and 309 486 (SC), 2681 and 319 619 (STAY) and 3356 and 319 108 (EP). Predictive ability of support vector regression (SVR), Bayesian regularized artificial neural network (BRANN) and random forest (RF) were compared with results obtained using parametric models (genomic best linear unbiased predictor, GBLUP, and Bayesian least absolute shrinkage and selection operator, BLASSO). A 5-fold cross-validation strategy was performed and the average prediction accuracy (ACC) and mean squared errors (MSE) were computed. The ACC was defined as the linear correlation between predicted and observed breeding values for categorical traits (EP and STAY) and as the correlation between predicted and observed adjusted phenotypes divided by the square root of the estimated heritability for continuous traits (AFC and SC). The average ACC varied from low to moderate depending on the trait and model under consideration, ranging between 0.56 and 0.63 (AFC), 0.27 and 0.36 (SC), 0.57 and 0.67 (EP), and 0.52 and 0.62 (STAY). SVR provided slightly better accuracies than the parametric models for all traits, increasing the prediction accuracy for AFC to around 6.3 and 4.8% compared with GBLUP and BLASSO respectively. Likewise, there was an increase of 8.3% for SC, 4.5% for EP and 4.8% for STAY, comparing SVR with both GBLUP and BLASSO. In contrast, the RF and BRANN did not present competitive predictive ability compared with the parametric models. The results indicate that SVR is a suitable method for genome-enabled prediction of reproductive traits in Nellore cattle. Further, the optimal kernel bandwidth parameter in the SVR model was trait-dependent, thus, a fine-tuning for this hyper-parameter in the training phase is crucial.
Collapse
Affiliation(s)
- A A C Alves
- Department of Animal Science, School of Agricultural and Veterinary Sciences, Sao Paulo State University (UNESP), Jaboticabal, 14884-900, Brazil
| | - R Espigolan
- Department of Animal Science, School of Agricultural and Veterinary Sciences, Sao Paulo State University (UNESP), Jaboticabal, 14884-900, Brazil
| | - T Bresolin
- Department of Animal Science, School of Agricultural and Veterinary Sciences, Sao Paulo State University (UNESP), Jaboticabal, 14884-900, Brazil
| | - R M Costa
- Department of Exact Sciences, School of Agricultural and Veterinary Sciences, Sao Paulo State University (UNESP), Jaboticabal, 4884-900, Brazil
| | - G A Fernandes Júnior
- Department of Animal Science, School of Agricultural and Veterinary Sciences, Sao Paulo State University (UNESP), Jaboticabal, 14884-900, Brazil
| | - R V Ventura
- Department of Animal Nutrition and Production, School of Veterinary Medicine and Animal Science, University of Sao Paulo (USP), Pirassununga, 13635-900, Brazil
| | - R Carvalheiro
- Department of Animal Science, School of Agricultural and Veterinary Sciences, Sao Paulo State University (UNESP), Jaboticabal, 14884-900, Brazil.,National Council of Technological and Scientific Development (CNPq), Brasília, 71605-001, Brazil
| | - L G Albuquerque
- Department of Animal Science, School of Agricultural and Veterinary Sciences, Sao Paulo State University (UNESP), Jaboticabal, 14884-900, Brazil.,National Council of Technological and Scientific Development (CNPq), Brasília, 71605-001, Brazil
| |
Collapse
|
95
|
Comparison of long-term effects of genomic selection index and genomic selection using different Bayesian methods. Livest Sci 2020. [DOI: 10.1016/j.livsci.2020.104207] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
96
|
Pincot DDA, Hardigan MA, Cole GS, Famula RA, Henry PM, Gordon TR, Knapp SJ. Accuracy of genomic selection and long-term genetic gain for resistance to Verticillium wilt in strawberry. THE PLANT GENOME 2020; 13:e20054. [PMID: 33217217 DOI: 10.1002/tpg2.20054] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 07/03/2020] [Accepted: 07/21/2020] [Indexed: 05/17/2023]
Abstract
Verticillium wilt, a soil-borne disease caused by the fungal pathogen Verticillium dahliae, threatens strawberry (Fragaria × ananassa) production worldwide. The development of resistant cultivars has been a persistent challenge, in part because the genetics of resistance is complex. The heritability of resistance and genetic gains in breeding for resistance to this pathogen have not been well documented. To elucidate the genetics, assess long-term genetic gains, and estimate the accuracy of genomic selection for resistance to Verticillium wilt, we analyzed a genetically diverse population of elite and exotic germplasm accessions (n = 984), including 245 cultivars developed since 1854. We observed a full range of phenotypes, from highly susceptible to highly resistant: < 3% were classified as highly resistant, whereas > 50% were classified as moderately to highly susceptible. Broad-sense heritability estimates ranged from 0.70-0.76, whereas narrow-sense genomic heritability estimates ranged from 0.33-0.45. We found that genetic gains in breeding for resistance to Verticillium wilt have been negative over the last 165 years (mean resistance has decreased over time). We identified several highly resistant accessions that might harbor favorable alleles that are either rare or non-existent in modern populations. We did not observe the segregation of large-effect loci. The accuracy of genomic predictions ranged from 0.38-0.53 among years and whole-genome regression methods. We show that genomic selection has promise for increasing genetic gains and accelerating the development of resistant cultivars in strawberry by shortening selection cycles and enabling selection in early developmental stages without phenotyping.
Collapse
Affiliation(s)
- Dominique D A Pincot
- Department of Plant Sciences, University of California, One Shields Avenue, Davis, CA, 95616, USA
| | - Michael A Hardigan
- Department of Plant Sciences, University of California, One Shields Avenue, Davis, CA, 95616, USA
| | - Glenn S Cole
- Department of Plant Sciences, University of California, One Shields Avenue, Davis, CA, 95616, USA
| | - Randi A Famula
- Department of Plant Sciences, University of California, One Shields Avenue, Davis, CA, 95616, USA
| | - Peter M Henry
- United States Department of Agriculture, 1636 E. Alisal Street, Salinas, CA, 93905, USA
| | - Thomas R Gordon
- Department of Plant Pathology, University of California, One Shields Avenue, Davis, CA, 95616, USA
| | - Steven J Knapp
- Department of Plant Sciences, University of California, One Shields Avenue, Davis, CA, 95616, USA
| |
Collapse
|
97
|
Ren W, Liang Z, He S, Xiao J. Hybrid of Restricted and Penalized Maximum Likelihood Method for Efficient Genome-Wide Association Study. Genes (Basel) 2020; 11:genes11111286. [PMID: 33138126 PMCID: PMC7692801 DOI: 10.3390/genes11111286] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 10/26/2020] [Accepted: 10/27/2020] [Indexed: 11/16/2022] Open
Abstract
In genome-wide association studies, linear mixed models (LMMs) have been widely used to explore the molecular mechanism of complex traits. However, typical association approaches suffer from several important drawbacks: estimation of variance components in LMMs with large scale individuals is computationally slow; single-locus model is unsatisfactory to handle complex confounding and causes loss of statistical power. To address these issues, we propose an efficient two-stage method based on hybrid of restricted and penalized maximum likelihood, named HRePML. Firstly, we performed restricted maximum likelihood (REML) on single-locus LMM to remove unrelated markers, where spectral decomposition on covariance matrix was used to fast estimate variance components. Secondly, we carried out penalized maximum likelihood (PML) on multi-locus LMM for markers with reasonably large effects. To validate the effectiveness of HRePML, we conducted a series of simulation studies and real data analyses. As a result, our method always had the highest average statistical power compared with multi-locus mixed-model (MLMM), fixed and random model circulating probability unification (FarmCPU), and genome-wide efficient mixed model association (GEMMA). More importantly, HRePML can provide higher accuracy estimation of marker effects. HRePML also identifies 41 previous reported genes associated with development traits in Arabidopsis, which is more than was detected by the other methods.
Collapse
Affiliation(s)
- Wenlong Ren
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong 226019, China; (W.R.); (S.H.)
| | - Zhikai Liang
- Plant and Microbial Biology Department, University of Minnesota, Saint Paul, MN 55108, USA;
| | - Shu He
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong 226019, China; (W.R.); (S.H.)
| | - Jing Xiao
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong 226019, China; (W.R.); (S.H.)
- Correspondence:
| |
Collapse
|
98
|
Guo J, Khan J, Pradhan S, Shahi D, Khan N, Avci M, Mcbreen J, Harrison S, Brown-Guedira G, Murphy JP, Johnson J, Mergoum M, Esten Mason R, Ibrahim AMH, Sutton R, Griffey C, Babar MA. Multi-Trait Genomic Prediction of Yield-Related Traits in US Soft Wheat under Variable Water Regimes. Genes (Basel) 2020; 11:genes11111270. [PMID: 33126620 PMCID: PMC7716228 DOI: 10.3390/genes11111270] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 10/23/2020] [Accepted: 10/26/2020] [Indexed: 11/16/2022] Open
Abstract
The performance of genomic prediction (GP) on genetically correlated traits can be improved through an interdependence multi-trait model under a multi-environment context. In this study, a panel of 237 soft facultative wheat (Triticum aestivum L.) lines was evaluated to compare single- and multi-trait models for predicting grain yield (GY), harvest index (HI), spike fertility (SF), and thousand grain weight (TGW). The panel was phenotyped in two locations and two years in Florida under drought and moderately drought stress conditions, while the genotyping was performed using 27,957 genotyping-by-sequencing (GBS) single nucleotide polymorphism (SNP) makers. Five predictive models including Multi-environment Genomic Best Linear Unbiased Predictor (MGBLUP), Bayesian Multi-trait Multi-environment (BMTME), Bayesian Multi-output Regressor Stacking (BMORS), Single-trait Multi-environment Deep Learning (SMDL), and Multi-trait Multi-environment Deep Learning (MMDL) were compared. Across environments, the multi-trait statistical model (BMTME) was superior to the multi-trait DL model for prediction accuracy in most scenarios, but the DL models were comparable to the statistical models for response to selection. The multi-trait model also showed 5 to 22% more genetic gain compared to the single-trait model across environment reflected by the response to selection. Overall, these results suggest that multi-trait genomic prediction can be an efficient strategy for economically important yield component related traits in soft wheat.
Collapse
Affiliation(s)
- Jia Guo
- Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
| | - Jahangir Khan
- Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
| | - Sumit Pradhan
- Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
| | - Dipendra Shahi
- Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
| | - Naeem Khan
- Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
| | - Muhsin Avci
- Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
| | - Jordan Mcbreen
- Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
| | - Stephen Harrison
- School of Plant Environment and Soil Sciences, Louisiana State University, Baton Rouge, LA 70803, USA;
| | | | - Joseph Paul Murphy
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27607, USA;
| | - Jerry Johnson
- Department of Crop and Soil Sciences, University of Georgia, Griffin, GA 32223, USA; (J.J.); (M.M.)
| | - Mohamed Mergoum
- Department of Crop and Soil Sciences, University of Georgia, Griffin, GA 32223, USA; (J.J.); (M.M.)
| | - Richanrd Esten Mason
- Department of Crop Soil and Environmental Sciences, University of Arkansas, Fayetteville, AR 72701, USA;
| | - Amir M. H. Ibrahim
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA; (A.M.H.I.); (R.S.)
| | - Russel Sutton
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA; (A.M.H.I.); (R.S.)
| | - Carl Griffey
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA 24061, USA;
| | - Md Ali Babar
- Department of Agronomy, University of Florida, Gainesville, FL 32611, USA; (J.G.); (J.K.); (S.P.); (D.S.); (N.K.); (M.A.); (J.M.)
- Correspondence:
| |
Collapse
|
99
|
Doekes HP, Bijma P, Veerkamp RF, de Jong G, Wientjes YCJ, Windig JJ. Inbreeding depression across the genome of Dutch Holstein Friesian dairy cattle. Genet Sel Evol 2020; 52:64. [PMID: 33115403 PMCID: PMC7594306 DOI: 10.1186/s12711-020-00583-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 10/09/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Inbreeding depression refers to the decrease in mean performance due to inbreeding. Inbreeding depression is caused by an increase in homozygosity and reduced expression of (on average) favourable dominance effects. Dominance effects and allele frequencies differ across loci, and consequently inbreeding depression is expected to differ along the genome. In this study, we investigated differences in inbreeding depression across the genome of Dutch Holstein Friesian cattle, by estimating dominance effects and effects of regions of homozygosity (ROH). METHODS Genotype (75 k) and phenotype data of 38,792 cows were used. For nine yield, fertility and udder health traits, GREML models were run to estimate genome-wide inbreeding depression and estimate additive, dominance and ROH variance components. For this purpose, we introduced a ROH-based relationship matrix. Additive, dominance and ROH effects per SNP were obtained through back-solving. In addition, a single SNP GWAS was performed to identify significant additive, dominance or ROH associations. RESULTS Genome-wide inbreeding depression was observed for all yield, fertility and udder health traits. For example, a 1% increase in genome-wide homozygosity was associated with a decrease in 305-d milk yield of approximately 99 kg. For yield traits only, including dominance and ROH effects in the GREML model resulted in a better fit (P < 0.05) than a model with only additive effects. After correcting for the effect of genome-wide homozygosity, dominance and ROH variance explained less than 1% of the phenotypic variance for all traits. Furthermore, dominance and ROH effects were distributed evenly along the genome. The most notable region with a favourable dominance effect for yield traits was on chromosome 5, but overall few regions with large favourable dominance effects and significant dominance associations were detected. No significant ROH-associations were found. CONCLUSIONS Inbreeding depression was distributed quite equally along the genome and was well captured by genome-wide homozygosity. These findings suggest that, based on 75 k SNP data, there is little benefit of accounting for region-specific inbreeding depression in selection schemes.
Collapse
Affiliation(s)
- Harmen P Doekes
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands. .,Centre for Genetic Resources the Netherlands, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands.
| | - Piter Bijma
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Roel F Veerkamp
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Gerben de Jong
- Cooperation CRV, Wassenaarweg 20, 6843 NW, Arnhem, The Netherlands
| | - Yvonne C J Wientjes
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Jack J Windig
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands.,Centre for Genetic Resources the Netherlands, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
100
|
Ren D, An L, Li B, Qiao L, Liu W. Efficient weighting methods for genomic best linear-unbiased prediction (BLUP) adapted to the genetic architectures of quantitative traits. Heredity (Edinb) 2020; 126:320-334. [PMID: 32980863 DOI: 10.1038/s41437-020-00372-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 09/12/2020] [Accepted: 09/13/2020] [Indexed: 11/09/2022] Open
Abstract
Genomic best linear-unbiased prediction (GBLUP) assumes equal variance for all marker effects, which is suitable for traits that conform to the infinitesimal model. For traits controlled by major genes, Bayesian methods with shrinkage priors or genome-wide association study (GWAS) methods can be used to identify causal variants effectively. The information from Bayesian/GWAS methods can be used to construct the weighted genomic relationship matrix (G). However, it remains unclear which methods perform best for traits varying in genetic architecture. Therefore, we developed several methods to optimize the performance of weighted GBLUP and compare them with other available methods using simulated and real data sets. First, two types of methods (marker effects with local shrinkage or normal prior) were used to obtain test statistics and estimates for each marker effect. Second, three weighted G matrices were constructed based on the marker information from the first step: (1) the genomic-feature-weighted G, (2) the estimated marker-variance-weighted G, and (3) the absolute value of the estimated marker-effect-weighted G. Following the above process, six different weighted GBLUP methods (local shrinkage/normal-prior GF/EV/AEWGBLUP) were proposed for genomic prediction. Analyses with both simulated and real data demonstrated that these options offer flexibility for optimizing the weighted GBLUP for traits with a broad spectrum of genetic architectures. The advantage of weighting methods over GBLUP in terms of accuracy was trait dependant, ranging from 14.8% to marginal for simulated traits and from 44% to marginal for real traits. Local-shrinkage prior EVWGBLUP is superior for traits mainly controlled by loci of a large effect. Normal-prior AEWGBLUP performs well for traits mainly controlled by loci of moderate effect. For traits controlled by some loci with large effects (explain 25-50% genetic variance) and a range of loci with small effects, GFWGBLUP has advantages. In conclusion, the optimal weighted GBLUP method for genomic selection should take both the genetic architecture and number of QTLs of traits into consideration carefully.
Collapse
Affiliation(s)
- Duanyang Ren
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Taigu, China
| | - Lixia An
- College of Information, Shanxi Agricultural University, Taigu, China
| | - Baojun Li
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Taigu, China
| | - Liying Qiao
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Taigu, China
| | - Wenzhong Liu
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Taigu, China.
| |
Collapse
|