1
|
Genomic prediction through machine learning and neural networks for traits with epistasis. Comput Struct Biotechnol J 2022; 20:5490-5499. [PMID: 36249559 PMCID: PMC9547190 DOI: 10.1016/j.csbj.2022.09.029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 09/20/2022] [Accepted: 09/20/2022] [Indexed: 11/22/2022] Open
Abstract
Performance of machine learning and neural netowrks in Genomic analysis. Heritability and QTL number impacts on performance machine learning methods. Machine learning models in genomic analyses. Neural networks can present better performance for complex quantitative traits.
Genomic wide selection (GWS) is one contributions of molecular genetics to breeding. Machine learning (ML) and artificial neural networks (ANN) methods are non-parameterized and can develop more accurate and parsimonious models for GWS analysis. Multivariate Adaptive Regression Splines (MARS) is considered one of the most flexible ML methods, automatically modeling nonlinearities and interactions of the predictor variables. This study aimed to evaluate and compare methods based on ANN, ML, including MARS, and G-BLUP through GWS. An F2 population formed by 1000 individuals and genotyped for 4010 SNP markers and twelve traits from a model considering epistatic effect, with QTL numbers ranging from eight to 480 and heritability (h2) of 0.3, 0.5 or 0.8 were simulated. Variation in heritability and number of QTL impacts the performance of methods. About quantitative traits (40, 80, 120, 240, and 480 QTLs) was observed highest R2 to Radial Base Network (RBF) and G-BLUP, followed by Random Forest (RF), Bagging (BA), and Boosting (BO). RF and BA also showed better results for traits to h2 of 0.3 with R2 values 16.51% and 16.30%, respectively, while MARS methods showed better results for oligogenic traits with R2 values ranging from 39,12 % to 43,20 % in h2 of 0.5 and from 59.92% to 78,56% in h2 of 0.8. Non-additive MARS methods also showed high R2 for traits with high heritability and 240 QTLs or more. ANN and ML methods are powerful tools to predict genetic values in traits with epistatic effect, for different degrees of heritability and QTL numbers.
Collapse
|
2
|
Yang CJ, Ladejobi O, Mott R, Powell W, Mackay I. Analysis of historical selection in winter wheat. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:3005-3023. [PMID: 35864201 PMCID: PMC9482581 DOI: 10.1007/s00122-022-04163-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Accepted: 06/22/2022] [Indexed: 06/15/2023]
Abstract
KEY MESSAGE Modeling of the distribution of allele frequency over year of variety release identifies major loci involved in historical breeding of winter wheat. Winter wheat is a major crop with a rich selection history in the modern era of crop breeding. Genetic gains across economically important traits like yield have been well characterized and are the major force driving its production. Winter wheat is also an excellent model for analyzing historical genetic selection. As a proof of concept, we analyze two major collections of winter wheat varieties that were bred in Western Europe from 1916 to 2010, namely the Triticeae Genome (TG) and WAGTAIL panels, which include 333 and 403 varieties, respectively. We develop and apply a selection mapping approach, Regression of Alleles on Years (RALLY), in these panels, as well as in simulated populations. RALLY maps loci under sustained historical selection by using a simple logistic model to regress allele counts on years of variety release. To control for drift-induced allele frequency change, we develop a hybrid approach of genomic control and delta control. Within the TG panel, we identify 22 significant RALLY quantitative selection loci (QSLs) and estimate the local heritabilities for 12 traits across these QSLs. By correlating predicted marker effects with RALLY regression estimates, we show that alleles whose frequencies have increased over time are heavily biased toward conferring positive yield effect, but negative effects in flowering time, lodging, plant height and grain protein content. Altogether, our results (1) demonstrate the use of RALLY to identify selected genomic regions while controlling for drift, and (2) reveal key patterns in the historical selection in winter wheat and guide its future breeding.
Collapse
Affiliation(s)
- Chin Jian Yang
- Scotland's Rural College (SRUC), Kings Buildings, West Mains Road, Edinburgh, EH9 3JG, UK
| | - Olufunmilayo Ladejobi
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
| | - Richard Mott
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
| | - Wayne Powell
- Scotland's Rural College (SRUC), Kings Buildings, West Mains Road, Edinburgh, EH9 3JG, UK
| | - Ian Mackay
- Scotland's Rural College (SRUC), Kings Buildings, West Mains Road, Edinburgh, EH9 3JG, UK.
- IMplant Consultancy Ltd, Chelmsford, UK.
| |
Collapse
|
3
|
|
4
|
Assessing the performance of a novel method for genomic selection: rrBLUP-method6. J Genet 2021. [DOI: 10.1007/s12041-021-01275-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
5
|
de Sousa DR, do Nascimento AV, Lôbo RNB. Prediction of genomic breeding values of milk traits in Brazilian Saanen goats. J Anim Breed Genet 2021; 138:541-551. [PMID: 33861884 DOI: 10.1111/jbg.12550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 03/17/2021] [Accepted: 03/22/2021] [Indexed: 11/28/2022]
Abstract
The study's objective was to compare the genomic prediction ability methods for the traits milk yield, milk composition and somatic cell count of Saanen Brazilian goats. Nine hundred forty goats, genotyped with an Axiom_OviCap (Caprine) panel, Affimetrix customized array with 62,557 single nucleotide polymorphisms (SNPs), were used for the genomic selection analyses. The genomic methods studied to estimate the effects of SNPs and direct genomic values (DGV) were as follows: (a) genomic BLUP (GBLUP), (b) Bayes Cπ and (c) Bayesian Lasso (BLASSO). Estimated breeding values (EBV) and deregressed estimated breeding values (dEBV) were used as response variables for the genomic predictions. The prediction ability was assessed by Pearson's correlation between DGV and response variables (EBV and dEBV). Regression coefficients of the response variables on the DGV were obtained to verify if the genomic predictions were biased. In addition, the mean square error of prediction (MSE) was used as a measure of verification of model fit to the data. The means of prediction accuracy, when EBV was used as a response variable, were 0.68, 0.68 and 0.67 for GBLUP, Bayes Cπ and BLASSO, respectively. With dEBV, the mean prediction accuracy was 0.50 for all models. The averages of the EBV regression coefficients on DGV were 1.08 for all models (GBLUP, Bayes Cπ and BLASSO), higher than those obtained for the regression coefficient of dEBV on DGV, which presented values of 1.05, 1.05 and 1.08 for GBLUP, Bayes Cπ and BLASSO, respectively. None of the methods stood out in terms of prediction ability; however, the GBLUP method was the most appropriate for estimating the DGV, in a slightly more reliable and less biased way, besides presenting the lowest computational cost. In the context of the present study, EBV was the preferred response variables considering the genomic prediction accuracy despite dEBV also presented lower bias.
Collapse
Affiliation(s)
| | - André Vieira do Nascimento
- Faculty of Agricultural and Veterinary Sciences of Jaboticabal. Animal Sciences Department I, São Paulo State University "Júlio de Mesquita Filho", Jaboticabal, Brazil
| | - Raimundo Nonato Braga Lôbo
- Animal Sciences Department, Federal University of Ceará, Fortaleza, Brazil.,Brazilian Agricultural Research Corporation - EMBRAPA, Embrapa Caprinos e Ovinos, Estrada Sobral/Groaíras, Sobral, Brazil.,National Council for Scientific and Technological Development - CNPq, Lago Sul, Brazil
| |
Collapse
|
6
|
Comparison of long-term effects of genomic selection index and genomic selection using different Bayesian methods. Livest Sci 2020. [DOI: 10.1016/j.livsci.2020.104207] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
7
|
Karaman E, Lund MS, Su G. Multi-trait single-step genomic prediction accounting for heterogeneous (co)variances over the genome. Heredity (Edinb) 2020; 124:274-287. [PMID: 31641237 PMCID: PMC6972913 DOI: 10.1038/s41437-019-0273-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 09/05/2019] [Accepted: 09/06/2019] [Indexed: 11/23/2022] Open
Abstract
Widely used genomic prediction models may not properly account for heterogeneous (co)variance structure across the genome. Models such as BayesA and BayesB assume locus-specific variance, which are highly influenced by the prior for (co)variance of single nucleotide polymorphism (SNP) effect, regardless of the size of data. Models such as BayesC or GBLUP assume a common (co)variance for a proportion (BayesC) or all (GBLUP) of the SNP effects. In this study, we propose a multi-trait Bayesian whole genome regression method (BayesN0), which is based on grouping a number of predefined SNPs to account for heterogeneous (co)variance structure across the genome. This model was also implemented in single-step Bayesian regression (ssBayesN0). For practical implementation, we considered multi-trait single-step SNPBLUP models, using (co)variance estimates from BayesN0 or ssBayesN0. Genotype data were simulated using haplotypes on first five chromosomes of 2200 Danish Holstein cattle, and phenotypes were simulated for two traits with heritabilities 0.1 or 0.4, assuming 200 quantitative trait loci (QTL). We compared prediction accuracy from different prediction models and different region sizes (one SNP, 100 SNPs, one chromosome or whole genome). In general, highest accuracies were obtained when 100 adjacent SNPs were grouped together. The ssBayesN0 improved accuracies over BayesN0, and using (co)variance estimates from ssBayesN0 generally yielded higher accuracies than using (co)variance estimates from BayesN0, for the 100 SNPs region size. Our results suggest that it could be a good strategy to estimate (co)variance components from ssBayesN0, and then to use those estimates in genomic prediction using multi-trait single-step SNPBLUP, in routine genomic evaluations.
Collapse
Affiliation(s)
- Emre Karaman
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark.
| | - Mogens S Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| |
Collapse
|
8
|
Abolhassani Targhi MV, Asgari Jafarabadi G, Aminafshar M, Emam Jomeh Kashan N. Comparison of non-parametric methods in genomic evaluation of discrete traits. GENE REPORTS 2019. [DOI: 10.1016/j.genrep.2019.100379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
9
|
Accurate genomic prediction of Coffea canephora in multiple environments using whole-genome statistical models. Heredity (Edinb) 2018; 122:261-275. [PMID: 29941997 DOI: 10.1038/s41437-018-0105-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 05/23/2018] [Accepted: 05/30/2018] [Indexed: 11/08/2022] Open
Abstract
Genomic selection has been proposed as the standard method to predict breeding values in animal and plant breeding. Although some crops have benefited from this methodology, studies in Coffea are still emerging. To date, there have been no studies describing how well genomic prediction models work across populations and environments for different complex traits in coffee. Considering that predictive models are based on biological and statistical assumptions, it is expected that their performance vary depending on how well these assumptions align with the true genetic architecture of the phenotype. To investigate this, we used data from two recurrent selection populations of Coffea canephora, evaluated in two locations, and single nucleotide polymorphisms identified by Genotyping-by-Sequencing. In particular, we evaluated the performance of 13 statistical approaches to predict three important traits in the coffee-production of coffee beans, leaf rust incidence and yield of green beans. Analyses were performed for predictions within-environment, across locations and across populations to assess the reliability of genomic selection. Overall, differences in the prediction accuracy of the competing models were small, although the Bayesian methods showed a modest improvement over other methods, at the cost of more computation time. As expected, predictive accuracy for within-environment analysis, on average, were higher than predictions across locations and across populations. Our results support the potential of genomic selection to reshape traditional plant breeding schemes. In practice, we expect to increase the genetic gain per unit of time by reducing the length cycle of recurrent selection in coffee.
Collapse
|
10
|
Kasnavi SA, Aminafshar M, Shariati MM, Emam Jomeh Kashan N, Honarvar M. The effect of kernel selection on genome wide prediction of discrete traits by Support Vector Machine. GENE REPORTS 2018. [DOI: 10.1016/j.genrep.2018.04.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
11
|
Du C, Wei J, Wang S, Jia Z. Genomic selection using principal component regression. Heredity (Edinb) 2018; 121:12-23. [PMID: 29713089 DOI: 10.1038/s41437-018-0078-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 03/17/2018] [Accepted: 03/21/2018] [Indexed: 01/02/2023] Open
Abstract
Many statistical methods are available for genomic selection (GS) through which genetic values of quantitative traits are predicted for plants and animals using whole-genome SNP data. A large number of predictors with much fewer subjects become a major computational challenge in GS. Principal components regression (PCR) and its derivative, i.e., partial least squares regression (PLSR), provide a solution through dimensionality reduction. In this study, we show that PCR can perform better than PLSR in cross validation. PCR often requires extracting more components to achieve the maximum predictive ability than PLSR and thus may be associated with a higher computational cost. However, application of the HAT method (a strategy of describing the relationship between the fitted and observed response variables with a hat matrix) to PCR circumvents conventional cross validation in testing predictive ability, resulting in substantially improved computational efficiency over PLSR where cross validation is mandatory. Advantages of PCR over PLSR are illustrated with a simulated trait of a hypothetical population and four agronomical traits of a rice population. The benefit of using PCR in genomic selection is further demonstrated in an effort to predict 1000 metabolomic traits and 24,973 transcriptomic traits in the same rice population.
Collapse
Affiliation(s)
- Caroline Du
- Department of Botany and Plant Sciences, University of California, Riverside, CA, 92521, USA
| | - Julong Wei
- Department of Botany and Plant Sciences, University of California, Riverside, CA, 92521, USA.,College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Shibo Wang
- Department of Botany and Plant Sciences, University of California, Riverside, CA, 92521, USA
| | - Zhenyu Jia
- Department of Botany and Plant Sciences, University of California, Riverside, CA, 92521, USA.
| |
Collapse
|
12
|
Prediction and association mapping of agronomic traits in maize using multiple omic data. Heredity (Edinb) 2017; 119:174-184. [PMID: 28590463 DOI: 10.1038/hdy.2017.27] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 05/03/2017] [Accepted: 05/05/2017] [Indexed: 02/06/2023] Open
Abstract
Genomic selection holds a great promise to accelerate plant breeding via early selection before phenotypes are measured, and it offers major advantages over marker-assisted selection for highly polygenic traits. In addition to genomic data, metabolome and transcriptome are increasingly receiving attention as new data sources for phenotype prediction. We used data available from maize as a model to compare the predictive abilities of three different omic data sources using eight representative methods for six traits. We found that the best linear unbiased prediction overall performs better than other methods across different traits and different omic data, and genomic prediction performs better than transcriptomic and metabolomic predictions. For the same maize data, we also conducted genome-wide association study, transcriptome-wide association studies and metabolome-wide association studies for the six agronomic traits using both the genome-wide efficient mixed model association (GEMMA) method and a modified least absolute shrinkage and selection operator (LASSO) method. The new LASSO method has the ability to perform statistical tests. Simulation studies show that the modified LASSO performs better than GEMMA in terms of high power and low Type 1 error.
Collapse
|
13
|
Wang C, Li X, Qian R, Su G, Zhang Q, Ding X. Bayesian methods for jointly estimating genomic breeding values of one continuous and one threshold trait. PLoS One 2017; 12:e0175448. [PMID: 28410429 PMCID: PMC5391971 DOI: 10.1371/journal.pone.0175448] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 03/27/2017] [Indexed: 01/29/2023] Open
Abstract
Genomic selection has become a useful tool for animal and plant breeding. Currently, genomic evaluation is usually carried out using a single-trait model. However, a multi-trait model has the advantage of using information on the correlated traits, leading to more accurate genomic prediction. To date, joint genomic prediction for a continuous and a threshold trait using a multi-trait model is scarce and needs more attention. Based on the previously proposed methods BayesCπ for single continuous trait and BayesTCπ for single threshold trait, we developed a novel method based on a linear-threshold model, i.e., LT-BayesCπ, for joint genomic prediction of a continuous trait and a threshold trait. Computing procedures of LT-BayesCπ using Markov Chain Monte Carlo algorithm were derived. A simulation study was performed to investigate the advantages of LT-BayesCπ over BayesCπ and BayesTCπ with regard to the accuracy of genomic prediction on both traits. Factors affecting the performance of LT-BayesCπ were addressed. The results showed that, in all scenarios, the accuracy of genomic prediction obtained from LT-BayesCπ was significantly increased for the threshold trait compared to that from single trait prediction using BayesTCπ, while the accuracy for the continuous trait was comparable with that from single trait prediction using BayesCπ. The proposed LT-BayesCπ could be a method of choice for joint genomic prediction of one continuous and one threshold trait.
Collapse
Affiliation(s)
- Chonglong Wang
- Department of Pig Genetics and Breeding, Institute of Animal Husbandry and Veterinary Medicine, Anhui Academy of Agricultural Sciences, Hefei, China
| | - Xiujin Li
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, Denmark
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, North Third Road, Guangzhou Higher Education Mega Center, Guangzhou, Guangdong, PR China
| | - Rong Qian
- Department of Pig Genetics and Breeding, Institute of Animal Husbandry and Veterinary Medicine, Anhui Academy of Agricultural Sciences, Hefei, China
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, Denmark
| | - Qin Zhang
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Xiangdong Ding
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
- * E-mail:
| |
Collapse
|
14
|
Iheshiulor OOM, Woolliams JA, Yu X, Wellmann R, Meuwissen THE. Within- and across-breed genomic prediction using whole-genome sequence and single nucleotide polymorphism panels. Genet Sel Evol 2016; 48:15. [PMID: 26895843 PMCID: PMC4759725 DOI: 10.1186/s12711-016-0193-1] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 01/29/2016] [Indexed: 11/21/2022] Open
Abstract
Background Currently, genomic prediction in cattle is largely based on panels of about 54k single nucleotide polymorphisms (SNPs). However with the decreasing costs of and current advances in next-generation sequencing technologies, whole-genome sequence (WGS) data on large numbers of individuals is within reach. Availability of such data provides new opportunities for genomic selection, which need to be explored. Methods This simulation study investigated how much predictive ability is gained by using WGS data under scenarios with QTL (quantitative trait loci) densities ranging from 45 to 132 QTL/Morgan and heritabilities ranging from 0.07 to 0.30, compared to different SNP densities, with emphasis on divergent dairy cattle breeds with small populations. The relative performances of best linear unbiased prediction (SNP-BLUP) and of a variable selection method with a mixture of two normal distributions (MixP) were also evaluated. Genomic predictions were based on within-population, across-population, and multi-breed reference populations. Results The use of WGS data for within-population predictions resulted in small to large increases in accuracy for low to moderately heritable traits. Depending on heritability of the trait, and on SNP and QTL densities, accuracy increased by up to 31 %. The advantage of WGS data was more pronounced (7 to 92 % increase in accuracy depending on trait heritability, SNP and QTL densities, and time of divergence between populations) with a combined reference population and when using MixP. While MixP outperformed SNP-BLUP at 45 QTL/Morgan, SNP-BLUP was as good as MixP when QTL density increased to 132 QTL/Morgan. Conclusions Our results show that, genomic predictions in numerically small cattle populations would benefit from a combination of WGS data, a multi-breed reference population, and a variable selection method.
Collapse
Affiliation(s)
- Oscar O M Iheshiulor
- Department of Animal and Aquaculture Sciences, Norwegian University of Life Sciences, 1432, Ås, Norway.
| | - John A Woolliams
- Department of Animal and Aquaculture Sciences, Norwegian University of Life Sciences, 1432, Ås, Norway. .,The Roslin Institute (Edinburgh), Royal (DICK) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, Scotland, UK.
| | - Xijiang Yu
- Department of Animal and Aquaculture Sciences, Norwegian University of Life Sciences, 1432, Ås, Norway.
| | - Robin Wellmann
- Institute of Animal Husbandry and Animal Breeding, University of Hohenheim, 70593, Stuttgart, Germany.
| | - Theo H E Meuwissen
- Department of Animal and Aquaculture Sciences, Norwegian University of Life Sciences, 1432, Ås, Norway.
| |
Collapse
|
15
|
van den Berg S, Calus MPL, Meuwissen THE, Wientjes YCJ. Across population genomic prediction scenarios in which Bayesian variable selection outperforms GBLUP. BMC Genet 2015; 16:146. [PMID: 26698836 PMCID: PMC4690391 DOI: 10.1186/s12863-015-0305-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 12/10/2015] [Indexed: 12/21/2022] Open
Abstract
Background The use of information across populations is an attractive approach to increase the accuracy of genomic prediction for numerically small populations. However, accuracies of across population genomic prediction, in which reference and selection individuals are from different populations, are currently disappointing. It has been shown for within population genomic prediction that Bayesian variable selection models outperform GBLUP models when the number of QTL underlying the trait is low. Therefore, our objective was to identify across population genomic prediction scenarios in which Bayesian variable selection models outperform GBLUP in terms of prediction accuracy. In this study, high density genotype information of 1033 Holstein Friesian, 105 Groningen White Headed, and 147 Meuse-Rhine-Yssel cows were used. Phenotypes were simulated using two changing variables: (1) the number of QTL underlying the trait (3000, 300, 30, 3), and (2) the correlation between allele substitution effects of QTL across populations, i.e. the genetic correlation of the simulated trait between the populations (1.0, 0.8, 0.4). Results The accuracy obtained by the Bayesian variable selection model was depending on the number of QTL underlying the trait, with a higher accuracy when the number of QTL was lower. This trend was more pronounced for across population genomic prediction than for within population genomic prediction. It was shown that Bayesian variable selection models have an advantage over GBLUP when the number of QTL underlying the simulated trait was small. This advantage disappeared when the number of QTL underlying the simulated trait was large. The point where the accuracy of Bayesian variable selection and GBLUP became similar was approximately the point where the number of QTL was equal to the number of independent chromosome segments (Me) across the populations. Conclusion Bayesian variable selection models outperform GBLUP when the number of QTL underlying the trait is smaller than Me. Across populations, Me is considerably larger than within populations. So, it is more likely to find a number of QTL underlying a trait smaller than Me across populations than within population. Therefore Bayesian variable selection models can help to improve the accuracy of across population genomic prediction.
Collapse
Affiliation(s)
- S van den Berg
- Animal Breeding and Genomics Centre, Wageningen University, 6700, AH, Wageningen, The Netherlands.
| | - M P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, 6700, AH, Wageningen, The Netherlands.
| | - T H E Meuwissen
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, P. O. Box 5003, 1432, Ås, Norway.
| | - Y C J Wientjes
- Animal Breeding and Genomics Centre, Wageningen University, 6700, AH, Wageningen, The Netherlands. .,Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, 6700, AH, Wageningen, The Netherlands.
| |
Collapse
|
16
|
Liu H, Meuwissen THE, Sørensen AC, Berg P. Upweighting rare favourable alleles increases long-term genetic gain in genomic selection programs. Genet Sel Evol 2015; 47:19. [PMID: 25886296 PMCID: PMC4367977 DOI: 10.1186/s12711-015-0101-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Accepted: 01/29/2015] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The short-term impact of using different genomic prediction (GP) models in genomic selection has been intensively studied, but their long-term impact is poorly understood. Furthermore, long-term genetic gain of genomic selection is expected to improve by using Jannink's weighting (JW) method, in which rare favourable marker alleles are upweighted in the selection criterion. In this paper, we extend the JW method by including an additional parameter to decrease the emphasis on rare favourable alleles over the time horizon, with the purpose of further improving the long-term genetic gain. We call this new method dynamic weighting (DW). The paper explores the long-term impact of different GP models with or without weighting methods. METHODS Different selection criteria were tested by simulating a population of 500 animals with truncation selection of five males and 50 females. Selection criteria included unweighted and weighted genomic estimated breeding values using the JW or DW methods, for which ridge regression (RR) and Bayesian lasso (BL) were used to estimate marker effects. The impacts of these selection criteria were compared under three genetic architectures, i.e. varying numbers of QTL for the trait and for two time horizons of 15 (TH15) or 40 (TH40) generations. RESULTS For unweighted GP, BL resulted in up to 21.4% higher long-term genetic gain and 23.5% lower rate of inbreeding under TH40 than RR. For weighted GP, DW resulted in 1.3 to 5.5% higher long-term gain compared to unweighted GP. JW, however, showed a 6.8% lower long-term genetic gain relative to unweighted GP when BL was used to estimate the marker effects. Under TH40, both DW and JW obtained significantly higher genetic gain than unweighted GP. With DW, the long-term genetic gain was increased by up to 30.8% relative to unweighted GP, and also increased by 8% relative to JW, although at the expense of a lower short-term gain. CONCLUSIONS Irrespective of the number of QTL simulated, BL is superior to RR in maintaining genetic variance and therefore results in higher long-term genetic gain. Moreover, DW is a promising method with which high long-term genetic gain can be expected within a fixed time frame.
Collapse
Affiliation(s)
- Huiming Liu
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, P. O. Box 50, 8830, Tjele, Denmark.
| | - Theo H E Meuwissen
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, P. O. Box 5003, 1432, Ås, Norway.
| | - Anders C Sørensen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, P. O. Box 50, 8830, Tjele, Denmark.
| | - Peer Berg
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, P. O. Box 50, 8830, Tjele, Denmark.
- Nordic Genetic Resource Center, P. O. Box 115, 1431, Ås, Norway.
| |
Collapse
|
17
|
Wientjes YCJ, Veerkamp RF, Bijma P, Bovenhuis H, Schrooten C, Calus MPL. Empirical and deterministic accuracies of across-population genomic prediction. Genet Sel Evol 2015; 47:5. [PMID: 25885467 PMCID: PMC4320472 DOI: 10.1186/s12711-014-0086-0] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2014] [Accepted: 12/17/2014] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Differences in linkage disequilibrium and in allele substitution effects of QTL (quantitative trait loci) may hinder genomic prediction across populations. Our objective was to develop a deterministic formula to estimate the accuracy of across-population genomic prediction, for which reference individuals and selection candidates are from different populations, and to investigate the impact of differences in allele substitution effects across populations and of the number of QTL underlying a trait on the accuracy. METHODS A deterministic formula to estimate the accuracy of across-population genomic prediction was derived based on selection index theory. Moreover, accuracies were deterministically predicted using a formula based on population parameters and empirically calculated using simulated phenotypes and a GBLUP (genomic best linear unbiased prediction) model. Phenotypes of 1033 Holstein-Friesian, 105 Groninger White Headed and 147 Meuse-Rhine-Yssel cows were simulated by sampling 3000, 300, 30 or 3 QTL from the available high-density SNP (single nucleotide polymorphism) information of three chromosomes, assuming a correlation of 1.0, 0.8, 0.6, 0.4, or 0.2 between allele substitution effects across breeds. The simulated heritability was set to 0.95 to resemble the heritability of deregressed proofs of bulls. RESULTS Accuracies estimated with the deterministic formula based on selection index theory were similar to empirical accuracies for all scenarios, while accuracies predicted with the formula based on population parameters overestimated empirical accuracies by ~25 to 30%. When the between-breed genetic correlation differed from 1, i.e. allele substitution effects differed across breeds, empirical and deterministic accuracies decreased in proportion to the genetic correlation. Using a multi-trait model, it was possible to accurately estimate the genetic correlation between the breeds based on phenotypes and high-density genotypes. The number of QTL underlying the simulated trait did not affect the accuracy. CONCLUSIONS The deterministic formula based on selection index theory estimated the accuracy of across-population genomic predictions well. The deterministic formula using population parameters overestimated the across-population genomic accuracy, but may still be useful because of its simplicity. Both formulas could accommodate for genetic correlations between populations lower than 1. The number of QTL underlying a trait did not affect the accuracy of across-population genomic prediction using a GBLUP method.
Collapse
Affiliation(s)
- Yvonne C J Wientjes
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, 6700, AH, Wageningen, The Netherlands. .,Animal Breeding and Genomics Centre, Wageningen University, 6700, AH, Wageningen, The Netherlands.
| | - Roel F Veerkamp
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, 6700, AH, Wageningen, The Netherlands. .,Animal Breeding and Genomics Centre, Wageningen University, 6700, AH, Wageningen, The Netherlands.
| | - Piter Bijma
- Animal Breeding and Genomics Centre, Wageningen University, 6700, AH, Wageningen, The Netherlands.
| | - Henk Bovenhuis
- Animal Breeding and Genomics Centre, Wageningen University, 6700, AH, Wageningen, The Netherlands.
| | | | - Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, 6700, AH, Wageningen, The Netherlands.
| |
Collapse
|
18
|
Guo G, Zhao F, Wang Y, Zhang Y, Du L, Su G. Comparison of single-trait and multiple-trait genomic prediction models. BMC Genet 2014; 15:30. [PMID: 24593261 PMCID: PMC3975852 DOI: 10.1186/1471-2156-15-30] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Accepted: 02/26/2014] [Indexed: 11/30/2022] Open
Abstract
Background In this study, a single-trait genomic model (STGM) is compared with a multiple-trait genomic model (MTGM) for genomic prediction using conventional estimated breeding values (EBVs) calculated using a conventional single-trait and multiple-trait linear mixed models as the response variables. Three scenarios with and without missing data were simulated; no missing data, 90% missing data in a trait with high heritability, and 90% missing data in a trait with low heritability. The simulated genome had a length of 500 cM with 5000 equally spaced single nucleotide polymorphism markers and 300 randomly distributed quantitative trait loci (QTL). The true breeding values of each trait were determined using 200 of the QTLs, and the remaining 100 QTLs were assumed to affect both the high (trait I with heritability of 0.3) and the low (trait II with heritability of 0.05) heritability traits. The genetic correlation between traits I and II was 0.5, and the residual correlation was zero. Results The results showed that when there were no missing records, MTGM and STGM gave the same reliability for the genomic predictions for trait I while, for trait II, MTGM performed better that STGM. When there were missing records for one of the two traits, MTGM performed much better than STGM. In general, the difference in reliability of genomic EBVs predicted using the EBV response variables estimated from either the multiple-trait or single-trait models was relatively small for the trait without missing data. However, for the trait with missing data, the EBV response variable obtained from the multiple-trait model gave a more reliable genomic prediction than the EBV response variable from the single-trait model. Conclusions These results indicate that MTGM performed better than STGM for the trait with low heritability and for the trait with a limited number of records. Even when the EBV response variable was obtained using the multiple-trait model, the genomic prediction using MTGM was more reliable than the prediction using the STGM.
Collapse
Affiliation(s)
| | | | | | | | - Lixin Du
- National Center for Molecular Genetics and Breeding of Animal, Institute of Animal Sciences, Chinese academy of Agricultural Sciences, Beijing 100193, China.
| | | |
Collapse
|
19
|
Guo G, Zhao F, Wang Y, Zhang Y, Du L, Su G. Comparison of single-trait and multiple-trait genomic prediction models. BMC Genet 2014; 15:30. [PMID: 24593261 DOI: 10.1186/1471-2156-1115-1130] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Accepted: 02/26/2014] [Indexed: 05/28/2023] Open
Abstract
BACKGROUND In this study, a single-trait genomic model (STGM) is compared with a multiple-trait genomic model (MTGM) for genomic prediction using conventional estimated breeding values (EBVs) calculated using a conventional single-trait and multiple-trait linear mixed models as the response variables. Three scenarios with and without missing data were simulated; no missing data, 90% missing data in a trait with high heritability, and 90% missing data in a trait with low heritability. The simulated genome had a length of 500 cM with 5000 equally spaced single nucleotide polymorphism markers and 300 randomly distributed quantitative trait loci (QTL). The true breeding values of each trait were determined using 200 of the QTLs, and the remaining 100 QTLs were assumed to affect both the high (trait I with heritability of 0.3) and the low (trait II with heritability of 0.05) heritability traits. The genetic correlation between traits I and II was 0.5, and the residual correlation was zero. RESULTS The results showed that when there were no missing records, MTGM and STGM gave the same reliability for the genomic predictions for trait I while, for trait II, MTGM performed better that STGM. When there were missing records for one of the two traits, MTGM performed much better than STGM. In general, the difference in reliability of genomic EBVs predicted using the EBV response variables estimated from either the multiple-trait or single-trait models was relatively small for the trait without missing data. However, for the trait with missing data, the EBV response variable obtained from the multiple-trait model gave a more reliable genomic prediction than the EBV response variable from the single-trait model. CONCLUSIONS These results indicate that MTGM performed better than STGM for the trait with low heritability and for the trait with a limited number of records. Even when the EBV response variable was obtained using the multiple-trait model, the genomic prediction using MTGM was more reliable than the prediction using the STGM.
Collapse
Affiliation(s)
| | | | | | | | - Lixin Du
- National Center for Molecular Genetics and Breeding of Animal, Institute of Animal Sciences, Chinese academy of Agricultural Sciences, Beijing 100193, China.
| | | |
Collapse
|
20
|
Tribout T, Larzul C, Phocas F. Economic aspects of implementing genomic evaluations in a pig sire line breeding scheme. Genet Sel Evol 2013; 45:40. [PMID: 24127883 PMCID: PMC3840607 DOI: 10.1186/1297-9686-45-40] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2013] [Accepted: 09/05/2013] [Indexed: 01/31/2023] Open
Abstract
Background Replacing pedigree-based BLUP evaluations by genomic evaluations in pig breeding schemes can result in greater selection accuracy and genetic gains, especially for traits with limited phenotypes. However, this methodological change would generate additional costs. The objective of this study was to determine whether additional expenditures would be more profitably devoted to implementing genomic evaluations or to increasing phenotyping capacity while retaining traditional evaluations. Methods Stochastic simulation was used to simulate a population with 1050 breeding females and 50 boars that was selected for 10 years for a breeding goal with two uncorrelated traits with heritabilities of 0.4. The reference breeding scheme was based on phenotyping 13 770 candidates per year for trait 1 and 270 sibs of candidates per year for trait 2, with selection based on pedigree-based BLUP estimated breeding values. Increased expenditures were allocated to either increasing the phenotyping capacity for trait 2 while maintaining traditional evaluations, or to implementing genomic selection. The genomic scheme was based on two training populations: one for trait 2, consisting of phenotyped sibs of the candidates whose number increased from 1000 to 3430 over time, and one for trait 1, consisting of the selection candidates. Several genomic scenarios were tested, where the size of the training population for trait 1, and the number of genotyped candidates pre-selected based on their parental estimated breeding value, varied. Results Both approaches resulted in higher genetic trends for the population breeding goal and lower rates of inbreeding compared to the reference scheme. However, even a very marked increase in phenotyping capacity for trait 2 could not match improvements achieved with genomic selection when the number of genotyped candidates was large. Genotyping just a limited number of pre-selected candidates significantly reduced the extra costs, while preserving most of the benefits in terms of genetic trends and inbreeding. Implementing genomic evaluations was the most efficient approach when major expenditure was possible, whereas increasing phenotypes was preferable when limited resources were available. Conclusions Economic decisions on implementing genomic evaluations in a pig nucleus population must take account of population characteristics, phenotyping and genotyping costs, and available funds.
Collapse
Affiliation(s)
- Thierry Tribout
- INRA, UMR1313 Génétique Animale et Biologie Intégrative, F-78350, Jouy-en-Josas, France.
| | | | | |
Collapse
|
21
|
de Los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 2013; 193:327-45. [PMID: 22745228 PMCID: PMC3567727 DOI: 10.1534/genetics.112.143313] [Citation(s) in RCA: 471] [Impact Index Per Article: 42.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 06/11/2012] [Indexed: 11/18/2022] Open
Abstract
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.
Collapse
Affiliation(s)
- Gustavo de Los Campos
- Department of Biostatistics, School of Public Health, University of Alabama, Birmingham, AL 35294, USA.
| | | | | | | | | |
Collapse
|
22
|
Colombani C, Legarra A, Fritz S, Guillaume F, Croiseau P, Ducrocq V, Robert-Granié C. Application of Bayesian least absolute shrinkage and selection operator (LASSO) and BayesCπ methods for genomic selection in French Holstein and Montbéliarde breeds. J Dairy Sci 2012; 96:575-91. [PMID: 23127905 DOI: 10.3168/jds.2011-5225] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2011] [Accepted: 09/14/2012] [Indexed: 11/19/2022]
Abstract
Recently, the amount of available single nucleotide polymorphism (SNP) marker data has considerably increased in dairy cattle breeds, both for research purposes and for application in commercial breeding and selection programs. Bayesian methods are currently used in the genomic evaluation of dairy cattle to handle very large sets of explanatory variables with a limited number of observations. In this study, we applied 2 bayesian methods, BayesCπ and bayesian least absolute shrinkage and selection operator (LASSO), to 2 genotyped and phenotyped reference populations consisting of 3,940 Holstein bulls and 1,172 Montbéliarde bulls with approximately 40,000 polymorphic SNP. We compared the accuracy of the bayesian methods for the prediction of 3 traits (milk yield, fat content, and conception rate) with pedigree-based BLUP, genomic BLUP, partial least squares (PLS) regression, and sparse PLS regression, a variable selection PLS variant. The results showed that the correlations between observed and predicted phenotypes were similar in BayesCπ (including or not pedigree information) and bayesian LASSO for most of the traits and whatever the breed. In the Holstein breed, bayesian methods led to higher correlations than other approaches for fat content and were similar to genomic BLUP for milk yield and to genomic BLUP and PLS regression for the conception rate. In the Montbéliarde breed, no method dominated the others, except BayesCπ for fat content. The better performances of the bayesian methods for fat content in Holstein and Montbéliarde breeds are probably due to the effect of the DGAT1 gene. The SNP identified by the BayesCπ, bayesian LASSO, and sparse PLS regression methods, based on their effect on the different traits of interest, were located at almost the same position on the genome. As the bayesian methods resulted in regressions of direct genomic values on daughter trait deviations closer to 1 than for the other methods tested in this study, bayesian methods are suggested for genomic evaluations of French dairy cattle.
Collapse
Affiliation(s)
- C Colombani
- INRA, UR631-SAGA, BP 52627, 31326 Castanet-Tolosan Cedex, France
| | | | | | | | | | | | | |
Collapse
|
23
|
Dekkers JCM. Application of genomics tools to animal breeding. Curr Genomics 2012; 13:207-12. [PMID: 23115522 PMCID: PMC3382275 DOI: 10.2174/138920212800543057] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2011] [Revised: 10/17/2011] [Accepted: 10/27/2011] [Indexed: 01/01/2023] Open
Abstract
The main goal in animal breeding is to select individuals that have high breeding values for traits of interest as parents to produce the next generation and to do so as quickly as possible. To date, most programs rely on statistical analysis of large data bases with phenotypes on breeding populations by linear mixed model methodology to estimate breeding values on selection candidates. However, there is a long history of research on the use of genetic markers to identify quantitative trait loci and their use in marker-assisted selection but with limited implementation in practical breeding programs. The advent of high-density SNP genotyping, combined with novel statistical methods for the use of this data to estimate breeding values, has resulted in the recent extensive application of genomic or whole-genome selection in dairy cattle and research to implement genomic selection in other livestock species is underway. The high-density SNP data also provides opportunities to detect QTL and to encover the genetic architecture of quantitative traits, in terms of the distribution of the size of genetic effects that contribute to trait differences in a population. Results show that this genetic architecture differs between traits but that for most traits, over 50% of the genetic variation resides in genomic regions with small effects that are of the order of magnitude that is expected under a highly polygenic model of inheritance.
Collapse
Affiliation(s)
- Jack C M Dekkers
- Department of Animal Science, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
24
|
Colombani C, Croiseau P, Fritz S, Guillaume F, Legarra A, Ducrocq V, Robert-Granié C. A comparison of partial least squares (PLS) and sparse PLS regressions in genomic selection in French dairy cattle. J Dairy Sci 2012; 95:2120-31. [PMID: 22459857 DOI: 10.3168/jds.2011-4647] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Accepted: 12/09/2011] [Indexed: 01/25/2023]
Abstract
Genomic selection involves computing a prediction equation from the estimated effects of a large number of DNA markers based on a limited number of genotyped animals with phenotypes. The number of observations is much smaller than the number of independent variables, and the challenge is to find methods that perform well in this context. Partial least squares regression (PLS) and sparse PLS were used with a reference population of 3,940 genotyped and phenotyped French Holstein bulls and 39,738 polymorphic single nucleotide polymorphism markers. Partial least squares regression reduces the number of variables by projecting independent variables onto latent structures. Sparse PLS combines variable selection and modeling in a one-step procedure. Correlations between observed phenotypes and phenotypes predicted by PLS and sparse PLS were similar, but sparse PLS highlighted some genome regions more clearly. Both PLS and sparse PLS were more accurate than pedigree-based BLUP and generally provided lower correlations between observed and predicted phenotypes than did genomic BLUP. Furthermore, PLS and sparse PLS required similar computing time to genomic BLUP for the study of 6 traits.
Collapse
Affiliation(s)
- C Colombani
- INRA, UR631-SAGA, BP 52627, 31326 Castanet-Tolosan Cedex, France.
| | | | | | | | | | | | | |
Collapse
|
25
|
Simulated data for genomic selection and genome-wide association studies using a combination of coalescent and gene drop methods. G3-GENES GENOMES GENETICS 2012; 2:425-7. [PMID: 22540033 PMCID: PMC3337470 DOI: 10.1534/g3.111.001297] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Accepted: 11/09/2011] [Indexed: 11/18/2022]
Abstract
An approach is described for simulating data sequence, genotype, and phenotype data to study genomic selection and genome-wide association studies (GWAS). The simulation method, implemented in a software package called AlphaDrop, can be used to simulate genomic data and phenotypes with flexibility in terms of the historical population structure, recent pedigree structure, distribution of quantitative trait loci effects, and with sequence and single nucleotide polymorphism-phased alleles and genotypes. Ten replicates of a representative scenario used to study genomic selection in livestock were generated and have been made publically available. The simulated data sets were structured to encompass a spectrum of additive quantitative trait loci effect distributions, relationship structures, and single nucleotide polymorphism chip densities.
Collapse
|
26
|
Bastiaansen JWM, Coster A, Calus MPL, van Arendonk JAM, Bovenhuis H. Long-term response to genomic selection: effects of estimation method and reference population structure for different genetic architectures. Genet Sel Evol 2012; 44:3. [PMID: 22273519 PMCID: PMC3305523 DOI: 10.1186/1297-9686-44-3] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2011] [Accepted: 01/24/2012] [Indexed: 11/18/2022] Open
Abstract
Background Genomic selection has become an important tool in the genetic improvement of animals and plants. The objective of this study was to investigate the impacts of breeding value estimation method, reference population structure, and trait genetic architecture, on long-term response to genomic selection without updating marker effects. Methods Three methods were used to estimate genomic breeding values: a BLUP method with relationships estimated from genome-wide markers (GBLUP), a Bayesian method, and a partial least squares regression method (PLSR). A shallow (individuals from one generation) or deep reference population (individuals from five generations) was used with each method. The effects of the different selection approaches were compared under four different genetic architectures for the trait under selection. Selection was based on one of the three genomic breeding values, on pedigree BLUP breeding values, or performed at random. Selection continued for ten generations. Results Differences in long-term selection response were small. For a genetic architecture with a very small number of three to four quantitative trait loci (QTL), the Bayesian method achieved a response that was 0.05 to 0.1 genetic standard deviation higher than other methods in generation 10. For genetic architectures with approximately 30 to 300 QTL, PLSR (shallow reference) or GBLUP (deep reference) had an average advantage of 0.2 genetic standard deviation over the Bayesian method in generation 10. GBLUP resulted in 0.6% and 0.9% less inbreeding than PLSR and BM and on average a one third smaller reduction of genetic variance. Responses in early generations were greater with the shallow reference population while long-term response was not affected by reference population structure. Conclusions The ranking of estimation methods was different with than without selection. Under selection, applying GBLUP led to lower inbreeding and a smaller reduction of genetic variance while a similar response to selection was achieved. The reference population structure had a limited effect on long-term accuracy and response. Use of a shallow reference population, most closely related to the selection candidates, gave early benefits while in later generations, when marker effects were not updated, the estimation of marker effects based on a deeper reference population did not pay off.
Collapse
Affiliation(s)
- John W M Bastiaansen
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands.
| | | | | | | | | |
Collapse
|
27
|
Abstract
Hierarchical mixed effects models have been demonstrated to be powerful for predicting genomic merit of livestock and plants, on the basis of high-density single-nucleotide polymorphism (SNP) marker panels, and their use is being increasingly advocated for genomic predictions in human health. Two particularly popular approaches, labeled BayesA and BayesB, are based on specifying all SNP-associated effects to be independent of each other. BayesB extends BayesA by allowing a large proportion of SNP markers to be associated with null effects. We further extend these two models to specify SNP effects as being spatially correlated due to the chromosomally proximal effects of causal variants. These two models, that we respectively dub as ante-BayesA and ante-BayesB, are based on a first-order nonstationary antedependence specification between SNP effects. In a simulation study involving 20 replicate data sets, each analyzed at six different SNP marker densities with average LD levels ranging from r(2) = 0.15 to 0.31, the antedependence methods had significantly (P < 0.01) higher accuracies than their corresponding classical counterparts at higher LD levels (r(2) > 0. 24) with differences exceeding 3%. A cross-validation study was also conducted on the heterogeneous stock mice data resource (http://mus.well.ox.ac.uk/mouse/HS/) using 6-week body weights as the phenotype. The antedependence methods increased cross-validation prediction accuracies by up to 3.6% compared to their classical counterparts (P < 0.001). Finally, we applied our method to other benchmark data sets and demonstrated that the antedependence methods were more accurate than their classical counterparts for genomic predictions, even for individuals several generations beyond the training data.
Collapse
|
28
|
Moser G, Khatkar MS, Hayes BJ, Raadsma HW. Accuracy of direct genomic values in Holstein bulls and cows using subsets of SNP markers. Genet Sel Evol 2010; 42:37. [PMID: 20950478 PMCID: PMC2964565 DOI: 10.1186/1297-9686-42-37] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Accepted: 10/16/2010] [Indexed: 08/26/2023] Open
Abstract
Background At the current price, the use of high-density single nucleotide polymorphisms (SNP) genotyping assays in genomic selection of dairy cattle is limited to applications involving elite sires and dams. The objective of this study was to evaluate the use of low-density assays to predict direct genomic value (DGV) on five milk production traits, an overall conformation trait, a survival index, and two profit index traits (APR, ASI). Methods Dense SNP genotypes were available for 42,576 SNP for 2,114 Holstein bulls and 510 cows. A subset of 1,847 bulls born between 1955 and 2004 was used as a training set to fit models with various sets of pre-selected SNP. A group of 297 bulls born between 2001 and 2004 and all cows born between 1992 and 2004 were used to evaluate the accuracy of DGV prediction. Ridge regression (RR) and partial least squares regression (PLSR) were used to derive prediction equations and to rank SNP based on the absolute value of the regression coefficients. Four alternative strategies were applied to select subset of SNP, namely: subsets of the highest ranked SNP for each individual trait, or a single subset of evenly spaced SNP, where SNP were selected based on their rank for ASI, APR or minor allele frequency within intervals of approximately equal length. Results RR and PLSR performed very similarly to predict DGV, with PLSR performing better for low-density assays and RR for higher-density SNP sets. When using all SNP, DGV predictions for production traits, which have a higher heritability, were more accurate (0.52-0.64) than for survival (0.19-0.20), which has a low heritability. The gain in accuracy using subsets that included the highest ranked SNP for each trait was marginal (5-6%) over a common set of evenly spaced SNP when at least 3,000 SNP were used. Subsets containing 3,000 SNP provided more than 90% of the accuracy that could be achieved with a high-density assay for cows, and 80% of the high-density assay for young bulls. Conclusions Accurate genomic evaluation of the broader bull and cow population can be achieved with a single genotyping assays containing ~ 3,000 to 5,000 evenly spaced SNP.
Collapse
Affiliation(s)
- Gerhard Moser
- Dairy Futures Cooperative Research Centre, Australia.
| | | | | | | |
Collapse
|