1
|
Wolc A, Arango J, Settar P, Fulton JE, O’Sullivan NP, Dekkers JCM, Fernando R, Garrick DJ. Mixture models detect large effect QTL better than GBLUP and result in more accurate and persistent predictions. J Anim Sci Biotechnol 2016; 7:7. [PMID: 26870325 PMCID: PMC4750167 DOI: 10.1186/s40104-016-0066-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Accepted: 01/27/2016] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Accurate evaluation of SNP effects is important for genome wide association studies and for genomic prediction. The genetic architecture of quantitative traits differs widely, with some traits exhibiting few if any quantitative trait loci (QTL) with large effects, while other traits have one or several easily detectable QTL with large effects. METHODS Body weight in broilers and egg weight in layers are two examples of traits that have QTL of large effect. A commonly used method for genome wide association studies is to fit a mixture model such as BayesB that assumes some known proportion of SNP effects are zero. In contrast, the most commonly used method for genomic prediction is known as GBLUP, which involves fitting an animal model to phenotypic data with the variance-covariance or genomic relationship matrix among the animals being determined by genome wide SNP genotypes. Genotypes at each SNP are typically weighted equally in determining the genomic relationship matrix for GBLUP. We used the equivalent marker effects model formulation of GBLUP for this study. We compare these two classes of models using egg weight data collected over 8 generations from 2,324 animals genotyped with a 42 K SNP panel. RESULTS Using data from the first 7 generations, both BayesB and GBLUP found the largest QTL in a similar well-recognized QTL region, but this QTL was estimated to account for 24 % of genetic variation with BayesB and less than 1 % with GBLUP. When predicting phenotypes in generation 8 BayesB accounted for 36 % of the phenotypic variation and GBLUP for 25 %. When using only data from any one generation, the same QTL was identified with BayesB in all but one generation but never with GBLUP. Predictions of phenotypes in generations 2 to 7 based on only 295 animals from generation 1 accounted for 10 % phenotypic variation with BayesB but only 6 % with GBLUP. Predicting phenotype using only the marker effects in the 1 Mb region that accounted for the largest effect on egg weight from generation 1 data alone accounted for almost 8 % variation using BayesB but had no predictive power with GBLUP. CONCLUSIONS In conclusion, In the presence of large effect QTL, BayesB did a better job of QTL detection and its genomic predictions were more accurate and persistent than those from GBLUP.
Collapse
Affiliation(s)
- Anna Wolc
- />Department of Animal Science, Iowa State University, 225D Kildee Hall, Ames, IA 50011 USA
- />Hy-Line International, Dallas Center, IA USA
| | | | | | | | | | - Jack C. M. Dekkers
- />Department of Animal Science, Iowa State University, 225D Kildee Hall, Ames, IA 50011 USA
| | - Rohan Fernando
- />Department of Animal Science, Iowa State University, 225D Kildee Hall, Ames, IA 50011 USA
| | - Dorian J. Garrick
- />Department of Animal Science, Iowa State University, 225D Kildee Hall, Ames, IA 50011 USA
| |
Collapse
|
2
|
Waldmann P, Mészáros G, Gredler B, Fuerst C, Sölkner J. Evaluation of the lasso and the elastic net in genome-wide association studies. Front Genet 2013; 4:270. [PMID: 24363662 PMCID: PMC3850240 DOI: 10.3389/fgene.2013.00270] [Citation(s) in RCA: 132] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Accepted: 11/18/2013] [Indexed: 01/23/2023] Open
Abstract
The number of publications performing genome-wide association studies (GWAS) has increased dramatically. Penalized regression approaches have been developed to overcome the challenges caused by the high dimensional data, but these methods are relatively new in the GWAS field. In this study we have compared the statistical performance of two methods (the least absolute shrinkage and selection operator—lasso and the elastic net) on two simulated data sets and one real data set from a 50 K genome-wide single nucleotide polymorphism (SNP) panel of 5570 Fleckvieh bulls. The first simulated data set displays moderate to high linkage disequilibrium between SNPs, whereas the second simulated data set from the QTLMAS 2010 workshop is biologically more complex. We used cross-validation to find the optimal value of regularization parameter λ with both minimum MSE and minimum MSE + 1SE of minimum MSE. The optimal λ values were used for variable selection. Based on the first simulated data, we found that the minMSE in general picked up too many SNPs. At minMSE + 1SE, the lasso didn't acquire any false positives, but selected too few correct SNPs. The elastic net provided the best compromise between few false positives and many correct selections when the penalty weight α was around 0.1. However, in our simulation setting, this α value didn't result in the lowest minMSE + 1SE. The number of selected SNPs from the QTLMAS 2010 data was after correction for population structure 82 and 161 for the lasso and the elastic net, respectively. In the Fleckvieh data set after population structure correction lasso and the elastic net identified from 1291 to 1966 important SNPs for milk fat content, with major peaks on chromosomes 5, 14, 15, and 20. Hence, we can conclude that it is important to analyze GWAS data with both the lasso and the elastic net and an alternative tuning criterion to minimum MSE is needed for variable selection.
Collapse
Affiliation(s)
- Patrik Waldmann
- Division of Livestock Sciences, Department of Sustainable Agricultural Systems, University of Natural Resources and Life Sciences Vienna, Austria ; Division of Statistics, Department of Computer and Information Science, Linköping University Linköping, Sweden
| | - Gábor Mészáros
- Division of Livestock Sciences, Department of Sustainable Agricultural Systems, University of Natural Resources and Life Sciences Vienna, Austria
| | | | | | - Johann Sölkner
- Division of Livestock Sciences, Department of Sustainable Agricultural Systems, University of Natural Resources and Life Sciences Vienna, Austria
| |
Collapse
|
3
|
Schurink A, Wolc A, Ducro BJ, Frankena K, Garrick DJ, Dekkers JCM, van Arendonk JAM. Genome-wide association study of insect bite hypersensitivity in two horse populations in the Netherlands. Genet Sel Evol 2012; 44:31. [PMID: 23110538 PMCID: PMC3524047 DOI: 10.1186/1297-9686-44-31] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Accepted: 10/19/2012] [Indexed: 01/09/2023] Open
Abstract
Background Insect bite hypersensitivity is a common allergic disease in horse populations worldwide. Insect bite hypersensitivity is affected by both environmental and genetic factors. However, little is known about genes contributing to the genetic variance associated with insect bite hypersensitivity. Therefore, the aim of our study was to identify and quantify genomic associations with insect bite hypersensitivity in Shetland pony mares and Icelandic horses in the Netherlands. Methods Data on 200 Shetland pony mares and 146 Icelandic horses were collected according to a matched case–control design. Cases and controls were matched on various factors (e.g. region, sire) to minimize effects of population stratification. Breed-specific genome-wide association studies were performed using 70 k single nucleotide polymorphisms genotypes. Bayesian variable selection method Bayes-C with a threshold model implemented in GenSel software was applied. A 1 Mb non-overlapping window approach that accumulated contributions of adjacent single nucleotide polymorphisms was used to identify associated genomic regions. Results The percentage of variance explained by all single nucleotide polymorphisms was 13% in Shetland pony mares and 28% in Icelandic horses. The 20 non-overlapping windows explaining the largest percentages of genetic variance were found on nine chromosomes in Shetland pony mares and on 14 chromosomes in Icelandic horses. Overlap in identified associated genomic regions between breeds would suggest interesting candidate regions to follow-up on. Such regions common to both breeds (within 15 Mb) were found on chromosomes 3, 7, 11, 20 and 23. Positional candidate genes within 2 Mb from the associated windows were identified on chromosome 20 in both breeds. Candidate genes are within the equine lymphocyte antigen class II region, which evokes an immune response by recognizing many foreign molecules. Conclusions The genome-wide association study identified several genomic regions associated with insect bite hypersensitivity in Shetland pony mares and Icelandic horses. On chromosome 20, associated genomic regions in both breeds were within 2 Mb from the equine lymphocyte antigen class II region. Increased knowledge on insect bite hypersensitivity associated genes will contribute to our understanding of its biology, enabling more efficient selection, therapy and prevention to decrease insect bite hypersensitivity prevalence.
Collapse
Affiliation(s)
- Anouk Schurink
- Animal Breeding and Genomics Centre, Wageningen University, P,O, Box 338, Wageningen, 6700 AH, the Netherlands
| | | | | | | | | | | | | |
Collapse
|
4
|
Zeng J, Pszczola M, Wolc A, Strabel T, Fernando RL, Garrick DJ, Dekkers JCM. Genomic breeding value prediction and QTL mapping of QTLMAS2011 data using Bayesian and GBLUP methods. BMC Proc 2012; 6 Suppl 2:S7. [PMID: 22640755 PMCID: PMC3363161 DOI: 10.1186/1753-6561-6-s2-s7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The goal of this study was to apply Bayesian and GBLUP methods to predict genomic breeding values (GEBV), map QTL positions and explore the genetic architecture of the trait simulated for the 15th QTL-MAS workshop. METHODS Three methods with models considering dominance and epistasis inheritances were used to fit the data: (i) BayesB with a proportion π = 0.995 of SNPs assumed to have no effect, (ii) BayesCπ, where π is considered as unknown, and (iii) GBLUP, which directly fits animal genetic effects using a genomic relationship matrix. RESULTS BayesB, BayesCπ and GBLUP with various fitted models detected 6, 5, and 4 out of 8 simulated QTL, respectively. All five additive QTL were detected by Bayesian methods. When two QTL were in either coupling or repulsion phase, GBLUP only detected one of them and missed the other. In addition, GBLUP yielded more false positives. One imprinted QTL was detected by BayesB and GBLUP despite that only additive gene action was assumed. This QTL was missed by BayesCπ. None of the methods found two simulated additive-by-additive epistatic QTL. Variance components estimation correctly detected no evidence for dominance gene-action. Bayesian methods predicted additive genetic merit more accurately than GBLUP, and similar accuracies were observed between BayesB and BayesCπ. CONCLUSIONS Bayesian methods and GBLUP mapped QTL to similar chromosome regions but Bayesian methods gave fewer false positives. Bayesian methods can be superior to GBLUP in GEBV prediction when genomic architecture is unknown.
Collapse
Affiliation(s)
- Jian Zeng
- Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, USA
| | - Marcin Pszczola
- Department of Genetics and Animal Breeding, Poznan University of Life Sciences, Poznan, Poland
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Lelystad, The Netherlands
| | - Anna Wolc
- Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, USA
- Department of Genetics and Animal Breeding, Poznan University of Life Sciences, Poznan, Poland
| | - Tomasz Strabel
- Department of Genetics and Animal Breeding, Poznan University of Life Sciences, Poznan, Poland
| | - Rohan L Fernando
- Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, USA
| | - Dorian J Garrick
- Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, USA
| | - Jack CM Dekkers
- Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, USA
| |
Collapse
|
5
|
Demeure O, Filangi O, Elsen JM, Le Roy P. Comparison of the analyses of the XVth QTLMAS common dataset II: QTL analysis. BMC Proc 2012; 6 Suppl 2:S2. [PMID: 22640591 PMCID: PMC3363156 DOI: 10.1186/1753-6561-6-s2-s2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The QTLMAS XVth dataset consisted of the pedigrees, marker genotypes and quantitative trait performances of 2,000 phenotyped animals with a half-sib family structure. The trait was regulated by 8 QTL which display additive, imprinting or epistatic effects. This paper aims at comparing the QTL mapping results obtained by six participants of the workshop. METHODS Different regression, GBLUP, LASSO and Bayesian methods were applied for QTL detection. The results of these methods are compared based on the number of correctly mapped QTL, the number of false positives, the accuracy of the QTL location and the estimation of the QTL effect. RESULTS All the simulated QTL, except the interacting QTL on Chr5, were identified by the participants. Depending on the method, 3 to 7 out of the 8 QTL were identified. The distance to the real location and the accuracy of the QTL effect varied to a large extent depending on the methods and complexity of the simulated QTL. CONCLUSIONS While all methods were fairly efficient in detecting QTL with additive effects, it was clear that for non-additive situations, such as parent-of-origin effects or interactions, the BayesC method gave the best results by detecting 7 out of the 8 simulated QTL, with only two false positives and a good precision (less than 1 cM away on average). Indeed, if LASSO could detect QTL even in complex situations, it was associated with too many false positive results to allow for efficient GWAS. GENMIX, a method based on the phylogenies of local haplotypes, also appeared as a promising approach, which however showed a few more false positives when compared with the BayesC method.
Collapse
Affiliation(s)
- Olivier Demeure
- INRA, UMR1348 PEGASE, Domaine de la Prise, 35590 Saint-Gilles, France.
| | | | | | | |
Collapse
|
6
|
Calus MP, Mulder HA, Veerkamp RF. Estimating genomic breeding values and detecting QTL using univariate and bivariate models. BMC Proc 2011; 5 Suppl 3:S5. [PMID: 21624175 PMCID: PMC3103204 DOI: 10.1186/1753-6561-5-s3-s5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background Genomic selection is particularly beneficial for difficult or expensive to measure traits. Since multi-trait selection is an important tool to deal with such cases, an important question is what the added value is of multi-trait genomic selection. Methods The simulated dataset, including a quantitative and binary trait, was analyzed with four univariate and bivariate linear models to predict breeding values for juvenile animals. Two models estimated variance components with REML using a numerator (A), or SNP based relationship matrix (G). Two SNP based Bayesian models included one (BayesA) or two distributions (BayesC) for estimated SNP effects. The bivariate BayesC model sampled QTL probabilities for each SNP conditional on both traits. Genotypes were permuted 2,000 times against phenotypes and pedigree, to obtain significance thresholds for posterior QTL probabilities. Genotypes were permuted rather than phenotypes, to retain relationships between pedigree and phenotypes, such that polygenic effects could still be estimated. Results Correlations between estimated breeding values (EBV) of different SNP based models, for juvenile animals, were greater than 0.93 (0.87) for the quantitative (binary) trait. Estimated genetic correlation was 0.71 (0.66) for model G (A). Accuracies of breeding values of SNP based models were for both traits highest for BayesC and lowest for G. Accuracies of breeding values of bivariate models were up to 0.08 higher than for univariate models. The bivariate BayesC model detected 14 out of 32 QTL for the quantitative trait, and 8 out of 22 for the binary trait. Conclusions Accuracy of EBV clearly improved for both traits using bivariate compared to univariate models. BayesC achieved highest accuracies of EBV and was also one of the methods that found most QTL. Permuting genotypes against phenotypes and pedigree in BayesC provided an effective way to derive significance thresholds for posterior QTL probabilities.
Collapse
Affiliation(s)
- Mario Pl Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Lelystad, Netherlands.
| | | | | |
Collapse
|