1
|
Herry F, Hérault F, Lecerf F, Lagoutte L, Doublet M, Picard-Druet D, Bardou P, Varenne A, Burlot T, Le Roy P, Allais S. Restriction site-associated DNA sequencing technologies as an alternative to low-density SNP chips for genomic selection: a simulation study in layer chickens. BMC Genomics 2023; 24:271. [PMID: 37208589 DOI: 10.1186/s12864-023-09321-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 04/18/2023] [Indexed: 05/21/2023] Open
Abstract
BACKGROUND To reduce the cost of genomic selection, a low-density (LD) single nucleotide polymorphism (SNP) chip can be used in combination with imputation for genotyping selection candidates instead of using a high-density (HD) SNP chip. Next-generation sequencing (NGS) techniques have been increasingly used in livestock species but remain expensive for routine use for genomic selection. An alternative and cost-efficient solution is to use restriction site-associated DNA sequencing (RADseq) techniques to sequence only a fraction of the genome using restriction enzymes. From this perspective, use of RADseq techniques followed by an imputation step on HD chip as alternatives to LD chips for genomic selection was studied in a pure layer line. RESULTS Genome reduction and sequencing fragments were identified on reference genome using four restriction enzymes (EcoRI, TaqI, AvaII and PstI) and a double-digest RADseq (ddRADseq) method (TaqI-PstI). The SNPs contained in these fragments were detected from the 20X sequence data of the individuals in our population. Imputation accuracy on HD chip with these genotypes was assessed as the mean correlation between true and imputed genotypes. Several production traits were evaluated using single-step GBLUP methodology. The impact of imputation errors on the ranking of the selection candidates was assessed by comparing a genomic evaluation based on ancestry using true HD or imputed HD genotyping. The relative accuracy of genomic estimated breeding values (GEBVs) was investigated by considering the GEBVs estimated on offspring as a reference. With AvaII or PstI and ddRADseq with TaqI and PstI, more than 10 K SNPs were detected in common with the HD SNP chip, resulting in an imputation accuracy greater than 0.97. The impact of imputation errors on genomic evaluation of the breeders was reduced, with a Spearman correlation greater than 0.99. Finally, the relative accuracy of GEBVs was equivalent. CONCLUSIONS RADseq approaches can be interesting alternatives to low-density SNP chips for genomic selection. With more than 10 K SNPs in common with the SNPs of the HD SNP chip, good imputation and genomic evaluation results can be obtained. However, with real data, heterogeneity between individuals with missing data must be considered.
Collapse
Affiliation(s)
- Florian Herry
- NOVOGEN, 5 rue des Compagnons, Secteur du Vau Ballier, Plédran, 22960, France
- PEGASE, INRAE, Institut Agro, Saint-Gilles, 35590, France
| | | | | | | | | | | | - Philippe Bardou
- SIGENAE, GenPhySE, Université de Toulouse, INRA, ENVT, 24 chemin de Borde-Rouge - Auzeville Tolosane, Castanet Tolosan, 31326, France
| | - Amandine Varenne
- NOVOGEN, 5 rue des Compagnons, Secteur du Vau Ballier, Plédran, 22960, France
| | - Thierry Burlot
- NOVOGEN, 5 rue des Compagnons, Secteur du Vau Ballier, Plédran, 22960, France
| | - Pascale Le Roy
- PEGASE, INRAE, Institut Agro, Saint-Gilles, 35590, France
| | - Sophie Allais
- PEGASE, INRAE, Institut Agro, Saint-Gilles, 35590, France.
| |
Collapse
|
2
|
Genotyping, the Usefulness of Imputation to Increase SNP Density, and Imputation Methods and Tools. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:113-138. [PMID: 35451774 DOI: 10.1007/978-1-0716-2205-6_4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Imputation has become a standard practice in modern genetic research to increase genome coverage and improve accuracy of genomic selection and genome-wide association study as a large number of samples can be genotyped at lower density (and lower cost) and, imputed up to denser marker panels or to sequence level, using information from a limited reference population. Most genotype imputation algorithms use information from relatives and population linkage disequilibrium. A number of software for imputation have been developed originally for human genetics and, more recently, for animal and plant genetics considering pedigree information and very sparse SNP arrays or genotyping-by-sequencing data. In comparison to human populations, the population structures in farmed species and their limited effective sizes allow to accurately impute high-density genotypes or sequences from very low-density SNP panels and a limited set of reference individuals. Whatever the imputation method, the imputation accuracy, measured by the correct imputation rate or the correlation between true and imputed genotypes, increased with the increasing relatedness of the individual to be imputed with its denser genotyped ancestors and as its own genotype density increased. Increasing the imputation accuracy pushes up the genomic selection accuracy whatever the genomic evaluation method. Given the marker densities, the most important factors affecting imputation accuracy are clearly the size of the reference population and the relationship between individuals in the reference and target populations.
Collapse
|
3
|
Impact of Marker Pruning Strategies Based on Different Measurements of Marker Distance on Genomic Prediction in Dairy Cattle. Animals (Basel) 2021; 11:ani11071992. [PMID: 34359120 PMCID: PMC8300388 DOI: 10.3390/ani11071992] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 06/27/2021] [Accepted: 06/28/2021] [Indexed: 11/16/2022] Open
Abstract
Simple Summary The usefulness of genomic prediction (GP) has been widely proofed by breeding analysis in livestock, plants and aquatic populations. It is well known that ‘marker density’ is a critical factor that affects the accuracy of GP, however, how to properly measure ‘marker density’ in GP is yet to be determined. With population-level whole-genome sequence data or high-density single nucleotide polymorphism (SNP) data available, this question seems to be answered more convincingly. In this study, we investigated and discussed the impact of four ‘marker density’ measures that reflect genetic or physical distances between SNPs on the accuracy of GP in a Germany Holstein dairy cattle population. Our results showed that the degree of variation of physical distance between adjacent SNPs had significant effects on the accuracy of GP, while the genetic distance between SNPs had no relationship with the accuracy of GP. Therefore, for studies based on high-density SNP data, the default strategy of pruning SNPs based on genetic distance is detrimental to heritability estimation and genomic prediction. The results extended the communities knowledge of ‘marker density’ and provided useful suggestions for the application and research on genome prediction. Abstract With the availability of high-density single-nucleotide polymorphism (SNP) data and the development of genotype imputation methods, high-density panel-based genomic prediction (GP) has become possible in livestock breeding. It is generally considered that the genomic estimated breeding value (GEBV) accuracy increases with the marker density, while studies have shown that the GEBV accuracy does not increase or even decrease when high-density panels were used. Therefore, in addition to the SNP number, other measurements of ‘marker density’ seem to have impacts on the GEBV accuracy, and exploring the relationship between the GEBV accuracy and the measurements of ‘marker density’ based on high-density SNP or whole-genome sequence data is important for the field of GP. In this study, we constructed different SNP panels with certain SNP numbers (e.g., 1 k) by using the physical distance (PhyD), genetic distance (GenD) and random distance (RanD) between SNPs respectively based on the high-density SNP data of a Germany Holstein dairy cattle population. Therefore, there are three different panels at a certain SNP number level. These panels were used to construct GP models to predict fat percentage, milk yield and somatic cell score. Meanwhile, the mean (d¯) and variance (σd2) of the physical distance between SNPs and the mean (r2¯) and variance (σr22) of the genetic distance between SNPs in each panel were used as marker density-related measurements and their influence on the GEBV accuracy was investigated. At the same SNP number level, the d¯ of all panels is basically the same, but the σd2, r2¯ and σr22 are different. Therefore, we only investigated the effects of σd2, r2¯ and σr22 on the GEBV accuracy. The results showed that at a certain SNP number level, the GEBV accuracy was negatively correlated with σd2, but not with r2¯ and σr22. Compared with GenD and RanD, the σd2 of panels constructed by PhyD is smaller. The low and moderate-density panels (< 50 k) constructed by RanD or GenD have large σd2, which is not conducive to genomic prediction. The GEBV accuracy of the low and moderate-density panels constructed by PhyD is 3.8~34.8% higher than that of the low and moderate-density panels constructed by RanD and GenD. Panels with 20–30 k SNPs constructed by PhyD can achieve the same or slightly higher GEBV accuracy than that of high-density SNP panels for all three traits. In summary, the smaller the variation degree of physical distance between adjacent SNPs, the higher the GEBV accuracy. The low and moderate-density panels construct by physical distance are beneficial to genomic prediction, while pruning high-density SNP data based on genetic distance is detrimental to genomic prediction. The results provide suggestions for the development of SNP panels and the research of genome prediction based on whole-genome sequence data.
Collapse
|
4
|
Budhlakoti N, Rai A, Mishra DC. Statistical Approach for Improving Genomic Prediction Accuracy through Efficient Diagnostic Measure of Influential Observation. Sci Rep 2020; 10:8408. [PMID: 32439883 PMCID: PMC7242349 DOI: 10.1038/s41598-020-65323-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 04/28/2020] [Indexed: 11/22/2022] Open
Abstract
It is expected the predictive performance of genomic prediction methods may be adversely affected in the presence of outliers. In agriculture science an outlier may arise due to wrong data imputation, outlying response, and in a series of trials over the time or location. Although several statistical procedures are already there in literature for identification of outlier but identification of true outlier is still a challenge especially in case of high dimensional genomic data. Here we have proposed an efficient approach for detecting outlier in high dimensional genomic data, our approach is p-value based combination methods to produce single p-value for detecting the outliers. Robustness of our approach has been tested using simulated data through the evaluation measures like precision, recall etc. It has been observed that significant improvement in the performance of genomic prediction has been obtained by detecting the outliers and handling them accordingly through our proposed approach using real data.
Collapse
Affiliation(s)
- Neeraj Budhlakoti
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, 110012, New Delhi, India
| | - Anil Rai
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, 110012, New Delhi, India
| | - D C Mishra
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, 110012, New Delhi, India.
| |
Collapse
|
5
|
Interest of using imputation for genomic evaluation in layer chicken. Poult Sci 2020; 99:2324-2336. [PMID: 32359567 PMCID: PMC7597443 DOI: 10.1016/j.psj.2020.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 12/27/2019] [Accepted: 01/01/2020] [Indexed: 11/21/2022] Open
Abstract
With the availability of the 600K Affymetrix Axiom high-density (HD) single nucleotide polymorphism (SNP) chip, genomic selection has been implemented in broiler and layer chicken. However, the cost of this SNP chip is too high to genotype all selection candidates. A solution is to develop a low-density SNP chip, at a lower price, and to impute all missing markers. But to routinely implement this solution, the impact of imputation on genomic evaluation accuracy must be studied. It is also interesting to study the consequences of the use of low-density SNP chips in genomic evaluation accuracy. In this perspective, the interest of using imputation in genomic selection was studied in a pure layer line. Two low-density SNP chip designs were compared: an equidistant methodology and a methodology based on linkage disequilibrium. Egg weight, egg shell color, egg shell strength, and albumen height were evaluated with single-step genomic best linear unbiased prediction methodology. The impact of imputation errors or the absence of imputation on the ranking of the male selection candidates was assessed with a genomic evaluation based on ancestry. Thus, genomic estimated breeding values (GEBV) obtained with imputed HD genotypes or low-density genotypes were compared with GEBV obtained with the HD SNP chip. The relative accuracy of GEBV was also investigated by considering as reference GEBV estimated on the offspring. A limited reordering of the breeders, selected on a multitrait index, was observed. Spearman correlations between GEBV on HD genotypes and GEBV on low-density genotypes (with or without imputation) were always higher than 0.94 with more than 3K SNP. For the genetically closer, top 150 individuals for a specific trait, with imputation, the reordering was reduced with correlation higher than 0.94 with more than 3K SNP. Without imputation, the correlations remained lower than 0.85 with less than 3K and 16K SNP for equidistant and linkage disequilibrium methodology, respectively. The differences in GEBV correlations between both methodologies were never significant. The conclusions were the same for all studied traits.
Collapse
|
6
|
Liu T, Luo C, Ma J, Wang Y, Shu D, Su G, Qu H. High-Throughput Sequencing With the Preselection of Markers Is a Good Alternative to SNP Chips for Genomic Prediction in Broilers. Front Genet 2020; 11:108. [PMID: 32174971 PMCID: PMC7056902 DOI: 10.3389/fgene.2020.00108] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 01/30/2020] [Indexed: 11/13/2022] Open
Abstract
The choice of a genetic marker genotyping platform is important for genomic prediction in livestock and poultry. High-throughput sequencing can produce more genetic markers, but the genotype quality is lower than that obtained with single nucleotide polymorphism (SNP) chips. The aim of this study was to compare the accuracy of genomic prediction between high-throughput sequencing and SNP chips in broilers. In this study, we developed a new SNP marker screening method, the pre-marker-selection (PMS) method, to determine whether an SNP marker can be used for genomic prediction. We also compared a method which preselection marker based results from genome-wide association studies (GWAS). With the two methods, we analysed body weight at the12th week (BW) and feed conversion ratio (FCR) in a local broiler population. A total of 395 birds were selected from the F2 generation of the population, and 10X specific-locus amplified fragment sequencing (SLAF-seq) and the Illumina Chicken 60K SNP Beadchip were used for genotyping. The genomic best linear unbiased prediction method (GBLUP) was used to predict the genomic breeding values. The accuracy of genomic prediction was validated by the leave-one-out cross-validation method. Without SNP marker screening, the accuracies of the genomic estimated breeding value (GEBV) of BW and FCR were 0.509 and 0.249, respectively, when using SLAF-seq, and the accuracies were 0.516 and 0.232, respectively, when using the SNP chip. With SNP marker screening by the PMS method, the accuracies of GEBV of the two traits were 0.671 and 0.499, respectively, when using SLAF-seq, and 0.605 and 0.422, respectively, when using the SNP chip. Our SNP marker screening method led to an increase of prediction accuracy by 0.089-0.250. With SNP marker screening by the GWAS method, the accuracies of genomic prediction for the two traits were also improved, but the gains of accuracy were less than the gains with PMS method for all traits. The results from this study indicate that our PMS method can improve the accuracy of GEBV, and that more accurate genomic prediction can be obtained from an increased number of genomic markers when using high-throughput sequencing in local broiler populations. Due to its lower genotyping cost, high-throughput sequencing could be a good alternative to SNP chips for genomic prediction in breeding programmes of local broiler populations.
Collapse
Affiliation(s)
- Tianfei Liu
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Chenglong Luo
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Jie Ma
- Guangdong Provincial Key Laboratory of Animal Breeding and Nutrition, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Yan Wang
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Dingming Shu
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, Denmark
| | - Hao Qu
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| |
Collapse
|
7
|
Hou L, Liang W, Xu G, Huang B, Zhang X, Hu CY, Wang C. Accuracy of genomic prediction using mixed low-density marker panels. ANIMAL PRODUCTION SCIENCE 2020. [DOI: 10.1071/an18503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Low-density single-nucleotide polymorphism (LD-SNP) panel is one effective way to reduce the cost of genomic selection in animal breeding. The present study proposes a new type of LD-SNP panel called mixed low-density (MLD) panel, which considers SNPs with a substantial effect estimated by Bayes method B (BayesB) from many traits and evenly spaced distribution simultaneously. Simulated and real data were used to compare the imputation accuracy and genomic-selection accuracy of two types of LD-SNP panels. The result of genotyping imputation for simulated data showed that the number of quantitative trait loci (QTL) had limited influence on the imputation accuracy only for MLD panels. Evenly spaced (ELD) panel was not affected by QTL. For real data, ELD performed slightly better than did MLD when panel contained 500 and 1000 SNP. However, this advantage vanished quickly as the density increased. The result of genomic selection for simulated data using BayesB showed that MLD performed much better than did ELD when QTL was 100. For real data, MLD also outperformed ELD in growth and carcass traits when using BayesB. In conclusion, the MLD strategy is superior to ELD in genomic selection under most situations.
Collapse
|
8
|
Shashkova TI, Martynova EU, Ayupova AF, Shumskiy AA, Ogurtsova PA, Kostyunina OV, Khaitovich PE, Mazin PV, Zinovieva NA. Development of a low-density panel for genomic selection of pigs in Russia. Transl Anim Sci 2019; 4:264-274. [PMID: 32704985 PMCID: PMC6994047 DOI: 10.1093/tas/txz182] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 11/27/2019] [Indexed: 02/07/2023] Open
Abstract
Genomic selection is routinely used worldwide in agricultural breeding. However, in Russia, it is still not used to its full potential partially due to high genotyping costs. The use of genotypes imputed from the low-density chips (LD-chip) provides a valuable opportunity for reducing the genotyping costs. Pork production in Russia is based on the conventional 3-tier pyramid involving 3 breeds; therefore, the best option would be the development of a single LD-chip that could be used for all of them. Here, we for the first time have analyzed genomic variability in 3 breeds of Russian pigs, namely, Landrace, Duroc, and Large White and generated the LD-chip that can be used in pig breeding with the negligible loss in genotyping quality. We have demonstrated that out of the 3 methods commonly used for LD-chip construction, the block method shows the best results. The imputation quality depends strongly on the presence of close ancestors in the reference population. We have demonstrated that for the animals with both parents genotyped using high-density panels high-quality genotypes (allelic discordance rate < 0.05) could be obtained using a 300 single nucleotide polymorphism (SNP) chip, while in the absence of genotyped ancestors at least 2,000 SNP markers are required. We have shown that imputation quality varies between chromosomes, and it is lower near the chromosome ends and drops with the increase in minor allele frequency. Imputation quality of the individual SNPs correlated well across breeds. Using the same LD-chip, we were able to obtain comparable imputation quality in all 3 breeds, so it may be suggested that a single chip could be used for all of them. Our findings also suggest that the presence of markers with extremely low imputation quality is likely to be explained by wrong mapping of the markers to the chromosomal positions.
Collapse
Affiliation(s)
| | | | - Asiya F Ayupova
- Skolkovo Institute of Science and Technology, Moscow, Russia
| | | | | | - Olga V Kostyunina
- Ernst Federal Science Center for Animal Husbandry, Dubrovitsy, Moscow Oblast, Russia
| | | | - Pavel V Mazin
- Skolkovo Institute of Science and Technology, Moscow, Russia.,Computer Science Department, National Research University Higher School of Economics, Moscow, Russia
| | - Natalia A Zinovieva
- Ernst Federal Science Center for Animal Husbandry, Dubrovitsy, Moscow Oblast, Russia
| |
Collapse
|
9
|
Abolhassani Targhi MV, Asgari Jafarabadi G, Aminafshar M, Emam Jomeh Kashan N. The effect of genotype imputation and some important factors on the accuracy of genomic prediction and its persistency over time. GENE REPORTS 2019. [DOI: 10.1016/j.genrep.2019.100425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
10
|
Genotype imputation from various low-density SNP panels and its impact on accuracy of genomic breeding values in pigs. Animal 2018; 12:2235-2245. [PMID: 29706144 DOI: 10.1017/s175173111800085x] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
The uptake of genomic selection (GS) by the swine industry is still limited by the costs of genotyping. A feasible alternative to overcome this challenge is to genotype animals using an affordable low-density (LD) single nucleotide polymorphism (SNP) chip panel followed by accurate imputation to a high-density panel. Therefore, the main objective of this study was to screen incremental densities of LD panels in order to systematically identify one that balances the tradeoffs among imputation accuracy, prediction accuracy of genomic estimated breeding values (GEBVs), and genotype density (directly associated with genotyping costs). Genotypes using the Illumina Porcine60K BeadChip were available for 1378 Duroc (DU), 2361 Landrace (LA) and 3192 Yorkshire (YO) pigs. In addition, pseudo-phenotypes (de-regressed estimated breeding values) for five economically important traits were provided for the analysis. The reference population for genotyping imputation consisted of 931 DU, 1631 LA and 2103 YO animals and the remainder individuals were included in the validation population of each breed. A LD panel of 3000 evenly spaced SNPs (LD3K) yielded high imputation accuracy rates: 93.78% (DU), 97.07% (LA) and 97.00% (YO) and high correlations (>0.97) between the predicted GEBVs using the actual 60 K SNP genotypes and the imputed 60 K SNP genotypes for all traits and breeds. The imputation accuracy was influenced by the reference population size as well as the amount of parental genotype information available in the reference population. However, parental genotype information became less important when the LD panel had at least 3000 SNPs. The correlation of the GEBVs directly increased with an increase in imputation accuracy. When genotype information for both parents was available, a panel of 300 SNPs (imputed to 60 K) yielded GEBV predictions highly correlated (⩾0.90) with genomic predictions obtained based on the true 60 K panel, for all traits and breeds. For a small reference population size with no parents on reference population, it is recommended the use of a panel at least as dense as the LD3K and, when there are two parents in the reference population, a panel as small as the LD300 might be a feasible option. These findings are of great importance for the development of LD panels for swine in order to reduce genotyping costs, increase the uptake of GS and, therefore, optimize the profitability of the swine industry.
Collapse
|
11
|
Raoul J, Swan AA, Elsen JM. Using a very low-density SNP panel for genomic selection in a breeding program for sheep. Genet Sel Evol 2017; 49:76. [PMID: 29065868 PMCID: PMC5655911 DOI: 10.1186/s12711-017-0351-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Accepted: 10/17/2017] [Indexed: 01/11/2023] Open
Abstract
Background Building an efficient reference population for genomic selection is an issue when the recorded population is small and phenotypes are poorly informed, which is often the case in sheep breeding programs. Using stochastic simulation, we evaluated a genomic design based on a reference population with medium-density genotypes [around 45 K single nucleotide polymorphisms (SNPs)] of dams that were imputed from very low-density genotypes (≤ 1000 SNPs). Methods A population under selection for a maternal trait was simulated using real genotypes. Genetic gains realized from classical selection and genomic selection designs were compared. Genomic selection scenarios that differed in reference population structure (whether or not dams were included in the reference) and genotype quality (medium-density or imputed to medium-density from very low-density) were evaluated. Results The genomic design increased genetic gain by 26% when the reference population was based on sire medium-density genotypes and by 54% when the reference population included both sire and dam medium-density genotypes. When medium-density genotypes of male candidates and dams were replaced by imputed genotypes from very low-density SNP genotypes (1000 SNPs), the increase in gain was 22% for the sire reference population and 42% for the sire and dam reference population. The rate of increase in inbreeding was lower (from − 20 to − 34%) for the genomic design than for the classical design regardless of the genomic scenario. Conclusions We show that very low-density genotypes of male candidates and dams combined with an imputation process result in a substantial increase in genetic gain for small sheep breeding programs.
Collapse
Affiliation(s)
- Jérôme Raoul
- Institut de l'Elevage, Castanet-Tolosan, France. .,GenPhySE, INRA, Castanet-Tolosan, France.
| | - Andrew A Swan
- Animal Genetics and Breeding Unit, University of New England, Armidale, Australia
| | | |
Collapse
|
12
|
Zhang Z, Xu ZQ, Luo YY, Zhang HB, Gao N, He JL, Ji CL, Zhang DX, Li JQ, Zhang XQ. Whole genomic prediction of growth and carcass traits in a Chinese quality chicken population. J Anim Sci 2017; 95:72-80. [PMID: 28177394 DOI: 10.2527/jas.2016.0823] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
By incorporating high-density markers into breeding value prediction models, the whole genomic prediction (WGP) method can effectively accelerate genetic improvement in livestock breeding. However, the performance of WGP varies across species and populations and is affected by the underlying genetic architecture. In particular, very little is known about the performance of WGP for many chicken breeds. Here we estimate the genetic parameters and evaluate the performance of WGP for 18 growth and carcass traits in a Chinese quality chicken population. In total, 435 chickens were systematically phenotyped and genotyped using a 600K genotyping array. Two variance component estimation scenarios, 3 breeding value prediction methods, and 2 validation procedures were compared. The results showed that the heritability of these 18 traits was medium to high (ranging from 0.28 to 0.60) and that deviations existed between the heritability estimated from pedigrees and markers. Compared with conventional breeding methods, WGP could potentially increase the selection accuracy by 20% or more depending on the prediction model used, the trait under consideration, and the genetic connectedness between the training and validation individuals. Our results showed the potential of implementing genomic selection in small breeding herds.
Collapse
|
13
|
Wolc A, Kranis A, Arango J, Settar P, Fulton J, O'Sullivan N, Avendano A, Watson K, Hickey J, de los Campos G, Fernando R, Garrick D, Dekkers J. Implementation of genomic selection in the poultry industry. Anim Front 2016. [DOI: 10.2527/af.2016-0004] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Affiliation(s)
- A. Wolc
- Department of Animal Science, Iowa State University, Ames, IA, USA
- Hy-Line International, Dallas Center, IA, USA
| | - A. Kranis
- Aviagen Limited, Newbridge, Midlothian, UK
- The Roslin Institute, R(D)SVS, University of Edinburgh, Edinburgh, Midlothian, UK
| | - J. Arango
- Hy-Line International, Dallas Center, IA, USA
| | - P. Settar
- Hy-Line International, Dallas Center, IA, USA
| | - J.E. Fulton
- Hy-Line International, Dallas Center, IA, USA
| | | | | | | | - J.M. Hickey
- The Roslin Institute, R(D)SVS, University of Edinburgh, Edinburgh, Midlothian, UK
| | - G. de los Campos
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA
| | - R.L. Fernando
- Department of Animal Science, Iowa State University, Ames, IA, USA
| | - D.J. Garrick
- Department of Animal Science, Iowa State University, Ames, IA, USA
| | - J.C.M. Dekkers
- Department of Animal Science, Iowa State University, Ames, IA, USA
| |
Collapse
|
14
|
Moghaddar N, Gore KP, Daetwyler HD, Hayes BJ, van der Werf JHJ. Accuracy of genotype imputation based on random and selected reference sets in purebred and crossbred sheep populations and its effect on accuracy of genomic prediction. Genet Sel Evol 2015; 47:97. [PMID: 26694131 PMCID: PMC4688977 DOI: 10.1186/s12711-015-0175-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 11/30/2015] [Indexed: 02/02/2023] Open
Abstract
Background The objectives of this study were to investigate the accuracy of genotype imputation from low (12k) to medium (50k Illumina-Ovine) SNP (single nucleotide polymorphism) densities in purebred and crossbred Merino sheep based on a random or selected reference set and to evaluate the impact of using imputed genotypes on accuracy of genomic prediction. Methods Imputation validation sets were composed of random purebred or crossbred Merinos, while imputation reference sets were of variable sizes and included random purebred or crossbred Merinos or a group of animals that were selected based on high genetic relatedness to animals in the validation set. The Beagle software program was used for imputation and accuracy of imputation was assessed based on the Pearson correlation coefficient between observed and imputed genotypes. Genomic evaluation was performed based on genomic best linear unbiased prediction and its accuracy was evaluated as the Pearson correlation coefficient between genomic estimated breeding values using either observed (12k/50k) or imputed genotypes with varying levels of imputation accuracy and accurate estimated breeding values based on progeny-tests. Results Imputation accuracy increased as the size of the reference set increased. However, accuracy was higher for purebred Merinos that were imputed from other purebred Merinos (on average 0.90 to 0.95 based on 1000 to 3000 animals) than from crossbred Merinos (0.78 to 0.87 based on 1000 to 3000 animals) or from non-Merino purebreds (on average 0.50). The imputation accuracy for crossbred Merinos based on 1000 to 3000 other crossbred Merino ranged from 0.86 to 0.88. Considerably higher imputation accuracy was observed when a selected reference set with a high genetic relationship to target animals was used vs. a random reference set of the same size (0.96 vs. 0.88, respectively). Accuracy of genomic prediction based on 50k genotypes imputed with high accuracy (0.88 to 0.99) decreased only slightly (0.0 to 0.67 % across traits) compared to using observed 50k genotypes. Accuracy of genomic prediction based on observed 12k genotypes was higher than accuracy based on lowly accurate (0.62 to 0.86) imputed 50k genotypes.
Collapse
Affiliation(s)
- Nasir Moghaddar
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia.
| | - Klint P Gore
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,Animal Genetics & Breeding Unit (AGBU), University of New England, Armidale, NSW, 2351, Australia.
| | - Hans D Daetwyler
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,Biosciences Research Division, Department of Economic Development, Jobs, Transport and Resources, Bundoora, VIC, Australia. .,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia.
| | - Ben J Hayes
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,Biosciences Research Division, Department of Economic Development, Jobs, Transport and Resources, Bundoora, VIC, Australia. .,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia.
| | - Julius H J van der Werf
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia.
| |
Collapse
|
15
|
Bolormaa S, Gore K, van der Werf JHJ, Hayes BJ, Daetwyler HD. Design of a low-density SNP chip for the main Australian sheep breeds and its effect on imputation and genomic prediction accuracy. Anim Genet 2015; 46:544-56. [DOI: 10.1111/age.12340] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/03/2015] [Indexed: 12/20/2022]
Affiliation(s)
- S. Bolormaa
- AgriBio; Centre for AgriBioscience; DEDJTR; Bundoora VIC 3083 Australia
- Cooperative Research Centre for Sheep Industry Innovation; Armidale NSW 2351 Australia
| | - K. Gore
- School of Environmental and Rural Science; University of New England; Armidale NSW 2351 Australia
| | - J. H. J. van der Werf
- Cooperative Research Centre for Sheep Industry Innovation; Armidale NSW 2351 Australia
- School of Environmental and Rural Science; University of New England; Armidale NSW 2351 Australia
| | - B. J. Hayes
- AgriBio; Centre for AgriBioscience; DEDJTR; Bundoora VIC 3083 Australia
- Cooperative Research Centre for Sheep Industry Innovation; Armidale NSW 2351 Australia
- School of Applied Systems Biology; La Trobe University; Bundoora VIC 3086 Australia
| | - H. D. Daetwyler
- AgriBio; Centre for AgriBioscience; DEDJTR; Bundoora VIC 3083 Australia
- Cooperative Research Centre for Sheep Industry Innovation; Armidale NSW 2351 Australia
- School of Applied Systems Biology; La Trobe University; Bundoora VIC 3086 Australia
| |
Collapse
|
16
|
Heidaritabar M, Calus MPL, Vereijken A, Groenen MAM, Bastiaansen JWM. Accuracy of imputation using the most common sires as reference population in layer chickens. BMC Genet 2015; 16:101. [PMID: 26282557 PMCID: PMC4539854 DOI: 10.1186/s12863-015-0253-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 07/10/2015] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Genotype imputation has become a standard practice in modern genetic research to increase genome coverage and improve the accuracy of genomic selection (GS) and genome-wide association studies (GWAS). We assessed accuracies of imputing 60K genotype data from lower density single nucleotide polymorphism (SNP) panels using a small set of the most common sires in a population of 2140 white layer chickens. Several factors affecting imputation accuracy were investigated, including the size of the reference population, the level of the relationship between the reference and validation populations, and minor allele frequency (MAF) of the SNP being imputed. RESULTS The accuracy of imputation was assessed with different scenarios using 22 and 62 carefully selected reference animals (Ref(22) and Ref(62)). Animal-specific imputation accuracy corrected for gene content was moderate on average (~ 0.80) in most scenarios and low in the 3K to 60K scenario. Maximum average accuracies were 0.90 and 0.93 for the most favourable scenario for Ref(22) and Ref(62) respectively, when SNPs were masked independent of their MAF. SNPs with low MAF were more difficult to impute, and the larger reference population considerably improved the imputation accuracy for these rare SNPs. When Ref(22) was used for imputation, the average imputation accuracy decreased by 0.04 when validation population was two instead of one generation away from the reference and increased again by 0.05 when validation was three generations away. Selecting the reference animals from the most common sires, compared with random animals from the population, considerably improved imputation accuracy for low MAF SNPs, but gave only limited improvement for other MAF classes. The allelic R(2) measure from Beagle software was found to be a good predictor of imputation reliability (correlation ~ 0.8) when the density of validation panel was very low (3K) and the MAF of the SNP and the size of the reference population were not extremely small. CONCLUSIONS Even with a very small number of animals in the reference population, reasonable accuracy of imputation can be achieved. Selecting a set of the most common sires, rather than selecting random animals for the reference population, improves the imputation accuracy of rare alleles, which may be a benefit when imputing with whole genome re-sequencing data.
Collapse
Affiliation(s)
- Marzieh Heidaritabar
- Animal Breeding and Genomics Centre, Wageningen University, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
| | - Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
| | - Addie Vereijken
- Hendrix Genetics Research, Technology and Services B.V., P.O. Box 114, 5830 AC, Boxmeer, the Netherlands.
| | - Martien A M Groenen
- Animal Breeding and Genomics Centre, Wageningen University, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
| | - John W M Bastiaansen
- Animal Breeding and Genomics Centre, Wageningen University, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
| |
Collapse
|
17
|
Abstract
Quality control filtering of single-nucleotide polymorphisms (SNPs) is a key step when analyzing genomic data. Here we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1, or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses restricted maximum likelihood (REML) to estimate heritability of gene content at each SNP and also builds a likelihood-ratio test statistic to test for zero error variance in genotyping. As a by-product, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 0.96 (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real data set with genotypes from 3534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip and a pedigree of 6473 individuals; those markers underwent very little quality control. A total of 4099 markers with P-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses all information in the population simultaneously, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided.
Collapse
|
18
|
Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications. Animal 2014; 8:1743-53. [PMID: 25045914 DOI: 10.1017/s1751731114001803] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In livestock, many studies have reported the results of imputation to 50k single nucleotide polymorphism (SNP) genotypes for animals that are genotyped with low-density SNP panels. The objective of this paper is to review different measures of correctness of imputation, and to evaluate their utility depending on the purpose of the imputed genotypes. Across studies, imputation accuracy, computed as the correlation between true and imputed genotypes, and imputation error rates, that counts the number of incorrectly imputed alleles, are commonly used measures of imputation correctness. Based on the nature of both measures and results reported in the literature, imputation accuracy appears to be a more useful measure of the correctness of imputation than imputation error rates, because imputation accuracy does not depend on minor allele frequency (MAF), whereas imputation error rate depends on MAF. Therefore imputation accuracy can be better compared across loci with different MAF. Imputation accuracy depends on the ability of identifying the correct haplotype of a SNP, but many other factors have been identified as well, including the number of genotyped immediate ancestors, the number of animals with genotypes at the high-density panel, the SNP density on the low- and high-density panel, the MAF of the imputed SNP and whether imputed SNP are located at the end of a chromosome or not. Some of these factors directly contribute to the linkage disequilibrium between imputed SNP and SNP on the low-density panel. When imputation accuracy is assessed as a predictor for the accuracy of subsequent genomic prediction, we recommend that: (1) individual-specific imputation accuracies should be used that are computed after centring and scaling both true and imputed genotypes; and (2) imputation of gene dosage is preferred over imputation of the most likely genotype, as this increases accuracy and reduces bias of the imputed genotypes and the subsequent genomic predictions.
Collapse
|
19
|
Wellmann R, Preuß S, Tholen E, Heinkel J, Wimmers K, Bennewitz J. Genomic selection using low density marker panels with application to a sire line in pigs. Genet Sel Evol 2013; 45:28. [PMID: 23895218 PMCID: PMC3750593 DOI: 10.1186/1297-9686-45-28] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2012] [Accepted: 07/05/2013] [Indexed: 12/13/2022] Open
Abstract
Background Genomic selection has become a standard tool in dairy cattle breeding. However, for other animal species, implementation of this technology is hindered by the high cost of genotyping. One way to reduce the routine costs is to genotype selection candidates with an SNP (single nucleotide polymorphism) panel of reduced density. This strategy is investigated in the present paper. Methods are proposed for the approximation of SNP positions, for selection of SNPs to be included in the low-density panel, for genotype imputation, and for the estimation of the accuracy of genomic breeding values. The imputation method was developed for a situation in which selection candidates are genotyped with an SNP panel of reduced density but have high-density genotyped sires. The dams of selection candidates are not genotyped. The methods were applied to a sire line pig population with 895 German Piétrain boars genotyped with the PorcineSNP60 BeadChip. Results Genotype imputation error rates were 0.133 for a 384 marker panel, 0.079 for a 768 marker panel, and 0.022 for a 3000 marker panel. Error rates for markers with approximated positions were slightly larger. Availability of high-density genotypes for close relatives of the selection candidates reduced the imputation error rate. The estimated decrease in the accuracy of genomic breeding values due to imputation errors was 3% for the 384 marker panel and negligible for larger panels, provided that at least one parent of the selection candidates was genotyped at high-density. Genomic breeding values predicted from deregressed breeding values with low reliabilities were more strongly correlated with the estimated BLUP breeding values than with the true breeding values. This was not the case when a shortened pedigree was used to predict BLUP breeding values, in which the parents of the individuals genotyped at high-density were considered unknown. Conclusions Genomic selection with imputation from very low- to high-density marker panels is a promising strategy for the implementation of genomic selection at acceptable costs. A panel size of 384 markers can be recommended for selection candidates of a pig breeding program if at least one parent is genotyped at high-density, but this appears to be the lower bound.
Collapse
Affiliation(s)
- Robin Wellmann
- Institute of Animal Husbandry and Animal Breeding, University of Hohenheim, D-70599 Stuttgart, Germany.
| | | | | | | | | | | |
Collapse
|