1
|
Dadousis C, Ablondi M, Cipolat-Gotet C, van Kaam JT, Finocchiaro R, Marusi M, Cassandro M, Sabbioni A, Summer A. Genomic inbreeding coefficients using imputation genotypes: assessing the effect of ancestral genotyping in Holstein-Friesian dairy cows. J Dairy Sci 2024:S0022-0302(24)00545-9. [PMID: 38490541 DOI: 10.3168/jds.2024-24042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 02/14/2024] [Indexed: 03/17/2024]
Abstract
The objective of this study was to assess the effect of using or not the genotypes of the parents of a cow for imputing single nucleotide polymorphisms (SNP), on the estimation of genomic inbreeding coefficients of cows. Imputation (i.e., genotyped plus imputed) genotypes from 68,127 Italian Holstein dairy cows registered in the Italian National Association of Holstein, Brown and Jersey Breeders (ANAFIBJ) were analyzed. Cows were genotyped with the HD Illumina Infinium BovineHD BeadChip and GeneSeek Genomic Profiler HD-150K, and the MD GeneSeek Genomic Profiler 3, GeneSeek Genomic Profiler 4, GeneSeek MD and the Labogena MD. To assess differences among estimators genomic inbreeding coefficients were estimated with 4 PLINK v1.9 estimators (F, Fhat1, 2, 3), 2 genomic relationship matrix (grm) based estimators (Fgrm and Fgrm2; with the latter including also pedigree information) and one estimator of runs of homozygosity (ROH; FROH). Assuming that the correct genomic inbreeding coefficients should be those estimated from genotyped SNP, a comparison of the genomic inbreeding coefficients estimated either with the genotyped SNP or the SNP after imputation was made. Information on the presence or absence of genotypic information from sire, dam and maternal grandsire during the imputation was investigated. Genomic inbreeding coefficients estimated with genotyped SNP or SNP after imputation were consistent for F, Fhat3, Fgrm2 and FROH, when at least one of the parents was genotyped. Biased (mainly higher) genomic inbreeding coefficients of imputation SNP were observed in cows that were genotyped with MD SNP panels whose SNP were poorly represented in the selected imputation SNP data set and also did not have their parents genotyped compared with what expected based on actual genotype data. For cows genotyped with MD the estimators Fhat1, Fhat2 and Fgrm provided higher genomic inbreeding coefficients of imputation SNP even with both parents and the maternal grandsire genotyped. Overall, FROH was the most robust estimator, followed by F and Fhat3. Our findings suggest that SNP selection, parental genotyping and estimator should be considered for designing imputation strategies in dairy cattle for estimating genomic inbreeding with imputation SNP. For computing genomic inbreeding coefficients, it is recommendable to have at least one parent genotyped and use an ROH based estimator.
Collapse
Affiliation(s)
- Christos Dadousis
- Department of Veterinary Science, University of Parma, 43126 Parma, Italy
| | - Michela Ablondi
- Department of Veterinary Science, University of Parma, 43126 Parma, Italy
| | | | - Jan-Thijs van Kaam
- Associazione Nazionale Allevatori della Razza Frisona, Bruna e Jersey Italiana, (ANAFIBJ), 26100 Cremona, Italy
| | - Raffaella Finocchiaro
- Associazione Nazionale Allevatori della Razza Frisona, Bruna e Jersey Italiana, (ANAFIBJ), 26100 Cremona, Italy
| | - Maurizio Marusi
- Associazione Nazionale Allevatori della Razza Frisona, Bruna e Jersey Italiana, (ANAFIBJ), 26100 Cremona, Italy
| | - Martino Cassandro
- Associazione Nazionale Allevatori della Razza Frisona, Bruna e Jersey Italiana, (ANAFIBJ), 26100 Cremona, Italy
| | - Alberto Sabbioni
- Department of Veterinary Science, University of Parma, 43126 Parma, Italy
| | - Andrea Summer
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, 35020 Legnaro (PD), Italy
| |
Collapse
|
2
|
Kriaridou C, Tsairidou S, Fraslin C, Gorjanc G, Looseley ME, Johnston IA, Houston RD, Robledo D. Evaluation of low-density SNP panels and imputation for cost-effective genomic selection in four aquaculture species. Front Genet 2023; 14:1194266. [PMID: 37252666 PMCID: PMC10213886 DOI: 10.3389/fgene.2023.1194266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 04/26/2023] [Indexed: 05/31/2023] Open
Abstract
Genomic selection can accelerate genetic progress in aquaculture breeding programmes, particularly for traits measured on siblings of selection candidates. However, it is not widely implemented in most aquaculture species, and remains expensive due to high genotyping costs. Genotype imputation is a promising strategy that can reduce genotyping costs and facilitate the broader uptake of genomic selection in aquaculture breeding programmes. Genotype imputation can predict ungenotyped SNPs in populations genotyped at a low-density (LD), using a reference population genotyped at a high-density (HD). In this study, we used datasets of four aquaculture species (Atlantic salmon, turbot, common carp and Pacific oyster), phenotyped for different traits, to investigate the efficacy of genotype imputation for cost-effective genomic selection. The four datasets had been genotyped at HD, and eight LD panels (300-6,000 SNPs) were generated in silico. SNPs were selected to be: i) evenly distributed according to physical position ii) selected to minimise the linkage disequilibrium between adjacent SNPs or iii) randomly selected. Imputation was performed with three different software packages (AlphaImpute2, FImpute v.3 and findhap v.4). The results revealed that FImpute v.3 was faster and achieved higher imputation accuracies. Imputation accuracy increased with increasing panel density for both SNP selection methods, reaching correlations greater than 0.95 in the three fish species and 0.80 in Pacific oyster. In terms of genomic prediction accuracy, the LD and the imputed panels performed similarly, reaching values very close to the HD panels, except in the pacific oyster dataset, where the LD panel performed better than the imputed panel. In the fish species, when LD panels were used for genomic prediction without imputation, selection of markers based on either physical or genetic distance (instead of randomly) resulted in a high prediction accuracy, whereas imputation achieved near maximal prediction accuracy independently of the LD panel, showing higher reliability. Our results suggests that, in fish species, well-selected LD panels may achieve near maximal genomic selection prediction accuracy, and that the addition of imputation will result in maximal accuracy independently of the LD panel. These strategies represent effective and affordable methods to incorporate genomic selection into most aquaculture settings.
Collapse
Affiliation(s)
- Christina Kriaridou
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Smaragda Tsairidou
- Global Academy of Agriculture and Food Systems, University of Edinburgh, Edinburgh, United Kingdom
| | - Clémence Fraslin
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | | | | | - Ross D. Houston
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
- Benchmark Genetics, Penicuik, United Kingdom
| | - Diego Robledo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
3
|
Dadousis C, Ablondi M, Cipolat-Gotet C, van Kaam JT, Finocchiaro R, Marusi M, Cassandro M, Sabbioni A, Summer A. Genomic inbreeding coefficients using imputed genotypes: assessing differences among SNP panels in Holstein-Friesian dairy cows. Front Vet Sci 2023; 10:1142476. [PMID: 37187928 PMCID: PMC10180025 DOI: 10.3389/fvets.2023.1142476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 03/31/2023] [Indexed: 05/17/2023] Open
Abstract
The objective of this study was to evaluate the effect of imputation of single nucleotide polymorphisms (SNP) on the estimation of genomic inbreeding coefficients. Imputed genotypes of 68,127 Italian Holstein dairy cows were analyzed. Cows were initially genotyped with two high density (HD) SNP panels, namely the Illumina Infinium BovineHD BeadChip (678 cows; 777,962 SNP) and the Genomic Profiler HD-150K (641 cows; 139,914 SNP), and four medium density (MD): GeneSeek Genomic Profiler 3 (10,679 cows; 26,151 SNP), GeneSeek Genomic Profiler 4 (33,394 cows; 30,113 SNP), GeneSeek MD (12,030 cows; 47,850 SNP) and the Labogena MD (10,705 cows; 41,911 SNP). After imputation, all cows had genomic information on 84,445 SNP. Seven genomic inbreeding estimators were tested: (i) four PLINK v1.9 estimators (F, Fhat1,2,3), (ii) two genomic relationship matrix (grm) estimators [VanRaden's 1st method, but with observed allele frequencies (Fgrm) and VanRaden's 3rd method that is allelic free and pedigree dependent (Fgrm2)], and (iii) a runs of homozygosity (roh) - based estimator (Froh). Genomic inbreeding coefficients of each SNP panel were compared with genomic inbreeding coefficients derived from the 84,445 imputation SNP. Coefficients of the HD SNP panels were consistent between genotyped-imputed SNP (Pearson correlations ~99%), while variability across SNP panels and estimators was observed in the MD SNP panels, with Labogena MD providing, on average, more consistent estimates. The robustness of Labogena MD, can be partly explained by the fact that 97.85% of the SNP of this panel is included in the 84,445 SNP selected by ANAFIBJ for routine genomic imputations, while this percentage for the other MD SNP panels varied between 55 and 60%. Runs of homozygosity was the most robust estimator. Genomic inbreeding estimates using imputation SNP are influenced by the SNP number of the SNP panel that are included in the imputed SNP, and performance of genomic inbreeding estimators depends on the imputation.
Collapse
Affiliation(s)
- Christos Dadousis
- Department of Veterinary Science, University of Parma, Parma, Italy
- *Correspondence: Christos Dadousis
| | - Michela Ablondi
- Department of Veterinary Science, University of Parma, Parma, Italy
| | | | - Jan-Thijs van Kaam
- Associazione Nazionale Allevatori della Razza Frisona Bruna e Jersey Italiana (ANAFIBJ), Cremona, Italy
| | - Raffaella Finocchiaro
- Associazione Nazionale Allevatori della Razza Frisona Bruna e Jersey Italiana (ANAFIBJ), Cremona, Italy
| | - Maurizio Marusi
- Associazione Nazionale Allevatori della Razza Frisona Bruna e Jersey Italiana (ANAFIBJ), Cremona, Italy
| | - Martino Cassandro
- Associazione Nazionale Allevatori della Razza Frisona Bruna e Jersey Italiana (ANAFIBJ), Cremona, Italy
- Department of Agronomy, Food, Natural Resources, Animals, and Environment, University of Padova, Legnaro, Italy
| | - Alberto Sabbioni
- Department of Veterinary Science, University of Parma, Parma, Italy
| | - Andrea Summer
- Department of Veterinary Science, University of Parma, Parma, Italy
| |
Collapse
|
4
|
Dadousis C, Ablondi M, Cipolat-Gotet C, van Kaam JT, Marusi M, Cassandro M, Sabbioni A, Summer A. Genomic inbreeding coefficients using imputed genotypes: Assessing different estimators in Holstein-Friesian dairy cows. J Dairy Sci 2022; 105:5926-5945. [DOI: 10.3168/jds.2021-21125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Accepted: 03/08/2022] [Indexed: 11/19/2022]
|
5
|
Joukhadar R, Daetwyler HD. Data Integration, Imputation, and Meta-analysis for Genome-Wide Association Studies. Methods Mol Biol 2022; 2481:173-183. [PMID: 35641765 DOI: 10.1007/978-1-0716-2237-7_11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Growing genomic and phenotypic datasets require different groups around the world to collaborate and integrate these valuable resources to maximize their benefit and increase reference population sizes for genomic prediction and genome-wide association studies (GWAS). However, different studies use different genotyping techniques which requires a synchronizing step for the genotyped variants called "imputation" before combining them. Optimally, different GWAS datasets can be analysed within a meta-analysis, which recruits summary statistics instead of actual data. This chapter describes the general principles for genotypic imputation and meta-GWAS analysis with a description of study designs and command lines required for such analyses.
Collapse
Affiliation(s)
- Reem Joukhadar
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
| | - Hans D Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia.
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia.
| |
Collapse
|
6
|
Zhang Z, Ma P, Zhang Z, Wang Z, Wang Q, Pan Y. The construction of a haplotype reference panel using extremely low coverage whole genome sequences and its application in genome-wide association studies and genomic prediction in Duroc pigs. Genomics 2021; 114:340-350. [PMID: 34929285 DOI: 10.1016/j.ygeno.2021.12.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 10/11/2021] [Accepted: 12/15/2021] [Indexed: 12/30/2022]
Abstract
Extremely low coverage whole genome sequencing (lcWGS) is an economical technique to obtain high-density single nucleotide polymorphisms (SNPs). Here, we explored the feasibility of constructing a haplotype reference panel (lcHRP) using lcWGS and evaluated the effects of lcHRP through a genome-wide association study (GWAS) and genomic prediction in pigs. A total of 297 and 974 Duroc pigs were genotyped using lcWGS and a 50 K SNP array, respectively. We obtained 19,306,498 SNPs using lcWGS with an accuracy of 0.984. With the help of lcHRP, the accuracy of imputation from the SNP array to lcWGS was 0.922. Compared to the SNP array findings, those from the imputation-based GWAS identified more signals across four traits. With the integration of the top 1% imputation-based GWAS findings as genomic features, the accuracies of genomic prediction was improved by 6.0% to 13.2%. This study showed the great potential of lcWGS in pigs' molecular breeding.
Collapse
Affiliation(s)
- Zhe Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, PR China
| | - Peipei Ma
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, PR China
| | - Zhenyang Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, PR China
| | - Zhen Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, PR China
| | - Qishan Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, PR China.
| | - Yuchun Pan
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, PR China; Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya 572000, China.
| |
Collapse
|
7
|
Money D, Wilson D, Jenko J, Whalen A, Thorn S, Gorjanc G, Hickey JM. Extending long-range phasing and haplotype library imputation algorithms to large and heterogeneous datasets. Genet Sel Evol 2020; 52:38. [PMID: 32640985 PMCID: PMC7346379 DOI: 10.1186/s12711-020-00558-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Accepted: 06/26/2020] [Indexed: 12/12/2022] Open
Abstract
Background We describe the latest improvements to the long-range phasing (LRP) and haplotype library imputation (HLI) algorithms for successful phasing of both datasets with one million individuals and datasets genotyped using different sets of single nucleotide polymorphisms (SNPs). Previous publicly available implementations of the LRP algorithm implemented in AlphaPhase could not phase large datasets due to the computational cost of defining surrogate parents by exhaustive all-against-all searches. Furthermore, the AlphaPhase implementations of LRP and HLI were not designed to deal with large amounts of missing data that are inherent when using multiple SNP arrays. Methods We developed methods that avoid the need for all-against-all searches by performing LRP on subsets of individuals and then concatenating the results. We also extended LRP and HLI algorithms to enable the use of different sets of markers, including missing values, when determining surrogate parents and identifying haplotypes. We implemented and tested these extensions in an updated version of AlphaPhase, and compared its performance to the software package Eagle2. Results A simulated dataset with one million individuals genotyped with the same 6711 SNPs for a single chromosome took less than a day to phase, compared to more than seven days for Eagle2. The percentage of correctly phased alleles at heterozygous loci was 90.2 and 99.9% for AlphaPhase and Eagle2, respectively. A larger dataset with one million individuals genotyped with 49,579 SNPs for a single chromosome took AlphaPhase 23 days to phase, with 89.9% of alleles at heterozygous loci phased correctly. The phasing accuracy was generally lower for datasets with different sets of markers than with one set of markers. For a simulated dataset with three sets of markers, 1.5% of alleles at heterozygous positions were phased incorrectly, compared to 0.4% with one set of markers. Conclusions The improved LRP and HLI algorithms enable AlphaPhase to quickly and accurately phase very large and heterogeneous datasets. AlphaPhase is an order of magnitude faster than the other tested packages, although Eagle2 showed a higher level of phasing accuracy. The speed gain will make phasing achievable for very large genomic datasets in livestock, enabling more powerful breeding and genetics research and application.
Collapse
Affiliation(s)
- Daniel Money
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - David Wilson
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Janez Jenko
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Andrew Whalen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Steve Thorn
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| |
Collapse
|
8
|
Rowan TN, Hoff JL, Crum TE, Taylor JF, Schnabel RD, Decker JE. A multi-breed reference panel and additional rare variants maximize imputation accuracy in cattle. Genet Sel Evol 2019; 51:77. [PMID: 31878893 PMCID: PMC6933688 DOI: 10.1186/s12711-019-0519-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Accepted: 12/16/2019] [Indexed: 01/08/2023] Open
Abstract
Background During the last decade, the use of common-variant array-based single nucleotide polymorphism (SNP) genotyping in the beef and dairy industries has produced an astounding amount of medium-to-low density genomic data. Although low-density assays work well in the context of genomic prediction, they are less useful for detecting and mapping causal variants and the effects of rare variants are not captured. The objective of this project was to maximize the accuracies of genotype imputation from medium- and low-density assays to the marker set obtained by combining two high-density research assays (~ 850,000 SNPs), the Illumina BovineHD and the GGP-F250 assays, which contains a large proportion of rare and potentially functional variants and for which the assay design is described here. This 850 K SNP set is useful for both imputation to sequence-level genotypes and direct downstream analysis. Results We found that a large multi-breed composite imputation reference panel that includes 36,131 samples with either BovineHD and/or GGP-F250 genotypes significantly increased imputation accuracy compared with a within-breed reference panel, particularly at variants with low minor allele frequencies. Individual animal imputation accuracies were maximized when more genetically similar animals were represented in the composite reference panel, particularly with complete 850 K genotypes. The addition of rare variants from the GGP-F250 assay to our composite reference panel significantly increased the imputation accuracy of rare variants that are exclusively present on the BovineHD assay. In addition, we show that an assay marker density of 50 K SNPs balances cost and accuracy for imputation to 850 K. Conclusions Using high-density genotypes on all available individuals in a multi-breed reference panel maximized imputation accuracy for tested cattle populations. Admixed animals or those from breeds with a limited representation in the composite reference panel were still imputed at high accuracy, which is expected to further increase as the reference panel expands. We anticipate that the addition of rare variants from the GGP-F250 assay will increase the accuracy of imputation to sequence level.
Collapse
Affiliation(s)
- Troy N Rowan
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Jesse L Hoff
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Tamar E Crum
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Jeremy F Taylor
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Robert D Schnabel
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA. .,Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Jared E Decker
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA. .,Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
9
|
Butty AM, Sargolzaei M, Miglior F, Stothard P, Schenkel FS, Gredler-Grandl B, Baes CF. Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants. Front Genet 2019; 10:510. [PMID: 31214246 PMCID: PMC6554347 DOI: 10.3389/fgene.2019.00510] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 05/10/2019] [Indexed: 11/29/2022] Open
Abstract
Imputation of high-density genotypes to whole-genome sequences (WGS) is a cost-effective method to increase the density of available markers within a population. Imputed genotypes have been successfully used for genomic selection and discovery of variants associated with traits of interest for the population. To allow for the use of imputed genotypes for genomic analyses, accuracy of imputation must be high. Accuracy of imputation is influenced by multiple factors, such as size and composition of the reference group, and the allele frequency of variants included. Understanding the use of imputed WGSs prior to the generation of the reference population is important, as accurate imputation might be more focused, for instance, on common or on rare variants. The aim of this study was to present and evaluate new methods to select animals for sequencing relying on a previously genotyped population. The Genetic Diversity Index method optimizes the number of unique haplotypes in the future reference population, while the Highly Segregating Haplotype selection method targets haplotype alleles found throughout the majority of the population of interest. First the WGSs of a dairy cattle population were simulated. The simulated sequences mimicked the linkage disequilibrium level and the variants’ frequency distribution observed in currently available Holstein sequences. Then, reference populations of different sizes, in which animals were selected using both novel methods proposed here as well as two other methods presented in previous studies, were created. Finally, accuracies of imputation obtained with different reference populations were compared against each other. The novel methods were found to have overall accuracies of imputation of more than 0.85. Accuracies of imputation of rare variants reached values above 0.50. In conclusion, if imputed sequences are to be used for discovery of novel associations between variants and traits of interest in the population, animals carrying novel information should be selected and, consequently, the Genetic Diversity Index method proposed here may be used. If sequences are to be used to impute the overall genotyped population, a reference population consisting of common haplotypes carriers selected using the proposed Highly Segregating Haplotype method is recommended.
Collapse
Affiliation(s)
- Adrien M Butty
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada
| | - Mehdi Sargolzaei
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada.,Select Sires Inc., Plain City, OH, United States
| | - Filippo Miglior
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada
| | - Paul Stothard
- Department of Agricultural, Food & Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - Flavio S Schenkel
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada
| | - Birgit Gredler-Grandl
- Qualitas AG, Zug, Switzerland.,Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Wageningen, Netherlands
| | - Christine F Baes
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada.,Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| |
Collapse
|
10
|
Alipour H, Bai G, Zhang G, Bihamta MR, Mohammadi V, Peyghambari SA. Imputation accuracy of wheat genotyping-by-sequencing (GBS) data using barley and wheat genome references. PLoS One 2019; 14:e0208614. [PMID: 30615624 PMCID: PMC6322752 DOI: 10.1371/journal.pone.0208614] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 11/20/2018] [Indexed: 02/04/2023] Open
Abstract
Genotyping-by-sequencing (GBS) provides high SNP coverage and has recently emerged as a popular technology for genetic and breeding applications in bread wheat (Triticum aestivum L.) and many other plant species. Although GBS can discover millions of SNPs, a high rate of missing data is a major concern for many applications. Accurate imputation of those missing data can significantly improve the utility of GBS data. This study compared imputation accuracies among four genome references including three wheat references (Chinese Spring survey sequence, W7984, and IWGSC RefSeq v1.0) and one barley reference genome by comparing imputed data derived from low-depth sequencing to actual data from high-depth sequencing. After imputation, the average number of imputed data points was the highest in the B genome (~48.99%). The D genome had the lowest imputed data points (~15.02%) but the highest imputation accuracy. Among the four reference genomes, IWGSC RefSeq v1.0 reference provided the most imputed data points, but the lowest imputation accuracy for the SNPs with < 10% minor allele frequency (MAF). The W7984 reference, however, provided the highest imputation accuracy for the SNPs with < 10% MAF.
Collapse
Affiliation(s)
- Hadi Alipour
- Department of Agronomy, Kansas State University, Manhattan, Kansas, United States of America
- Department of Plant Breeding and Biotechnology, Faculty of Agriculture, Urmia University, Urmia, Iran
| | - Guihua Bai
- USDA-ARS, Hard Winter Wheat Genetics Research Unit, Manhattan, Kansas, United States of America
| | - Guorong Zhang
- Department of Agronomy, Kansas State University, Manhattan, Kansas, United States of America
- * E-mail:
| | - Mohammad Reza Bihamta
- Department of Agronomy and Plant Breeding, Faculty of Agriculture, University of Tehran, Karaj, Iran
| | - Valiollah Mohammadi
- Department of Agronomy and Plant Breeding, Faculty of Agriculture, University of Tehran, Karaj, Iran
| | - Seyed Ali Peyghambari
- Department of Agronomy and Plant Breeding, Faculty of Agriculture, University of Tehran, Karaj, Iran
| |
Collapse
|