1
|
Khan MZ, Dari G, Khan A, Yu Y. Genetic polymorphisms of TRAPPC9 and CD4 genes and their association with milk production and mastitis resistance phenotypic traits in Chinese Holstein. Front Vet Sci 2022; 9:1008497. [PMID: 36213405 PMCID: PMC9540853 DOI: 10.3389/fvets.2022.1008497] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 08/26/2022] [Indexed: 11/13/2022] Open
Abstract
The present study was designed to evaluate the association of polymorphisms in bovine trafficking protein particle complex subunit 9 (TRAPPC9) and cluster of differentiation 4 (CD4) genes with milk production and mastitis resistance phenotypic traits in a different cattle population. Three single nucleotide polymorphisms (SNPs) (SNP1 Position: Chr14:2484891, SNP2 (rs110017379), SNP3 Position: Chr14:2525852) in bovine TRAPPC9 and one SNP (Position: Chr5:104010752) in CD4 were screened through Chinese Cow's SNPs Chip-I (CCSC-I) and genotyped in a population of 312 Chinese Holsteins (156: Mastitis, 156: Healthy). The results were analyzed using the general linear model in SAS 9.4. Our analysis revealed that milk protein percentage, somatic cell count (SCC), somatic cell score (SCS), serum cytokines interleukin 6 (IL-6) and interferon-gamma (IFN-γ) were significantly (P < 0.05) associated with at least one or more identified SNPs of TRAPPC9 and CD4 genes. Furthermore, the expression status of SNPs in CD4 and TRAPPC9 genes were verified through RT-qPCR. The expression analysis showed that genotypes GG in SNP3 of TRAPPC9 and TT genotype in SNP4 of CD4 showed higher expression level compared to other genotypes. The GG genotype in SNP2 and TT genotype in SNP3 of TRAPPC9 were associated with higher bovine milk SCC and lower IL6. Altogether, our findings suggested that the SNPs of TRAPPC9 and CD4 genes could be useful genetic markers in selection for milk protein improvement and mastitis resistance phenotypic traits in dairy cattle. The CCSC-I used in current study is proposed to be validate in different and large population of dairy cattle not only in China but also in other countries. Moreover, our analyses recommended that besides SCC and SCS, the association of genetic markers could also be considered with the serum cytokines (IL-6, IFN-γ) while selecting genetically mastitis resistance dairy cattle.
Collapse
Affiliation(s)
- Muhammad Zahoor Khan
- Key Laboratory of Animal Genetics, Breeding, and Reproduction, Ministry of Agriculture and National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
- Faculty of Veterinary and Animal Sciences, Department of Animal Breeding and Genetics, The University of Agriculture, Dera Ismail Khan, Pakistan
| | - Gerile Dari
- Key Laboratory of Animal Genetics, Breeding, and Reproduction, Ministry of Agriculture and National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Adnan Khan
- Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Ying Yu
- Key Laboratory of Animal Genetics, Breeding, and Reproduction, Ministry of Agriculture and National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
- *Correspondence: Ying Yu
| |
Collapse
|
2
|
Nawaz MY, Bernardes PA, Savegnago RP, Lim D, Lee SH, Gondro C. Evaluation of Whole-Genome Sequence Imputation Strategies in Korean Hanwoo Cattle. Animals (Basel) 2022; 12:ani12172265. [PMID: 36077985 PMCID: PMC9454883 DOI: 10.3390/ani12172265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 08/25/2022] [Accepted: 08/30/2022] [Indexed: 11/29/2022] Open
Abstract
Simple Summary In this study, we evaluated various imputation strategies for the Korean Hanwoo cattle. We observed that a large reference panel consisting of many cattle breeds did not improve the imputation accuracy when compared to a proportionally small purebred Hanwoo reference. This was because the multi-breed reference did not contain animals sufficiently related to the Hanwoo to improve the accuracies and, although not detrimental, in effect, only added to the computational burden of the imputation. Despite the large multi-breed reference, when the Hanwoo were removed from the reference, the imputation accuracies were low. These results suggest additional sequencing efforts are needed for underrepresented breeds, particularly those less genetically related to the main European breeds. Abstract This study evaluated the accuracy of sequence imputation in Hanwoo beef cattle using different reference panels: a large multi-breed reference with no Hanwoo (n = 6269), a much smaller Hanwoo purebred reference (n = 88), and both datasets combined (n = 6357). The target animals were 136 cattle both sequenced and genotyped with the Illumina BovineSNP50 v2 (50K). The average imputation accuracy measured by the Pearson correlation (R) was 0.695 with the multi-breed reference, 0.876 with the purebred Hanwoo, and 0.887 with the combined data; the average concordance rates (CR) were 88.16%, 94.49%, and 94.84%, respectively. The accuracy gains from adding a large multi-breed reference of 6269 samples to only 88 Hanwoo was marginal; however, the concordance rate for the heterozygotes decreased from 85% to 82%, and the concordance rate for fixed SNPs in Hanwoo also decreased from 99.98% to 98.73%. Although the multi-breed panel was large, it was not sufficiently representative of the breed for accurate imputation without the Hanwoo animals. Additionally, we evaluated the value of high-density 700K genotypes (n = 991) as an intermediary step in the imputation process. The imputation accuracy differences were negligible between a single-step imputation strategy from 50K directly to sequence and a two-step imputation approach (50K-700K-sequence). We also observed that imputed sequence data can be used as a reference panel for imputation (mean R = 0.9650, mean CR = 98.35%). Finally, we identified 31 poorly imputed genomic regions in the Hanwoo genome and demonstrated that imputation accuracies were particularly lower at the chromosomal ends.
Collapse
Affiliation(s)
- Muhammad Yasir Nawaz
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI 48824, USA
- Correspondence: (M.Y.N.); (C.G.)
| | - Priscila Arrigucci Bernardes
- Department of Animal Science and Rural Development, Federal University of Santa Catarina, Florianopolis 88034-000, SC, Brazil
| | | | - Dajeong Lim
- Animal Genome & Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea
| | - Seung Hwan Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 305764, Korea
| | - Cedric Gondro
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
- Correspondence: (M.Y.N.); (C.G.)
| |
Collapse
|
3
|
Mdyogolo S, MacNeil MD, Neser FWC, Scholtz MM, Makgahlela ML. Assessing accuracy of genotype imputation in the Afrikaner and Brahman cattle breeds of South Africa. Trop Anim Health Prod 2022; 54:90. [PMID: 35133512 DOI: 10.1007/s11250-022-03102-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 02/01/2022] [Indexed: 11/26/2022]
Abstract
Imputation may be used to rescue genomic data from animals that would otherwise be eliminated due to a lower than desired call rate. The aim of this study was to compare the accuracy of genotype imputation for Afrikaner, Brahman, and Brangus cattle of South Africa using within- and multiple-breed reference populations. A total of 373, 309, and 101 Afrikaner, Brahman, and Brangus cattle, respectively, were genotyped using the GeneSeek Genomic Profiler 150 K panel that contained 141,746 markers. Markers with MAF ≤ 0.02 and call rates ≤ 0.95 or that deviated from Hardy Weinberg Equilibrium frequency with a probability of ≤ 0.0001 were excluded from the data as were animals with a call rate ≤ 0.90. The remaining data included 99,086 SNPs and 360 Afrikaner, 75,291 SNPs and 288 animals Brahman, and 97,897 SNPs and 99 Brangus animals. A total of 7986, 7002, and 7000 SNP from 50 Afrikaner and Brahman and 30 Brangus cattle, respectively, were masked and then imputed using BEAGLE v3 and FImpute v2. The within-breed imputation yielded accuracies ranging from 89.9 to 96.6% for the three breeds. The multiple-breed imputation yielded corresponding accuracies from 69.21 to 88.35%. The results showed that population homogeneity and numerical representation for within and across breed strategies, respectively, are crucial components for improving imputation accuracies.
Collapse
Affiliation(s)
- S Mdyogolo
- Department of Animal Breeding and Genetics, Agricultural Research Council, Irene, South Africa.
- Department of Animal, Wildlife and Grassland Sciences, University of the Free State, Bloemfontein, South Africa.
| | - M D MacNeil
- Department of Animal Breeding and Genetics, Agricultural Research Council, Irene, South Africa
- Department of Animal, Wildlife and Grassland Sciences, University of the Free State, Bloemfontein, South Africa
- Delta G, Miles City, MT, USA
| | - F W C Neser
- Department of Animal, Wildlife and Grassland Sciences, University of the Free State, Bloemfontein, South Africa
| | - M M Scholtz
- Department of Animal Breeding and Genetics, Agricultural Research Council, Irene, South Africa
- Department of Animal, Wildlife and Grassland Sciences, University of the Free State, Bloemfontein, South Africa
| | - M L Makgahlela
- Department of Animal Breeding and Genetics, Agricultural Research Council, Irene, South Africa
- Department of Animal, Wildlife and Grassland Sciences, University of the Free State, Bloemfontein, South Africa
| |
Collapse
|
4
|
Toro Ospina AM, Aguilar I, Vargas de Oliveira MH, Cruz Dos Santos Correia LE, Vercesi Filho AE, Albuquerque LG, de Vasconcelos Silva JAI. Assessing the accuracy of imputation in the Gyr breed using different SNP panels. Genome 2021; 64:893-899. [PMID: 34057850 DOI: 10.1139/gen-2020-0081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The aim of this study was to evaluate the accuracy of imputation in a Gyr population using two medium-density panels (Bos taurus - Bos indicus) and to test whether the inclusion of the Nellore breed increases the imputation accuracy in the Gyr population. The database consisted of 289 Gyr females from Brazil genotyped with the GGP Bovine LDv4 chip containing 30 000 SNPs and 158 Gyr females from Colombia genotyped with the GGP indicus chip containing 35 000 SNPs. A customized chip was created that contained the information of 9109 SNPs (9K) to test the imputation accuracy in Gyr populations; 604 Nellore animals with information of LD SNPs tested in the scenarios were included in the reference population. Four scenarios were tested: LD9K_30KGIR, LD9K_35INDGIR, LD9K_30KGIR_NEL, and LD9K_35INDGIR_NEL. Principal component analysis (PCA) was computed for the genomic matrix and sample-specific imputation accuracies were calculated using Pearson's correlation (CS) and the concordance rate (CR) for imputed genotypes. The results of PCA of the Colombian and Brazilian Gyr populations demonstrated the genomic relationship between the two populations. The CS and CR ranged from 0.88 to 0.94 and from 0.93 to 0.96, respectively. Among the scenarios tested, the highest CS (0.94) was observed for the LD9K_30KGIR scenario. The present results highlight the importance of the choice of chip for imputation in the Gyr breed. However, the variation in SNPs may reduce the imputation accuracy even when the chip of the Bos indicus subspecies is used.
Collapse
Affiliation(s)
| | - Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria, INIA, Montevideo, Uruguay
| | | | | | | | - Lucia Galvão Albuquerque
- Faculdade de Ciências Agrárias e Veterinárias - Unesp, CEP 14.884-900, Jaboticabal, São Paulo, Brasil
| | | |
Collapse
|
5
|
Geibel J, Reimer C, Pook T, Weigend S, Weigend A, Simianer H. How imputation can mitigate SNP ascertainment Bias. BMC Genomics 2021; 22:340. [PMID: 33980139 PMCID: PMC8114708 DOI: 10.1186/s12864-021-07663-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 04/28/2021] [Indexed: 12/30/2022] Open
Abstract
Background Population genetic studies based on genotyped single nucleotide polymorphisms (SNPs) are influenced by a non-random selection of the SNPs included in the used genotyping arrays. The resulting bias in the estimation of allele frequency spectra and population genetics parameters like heterozygosity and genetic distances relative to whole genome sequencing (WGS) data is known as SNP ascertainment bias. Full correction for this bias requires detailed knowledge of the array design process, which is often not available in practice. This study suggests an alternative approach to mitigate ascertainment bias of a large set of genotyped individuals by using information of a small set of sequenced individuals via imputation without the need for prior knowledge on the array design. Results The strategy was first tested by simulating additional ascertainment bias with a set of 1566 chickens from 74 populations that were genotyped for the positions of the Affymetrix Axiom™ 580 k Genome-Wide Chicken Array. Imputation accuracy was shown to be consistently higher for populations used for SNP discovery during the simulated array design process. Reference sets of at least one individual per population in the study set led to a strong correction of ascertainment bias for estimates of expected and observed heterozygosity, Wright’s Fixation Index and Nei’s Standard Genetic Distance. In contrast, unbalanced reference sets (overrepresentation of populations compared to the study set) introduced a new bias towards the reference populations. Finally, the array genotypes were imputed to WGS by utilization of reference sets of 74 individuals (one per population) to 98 individuals (additional commercial chickens) and compared with a mixture of individually and pooled sequenced populations. The imputation reduced the slope between heterozygosity estimates of array data and WGS data from 1.94 to 1.26 when using the smaller balanced reference panel and to 1.44 when using the larger but unbalanced reference panel. This generally supported the results from simulation but was less favorable, advocating for a larger reference panel when imputing to WGS. Conclusions The results highlight the potential of using imputation for mitigation of SNP ascertainment bias but also underline the need for unbiased reference sets. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07663-6.
Collapse
Affiliation(s)
- Johannes Geibel
- Department of Animal Sciences, Animal Breeding and Genetics Group, University of Goettingen, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany. .,Center for Integrated Breeding Research, University of Goettingen, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany.
| | - Christian Reimer
- Department of Animal Sciences, Animal Breeding and Genetics Group, University of Goettingen, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany.,Center for Integrated Breeding Research, University of Goettingen, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany
| | - Torsten Pook
- Department of Animal Sciences, Animal Breeding and Genetics Group, University of Goettingen, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany.,Center for Integrated Breeding Research, University of Goettingen, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany
| | - Steffen Weigend
- Center for Integrated Breeding Research, University of Goettingen, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany.,Institute of Farm Animal Genetics, Friedrich-Loeffler-Institut, Höltystrasse 10, 31535, Neustadt-Mariensee, Germany
| | - Annett Weigend
- Institute of Farm Animal Genetics, Friedrich-Loeffler-Institut, Höltystrasse 10, 31535, Neustadt-Mariensee, Germany
| | - Henner Simianer
- Department of Animal Sciences, Animal Breeding and Genetics Group, University of Goettingen, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany.,Center for Integrated Breeding Research, University of Goettingen, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany
| |
Collapse
|
6
|
Gebrehiwot NZ, Aliloo H, Strucken EM, Marshall K, Al Kalaldeh M, Missohou A, Gibson JP. Inference of Ancestries and Heterozygosity Proportion and Genotype Imputation in West African Cattle Populations. Front Genet 2021; 12:584355. [PMID: 33841491 PMCID: PMC8025404 DOI: 10.3389/fgene.2021.584355] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 02/22/2021] [Indexed: 11/24/2022] Open
Abstract
Several studies have evaluated computational methods that infer the haplotypes from population genotype data in European cattle populations. However, little is known about how well they perform in African indigenous and crossbred populations. This study investigates: (1) global and local ancestry inference; (2) heterozygosity proportion estimation; and (3) genotype imputation in West African indigenous and crossbred cattle populations. Principal component analysis (PCA), ADMIXTURE, and LAMP-LD were used to analyse a medium-density single nucleotide polymorphism (SNP) dataset from Senegalese crossbred cattle. Reference SNP data of East and West African indigenous and crossbred cattle populations were used to investigate the accuracy of imputation from low to medium-density and from medium to high-density SNP datasets using Minimac v3. The first two principal components differentiated Bos indicus from European Bos taurus and African Bos taurus from other breeds. Irrespective of assuming two or three ancestral breeds for the Senegalese crossbreds, breed proportion estimates from ADMIXTURE and LAMP-LD showed a high correlation (r ≥ 0.981). The observed ancestral origin heterozygosity proportion in putative F1 crosses was close to the expected value of 1.0, and clearly differentiated F1 from all other crosses. The imputation accuracies (estimated as correlation) between imputed and the real data in crossbred animals ranged from 0.142 to 0.717 when imputing from low to medium-density, and from 0.478 to 0.899 for imputation from medium to high-density. The imputation accuracy was generally higher when the reference data came from the same geographical region as the target population, and when crossbred reference data was used to impute crossbred genotypes. The lowest imputation accuracies were observed for indigenous breed genotypes. This study shows that ancestral origin heterozygosity can be estimated with high accuracy and will be far superior to the use of observed individual heterozygosity for estimating heterosis in African crossbred populations. It was not possible to achieve high imputation accuracy in West African crossbred or indigenous populations based on reference data sets from East Africa, and population-specific genotyping with high-density SNP assays is required to improve imputation.
Collapse
Affiliation(s)
- Netsanet Z Gebrehiwot
- Centre for Genetic Analysis and Applications, School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| | - Hassan Aliloo
- Centre for Genetic Analysis and Applications, School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| | - Eva M Strucken
- Centre for Genetic Analysis and Applications, School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| | - Karen Marshall
- International Livestock Research Institute and Centre for Tropical Livestock Genetics and Health, Nairobi, Kenya
| | - Mohammad Al Kalaldeh
- Centre for Genetic Analysis and Applications, School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| | - Ayao Missohou
- L'École Inter-États des Sciences et Médecine Vétérinaires de Dakar (EISMV), Dakar, Senegal
| | - John P Gibson
- Centre for Genetic Analysis and Applications, School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| |
Collapse
|
7
|
Al-Khudhair A, VanRaden PM, Null DJ, Li B. Marker selection and genomic prediction of economically important traits using imputed high-density genotypes for 5 breeds of dairy cattle. J Dairy Sci 2021; 104:4478-4485. [PMID: 33612229 DOI: 10.3168/jds.2020-19260] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 11/22/2020] [Indexed: 11/19/2022]
Abstract
Marker sets used in US dairy genomic predictions were previously expanded by including high-density (HD) or sequence markers with the largest effects for Holstein breed only. Other non-Holstein breeds lacked enough HD genotyped animals to be used as a reference population at that time, and thus were not included in the genomic prediction. Recently, numbers of non-Holstein breeds genotyped using HD panels reached an acceptable level for imputation and marker selection, allowing HD genomic prediction and HD marker selection for Holstein plus 4 other breeds. Genotypes for 351,461 Holsteins, 347,570 Jerseys, 42,346 Brown Swiss, 9,364 Ayrshires (including Red dairy cattle), and 4,599 Guernseys were imputed to the HD marker list that included 643,059 SNP. The separate HD reference populations included Illumina BovineHD (San Diego, CA) genotypes for 4,012 Holsteins, 407 Jerseys, 181 Brown Swiss, 527 Ayrshires, and 147 Guernseys. The 643,059 variants included the HD SNP and all 79,254 (80K) genetic markers and QTL used in routine national genomic evaluations. Before imputation, approximately 91 to 97% of genotypes were unknown for each breed; after imputation, 1.1% of Holstein, 3.2% of Jersey, 6.7% of Brown Swiss, 4.8% of Ayrshire, and 4.2% of Guernsey alleles remained unknown due to lower density haplotypes that had no matching HD haplotype. The higher remaining missing rates in non-Holstein breeds are mainly due to fewer HD genotyped animals in the imputation reference populations. Allele effects for up to 39 traits were estimated separately within each breed using phenotypic reference populations that included up to 6,157 Jersey males and 110,130 Jersey females. Correlations of HD with 80K genomic predictions for young animals averaged 0.986, 0.989, 0.985, 0.992, and 0.978 for Jersey, Ayrshire, Brown Swiss, Guernsey, and Holstein breeds, respectively. Correlations were highest for yield traits (about 0.991) and lowest for foot angle and rear legs-side view (0.981and 0.982, respectively). Some HD effects were more than twice as large as the largest 80K SNP effect, and HD markers had larger effects than nearby 80K markers for many breed-trait combinations. Previous studies selected and included markers with large effects for Holstein traits; the newly selected HD markers should also improve non-Holstein and crossbred genomic predictions and were added to official US genomic predictions in April 2020.
Collapse
Affiliation(s)
- A Al-Khudhair
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
| | - P M VanRaden
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350.
| | - D J Null
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
| | - B Li
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
| |
Collapse
|
8
|
Aliloo H, Clark SA. The impact of reference composition and genome build on the accuracy of genotype imputation in Australian Angus cattle. ANIMAL PRODUCTION SCIENCE 2021. [DOI: 10.1071/an21098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Context
Genotype imputation is an effective method to increase the number of SNP markers available for an animal and thereby increase the overall power of genome-wide associations and accuracy of genomic predictions. It is also the key to achieve a common set of markers for all individuals when the original genotypes are obtained using multiple genotyping platforms. High accuracy of imputed genotypes is crucial to their utility.
Aims
In this study, we propose a method for the construction of a common set of medium density markers for imputation, which relies on keeping as much information as possible. We also investigated the impact of changing marker coordinates on the basis of the new bovine genome assembly, ARS-UCD 1.2, on imputation accuracy.
Methods
In total, 49 754 animals with 45 364 single nucleotide polymorphism markers were used in a 10-fold cross-validation to compare four different imputation scenarios. The four scenarios were based on two alternative designs for the reference datasets. (1) A traditional reference panel that was created using the overlapping SNP from five medium density arrays and (2) a composite reference panel created by combining SNPs across the five arrays. Each of the reference datasets was used to test imputation accuracy when the SNPs were aligned on the basis of two genome assemblies (UMD 3.1 and ARS-UCD 1.2).
Key results
Our results showed that a composite reference panel can achieve higher imputation accuracies than does a traditional overlap reference. Incorporating mapping information on the basis of the recent genome build slightly improved the imputation accuracies, especially for lower density chips.
Conclusions
Markers with unreliable mapping information and animals with low connectedness to the imputation reference dataset benefited the most from the ARS-UCD 1.2 assembly and composite reference respectively.
Implications
The presented method is straightforward and can be used to setup an optimal imputation for accurate inference of genotypes in Australian Angus cattle.
Collapse
|
9
|
Linkage Disequilibrium-Based Inference of Genome Homology and Chromosomal Rearrangements Between Species. G3-GENES GENOMES GENETICS 2020; 10:2327-2343. [PMID: 32434754 PMCID: PMC7341147 DOI: 10.1534/g3.120.401090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The aim of this study was to analyze the genomic homology between cattle (Bos taurus) and buffaloes (Bubalus bubalis) and to propose a rearrangement of the buffalo genome through linkage disequilibrium analyses of buffalo SNP markers referenced in the cattle genome assembly and also compare it to the buffalo genome assembly. A panel of bovine SNPs (single nucleotide polymorphisms) was used for hierarchical, non-hierarchical and admixture cluster analyses. Thus, the linkage disequilibrium information between markers of a specific panel of buffalo was used to infer chromosomal rearrangement. Haplotype diversity and imputation accuracy of the submetacentric chromosomes were also analyzed. The genomic homology between the species enabled us to use the bovine genome assembly to recreate a buffalo genomic reference by rearranging the submetacentric chromosomes. The centromere of the submetacentric chromosomes exhibited high linkage disequilibrium and low haplotype diversity. It allowed hypothesizing about chromosome evolution. It indicated that buffalo submetacentric chromosomes are a centric fusion of ancestral acrocentric chromosomes. The chronology of fusions was also suggested. Moreover, a linear regression between buffalo and cattle rearranged assembly and the imputation accuracy indicated that the rearrangement of the chromosomes was adequate. When using the bovine reference genome assembly, the rearrangement of the buffalo submetacentric chromosomes could be done by SNP BTA (chromosome of Bos taurus) calculations: shorter BTA (shorter arm of buffalo chromosome) was given as [(shorter BTA length - SNP position in shorter BTA)] and larger BTA length as [shorter BTA length + (larger BTA length - SNP position in larger BTA)]. Finally, the proposed linkage disequilibrium-based method can be applied to elucidate other chromosomal rearrangement events in other species with the possibility of better understanding the evolutionary relationship between their genomes.
Collapse
|
10
|
O'Brien AC, Judge MM, Fair S, Berry DP. High imputation accuracy from informative low-to-medium density single nucleotide polymorphism genotypes is achievable in sheep1. J Anim Sci 2019; 97:1550-1567. [PMID: 30722011 DOI: 10.1093/jas/skz043] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 01/30/2019] [Indexed: 12/29/2022] Open
Abstract
The objective of the present study was to quantify the accuracy of imputing medium-density single nucleotide polymorphism (SNP) genotypes from lower-density panels (384 to 12,000 SNPs) derived using alternative selection methods to select the most informative SNPs. Four different selection methods were used to select SNPs based on genomic characteristics (i.e., minor allele frequency (MAF) and linkage disequilibrium (LD)) within five sheep breeds (642 Belclare, 645 Charollais, 715 Suffolk, 440 Texel, and 620 Vendeen) separately. Selection methods evaluated included (i) random, (ii) splitting the genome into blocks of equal length and selecting SNPs within block based on MAF and LD patterns, (iii) equidistant location while optimizing MAF, (iv) a combination of MAF, distance from already selected SNPs, and weak LD with the SNP(s) already selected. All animals were genotyped on the Illumina OvineSNP50 Beadchip containing 51,135 SNPs of which 44,040 remained after edits. Within each breed separately, the youngest 100 animals were assumed to represent the validation population; the remaining animals represented the reference population. Imputation was undertaken under three different conditions: (i) SNPs were selected within a given breed and imputed for all breeds individually, (ii) all breeds were collectively used to select SNPs and were included as the reference population, and (iii) the SNPs were selected for each breed separately and imputation was undertaken for all breeds but excluding from the reference population, the breed from which the SNPs were selected. Regardless of SNP selection method, mean animal allele concordance rate improved at a diminishing rate while the variability in mean animal allele concordance rate reduced as the panel density increased. The SNP selection method impacted the accuracy of imputation although the effect reduced as the density of the panel increased. Overall, the most accurate SNP selection method for panels with <9,000 SNPs was that based on MAF and LD pattern within genomic blocks. The mean animal allele concordance rate varied from 0.89 in Texel to 0.97 in Vendeen. Greater imputation accuracy was achieved when SNPs were selected and imputed within each breed individually compared with when SNPs were selected across all breeds and imputed using a multi-breed reference population. In all, results indicate that accurate genotype imputation to medium density is achievable with low-density genotype panels with at least 6,000 SNPs.
Collapse
Affiliation(s)
- Aine C O'Brien
- Animal and Grassland Research and Innovation Centre, Teagasc, Moorepark, Fermoy, Co. Cork, Ireland.,Laboratory of Animal Reproduction, Department of Biological Sciences, Faculty of Science and Engineering, University of Limerick, Limerick, Ireland
| | - Michelle M Judge
- Animal and Grassland Research and Innovation Centre, Teagasc, Moorepark, Fermoy, Co. Cork, Ireland
| | - Sean Fair
- Laboratory of Animal Reproduction, Department of Biological Sciences, Faculty of Science and Engineering, University of Limerick, Limerick, Ireland
| | - Donagh P Berry
- Animal and Grassland Research and Innovation Centre, Teagasc, Moorepark, Fermoy, Co. Cork, Ireland
| |
Collapse
|
11
|
Pégard M, Rogier O, Bérard A, Faivre-Rampant P, Paslier MCL, Bastien C, Jorge V, Sánchez L. Sequence imputation from low density single nucleotide polymorphism panel in a black poplar breeding population. BMC Genomics 2019; 20:302. [PMID: 30999856 PMCID: PMC6471894 DOI: 10.1186/s12864-019-5660-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 03/29/2019] [Indexed: 12/30/2022] Open
Abstract
Background Genomic selection accuracy increases with the use of high SNP (single nucleotide polymorphism) coverage. However, such gains in coverage come at high costs, preventing their prompt operational implementation by breeders. Low density panels imputed to higher densities offer a cheaper alternative during the first stages of genomic resources development. Our study is the first to explore the imputation in a tree species: black poplar. About 1000 pure-breed Populus nigra trees from a breeding population were selected and genotyped with a 12K custom Infinium Bead-Chip. Forty-three of those individuals corresponding to nodal trees in the pedigree were fully sequenced (reference), while the remaining majority (target) was imputed from 8K to 1.4 million SNPs using FImpute. Each SNP and individual was evaluated for imputation errors by leave-one-out cross validation in the training sample of 43 sequenced trees. Some summary statistics such as Hardy-Weinberg Equilibrium exact test p-value, quality of sequencing, depth of sequencing per site and per individual, minor allele frequency, marker density ratio or SNP information redundancy were calculated. Principal component and Boruta analyses were used on all these parameters to rank the factors affecting the quality of imputation. Additionally, we characterize the impact of the relatedness between reference population and target population. Results During the imputation process, we used 7540 SNPs from the chip to impute 1,438,827 SNPs from sequences. At the individual level, imputation accuracy was high with a proportion of SNPs correctly imputed between 0.84 and 0.99. The variation in accuracies was mostly due to differences in relatedness between individuals. At a SNP level, the imputation quality depended on genotyped SNP density and on the original minor allele frequency. The imputation did not appear to result in an increase of linkage disequilibrium. The genotype densification not only brought a better distribution of markers all along the genome, but also we did not detect any substantial bias in annotation categories. Conclusions This study shows that it is possible to impute low-density marker panels to whole genome sequence with good accuracy under certain conditions that could be common to many breeding populations. Electronic supplementary material The online version of this article (10.1186/s12864-019-5660-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marie Pégard
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France
| | - Odile Rogier
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France
| | - Aurélie Bérard
- Etude du Polymorphisme des Génomes Végétaux (EPGV), INRA, Université Paris-Saclay, 91000, 2 rue Gaston Crémieux, Evry, 9100, France
| | - Patricia Faivre-Rampant
- Etude du Polymorphisme des Génomes Végétaux (EPGV), INRA, Université Paris-Saclay, 91000, 2 rue Gaston Crémieux, Evry, 9100, France
| | - Marie-Christine Le Paslier
- Etude du Polymorphisme des Génomes Végétaux (EPGV), INRA, Université Paris-Saclay, 91000, 2 rue Gaston Crémieux, Evry, 9100, France
| | - Catherine Bastien
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France
| | - Véronique Jorge
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France
| | - Leopoldo Sánchez
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France.
| |
Collapse
|
12
|
Gobikrushanth M, Purfield D, Kenneally J, Doyle R, Holden S, Martinez P, Canadas E, Bruinjé T, Colazo M, Ambrose D, Butler S. The relationship between anogenital distance and fertility, and genome-wide associations for anogenital distance in Irish Holstein-Friesian cows. J Dairy Sci 2019; 102:1702-1711. [DOI: 10.3168/jds.2018-15552] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 10/15/2018] [Indexed: 12/11/2022]
|
13
|
Ghoreishifar SM, Moradi-Shahrbabak H, Moradi-Shahrbabak M, Nicolazzi EL, Williams JL, Iamartino D, Nejati-Javaremi A. Accuracy of imputation of single-nucleotide polymorphism marker genotypes for water buffaloes (Bubalus bubalis) using different reference population sizes and imputation tools. Livest Sci 2018. [DOI: 10.1016/j.livsci.2018.08.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
14
|
He J, Guo Y, Xu J, Li H, Fuller A, Tait RG, Wu XL, Bauck S. Comparing SNP panels and statistical methods for estimating genomic breed composition of individual animals in ten cattle breeds. BMC Genet 2018; 19:56. [PMID: 30092776 PMCID: PMC6085684 DOI: 10.1186/s12863-018-0654-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Accepted: 07/11/2018] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND SNPs are informative to estimate genomic breed composition (GBC) of individual animals, but selected SNPs for this purpose were not made available in the commercial bovine SNP chips prior to the present study. The primary objective of the present study was to select five common SNP panels for estimating GBC of individual animals initially involving 10 cattle breeds (two dairy breeds and eight beef breeds). The performance of the five common SNP panels was evaluated based on admixture model and linear regression model, respectively. Finally, the downstream implication of GBC on genomic prediction accuracies was investigated and discussed in a Santa Gertrudis cattle population. RESULTS There were 15,708 common SNPs across five currently-available commercial bovine SNP chips. From this set, four subsets (1,000, 3,000, 5,000, and 10,000 SNPs) were selected by maximizing average Euclidean distance (AED) of SNP allelic frequencies among the ten cattle breeds. For 198 animals presented as Akaushi, estimated GBC of the Akaushi breed (GBCA) based on the admixture model agreed very well among the five SNP panels, identifying 166 animals with GBCA = 1. Using the same SNP panels, the linear regression approach reported fewer animals with GBCA = 1. Nevertheless, estimated GBCA using both models were highly correlated (r = 0.953 to 0.992). In the genomic prediction of a Santa Gertrudis population (and crosses), the results showed that the predictability of molecular breeding values using SNP effects obtained from 1,225 animals with no less than 0.90 GBC of Santa Gertrudis (GBCSG) decreased on crossbred animals with lower GBCSG. CONCLUSIONS Of the two statistical models used to compute GBC, the admixture model gave more consistent results among the five selected SNP panels than the linear regression model. The availability of these common SNP panels facilitates identification and estimation of breed compositions using currently-available bovine SNP chips. In view of utility, the 1 K panel is the most cost effective and it is convenient to be included as add-on content in future development of bovine SNP chips, whereas the 10 K and 16 K SNP panels can be more resourceful if used independently for imputation to intermediate or high-density genotypes.
Collapse
Affiliation(s)
- Jun He
- Biostatistics and Bioinformatics, Neogen GeneSeek Operations, Lincoln, NE USA
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Yage Guo
- Biostatistics and Bioinformatics, Neogen GeneSeek Operations, Lincoln, NE USA
- College of Education and Human Sciences, University of Nebraska, Lincoln, NE USA
| | - Jiaqi Xu
- Biostatistics and Bioinformatics, Neogen GeneSeek Operations, Lincoln, NE USA
- Department of Statistics, University of Nebraska, Lincoln, NE USA
| | - Hao Li
- Biostatistics and Bioinformatics, Neogen GeneSeek Operations, Lincoln, NE USA
- Department of Animal Sciences, University of Wisconsin, Madison, WI USA
| | - Anna Fuller
- Biostatistics and Bioinformatics, Neogen GeneSeek Operations, Lincoln, NE USA
| | - Richard G. Tait
- Biostatistics and Bioinformatics, Neogen GeneSeek Operations, Lincoln, NE USA
| | - Xiao-Lin Wu
- Biostatistics and Bioinformatics, Neogen GeneSeek Operations, Lincoln, NE USA
- Department of Animal Sciences, University of Wisconsin, Madison, WI USA
| | - Stewart Bauck
- Biostatistics and Bioinformatics, Neogen GeneSeek Operations, Lincoln, NE USA
| |
Collapse
|
15
|
Teissier M, Sanchez MP, Boussaha M, Barbat A, Hoze C, Robert-Granie C, Croiseau P. Use of meta-analyses and joint analyses to select variants in whole genome sequences for genomic evaluation: An application in milk production of French dairy cattle breeds. J Dairy Sci 2018; 101:3126-3139. [PMID: 29428760 DOI: 10.3168/jds.2017-13587] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 12/18/2017] [Indexed: 01/12/2023]
Abstract
As a result of the 1000 Bull Genome Project, it has become possible to impute millions of variants, with many of these potentially causative for traits of interest, for thousands of animals that have been genotyped with medium-density chips. This enormous source of data opens up very interesting possibilities for the inclusion of these variants in genomic evaluations. However, for computational reasons, it is not possible to include all variants in genomic evaluation procedures. One potential approach could be to select the most relevant variants based on the results of genome-wide association studies (GWAS); however, the identification of causative mutations is still difficult with this method, partly because of weak imputation accuracy for rare variants. To address this problem, this study assesses the ability of different approaches based on multi-breed GWAS (joint and meta-analyses) to identify single-nucleotide polymorphisms (SNP) for use in genomic evaluation in the 3 main French dairy cattle breeds. A total of 6,262 Holstein bulls, 2,434 Montbéliarde bulls, and 2,175 Normande bulls with daughter yield deviations for 5 milk production traits were imputed for 27 million variants. Within-breed and joint (including all 3 breeds) GWAS were performed and 3 models of meta-analysis were tested: fixed effect, random effect, and Z-score. Comparison of the results of within- and multi-breed GWAS showed that most of the quantitative trait loci identified using within-breed approaches were also found with multi-breed methods. However, the most significant variants identified in each region differed depending on the method used. To determine which approach highlighted the most predictive SNP for each trait, we used a marker-assisted best unbiased linear prediction model to evaluate lists of SNP generated by the different GWAS methods; each list contained between 25 and 2,000 candidate variants per trait, which were identified using a single within- or multi-breed GWAS approach. Among all the multi-breed methods tested in this study, variant selection based on meta-analysis (fixed effect) resulted in the most-accurate genomic evaluation (+1 to +3 points compared with other multi-breed approaches). However, the accuracies of genomic evaluation were always better when variants were selected using the results of within-breed GWAS. As has generally been found in studies of quantitative trait loci, these results suggest that part of the genetic variance of milk production traits is breed specific in Holstein, Montbéliarde, and Normande cattle.
Collapse
Affiliation(s)
- M Teissier
- GenPhySE, Université de Toulouse, INRA, INPT, ENVT, 31326 Castanet-Tolosan, France.
| | - M P Sanchez
- GABI, INRA, AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - M Boussaha
- GABI, INRA, AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - A Barbat
- GABI, INRA, AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - C Hoze
- GABI, INRA, AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France; Allice, 75012 Paris, France
| | - C Robert-Granie
- GenPhySE, Université de Toulouse, INRA, INPT, ENVT, 31326 Castanet-Tolosan, France
| | - P Croiseau
- GABI, INRA, AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| |
Collapse
|
16
|
Larmer SG, Sargolzaei M, Brito LF, Ventura RV, Schenkel FS. Novel methods for genotype imputation to whole-genome sequence and a simple linear model to predict imputation accuracy. BMC Genet 2017; 18:120. [PMID: 29281958 PMCID: PMC5746022 DOI: 10.1186/s12863-017-0588-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Accepted: 12/15/2017] [Indexed: 11/10/2022] Open
Abstract
Background Accurate imputation plays a major role in genomic studies of livestock industries, where the number of genotyped or sequenced animals is limited by costs. This study explored methods to create an ideal reference population for imputation to Next Generation Sequencing data in cattle. Methods Methods for clustering of animals for imputation were explored, using 1000 Bull Genomes Project sequence data on 1146 animals from a variety of beef and dairy breeds. Imputation from 50 K to 777 K was first carried out to choose an ideal clustering method, using ADMIXTURE or PLINK clustering algorithms with either genotypes or reconstructed haplotypes. Results Due to efficiency, accuracy and ease of use, clustering with PLINK using haplotypes as quasi-genotypes was chosen as the most advantageous grouping method. It was found that using a clustered population slightly decreased computing time, while maintaining accuracy across the population. Although overall accuracy remained the same, a slight increase in accuracy was observed for groups of animals in some breeds (primarily purebred beef cattle from breeds with fewer sequenced animals) and for other groups, primarily crossbreed animals, a slight decrease in accuracy was observed. However, it was noted that some animals in each breed were poorly imputed across all methods. When imputed sequences were included in the reference population to aid imputation of poorly imputed animals, a small increase in overall accuracy was observed for nearly every individual in the population. Two models were created to predict imputation accuracy, a complete model using all information available including Euclidean distances from genotypes and haplotypes, pedigree information, and clustering groups and a simple model using only breed and an Euclidean distance matrix as predictors. Both models were successful in predicting imputation accuracy, with correlations between predicted and true imputation accuracy as measured by concordance rate of 0.87 and 0.83, respectively. Conclusions A clustering methodology can be very useful to subgroup cattle for efficient genotype imputation. In addition, accuracy of genotype imputation from medium to high-density Single Nucleotide Polymorphisms (SNP) chip panels to whole-genome sequence can be predicted well using a simple linear model defined in this study.
Collapse
Affiliation(s)
- Steven G Larmer
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada. .,The Semex Alliance, 5653 Highway 6 North, Guelph, ON, N1H 6J2, Canada.
| | - Mehdi Sargolzaei
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada.,The Semex Alliance, 5653 Highway 6 North, Guelph, ON, N1H 6J2, Canada
| | - Luiz F Brito
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| | - Ricardo V Ventura
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada.,Bringing Intelligence Opportunities, 294 Mill St. East, Elora, ON, N0B 1S0, Canada
| | - Flávio S Schenkel
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| |
Collapse
|
17
|
Oliveira Júnior GA, Chud TCS, Ventura RV, Garrick DJ, Cole JB, Munari DP, Ferraz JBS, Mullart E, DeNise S, Smith S, da Silva MVGB. Genotype imputation in a tropical crossbred dairy cattle population. J Dairy Sci 2017; 100:9623-9634. [PMID: 28987572 DOI: 10.3168/jds.2017-12732] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 08/16/2017] [Indexed: 11/19/2022]
Abstract
The objective of this study was to investigate different strategies for genotype imputation in a population of crossbred Girolando (Gyr × Holstein) dairy cattle. The data set consisted of 478 Girolando, 583 Gyr, and 1,198 Holstein sires genotyped at high density with the Illumina BovineHD (Illumina, San Diego, CA) panel, which includes ∼777K markers. The accuracy of imputation from low (20K) and medium densities (50K and 70K) to the HD panel density and from low to 50K density were investigated. Seven scenarios using different reference populations (RPop) considering Girolando, Gyr, and Holstein breeds separately or combinations of animals of these breeds were tested for imputing genotypes of 166 randomly chosen Girolando animals. The population genotype imputation were performed using FImpute. Imputation accuracy was measured as the correlation between observed and imputed genotypes (CORR) and also as the proportion of genotypes that were imputed correctly (CR). This is the first paper on imputation accuracy in a Girolando population. The sample-specific imputation accuracies ranged from 0.38 to 0.97 (CORR) and from 0.49 to 0.96 (CR) imputing from low and medium densities to HD, and 0.41 to 0.95 (CORR) and from 0.50 to 0.94 (CR) for imputation from 20K to 50K. The CORRanim exceeded 0.96 (for 50K and 70K panels) when only Girolando animals were included in RPop (S1). We found smaller CORRanim when Gyr (S2) was used instead of Holstein (S3) as RPop. The same behavior was observed between S4 (Gyr + Girolando) and S5 (Holstein + Girolando) because the target animals were more related to the Holstein population than to the Gyr population. The highest imputation accuracies were observed for scenarios including Girolando animals in the reference population, whereas using only Gyr animals resulted in low imputation accuracies, suggesting that the haplotypes segregating in the Girolando population had a greater effect on accuracy than the purebred haplotypes. All chromosomes had similar imputation accuracies (CORRsnp) within each scenario. Crossbred animals (Girolando) must be included in the reference population to provide the best imputation accuracies.
Collapse
Affiliation(s)
- Gerson A Oliveira Júnior
- Departamento de Medicina Veterinária, Universidade de São Paulo (USP), Faculdade de Zootecnia e Engenharia de Alimentos, Pirassununga, SP, 13635-900, Brazil
| | - Tatiane C S Chud
- Departamento de Ciências Exatas, Universidade Estadual Paulista (Unesp), Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, SP, 14884-900, Brazil
| | - Ricardo V Ventura
- Beef Improvement Opportunities, Guelph, ON N1K1E5, Canada; Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON N1G2W1, Canada
| | - Dorian J Garrick
- Department of Animal Science, Iowa State University, Ames 50011-3150
| | - John B Cole
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD, 20705-2350
| | - Danísio P Munari
- Departamento de Ciências Exatas, Universidade Estadual Paulista (Unesp), Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, SP, 14884-900, Brazil
| | - José B S Ferraz
- Departamento de Medicina Veterinária, Universidade de São Paulo (USP), Faculdade de Zootecnia e Engenharia de Alimentos, Pirassununga, SP, 13635-900, Brazil
| | | | | | | | | |
Collapse
|
18
|
Judge MM, Purfield DC, Sleator RD, Berry DP. The impact of multi-generational genotype imputation strategies on imputation accuracy and subsequent genomic predictions. J Anim Sci 2017; 95:1489-1501. [PMID: 28464096 DOI: 10.2527/jas.2016.1212] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The objective of the present study was to quantify, using simulations, the impact of successive generations of genotype imputation on genomic predictions. The impact of using a small reference population of true genotypes versus a larger reference population of imputed genotypes on the accuracy of genomic predictions was also investigated. After construction of a founder population, high-density (HD) genotypes ( = 43,500 single nucleotide polymorphisms, SNP) were simulated across 25 generations ( = 46,800 per generation); a low-density genotype panel ( = 3,000 SNP) was developed from these HD genotypes, which was then used to impute genotypes using 7 alternative imputation strategies. Both low (0.03) and moderately (0.35) heritable phenotypes were simulated. Direct genomic values (DGV) were estimated using imputed genotypes from the investigated scenarios and the accuracy of predicting the simulated true breeding values (TBV) were expressed relative to the accuracy when the true genotypes were used. Mean allele concordance rate and the rate of change in mean allele concordance per generation differed between the imputation strategies investigated. Imputation was most accurate when the true HD genotypes of sires and 50% of the dams of the generation being imputed were included in the reference population; the average allele concordance rate for this scenario across generations was 0.9707. The strongest correlation between the TBV and DGV of the last generation was when the reference population included sequentially imputed HD genotypes of all previous generations, plus the true HD genotypes of all sires of the previous generations (0.987 as efficient as when the true genotypes were used in the reference population). With a moderate heritability, the correlation between the TBV and the DGV using a small reference population of accurate genotypes were, on average, 0.07 units stronger compared to DGV generated using a larger population of imputed genotypes. When the heritability was low, the accuracy of genomic predictions benefited from a larger reference population, even if SNP were imputed. The impact on the accuracy of genomic predictions from the accumulation of imputation errors across generations indicates the need to routinely generate HD genotypes on influential animals to reduce the accumulation of imputation errors over generations.
Collapse
|
19
|
Purfield DC, McClure M, Berry DP. Justification for setting the individual animal genotype call rate threshold at eighty-five percent. J Anim Sci 2017; 94:4558-4569. [PMID: 27898963 DOI: 10.2527/jas.2016-0802] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Data quality of SNP arrays impacts the accuracy and precision of downstream data analyses. One such quality control measure often imposed is a threshold on individual animal call rate. Different call rate thresholds have been applied across studies; little is known, however, about the impact of these thresholds on the quality of the genotype data. The objective of the present study was to investigate the effect of different call rate thresholds on the integrity of the genotypes but also to quantify the contribution of different factors to the variability in animal call rate. Data included 142,342 samples genotyped on a custom Illumina genotype panel from 141,591 dairy and beef cattle; the number of Illumina SNP on the panel was 14,371. The mean animal call rate across all samples was 99.09%; 487 animals had both a low call rate (<99%) and a subsequent high call rate (≥99%) after resampling and regenotyping. Several factors were associated ( < 0.001) with individual call rate including animal sex, the sampling herd, the date of genotyping, the genotyping plate, and the plate well. The genotype and allele concordance between the genotypes of the 487 low- and high-call rate individuals improved at a diminishing rate as mean animal call rate increased. Mean genotype and allele concordance rates of 0.987 and 0.997, respectively, existed when animal call rate was between 85 and 90%, increasing to 0.998 and 0.999, respectively, when animal call rate was between 95 and <99%. The mean within-animal allele concordance rate of rare variants (i.e., minor allele frequency < 0.05) between low and high genotype call rate animals increased when animal call rate improved; an allele concordance rate of 1.00 was achieved when animal call rate was between 85 and <99%. The accuracy of imputation of the nonobserved genotypes in the low-call rate animals improved as animal call rate increased; the mean genotype concordance rate of the imputed nonobserved SNP was 0.41 when animal call rate was <40% but increased to 0.95 when animal call rate was between 95 and <99%. Parentage validation, determined by the count of opposing homozygotes in a parent-progeny pair, was unreliable when animal call rate was <85%. Therefore, to ensure the provision of high-quality genotypes while also considering the cost and inconvenience of resampling and regenotyping, we suggest a minimum animal call rate threshold of 85%.
Collapse
|
20
|
|
21
|
Judge MM, Kearney JF, McClure MC, Sleator RD, Berry DP. Evaluation of developed low-density genotype panels for imputation to higher density in independent dairy and beef cattle populations. J Anim Sci 2016; 94:949-62. [PMID: 27065257 DOI: 10.2527/jas.2015-0044] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The objective of this study was to develop, using alternative algorithms, low-density SNP genotyping panels (384 to 12,000 SNP), which can be accurately imputed to higher-density panels across independent cattle populations. Single nucleotide polymorphisms were selected based on genomic characteristics (i.e., linkage disequilibrium [LD], minor allele frequency [MAF], and genomic distance) in a population of 1,267 Holstein-Friesian animals genotyped on the Illumina Bovine50 Beadchip (54,001 SNP). Single nucleotide polymorphism selection methods included 1) random; 2) equidistant location; 3) combination of SNP MAF and LD structure while maintaining relatively equal genomic distance between adjacent SNP; 4) a combination of high MAF, genomic distance between selected and candidate SNP, and correlation between genotypes of selected and candidate SNP; and 5) a machine learning algorithm. The panels were validated separately in 1) a population of 750 Holstein-Friesian animals with masked genotypes to reflect the lower-density SNP densities under investigation (1,249 animals with complete genotypes included in reference population) and 2) a population of 359 Limousin and Charolais cattle with high (777,962 SNP)-density genotypes (1,918 animals with complete genotypes included in the reference population). Irrespective of SNP selection method, imputation accuracy in both populations improved at a diminishing rate as the number of SNP included in the lower-density genotype panel increased. Additionally, the variability in mean imputation accuracy per individual decreased as the panel density increased. The SNP selection method had a major impact on the mean allele concordance rate, although its impact diminished as the panel density increased. Imputation accuracy for SNP selected using a combination of high SNP MAF, LD structure, and relatively equal genomic distance between SNP outperformed all other selection methods in densities < 12,000 SNP. Using this method of SNP selection, the correlation between the imputed and actual genotypes for the 3,000 SNP panel was 0.90 and 0.96 when applied to the beef and dairy populations, respectively; the respective correlations for the 6,000 SNP panel were 0.95 and 0.98. It is necessary to include between 3,000 and 6,000 SNP in a low-density panel to achieve adequate imputation accuracy to either medium density (approximately 50,000 SNP in the dairy population) or high density (approximately 700,000 SNP in the beef population) across diverse and independent populations.
Collapse
|
22
|
Buchanan JW, Woronuk GN, Marquess FL, Lang K, James ST, Deobald H, Welly BT, Van Eenennaam AL. Analysis of validated and population-specific single nucleotide polymorphism parentage panels in pedigreed and commercial beef cattle populations. CANADIAN JOURNAL OF ANIMAL SCIENCE 2016. [DOI: 10.1139/cjas-2016-0143] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Justin W. Buchanan
- Department of Animal Science, University of California, Davis, CA 95616, USA
| | - Grant N. Woronuk
- Quantum Genetix, 101 Research Drive, Saskatoon, SK S7N 3R3, Canada
| | | | - Kevin Lang
- Quantum Genetix, 101 Research Drive, Saskatoon, SK S7N 3R3, Canada
| | - Steven T. James
- Quantum Genetix, 101 Research Drive, Saskatoon, SK S7N 3R3, Canada
| | - Heather Deobald
- Quantum Genetix, 101 Research Drive, Saskatoon, SK S7N 3R3, Canada
| | - Bryan T. Welly
- Department of Animal Science, University of California, Davis, CA 95616, USA
| | | |
Collapse
|
23
|
Yudin NS, Lukyanov KI, Voevoda MI, Kolchanov NA. Application of reproductive technologies to improve dairy cattle genomic selection. ACTA ACUST UNITED AC 2016. [DOI: 10.1134/s207905971603014x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
24
|
Richardson IW, Berry DP, Wiencko HL, Higgins IM, More SJ, McClure J, Lynn DJ, Bradley DG. A genome-wide association study for genetic susceptibility to Mycobacterium bovis infection in dairy cattle identifies a susceptibility QTL on chromosome 23. Genet Sel Evol 2016; 48:19. [PMID: 26960806 PMCID: PMC4784436 DOI: 10.1186/s12711-016-0197-x] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 02/29/2016] [Indexed: 01/08/2023] Open
Abstract
Background Bovine tuberculosis (bTB) infection in cattle is a significant economic concern in many countries, with annual costs to the UK and Irish governments of approximately €190 million and €63 million, respectively, for bTB control. The existence of host additive and non-additive genetic components to bTB susceptibility has been established. Methods Two approaches i.e. single-SNP (single nucleotide polymorphism) regression and a Bayesian method were applied to genome-wide association studies (GWAS) using high-density SNP genotypes (n = 597,144 SNPs) from 841 dairy artificial insemination (AI) sires. Deregressed estimated breeding values for bTB susceptibility were used as the quantitative dependent variable. Network analysis was performed using the quantitative trait loci (QTL) that were identified as significant in the single-SNP regression and Bayesian analyses separately. In addition, an identity-by-descent analysis was performed on a subset of the most prolific sires in the dataset that showed contrasting prevalences of bTB infection in daughters. Results A significant QTL region was identified on BTA23 (P value >1 × 10−5, Bayes factor >10) across all analyses. Sires with the minor allele (minor allele frequency = 0.136) for this QTL on BTA23 had estimated breeding values that conferred a greater susceptibility to bTB infection than those that were homozygous for the major allele. Imputation of the regions that flank this QTL on BTA23 to full sequence indicated that the most significant associations were located within introns of the FKBP5 gene. Conclusions A genomic region on BTA23 that is strongly associated with host susceptibility to bTB infection was identified. This region contained FKBP5, a gene involved in the TNFα/NFκ-B signalling pathway, which is a major biological pathway associated with immune response. Although there is no study that validates this region in the literature, our approach represents one of the most powerful studies for the analysis of bTB susceptibility to
date. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0197-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ian W Richardson
- Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland. .,Animal and Grassland Research and Innovation Centre, Teagasc, Moorepark, Fermoy, Co. Cork, Ireland.
| | - Donagh P Berry
- Animal and Grassland Research and Innovation Centre, Teagasc, Moorepark, Fermoy, Co. Cork, Ireland.
| | - Heather L Wiencko
- Animal and Bioscience Research Department, Animal and Grassland Research and Innovation Centre, Teagasc, Grange, Co. Meath, Ireland.
| | - Isabella M Higgins
- UCD Centre for Veterinary Epidemiology and Risk Analysis, UCD School of Veterinary Medicine, University College Dublin, Belfield, Dublin 4, Ireland.
| | - Simon J More
- UCD Centre for Veterinary Epidemiology and Risk Analysis, UCD School of Veterinary Medicine, University College Dublin, Belfield, Dublin 4, Ireland.
| | | | - David J Lynn
- Animal and Bioscience Research Department, Animal and Grassland Research and Innovation Centre, Teagasc, Grange, Co. Meath, Ireland. .,South Australian Health and Medical Research Institute, North Terrace, Adelaide, SA, 5000, Australia. .,School of Medicine, Flinders University, Bedford Park, SA, 5042, Australia.
| | - Daniel G Bradley
- Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland.
| |
Collapse
|
25
|
Boison S, Santos D, Utsunomiya A, Carvalheiro R, Neves H, O’Brien A, Garcia J, Sölkner J, da Silva M. Strategies for single nucleotide polymorphism (SNP) genotyping to enhance genotype imputation in Gyr (Bos indicus) dairy cattle: Comparison of commercially available SNP chips. J Dairy Sci 2015; 98:4969-89. [DOI: 10.3168/jds.2014-9213] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Accepted: 03/22/2015] [Indexed: 01/15/2023]
|
26
|
Ogawa S, Matsuda H, Taniguchi Y, Watanabe T, Takasuga A, Sugimoto Y, Iwaisaki H. Accuracy of imputation of single nucleotide polymorphism marker genotypes from low-density panels in Japanese Black cattle. Anim Sci J 2015; 87:3-12. [PMID: 26032028 DOI: 10.1111/asj.12393] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Accepted: 11/18/2014] [Indexed: 12/25/2022]
Abstract
Using target and reference fattened steer populations, the performance of genotype imputation using lower-density marker panels in Japanese Black cattle was evaluated. Population imputation was performed using BEAGLE software. Genotype information for approximately 40,000 single nucleotide polymorphism (SNP) markers by Illumina BovineSNP50 BeadChip was available, and imputation accuracy was assessed based on the average concordance rates of the genotypes, varying equally spaced SNP densities, and the number of individuals in the reference population. Two additional statistics were also calculated as indicators of imputation performance. The concordance rates tended to be lower for SNPs with greater minor allele frequencies, or those located near the ends of the chromosomes. Longer autosomes yielded greater imputation accuracies than shorter ones. When SNPs were selected based on linkage disequilibrium information, relative imputation accuracy was slightly improved. When 3000 and 10,000 equally spaced SNPs were used, the imputation accuracies were greater than 90% and approximately 97%, respectively. These results indicate that combining genotyping using a lower-density SNP chip with genotype imputation based on a population of individuals genotyped using a higher-density SNP chip is a cost-effective and valid approach for genomic prediction.
Collapse
Affiliation(s)
| | | | - Yukio Taniguchi
- Graduate School of Agriculture, Kyoto University, Kyoto, Japan
| | | | | | | | | |
Collapse
|
27
|
Imputation of ungenotyped parental genotypes in dairy and beef cattle from progeny genotypes. Animal 2015; 8:895-903. [PMID: 24840560 DOI: 10.1017/s1751731114000883] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The objective of this study was to quantify the accuracy of imputing the genotype of parents using information on the genotype of their progeny and a family-based and population-based imputation algorithm. Two separate data sets were used, one containing both dairy and beef animals (n=3122) with high-density genotypes (735 151 single nucleotide polymorphisms (SNPs)) and the other containing just dairy animals (n=5489) with medium-density genotypes (51 602 SNPs). Imputation accuracy of three different genotype density panels were evaluated representing low (i.e. 6501 SNPs), medium and high density. The full genotypes of sires with genotyped half-sib progeny were masked and subsequently imputed. Genotyped half-sib progeny group sizes were altered from 4 up to 12 and the impact on imputation accuracy was quantified. Up to 157 and 258 sires were used to test the accuracy of imputation in the dairy plus beef data set and the dairy-only data set, respectively. The efficiency and accuracy of imputation was quantified as the proportion of genotypes that could not be imputed, and as both the genotype concordance rate and allele concordance rate. The median proportion of genotypes per animal that could not be imputed in the imputation process decreased as the number of genotyped half-sib progeny increased; values for the medium-density panel ranged from a median of 0.015 with a half-sib progeny group size of 4 to a median of 0.0014 to 0.0015 with a half-sib progeny group size of 8. The accuracy of imputation across different paternal half-sib progeny group sizes was similar in both data sets. Concordance rates increased considerably as the number of genotyped half-sib progeny increased from four (mean animal allele concordance rate of 0.94 in both data sets for the medium-density genotype panel) to five (mean animal allele concordance rate of 0.96 in both data sets for the medium-density genotype panel) after which it was relatively stable up to a half-sib progeny group size of eight. In the data set with dairy-only animals, sufficient sires with paternal half-sib progeny groups up to 12 were available and the within-animal mean genotype concordance rates continued to increase up to this group size. The accuracy of imputation was worst for the low-density genotypes, especially with smaller half-sib progeny group sizes but the difference in imputation accuracy between density panels diminished as progeny group size increased; the difference between high and medium-density genotype panels was relatively small across all half-sib progeny group sizes. Where biological material or genotypes are not available on individual animals, at least five progeny can be genotyped (on either a medium or high-density genotyping platform) and the parental alleles imputed with, on average, ⩾96% accuracy.
Collapse
|
28
|
Biscarini F, Nicolazzi EL, Stella A, Boettcher PJ, Gandini G. Challenges and opportunities in genetic improvement of local livestock breeds. Front Genet 2015; 6:33. [PMID: 25763010 PMCID: PMC4340267 DOI: 10.3389/fgene.2015.00033] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2014] [Accepted: 01/25/2015] [Indexed: 11/29/2022] Open
Abstract
Sufficient genetic variation in livestock populations is necessary both for adaptation to future changes in climate and consumer demand, and for continual genetic improvement of economically important traits. Unfortunately, the current trend is for reduced genetic variation, both within and across breeds. The latter occurs primarily through the loss of small, local breeds. Inferior production is a key driver for loss of small breeds, as they are replaced by high-output international transboundary breeds. Selection to improve productivity of small local breeds is therefore critical for their long term survival. The objective of this paper is to review the technology options available for the genetic improvement of small local breeds and discuss their feasibility. Most technologies have been developed for the high-input breeds and consequently are more favorably applied in that context. Nevertheless, their application in local breeds is not precluded and can yield significant benefits, especially when multiple technologies are applied in close collaboration with farmers and breeders. Breeding strategies that require cooperation and centralized decision-making, such as optimal contribution selection, may in fact be more easily implemented in small breeds.
Collapse
Affiliation(s)
| | | | - Alessandra Stella
- Parco Tecnologico Padano , Lodi, Italy ; Institute of Agricultural Biology and Biotechnology, National Research Council , Milan, Italy
| | - Paul J Boettcher
- Animal Production and Health Division, Food and Agriculture Organization of the United Nations , Rome, Italy
| | - Gustavo Gandini
- Department of Veterinary Sciences and Public Health, University of Milan , Milan, Italy
| |
Collapse
|
29
|
Piccoli ML, Braccini J, Cardoso FF, Sargolzaei M, Larmer SG, Schenkel FS. Accuracy of genome-wide imputation in Braford and Hereford beef cattle. BMC Genet 2014; 15:157. [PMID: 25543517 PMCID: PMC4300607 DOI: 10.1186/s12863-014-0157-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2014] [Accepted: 12/18/2014] [Indexed: 12/31/2022] Open
Abstract
Background Strategies for imputing genotypes from the Illumina-Bovine3K, Illumina-BovineLD (6K), BeefLD-GGP (8K), a non-commercial-15K and IndicusLD-GGP (20K) to either Illumina-BovineSNP50 (50K) or to Illumina-BovineHD (777K) SNP panel, as well as for imputing from 50K, GGP-IndicusHD (90iK) and GGP-BeefHD (90tK) to 777K were investigated. Imputation of low density (<50K) genotypes to 777K was carried out in either one or two steps. Imputation of ungenotyped parents (n = 37 sires) with four or more offspring to the 50K panel was also assessed. There were 2,946 Braford, 664 Hereford and 88 Nellore animals, from which 71, 59 and 88 were genotyped with the 777K panel, while all others had 50K genotypes. The reference population was comprised of 2,735 animals and 175 bulls for 50K and 777K, respectively. The low density panels were simulated by masking genotypes in the 50K or 777K panel for animals born in 2011. Analyses were performed using both Beagle and FImpute software. Genotype imputation accuracy was measured by concordance rate and allelic R2 between true and imputed genotypes. Results The average concordance rate using FImpute was 0.943 and 0.921 averaged across all simulated low density panels to 50K or to 777K, respectively, in comparison with 0.927 and 0.895 using Beagle. The allelic R2 was 0.912 and 0.866 for imputation to 50K or to 777K using FImpute, respectively, and 0.890 and 0.826 using Beagle. One and two steps imputation to 777K produced averaged concordance rates of 0.806 and 0.892 and allelic R2 of 0.674 and 0.819, respectively. Imputation of low density panels to 50K, with the exception of 3K, had overall concordance rates greater than 0.940 and allelic R2 greater than 0.919. Ungenotyped animals were imputed to 50K panel with an average concordance rate of 0.950 by FImpute. Conclusion FImpute accuracy outperformed Beagle on both imputation to 50K and to 777K. Two-step outperformed one-step imputation for imputing to 777K. Ungenotyped animals that have four or more offspring can have their 50K genotypes accurately inferred using FImpute. All low density panels, except the 3K, can be used to impute to the 50K using FImpute or Beagle with high concordance rate and allelic R2.
Collapse
Affiliation(s)
- Mario L Piccoli
- Departamento de Zootecnia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil. .,GenSys Consultores Associados S/S, Porto Alegre, Brazil. .,Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada.
| | - José Braccini
- Departamento de Zootecnia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil. .,National Council for Scientific and Technological Development, Brasília, Brazil.
| | - Fernando F Cardoso
- Embrapa Southern Region Animal Husbandry, Bagé, Brazil. .,National Council for Scientific and Technological Development, Brasília, Brazil.
| | - Medhi Sargolzaei
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada. .,The Semex Alliance, Guelph, ON, Canada.
| | - Steven G Larmer
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada.
| | - Flávio S Schenkel
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada.
| |
Collapse
|
30
|
Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle. BMC Genomics 2014; 15:728. [PMID: 25164068 PMCID: PMC4152568 DOI: 10.1186/1471-2164-15-728] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 06/18/2014] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a high density SNP marker panel to whole genome sequence level. Data contained 132 Holstein, 42 Jersey, 52 Nordic Red and 16 Brown Swiss bulls with whole genome sequence data; 16 Holstein, 27 Jersey and 29 Nordic Reds had previously been typed with the bovine high density SNP panel and were used for validation. We investigated the effect of enlarging the reference population by combining data across breeds on the accuracy of imputation, and the accuracy and speed of both IMPUTE2 and BEAGLE using either genotype probability reference data or pre-phased reference data. All analyses were done on Bovine autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel. RESULTS A combined breed reference population led to higher imputation accuracies than did a single breed reference. The highest accuracy of imputation for all three test breeds was achieved when using BEAGLE with un-phased reference data (mean genotype correlations of 0.90, 0.89 and 0.87 for Holstein, Jersey and Nordic Red respectively) but IMPUTE2 with un-phased reference data gave similar accuracies for Holsteins and Nordic Red. Pre-phasing the reference data only lead to a minor decrease in the imputation accuracy, but gave a large improvement in computation time. Pre-phasing with BEAGLE was substantially faster than pre-phasing with SHAPEIT2 (2.5 hours vs. 52 hours for 242 individuals), and imputation with pre-phased data was faster in IMPUTE2 than in BEAGLE (5 minutes vs. 50 minutes per individual). CONCLUSION Combining reference populations across breeds is a good option to increase the size of the reference data and in turn the accuracy of imputation when only few animals are available. Pre-phasing the reference data only slightly decreases the accuracy but gives substantial improvements in speed. Using BEAGLE for pre-phasing and IMPUTE2 for imputation is a fast and accurate strategy.
Collapse
|
31
|
|
32
|
Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications. Animal 2014; 8:1743-53. [PMID: 25045914 DOI: 10.1017/s1751731114001803] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In livestock, many studies have reported the results of imputation to 50k single nucleotide polymorphism (SNP) genotypes for animals that are genotyped with low-density SNP panels. The objective of this paper is to review different measures of correctness of imputation, and to evaluate their utility depending on the purpose of the imputed genotypes. Across studies, imputation accuracy, computed as the correlation between true and imputed genotypes, and imputation error rates, that counts the number of incorrectly imputed alleles, are commonly used measures of imputation correctness. Based on the nature of both measures and results reported in the literature, imputation accuracy appears to be a more useful measure of the correctness of imputation than imputation error rates, because imputation accuracy does not depend on minor allele frequency (MAF), whereas imputation error rate depends on MAF. Therefore imputation accuracy can be better compared across loci with different MAF. Imputation accuracy depends on the ability of identifying the correct haplotype of a SNP, but many other factors have been identified as well, including the number of genotyped immediate ancestors, the number of animals with genotypes at the high-density panel, the SNP density on the low- and high-density panel, the MAF of the imputed SNP and whether imputed SNP are located at the end of a chromosome or not. Some of these factors directly contribute to the linkage disequilibrium between imputed SNP and SNP on the low-density panel. When imputation accuracy is assessed as a predictor for the accuracy of subsequent genomic prediction, we recommend that: (1) individual-specific imputation accuracies should be used that are computed after centring and scaling both true and imputed genotypes; and (2) imputation of gene dosage is preferred over imputation of the most likely genotype, as this increases accuracy and reduces bias of the imputed genotypes and the subsequent genomic predictions.
Collapse
|