1
|
Ahmad SF, Singh A, Deb CK, Panda S, Gaur GK, Dutt T, Mishra BP, Kumar A. Evaluation of imputation possibility from low-density SNP panel in composite Vrindavani cattle. Anim Genet 2023; 54:647-648. [PMID: 37336526 DOI: 10.1111/age.13339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 06/07/2023] [Accepted: 06/08/2023] [Indexed: 06/21/2023]
Affiliation(s)
| | - Akansha Singh
- ICAR-Indian Veterinary Research Institute, Bareilly, India
| | - Chandan Kumar Deb
- Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | | | | | - Triveni Dutt
- ICAR-Indian Veterinary Research Institute, Bareilly, India
| | | | - Amit Kumar
- ICAR-Indian Veterinary Research Institute, Bareilly, India
| |
Collapse
|
2
|
Lloret-Villas A, Pausch H, Leonard AS. The size and composition of haplotype reference panels impact the accuracy of imputation from low-pass sequencing in cattle. Genet Sel Evol 2023; 55:33. [PMID: 37170101 PMCID: PMC10173671 DOI: 10.1186/s12711-023-00809-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 05/02/2023] [Indexed: 05/13/2023] Open
Abstract
BACKGROUND Low-pass sequencing followed by sequence variant genotype imputation is an alternative to the routine microarray-based genotyping in cattle. However, the impact of haplotype reference panels and their interplay with the coverage of low-pass whole-genome sequencing data have not been sufficiently explored in typical livestock settings where only a small number of reference samples is available. METHODS Sequence variant genotyping accuracy was compared between two variant callers, GATK and DeepVariant, in 50 Brown Swiss cattle with sequencing coverages ranging from 4- to 63-fold. Haplotype reference panels of varying sizes and composition were built with DeepVariant based on 501 individuals from nine breeds. High-coverage sequence data for 24 Brown Swiss cattle were downsampled to between 0.01- and 4-fold to mimic low-pass sequencing. GLIMPSE was used to infer sequence variant genotypes from the low-pass sequencing data using different haplotype reference panels. The accuracy of the sequence variant genotypes that were inferred from low-pass sequencing data was compared with sequence variant genotypes called from high-coverage data. RESULTS DeepVariant was used to establish bovine haplotype reference panels because it outperformed GATK in all evaluations. Within-breed haplotype reference panels were more accurate and efficient to impute sequence variant genotypes from low-pass sequencing than equally-sized multibreed haplotype reference panels for all target sample coverages and allele frequencies. F1 scores greater than 0.9, which indicate high harmonic means of recall and precision of called genotypes, were achieved with 0.25-fold sequencing coverage when large breed-specific haplotype reference panels (n = 150) were used. In absence of such large within-breed haplotype panels, variant genotyping accuracy from low-pass sequencing could be increased either by adding non-related samples to the haplotype reference panel or by increasing the coverage of the low-pass sequencing data. Sequence variant genotyping from low-pass sequencing was substantially less accurate when the reference panel lacked individuals from the target breed. CONCLUSIONS Variant genotyping is more accurate with DeepVariant than GATK. DeepVariant is therefore suitable to establish bovine haplotype reference panels. Medium-sized breed-specific haplotype reference panels and large multibreed haplotype reference panels enable accurate imputation of low-pass sequencing data in a typical cattle breed.
Collapse
Affiliation(s)
| | - Hubert Pausch
- Animal Genomics, ETH Zürich, Universitätstrasse 2, Zürich, 8092, Switzerland
| | - Alexander S Leonard
- Animal Genomics, ETH Zürich, Universitätstrasse 2, Zürich, 8092, Switzerland
| |
Collapse
|
3
|
Marina H, Pelayo R, Gutiérrez-Gil B, Suárez-Vega A, Esteban-Blanco C, Reverter A, Arranz JJ. Low-density SNP panel for efficient imputation and genomic selection of milk production and technological traits in dairy sheep. J Dairy Sci 2022; 105:8199-8217. [PMID: 36028350 DOI: 10.3168/jds.2021-21601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 05/30/2022] [Indexed: 11/19/2022]
Abstract
The present study aimed to ascertain how different strategies for leveraging genomic information enhance the accuracy of estimated breeding values for milk and cheese-making traits and to evaluate the implementation of a low-density (LowD) SNP chip designed explicitly for that aim. Thus, milk samples from a total of 2,020 dairy ewes from 2 breeds (1,039 Spanish Assaf and 981 Churra) were collected and analyzed to determine 3 milk production and composition traits and 2 traits related to milk coagulation properties and cheese yield. The 2 studied populations were genotyped with a customized 50K Affymetrix SNP chip (Affymetrix Inc.) containing 55,627 SNP markers. The prediction accuracies were obtained using different multitrait methodologies, such as the BLUP model based on pedigree information, the genomic BLUP (GBLUP), and the BLUP at the SNP level (SNP-BLUP), which are based on genotypic data, and the single-step GBLUP (ssGBLUP), which combines both sources of information. All of these methods were analyzed by cross-validation, comparing predictions of the whole population with the test population sets. Additionally, we describe the design of a LowD SNP chip (3K) and its prediction accuracies through the different methods mentioned previously. Furthermore, the results obtained using the LowD SNP chip were compared with those based on the 50K SNP chip data sets. Finally, we conclude that implementing genomic selection through the ssGBLUP model in the current breeding programs would increase the accuracy of the estimated breeding values compared with the BLUP methodology in the Assaf (from 0.19 to 0.39) and Churra (from 0.27 to 0.44) dairy sheep populations. The LowD SNP chip is cost-effective and has proven to be an accurate tool for estimating genomic breeding values for milk and cheese-making traits, microsatellite imputation, and parentage verification. The results presented here suggest that the routine use of this LowD SNP chip could potentially increase the genetic gains of the breeding selection programs of the 2 Spanish dairy sheep breeds considered here.
Collapse
Affiliation(s)
- H Marina
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León 24071, Spain
| | - R Pelayo
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León 24071, Spain
| | - B Gutiérrez-Gil
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León 24071, Spain
| | - A Suárez-Vega
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León 24071, Spain
| | - C Esteban-Blanco
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León 24071, Spain
| | - A Reverter
- CSIRO Agriculture & Food, 306 Carmody Rd., St. Lucia, Brisbane, QLD 4067, Australia
| | - J J Arranz
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León 24071, Spain.
| |
Collapse
|
4
|
|
5
|
Lashmar SF, Berry DP, Pierneef R, Muchadeyi FC, Visser C. Assessing single-nucleotide polymorphism selection methods for the development of a low-density panel optimized for imputation in South African Drakensberger beef cattle. J Anim Sci 2021; 99:6226920. [PMID: 33860324 DOI: 10.1093/jas/skab118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 04/14/2021] [Indexed: 11/13/2022] Open
Abstract
A major obstacle in applying genomic selection (GS) to uniquely adapted local breeds in less-developed countries has been the cost of genotyping at high densities of single-nucleotide polymorphisms (SNP). Cost reduction can be achieved by imputing genotypes from lower to higher densities. Locally adapted breeds tend to be admixed and exhibit a high degree of genomic heterogeneity thus necessitating the optimization of SNP selection for downstream imputation. The aim of this study was to quantify the achievable imputation accuracy for a sample of 1,135 South African (SA) Drakensberger cattle using several custom-derived lower-density panels varying in both SNP density and how the SNP were selected. From a pool of 120,608 genotyped SNP, subsets of SNP were chosen (1) at random, (2) with even genomic dispersion, (3) by maximizing the mean minor allele frequency (MAF), (4) using a combined score of MAF and linkage disequilibrium (LD), (5) using a partitioning-around-medoids (PAM) algorithm, and finally (6) using a hierarchical LD-based clustering algorithm. Imputation accuracy to higher density improved as SNP density increased; animal-wise imputation accuracy defined as the within-animal correlation between the imputed and actual alleles ranged from 0.625 to 0.990 when 2,500 randomly selected SNP were chosen vs. a range of 0.918 to 0.999 when 50,000 randomly selected SNP were used. At a panel density of 10,000 SNP, the mean (standard deviation) animal-wise allele concordance rate was 0.976 (0.018) vs. 0.982 (0.014) when the worst (i.e., random) as opposed to the best (i.e., combination of MAF and LD) SNP selection strategy was employed. A difference of 0.071 units was observed between the mean correlation-based accuracy of imputed SNP categorized as low (0.01 < MAF ≤ 0.1) vs. high MAF (0.4 < MAF ≤ 0.5). Greater mean imputation accuracy was achieved for SNP located on autosomal extremes when these regions were populated with more SNP. The presented results suggested that genotype imputation can be a practical cost-saving strategy for indigenous breeds such as the SA Drakensberger. Based on the results, a genotyping panel consisting of ~10,000 SNP selected based on a combination of MAF and LD would suffice in achieving a <3% imputation error rate for a breed characterized by genomic admixture on the condition that these SNP are selected based on breed-specific selection criteria.
Collapse
Affiliation(s)
- Simon F Lashmar
- Department of Animal Sciences, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| | - Donagh P Berry
- Department of Animal Sciences, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa.,Animal and Grassland Research and Innovation Centre, Teagasc, Moorepark, Fermoy, Co. Cork, Ireland
| | - Rian Pierneef
- Biotechnology Platform, Agricultural Research Council, Private Bag X5, Onderstepoort 0110, South Africa
| | - Farai C Muchadeyi
- Biotechnology Platform, Agricultural Research Council, Private Bag X5, Onderstepoort 0110, South Africa
| | - Carina Visser
- Department of Animal Sciences, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| |
Collapse
|
6
|
Mancin E, Sosa-Madrid BS, Blasco A, Ibáñez-Escriche N. Genotype Imputation to Improve the Cost-Efficiency of Genomic Selection in Rabbits. Animals (Basel) 2021; 11:ani11030803. [PMID: 33805619 PMCID: PMC8000098 DOI: 10.3390/ani11030803] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/04/2021] [Accepted: 03/05/2021] [Indexed: 01/19/2023] Open
Abstract
Simple Summary Genotyping costs are still the major limitation for the uptake of genomic selection by the rabbit meat industry, as a large number of genetic markers are needed for improving the prediction of breeding values by genomic data. In this study, several genotyping strategies were examined through simulation scenarios to disentangle the best feasible options of implementing genomic selection in rabbit breeding programs. Most scenarios emphasized the genotyping of candidate animals with a low Single Nucleotide Polymorphism (SNP) density platform. Imputation accuracies were high for the scenarios with ancestors genotyped at high or medium SNP-densities. However, the scenario with male ancestors genotyped at high SNP-density and only dams genotyped at medium SNP-density showed the best economically feasible strategy, taking into account the trade-off among genotyping costs, the accuracy of breeding values and response to selection. The results confirmed that by combining the imputation technique with a mindful selection of the animals to be genotyped, it is possible to achieve better performance than Best Linear Unbiased Prediction (BLUP), reducing genotyping cost at the same time. Abstract Genomic selection uses genetic marker information to predict genomic breeding values (gEBVs), and can be a suitable tool for selecting low-hereditability traits such as litter size in rabbits. However, genotyping costs in rabbits are still too high to enable genomic prediction in selective breeding programs. One method for decreasing genotyping costs is the genotype imputation, where parents are genotyped at high SNP-density (HD) and the progeny are genotyped at lower SNP-density, followed by imputation to HD. The aim of this study was to disentangle the best imputation strategies with a trade-off between genotyping costs and the accuracy of breeding values for litter size. A selection process, mimicking a commercial breeding rabbit selection program for litter size, was simulated. Two different Quantitative Trait Nucleotide (QTN) models (QTN_5 and QTN_44) were generated 36 times each. From these simulations, seven different scenarios (S1–S7) and a further replicate of the third scenario (S3_A) were created. Scenarios consist of a different combination of genotyping strategies. In these scenarios, ancestors and progeny were genotyped with a mix of three different platforms, containing 200,000, 60,000, and 600 SNPs under a cost of EUR 100, 50 and 11 per animal, respectively. Imputation accuracy (IA) was measured as a Pearson’s correlation between true genotype and imputed genotype, whilst the accuracy of gEBVs was the correlation between true breeding value and the estimated one. The relationships between IA, the accuracy of gEBVs, genotyping costs, and response to selection were examined under each QTN model. QTN_44 presented better performance, according to the results of genomic prediction, but the same ranks between scenarios remained in both QTN models. The highest IA (0.99) and the accuracy of gEBVs (0.26; QTN_44, and 0.228; QTN_5) were observed in S1 where all ancestors were genotyped at HD and progeny at medium SNP-density (MD). Nevertheless, this was the most expensive scenario compared to the others in which the progenies were genotyped at low SNP-density (LD). Scenarios with low average costs presented low IA, particularly when female ancestors were genotyped at LD (S5) or non-genotyped (S7). The S3_A, imputing whole-genomes, had the lowest accuracy of gEBVs (0.09), even worse than Best Linear Unbiased Prediction (BLUP). The best trade-off between genotyping costs and the accuracy of gEBVs (0.234; QTN_44 and 0.199) was in S6, in which dams were genotyped with MD whilst grand-dams were non-genotyped. However, this relationship would depend mainly on the distribution of QTN and SNP across the genome, suggesting further studies on the characterization of the rabbit genome in the Spanish lines. In summary, genomic selection with genotype imputation is feasible in the rabbit industry, considering only genotyping strategies with suitable IA, accuracy of gEBVs, genotyping costs, and response to selection.
Collapse
Affiliation(s)
- Enrico Mancin
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, viale dell’Università 16, 35020 Legnaro, PD, Italy;
| | - Bolívar Samuel Sosa-Madrid
- Institute for Animal Science and Technology, Universitat Politècnica de València, 46022 Valencia, Spain;
- Correspondence: (B.S.S.-M.); (N.I.-E.); Tel.: +34-963877438 (N.I.-E.)
| | - Agustín Blasco
- Institute for Animal Science and Technology, Universitat Politècnica de València, 46022 Valencia, Spain;
| | - Noelia Ibáñez-Escriche
- Institute for Animal Science and Technology, Universitat Politècnica de València, 46022 Valencia, Spain;
- Correspondence: (B.S.S.-M.); (N.I.-E.); Tel.: +34-963877438 (N.I.-E.)
| |
Collapse
|
7
|
Kumar H, Panigrahi M, Saravanan KA, Parida S, Bhushan B, Gaur GK, Dutt T, Mishra BP, Singh RK. SNPs with intermediate minor allele frequencies facilitate accurate breed assignment of Indian Tharparkar cattle. Gene 2021; 777:145473. [PMID: 33549713 DOI: 10.1016/j.gene.2021.145473] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 01/23/2021] [Accepted: 01/28/2021] [Indexed: 10/22/2022]
Abstract
Tharparkar cattle breed is widely known for its superior milch quality and hardiness attributes. This study aimed to develop an ultra-low density breed-specific single nucleotide polymorphism (SNP) genotype panel to accurately quantify Tharparkar populations in biological samples. In this study, we selected and genotyped 72 Tharparkar animals randomly from Cattle & Buffalo Farm of IVRI, India. This Bovine SNP50 BeadChip genotypic datum was merged with the online data from six indigenous cattle breeds and five taurine breeds. Here, we used a combination of pre-selection statistics and the MAF-LD method developed in our laboratory to analyze the genotypic data obtained from 317 individuals of 12 distinct breeds to identify breed-informative SNPs for the selection of Tharparkar cattle. This methodology identified 63 unique Tharparkar-specific SNPs near intermediate gene frequencies. We report several informative SNPs in genes/QTL regions affecting phenotypes or production traits that might differentiate the Tharparkar breed.
Collapse
Affiliation(s)
- Harshit Kumar
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - Manjit Panigrahi
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India.
| | - K A Saravanan
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - Subhashree Parida
- Division of Pharmacology & Toxicology, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - Bharat Bhushan
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - G K Gaur
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - Triveni Dutt
- Livestock Production and Management Section, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - B P Mishra
- Division of Animal Biotechnology, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| | - R K Singh
- Division of Animal Biotechnology, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, UP, India
| |
Collapse
|
8
|
Berry DP, Dunne FL, Evans RD, McDermott K, O'Brien AC. Concordance rate in cattle and sheep between genotypes differing in Illumina GenCall quality score. Anim Genet 2021; 52:208-213. [PMID: 33527466 DOI: 10.1111/age.13043] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/13/2021] [Indexed: 11/30/2022]
Abstract
Proper quality control of data prior to downstream analyses is fundamental to ensure integrity of results; quality control of genomic data is no exception. While many metrics of quality control of genomic data exist, the objective of the present study was to quantify the genotype and allele concordance rate between called single nucleotide polymorphism (SNP) genotypes differing in GenCall (GC) score; the GC score is a confidence measure assigned to each Illumina genotype call. This objective was achieved using Illumina beadchip genotype data from 771 cattle (12 428 767 genotypes in total post-editing) and 80 sheep (1 557 360 SNPs genotypes in total post-editing) each genotyped in duplicate. The called genotype with the lowest associated GC score was compared to the genotype called for the same SNP in the same duplicated animal sample but with a GC score of >0.90 (assumed to represent the true genotype). The mean genotype concordance rate for a GC score of <0.300, 0.300-0.549, and ≥0.550 in the cattle (sheep in parenthesis) was 0.9467 (0.9864), 0.9707 (0.9953), and 0.9994 (0.99997) respectively; the respective allele concordance rate was 0.9730 (0.9930), 0.9849 (0.9976), and 0.9997 (0.99998). Hence, concordance eroded as the GC score of the called genotype reduced, albeit the impact was not dramatic and was not very noticeable until a GC score of <0.55. Moreover, the impact was greater and more consistent in the cattle population than in the sheep population. Furthermore, an impact of GC score on genotype concordance rate existed even for the same SNP GenTrain value; the GenTrain value is a statistical score that depicts the shape of the genotype clusters and the relative distance between the called genotype clusters.
Collapse
Affiliation(s)
- D P Berry
- Teagasc, Animal & Grassland Research and Innovation Centre, Moorepark, Fermoy, Co. Cork, P61 P302, Ireland
| | - F L Dunne
- Teagasc, Animal & Grassland Research and Innovation Centre, Moorepark, Fermoy, Co. Cork, P61 P302, Ireland
| | - R D Evans
- Irish Cattle Breeding Federation, Highfield House, Shinagh, Bandon, Co. Cork, P72 X050, Ireland
| | - K McDermott
- Sheep Ireland, Highfield House, Shinagh, Bandon, Co. Cork, P72 X050, Ireland
| | - A C O'Brien
- Teagasc, Animal & Grassland Research and Innovation Centre, Moorepark, Fermoy, Co. Cork, P61 P302, Ireland
| |
Collapse
|
9
|
Cook SR, Conzemius MG, McCue ME, Ekenstedt KJ. SNP-based heritability and genetic architecture of cranial cruciate ligament rupture in Labrador Retrievers. Anim Genet 2020; 51:824-828. [PMID: 32696518 DOI: 10.1111/age.12978] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/15/2020] [Indexed: 01/09/2023]
Abstract
Cranial cruciate ligament rupture (CCLR) is one of the leading causes of pelvic limb lameness in dogs. About 6% of Labrador Retrievers suffer from this orthopedic problem. The aim of this study was to determine the heritability of CCLR in this breed using SNP array genotyping data. DNA samples were collected from CCLR-affected dogs (n = 190) and unaffected dogs over the age of 8 years (n = 143). All 333 dogs were genotyped directly or imputed up to approximately 710k SNPs on the Affymetrix Axiom CanineHD SNP array. Heritability of CCLR was calculated using multiple methodologies, including linear mixed models, Bayesian models and a model that incorporates LD. The covariates of sex and sterilization status were added to each analysis to assess their impact. Across the algorithms of these models, heritability ranged from 0.550 to 0.886, depending on covariate inclusion. The relatively high heritability for this disease indicates that a substantial genetic component contributes to CCLR in the Labrador Retriever.
Collapse
Affiliation(s)
- S R Cook
- Department of Basic Medical Sciences, College of Veterinary Medicine, Purdue University, 625 Harrison St, West Lafayette, IN, 47907, USA
| | - M G Conzemius
- Department of Veterinary Clinical Sciences, College of Veterinary Medicine, University of Minnesota, 1352 Boyd Avenue, St Paul, MN, 55108, USA
| | - M E McCue
- Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, 1365 Gortner Avenue, St Paul, MN, 55108, USA
| | - K J Ekenstedt
- Department of Basic Medical Sciences, College of Veterinary Medicine, Purdue University, 625 Harrison St, West Lafayette, IN, 47907, USA
| |
Collapse
|
10
|
Yang W, Yang Y, Zhao C, Yang K, Wang D, Yang J, Niu X, Gong J. Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation. Nucleic Acids Res 2020; 48:D659-D667. [PMID: 31584087 PMCID: PMC6943029 DOI: 10.1093/nar/gkz854] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 09/19/2019] [Accepted: 10/01/2019] [Indexed: 12/11/2022] Open
Abstract
Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is a public database with genomic reference panels of 13 animal species for online genotype imputation, genetic variant search, and free download. Genotype imputation is a process of estimating missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs) and thus can be widely used in large-scale genome-wide association studies (GWASs) using relatively inexpensive and low-density SNP arrays. However, most animals except humans lack high-quality reference panels, which greatly limits the application of genotype imputation in animals. To overcome this limitation, we developed Animal-ImputeDB, which is dedicated to collecting genotype data and whole-genome resequencing data of nonhuman animals from various studies and databases. A computational pipeline was developed to process different types of raw data to construct reference panels. Finally, 13 high-quality reference panels including ∼400 million SNPs from 2265 samples were constructed. In Animal-ImputeDB, an easy-to-use online tool consisting of two popular imputation tools was designed for the purpose of genotype imputation. Collectively, Animal-ImputeDB serves as an important resource for animal genotype imputation and will greatly facilitate research on animal genomic selection and genetic improvement.
Collapse
Affiliation(s)
- Wenqian Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China
| | - Yanbo Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China
| | - Cecheng Zhao
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China
| | - Kun Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China
| | - Dongyang Wang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China
| | - Jiajun Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China
| | - Xiaohui Niu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China
| | - Jing Gong
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China.,College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, P. R. China
| |
Collapse
|
11
|
Concordance rate between copy number variants detected using either high- or medium-density single nucleotide polymorphism genotype panels and the potential of imputing copy number variants from flanking high density single nucleotide polymorphism haplotypes in cattle. BMC Genomics 2020; 21:205. [PMID: 32131735 PMCID: PMC7057620 DOI: 10.1186/s12864-020-6627-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 02/26/2020] [Indexed: 12/01/2022] Open
Abstract
Background The trading of individual animal genotype information often involves only the exchange of the called genotypes and not necessarily the additional information required to effectively call structural variants. The main aim here was to determine if it is possible to impute copy number variants (CNVs) using the flanking single nucleotide polymorphism (SNP) haplotype structure in cattle. While this objective was achieved using high-density genotype panels (i.e., 713,162 SNPs), a secondary objective investigated the concordance of CNVs called with this high-density genotype panel compared to CNVs called from a medium-density panel (i.e., 45,677 SNPs in the present study). This is the first study to compare CNVs called from high-density and medium-density SNP genotypes from the same animals. High (and medium-density) genotypes were available on 991 Holstein-Friesian, 1015 Charolais, and 1394 Limousin bulls. The concordance between CNVs called from the medium-density and high-density genotypes were calculated separately for each animal. A subset of CNVs which were called from the high-density genotypes was selected for imputation. Imputation was carried out separately for each breed using a set of high-density SNPs flanking the midpoint of each CNV. A CNV was deemed to be imputed correctly when the called copy number matched the imputed copy number. Results For 97.0% of CNVs called from the high-density genotypes, the corresponding genomic position on the medium-density of the animal did not contain a called CNV. The average accuracy of imputation for CNV deletions was 0.281, with a standard deviation of 0.286. The average accuracy of imputation of the CNV normal state, i.e. the absence of a CNV, was 0.982 with a standard deviation of 0.022. Two CNV duplications were imputed in the Charolais, a single CNV duplication in the Limousins, and a single CNV duplication in the Holstein-Friesians; in all cases the CNV duplications were incorrectly imputed. Conclusion The vast majority of CNVs called from the high-density genotypes were not detected using the medium-density genotypes. Furthermore, CNVs cannot be accurately predicted from flanking SNP haplotypes, at least based on the imputation algorithms routinely used in cattle, and using the SNPs currently available on the high-density genotype panel.
Collapse
|
12
|
Shashkova TI, Martynova EU, Ayupova AF, Shumskiy AA, Ogurtsova PA, Kostyunina OV, Khaitovich PE, Mazin PV, Zinovieva NA. Development of a low-density panel for genomic selection of pigs in Russia. Transl Anim Sci 2019; 4:264-274. [PMID: 32704985 PMCID: PMC6994047 DOI: 10.1093/tas/txz182] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 11/27/2019] [Indexed: 02/07/2023] Open
Abstract
Genomic selection is routinely used worldwide in agricultural breeding. However, in Russia, it is still not used to its full potential partially due to high genotyping costs. The use of genotypes imputed from the low-density chips (LD-chip) provides a valuable opportunity for reducing the genotyping costs. Pork production in Russia is based on the conventional 3-tier pyramid involving 3 breeds; therefore, the best option would be the development of a single LD-chip that could be used for all of them. Here, we for the first time have analyzed genomic variability in 3 breeds of Russian pigs, namely, Landrace, Duroc, and Large White and generated the LD-chip that can be used in pig breeding with the negligible loss in genotyping quality. We have demonstrated that out of the 3 methods commonly used for LD-chip construction, the block method shows the best results. The imputation quality depends strongly on the presence of close ancestors in the reference population. We have demonstrated that for the animals with both parents genotyped using high-density panels high-quality genotypes (allelic discordance rate < 0.05) could be obtained using a 300 single nucleotide polymorphism (SNP) chip, while in the absence of genotyped ancestors at least 2,000 SNP markers are required. We have shown that imputation quality varies between chromosomes, and it is lower near the chromosome ends and drops with the increase in minor allele frequency. Imputation quality of the individual SNPs correlated well across breeds. Using the same LD-chip, we were able to obtain comparable imputation quality in all 3 breeds, so it may be suggested that a single chip could be used for all of them. Our findings also suggest that the presence of markers with extremely low imputation quality is likely to be explained by wrong mapping of the markers to the chromosomal positions.
Collapse
Affiliation(s)
| | | | - Asiya F Ayupova
- Skolkovo Institute of Science and Technology, Moscow, Russia
| | | | | | - Olga V Kostyunina
- Ernst Federal Science Center for Animal Husbandry, Dubrovitsy, Moscow Oblast, Russia
| | | | - Pavel V Mazin
- Skolkovo Institute of Science and Technology, Moscow, Russia.,Computer Science Department, National Research University Higher School of Economics, Moscow, Russia
| | - Natalia A Zinovieva
- Ernst Federal Science Center for Animal Husbandry, Dubrovitsy, Moscow Oblast, Russia
| |
Collapse
|
13
|
Population structure and breed composition prediction in a multi-breed sheep population using genome-wide single nucleotide polymorphism genotypes. Animal 2019; 14:464-474. [PMID: 31610818 DOI: 10.1017/s1751731119002398] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Knowledge of population structure and breed composition of a population can be advantageous for a number of reasons; these include designing optimal (cross)breeding strategies in order to maximise non-additive genetic effects, maintaining flockbook integrity by authenticating animals being registered and as a quality control measure in the genotyping process. The objectives of the present study were to 1) describe the population structure of 24 sheep breeds, 2) quantify the breed composition of both flockbook-recorded and crossbred animals using single nucleotide polymorphism BLUP (SNP-BLUP), and 3) quantify the accuracy of breed composition prediction from low-density genotype panels containing between 2000 and 6000 SNPs. In total, 9334 autosomal SNPs on 11 144 flockbook-recorded animals and 1172 crossbred animals were used. The population structure of all breeds was characterised by principal component analysis (PCA) as well as the pairwise breed fixation index (Fst). The total number of animals, all of which were purebred, included in the calibration population for SNP-BLUP was 2579 with the number of animals per breed ranging from 9 to 500. The remaining 9559 flockbook-recorded animals, composite breeds and crossbred animals represented the test population; three breeds were excluded from breed composition prediction. The breed composition predicted using SNP-BLUP with 9334 SNPs was considered the gold standard prediction. The pairwise breed Fst ranged from 0.040 (between the Irish Blackface and Scottish Blackface) to 0.282 (between the Border Leicester and Suffolk). Principal component analysis revealed that the Suffolk from Ireland and the Suffolk from New Zealand formed distinct, non-overlapping clusters. In contrast, the Texel from Ireland and that from New Zealand formed integrated, overlapping clusters. Composite animals such as the Belclare clustered close to its founder breeds (i.e., Finn, Galway, Lleyn and Texel). When all 9334 SNPs were used to predict breed composition, an animal that had a majority breed proportion predicted to be ≥0.90 was defined as purebred for the present study. As the panel density decreased, the predicted breed proportion threshold, used to identify animals as purebred, also decreased (≥0.85 with 6000 SNPs to ≥0.60 with 2000 SNPs). In all, results from the study suggest that breed composition for purebred and crossbred animals can be determined with SNP-BLUP using ≥5000 SNPs.
Collapse
|