1
|
Berry DP, Spangler ML. Animal board invited review: Practical applications of genomic information in livestock. Animal 2023; 17:100996. [PMID: 37820404 DOI: 10.1016/j.animal.2023.100996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 09/08/2023] [Accepted: 09/11/2023] [Indexed: 10/13/2023] Open
Abstract
Access to high-dimensional genomic information in many livestock species is accelerating. This has been greatly aided not only by continual reductions in genotyping costs but also an expansion in the services available that leverage genomic information to create a greater return-on-investment. Genomic information on individual animals has many uses including (1) parentage verification and discovery, (2) traceability, (3) karyotyping, (4) sex determination, (5) reporting and monitoring of mutations conferring major effects or congenital defects, (6) better estimating inbreeding of individuals and coancestry among individuals, (7) mating advice, (8) determining breed composition, (9) enabling precision management, and (10) genomic evaluations; genomic evaluations exploit genome-wide genotype information to improve the accuracy of predicting an animal's (and by extension its progeny's) genetic merit. Genomic data also provide a huge resource for research, albeit the outcome from this research, if successful, should eventually be realised through one of the ten applications already mentioned. The process for generating a genotype all the way from sample procurement to identifying erroneous genotypes is described, as are the steps that should be considered when developing a bespoke genotyping panel for practical application.
Collapse
Affiliation(s)
- D P Berry
- Animal & Grassland Research and Innovation Centre, Teagasc, Moorepark, Fermoy, Cork, Ireland.
| | - M L Spangler
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE, United States
| |
Collapse
|
2
|
Cai Z, Christensen OF, Lund MS, Ostersen T, Sahana G. Large-scale association study on daily weight gain in pigs reveals overlap of genetic factors for growth in humans. BMC Genomics 2022; 23:133. [PMID: 35168569 PMCID: PMC8845347 DOI: 10.1186/s12864-022-08373-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 02/08/2022] [Indexed: 01/10/2023] Open
Abstract
Background Imputation from genotyping array to whole-genome sequence variants using resequencing of representative reference populations enhances our ability to map genetic factors affecting complex phenotypes in livestock species. The accumulation of knowledge about gene function in human and laboratory animals can provide substantial advantage for genomic research in livestock species. Results In this study, 201,388 pigs from three commercial Danish breeds genotyped with low to medium (8.5k to 70k) SNP arrays were imputed to whole genome sequence variants using a two-step approach. Both imputation steps achieved high accuracies, and in total this yielded 26,447,434 markers on 18 autosomes. The average estimated imputation accuracy of markers with minor allele frequency ≥ 0.05 was 0.94. To overcome the memory consumption of running genome-wide association study (GWAS) for each breed, we performed within-breed subpopulation GWAS then within-breed meta-analysis for average daily weight gain (ADG), followed by a multi-breed meta-analysis of GWAS summary statistics. We identified 15 quantitative trait loci (QTL). Our post-GWAS analysis strategy to prioritize of candidate genes including information like gene ontology, mammalian phenotype database, differential expression gene analysis of high and low feed efficiency pig and human GWAS catalog for height, obesity, and body mass index, we proposed MRAP2, LEPROT, PMAIP1, ENSSSCG00000036234, BMP2, ELFN1, LIG4 and FAM155A as the candidate genes with biological support for ADG in pigs. Conclusion Our post-GWAS analysis strategy helped to identify candidate genes not just by distance to the lead SNP but also by multiple sources of biological evidence. Besides, the identified QTL overlap with genes which are known for their association with human growth-related traits. The GWAS with this large data set showed the power to map the genetic factors associated with ADG in pigs and have added to our understanding of the genetics of growth across mammalian species. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08373-3.
Collapse
Affiliation(s)
- Zexi Cai
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark.
| | | | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Tage Ostersen
- SEGES Danish Pig Research Centre, Agro Food Park 15, 8200, Aarhus N, Denmark
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| |
Collapse
|
3
|
Jenkins CA, Schofield EC, Mellersh CS, De Risio L, Ricketts SL. Improving the resolution of canine genome-wide association studies using genotype imputation: A study of two breeds. Anim Genet 2021; 52:703-713. [PMID: 34252218 PMCID: PMC8514152 DOI: 10.1111/age.13117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 05/07/2021] [Accepted: 06/24/2021] [Indexed: 01/08/2023]
Abstract
Genotype imputation using a reference panel that combines high-density array data and publicly available whole genome sequence consortium variant data is potentially a cost-effective method to increase the density of extant lower-density array datasets. In this study, three datasets (two Border Collie; one Italian Spinone) generated using a legacy array (Illumina CanineHD, 173 662 SNPs) were utilised to assess the feasibility and accuracy of this approach and to gather additional evidence for the efficacy of canine genotype imputation. The cosmopolitan reference panels used to impute genotypes comprised dogs of 158 breeds, mixed breed dogs, wolves and Chinese indigenous dogs, as well as breed-specific individuals genotyped using the Axiom Canine HD array. The two Border Collie reference panels comprised 808 individuals including 79 Border Collies and 426 326 or 426 332 SNPs; and the Italian Spinone reference panel comprised 807 individuals including 38 Italian Spinoni and 476 313 SNPs. A high accuracy for imputation was observed, with the lowest accuracy observed for one of the Border Collie datasets (mean R2 = 0.94) and the highest for the Italian Spinone dataset (mean R2 = 0.97). This study’s findings demonstrate that imputation of a legacy array study set using a reference panel comprising both breed-specific array data and multi-breed variant data derived from whole genomes is effective and accurate. The process of canine genotype imputation, using the valuable growing resource of publicly available canine genome variant datasets alongside breed-specific data, is described in detail to facilitate and encourage use of this technique in canine genetics.
Collapse
Affiliation(s)
- Christopher A Jenkins
- Department of Veterinary Medicine, Kennel Club Genetics Centre1, University of Cambridge, Cambridge, UK.,Division of Population Health, Health Services Research & Primary Care, University of Manchester, Manchester, UK
| | | | - Ellen C Schofield
- Department of Veterinary Medicine, Kennel Club Genetics Centre1, University of Cambridge, Cambridge, UK
| | - Cathryn S Mellersh
- Department of Veterinary Medicine, Kennel Club Genetics Centre1, University of Cambridge, Cambridge, UK
| | - Luisa De Risio
- Neurology/Neurosurgery Service, Centre for Small Animal Studies, Animal Health Trust, Newmarket, Suffolk, UK
| | - Sally L Ricketts
- Department of Veterinary Medicine, Kennel Club Genetics Centre1, University of Cambridge, Cambridge, UK.,Division of Population Health, Health Services Research & Primary Care, University of Manchester, Manchester, UK
| |
Collapse
|
4
|
Han J, Gondro C, Reid K, Steibel JP. Heuristic hyperparameter optimization of deep learning models for genomic prediction. G3-GENES GENOMES GENETICS 2021; 11:6129776. [PMID: 33993261 PMCID: PMC8495939 DOI: 10.1093/g3journal/jkab032] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 01/23/2021] [Indexed: 11/17/2022]
Abstract
There is a growing interest among quantitative geneticists and animal breeders in the use of deep learning (DL) for genomic prediction. However, the performance of DL is affected by hyperparameters that are typically manually set by users. These hyperparameters do not simply specify the architecture of the model; they are also critical for the efficacy of the optimization and model-fitting process. To date, most DL approaches used for genomic prediction have concentrated on identifying suitable hyperparameters by exploring discrete options from a subset of the hyperparameter space. Enlarging the hyperparameter optimization search space with continuous hyperparameters is a daunting combinatorial problem. To deal with this problem, we propose using differential evolution (DE) to perform an efficient search of arbitrarily complex hyperparameter spaces in DL models, and we apply this to the specific case of genomic prediction of livestock phenotypes. This approach was evaluated on two pig and cattle datasets with real genotypes and simulated phenotypes (N = 7,539 animals and M = 48,541 markers) and one real dataset (N = 910 individuals and M = 28,916 markers). Hyperparameters were evaluated using cross-validation. We compared the predictive performance of DL models using hyperparameters optimized by DE against DL models with “best practice” hyperparameters selected from published studies and baseline DL models with randomly specified hyperparameters. Optimized models using DE showed a clear improvement in predictive performance across all three datasets. DE optimized hyperparameters also resulted in DL models with less overfitting and less variation in predictive performance over repeated retraining compared to non-optimized DL models.
Collapse
Affiliation(s)
- Junjie Han
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA.,Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Cedric Gondro
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
| | - Kenneth Reid
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA.,Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
5
|
Hou L, Liang W, Xu G, Huang B, Zhang X, Hu CY, Wang C. Accuracy of genomic prediction using mixed low-density marker panels. ANIMAL PRODUCTION SCIENCE 2020. [DOI: 10.1071/an18503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Low-density single-nucleotide polymorphism (LD-SNP) panel is one effective way to reduce the cost of genomic selection in animal breeding. The present study proposes a new type of LD-SNP panel called mixed low-density (MLD) panel, which considers SNPs with a substantial effect estimated by Bayes method B (BayesB) from many traits and evenly spaced distribution simultaneously. Simulated and real data were used to compare the imputation accuracy and genomic-selection accuracy of two types of LD-SNP panels. The result of genotyping imputation for simulated data showed that the number of quantitative trait loci (QTL) had limited influence on the imputation accuracy only for MLD panels. Evenly spaced (ELD) panel was not affected by QTL. For real data, ELD performed slightly better than did MLD when panel contained 500 and 1000 SNP. However, this advantage vanished quickly as the density increased. The result of genomic selection for simulated data using BayesB showed that MLD performed much better than did ELD when QTL was 100. For real data, MLD also outperformed ELD in growth and carcass traits when using BayesB. In conclusion, the MLD strategy is superior to ELD in genomic selection under most situations.
Collapse
|
6
|
Shashkova TI, Martynova EU, Ayupova AF, Shumskiy AA, Ogurtsova PA, Kostyunina OV, Khaitovich PE, Mazin PV, Zinovieva NA. Development of a low-density panel for genomic selection of pigs in Russia. Transl Anim Sci 2019; 4:264-274. [PMID: 32704985 PMCID: PMC6994047 DOI: 10.1093/tas/txz182] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 11/27/2019] [Indexed: 02/07/2023] Open
Abstract
Genomic selection is routinely used worldwide in agricultural breeding. However, in Russia, it is still not used to its full potential partially due to high genotyping costs. The use of genotypes imputed from the low-density chips (LD-chip) provides a valuable opportunity for reducing the genotyping costs. Pork production in Russia is based on the conventional 3-tier pyramid involving 3 breeds; therefore, the best option would be the development of a single LD-chip that could be used for all of them. Here, we for the first time have analyzed genomic variability in 3 breeds of Russian pigs, namely, Landrace, Duroc, and Large White and generated the LD-chip that can be used in pig breeding with the negligible loss in genotyping quality. We have demonstrated that out of the 3 methods commonly used for LD-chip construction, the block method shows the best results. The imputation quality depends strongly on the presence of close ancestors in the reference population. We have demonstrated that for the animals with both parents genotyped using high-density panels high-quality genotypes (allelic discordance rate < 0.05) could be obtained using a 300 single nucleotide polymorphism (SNP) chip, while in the absence of genotyped ancestors at least 2,000 SNP markers are required. We have shown that imputation quality varies between chromosomes, and it is lower near the chromosome ends and drops with the increase in minor allele frequency. Imputation quality of the individual SNPs correlated well across breeds. Using the same LD-chip, we were able to obtain comparable imputation quality in all 3 breeds, so it may be suggested that a single chip could be used for all of them. Our findings also suggest that the presence of markers with extremely low imputation quality is likely to be explained by wrong mapping of the markers to the chromosomal positions.
Collapse
Affiliation(s)
| | | | - Asiya F Ayupova
- Skolkovo Institute of Science and Technology, Moscow, Russia
| | | | | | - Olga V Kostyunina
- Ernst Federal Science Center for Animal Husbandry, Dubrovitsy, Moscow Oblast, Russia
| | | | - Pavel V Mazin
- Skolkovo Institute of Science and Technology, Moscow, Russia.,Computer Science Department, National Research University Higher School of Economics, Moscow, Russia
| | - Natalia A Zinovieva
- Ernst Federal Science Center for Animal Husbandry, Dubrovitsy, Moscow Oblast, Russia
| |
Collapse
|
7
|
Yoshida GM, Lhorente JP, Correa K, Soto J, Salas D, Yáñez JM. Genome-Wide Association Study and Cost-Efficient Genomic Predictions for Growth and Fillet Yield in Nile Tilapia ( Oreochromis niloticus). G3 (BETHESDA, MD.) 2019; 9:2597-2607. [PMID: 31171566 PMCID: PMC6686944 DOI: 10.1534/g3.119.400116] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Accepted: 06/05/2019] [Indexed: 12/16/2022]
Abstract
Fillet yield (FY) and harvest weight (HW) are economically important traits in Nile tilapia production. Genetic improvement of these traits, especially for FY, are lacking, due to the absence of efficient methods to measure the traits without sacrificing fish and the use of information from relatives to selection. However, genomic information could be used by genomic selection to improve traits that are difficult to measure directly in selection candidates, as in the case of FY. The objectives of this study were: (i) to perform genome-wide association studies (GWAS) to dissect the genetic architecture of FY and HW, (ii) to evaluate the accuracy of genotype imputation and (iii) to assess the accuracy of genomic selection using true and imputed low-density (LD) single nucleotide polymorphism (SNP) panels to determine a cost-effective strategy for practical implementation of genomic information in tilapia breeding programs. The data set consisted of 5,866 phenotyped animals and 1,238 genotyped animals (108 parents and 1,130 offspring) using a 50K SNP panel. The GWAS were performed using all genotyped and phenotyped animals. The genotyped imputation was performed from LD panels (LD0.5K, LD1K and LD3K) to high-density panel (HD), using information from parents and 20% of offspring in the reference set and the remaining 80% in the validation set. In addition, we tested the accuracy of genomic selection using true and imputed genotypes comparing the accuracy obtained from pedigree-based best linear unbiased prediction (PBLUP) and genomic predictions. The results from GWAS supports evidence of the polygenic nature of FY and HW. The accuracy of imputation ranged from 0.90 to 0.98 for LD0.5K and LD3K, respectively. The accuracy of genomic prediction outperformed the estimated breeding value from PBLUP. The use of imputation for genomic selection resulted in an increased relative accuracy independent of the trait and LD panel analyzed. The present results suggest that genotype imputation could be a cost-effective strategy for genomic selection in Nile tilapia breeding programs.
Collapse
Affiliation(s)
- Grazyella M Yoshida
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, 8820808 Chile
- Benchmark Genetics Chile, Puerto Montt, Chile, and
| | | | | | - Jose Soto
- Grupo Acuacorporacion Internacional (GACI), Cañas, Costa Rica
| | - Diego Salas
- Grupo Acuacorporacion Internacional (GACI), Cañas, Costa Rica
| | - José M Yáñez
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, 8820808 Chile,
| |
Collapse
|
8
|
|
9
|
Velez-Irizarry D, Casiro S, Daza KR, Bates RO, Raney NE, Steibel JP, Ernst CW. Genetic control of longissimus dorsi muscle gene expression variation and joint analysis with phenotypic quantitative trait loci in pigs. BMC Genomics 2019; 20:3. [PMID: 30606113 PMCID: PMC6319002 DOI: 10.1186/s12864-018-5386-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Accepted: 12/18/2018] [Indexed: 12/21/2022] Open
Abstract
Background Economically important growth and meat quality traits in pigs are controlled by cascading molecular events occurring during development and continuing throughout the conversion of muscle to meat. However, little is known about the genes and molecular mechanisms involved in this process. Evaluating transcriptomic profiles of skeletal muscle during the initial steps leading to the conversion of muscle to meat can identify key regulators of polygenic phenotypes. In addition, mapping transcript abundance through genome-wide association analysis using high-density marker genotypes allows identification of genomic regions that control gene expression, referred to as expression quantitative trait loci (eQTL). In this study, we perform eQTL analyses to identify potential candidate genes and molecular markers regulating growth and meat quality traits in pigs. Results Messenger RNA transcripts obtained with RNA-seq of longissimus dorsi muscle from 168 F2 animals from a Duroc x Pietrain pig resource population were used to estimate gene expression variation subject to genetic control by mapping eQTL. A total of 339 eQTL were mapped (FDR ≤ 0.01) with 191 exhibiting local-acting regulation. Joint analysis of eQTL with phenotypic QTL (pQTL) segregating in our population revealed 16 genes significantly associated with 21 pQTL for meat quality, carcass composition and growth traits. Ten of these pQTL were for meat quality phenotypes that co-localized with one eQTL on SSC2 (8.8-Mb region) and 11 eQTL on SSC15 (121-Mb region). Biological processes identified for co-localized eQTL genes include calcium signaling (FERM, MRLN, PKP2 and CHRNA9), energy metabolism (SUCLG2 and PFKFB3) and redox hemostasis (NQO1 and CEP128), and results support an important role for activation of the PI3K-Akt-mTOR signaling pathway during the initial conversion of muscle to meat. Conclusion Co-localization of eQTL with pQTL identified molecular markers significantly associated with both economically important phenotypes and gene transcript abundance. This study reveals candidate genes contributing to variation in pig production traits, and provides new knowledge regarding the genetic architecture of meat quality phenotypes. Electronic supplementary material The online version of this article (10.1186/s12864-018-5386-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Sebastian Casiro
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA
| | - Kaitlyn R Daza
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA
| | - Ronald O Bates
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA
| | - Nancy E Raney
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA.,Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, 48824, USA
| | - Catherine W Ernst
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
10
|
Chassier M, Barrey E, Robert C, Duluard A, Danvy S, Ricard A. Genotype imputation accuracy in multiple equine breeds from medium- to high-density genotypes. J Anim Breed Genet 2018; 135:420-431. [DOI: 10.1111/jbg.12358] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 08/17/2018] [Accepted: 08/24/2018] [Indexed: 01/27/2023]
Affiliation(s)
- Marjorie Chassier
- Unité Mixte de Recherche 1313 Génétique Animale et Biologie Intégrative; Département Sciences du Vivant; Institut National de la Recherche Agronomique; AgroParisTech; Université Paris Saclay; Jouy-en-Josas France
| | - Eric Barrey
- Unité Mixte de Recherche 1313 Génétique Animale et Biologie Intégrative; Département Sciences du Vivant; Institut National de la Recherche Agronomique; AgroParisTech; Université Paris Saclay; Jouy-en-Josas France
| | - Céline Robert
- Unité Mixte de Recherche 1313 Génétique Animale et Biologie Intégrative; Département Sciences du Vivant; Institut National de la Recherche Agronomique; AgroParisTech; Université Paris Saclay; Jouy-en-Josas France
- Ecole Nationale Vétérinaire d'Alfort; Maisons Alfort France
| | - Arnaud Duluard
- Département élevage et santé animale; Le Trot; Paris France
| | - Sophie Danvy
- Institut Français du Cheval et de l'Equitation; Pôle développement; Innovation et Recherche; Exmes France
| | - Anne Ricard
- Unité Mixte de Recherche 1313 Génétique Animale et Biologie Intégrative; Département Sciences du Vivant; Institut National de la Recherche Agronomique; AgroParisTech; Université Paris Saclay; Jouy-en-Josas France
- Institut Français du Cheval et de l'Equitation; Pôle développement; Innovation et Recherche; Exmes France
| |
Collapse
|
11
|
Friedrich J, Antolín R, Edwards SM, Sánchez‐Molano E, Haskell MJ, Hickey JM, Wiener P. Accuracy of genotype imputation in Labrador Retrievers. Anim Genet 2018; 49:303-311. [PMID: 29974966 PMCID: PMC6055857 DOI: 10.1111/age.12677] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/05/2018] [Indexed: 12/12/2022]
Abstract
The dog is a valuable model species for the genetic analysis of complex traits, and the use of genotype imputation in dogs will be an important tool for future studies. It is of particular interest to analyse the effect of factors like single nucleotide polymorphism (SNP) density of genotyping arrays and relatedness between dogs on imputation accuracy due to the acknowledged genetic and pedigree structure of dog breeds. In this study, we simulated different genotyping strategies based on data from 1179 Labrador Retriever dogs. The study involved 5826 SNPs on chromosome 1 representing the high density (HighD) array; the low-density (LowD) array was simulated by masking different proportions of SNPs on the HighD array. The correlations between true and imputed genotypes for a realistic masking level of 87.5% ranged from 0.92 to 0.97, depending on the scenario used. A correlation of 0.92 was found for a likely scenario (10% of dogs genotyped using HighD, 87.5% of HighD SNPs masked in the LowD array), which indicates that genotype imputation in Labrador Retrievers can be a valuable tool to reduce experimental costs while increasing sample size. Furthermore, we show that genotype imputation can be performed successfully even without pedigree information and with low relatedness between dogs in the reference and validation sets. Based on these results, the impact of genotype imputation was evaluated in a genome-wide association analysis and genomic prediction in Labrador Retrievers.
Collapse
Affiliation(s)
- J. Friedrich
- Division of Genetics and GenomicsThe Roslin Institute and Royal (Dick) School of Veterinary StudiesUniversity of EdinburghMidlothianEH25 9RGUK
| | - R. Antolín
- Division of Genetics and GenomicsThe Roslin Institute and Royal (Dick) School of Veterinary StudiesUniversity of EdinburghMidlothianEH25 9RGUK
| | - S. M. Edwards
- Division of Genetics and GenomicsThe Roslin Institute and Royal (Dick) School of Veterinary StudiesUniversity of EdinburghMidlothianEH25 9RGUK
| | - E. Sánchez‐Molano
- Division of Genetics and GenomicsThe Roslin Institute and Royal (Dick) School of Veterinary StudiesUniversity of EdinburghMidlothianEH25 9RGUK
| | - M. J. Haskell
- Animal and Veterinary Sciences GroupScotland's Rural CollegeEdinburghEH9 3JGUK
| | - J. M. Hickey
- Division of Genetics and GenomicsThe Roslin Institute and Royal (Dick) School of Veterinary StudiesUniversity of EdinburghMidlothianEH25 9RGUK
| | - P. Wiener
- Division of Genetics and GenomicsThe Roslin Institute and Royal (Dick) School of Veterinary StudiesUniversity of EdinburghMidlothianEH25 9RGUK
| |
Collapse
|
12
|
Zhang C, Kemp RA, Stothard P, Wang Z, Boddicker N, Krivushin K, Dekkers J, Plastow G. Genomic evaluation of feed efficiency component traits in Duroc pigs using 80K, 650K and whole-genome sequence variants. Genet Sel Evol 2018; 50:14. [PMID: 29625549 PMCID: PMC5889553 DOI: 10.1186/s12711-018-0387-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2017] [Accepted: 03/27/2018] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Increasing marker density was proposed to have potential to improve the accuracy of genomic prediction for quantitative traits; whole-sequence data is expected to give the best accuracy of prediction, since all causal mutations that underlie a trait are expected to be included. However, in cattle and chicken, this assumption is not supported by empirical studies. Our objective was to compare the accuracy of genomic prediction of feed efficiency component traits in Duroc pigs using single nucleotide polymorphism (SNP) panels of 80K, imputed 650K, and whole-genome sequence variants using GBLUP, BayesB and BayesRC methods, with the ultimate purpose to determine the optimal method to increase genetic gain for feed efficiency in pigs. RESULTS Phenotypes of average daily feed intake (ADFI), average daily gain (ADG), ultrasound backfat depth (FAT), and loin muscle depth (LMD) were available for 1363 Duroc boars from a commercial breeding program. Genotype imputation accuracies reached 92.1% from 80K to 650K and 85.6% from 650K to whole-genome sequence variants. Average accuracies across methods and marker densities of genomic prediction of ADFI, FAT, LMD and ADG were 0.40, 0.65, 0.30 and 0.15, respectively. For ADFI and FAT, BayesB outperformed GBLUP, but increasing marker density had little advantage for genomic prediction. For ADG and LMD, GBLUP outperformed BayesB, while BayesRC based on whole-genome sequence data gave the best accuracies and reached up to 0.35 for LMD and 0.25 for ADG. CONCLUSIONS Use of genomic information was beneficial for prediction of ADFI and FAT but not for that of ADG and LMD compared to pedigree-based estimates. BayesB based on 80K SNPs gave the best genomic prediction accuracy for ADFI and FAT, while BayesRC based on whole-genome sequence data performed best for ADG and LMD. We suggest that these differences between traits in the effect of marker density and method on accuracy of genomic prediction are mainly due to the underlying genetic architecture of the traits.
Collapse
Affiliation(s)
- Chunyan Zhang
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | | | - Paul Stothard
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Zhiquan Wang
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | | | - Kirill Krivushin
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Jack Dekkers
- Department of Animal Science, Iowa State University, Ames, IA, 50011, USA
| | - Graham Plastow
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada.
| |
Collapse
|
13
|
Frischknecht M, Pausch H, Bapst B, Signer-Hasler H, Flury C, Garrick D, Stricker C, Fries R, Gredler-Grandl B. Highly accurate sequence imputation enables precise QTL mapping in Brown Swiss cattle. BMC Genomics 2017; 18:999. [PMID: 29284405 PMCID: PMC5747239 DOI: 10.1186/s12864-017-4390-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 12/15/2017] [Indexed: 01/06/2023] Open
Abstract
Background Within the last few years a large amount of genomic information has become available in cattle. Densities of genomic information vary from a few thousand variants up to whole genome sequence information. In order to combine genomic information from different sources and infer genotypes for a common set of variants, genotype imputation is required. Results In this study we evaluated the accuracy of imputation from high density chips to whole genome sequence data in Brown Swiss cattle. Using four popular imputation programs (Beagle, FImpute, Impute2, Minimac) and various compositions of reference panels, the accuracy of the imputed sequence variant genotypes was high and differences between the programs and scenarios were small. We imputed sequence variant genotypes for more than 1600 Brown Swiss bulls and performed genome-wide association studies for milk fat percentage at two stages of lactation. We found one and three quantitative trait loci for early and late lactation fat content, respectively. Known causal variants that were imputed from the sequenced reference panel were among the most significantly associated variants of the genome-wide association study. Conclusions Our study demonstrates that whole-genome sequence information can be imputed at high accuracy in cattle populations. Using imputed sequence variant genotypes in genome-wide association studies may facilitate causal variant detection. Electronic supplementary material The online version of this article (10.1186/s12864-017-4390-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mirjam Frischknecht
- Qualitas AG, Chamerstrasse 56a, 6300, Zug, Switzerland. .,Bern University of Applied Sciences, School of Agricultural, Forest and Food Sciences HAFL, Länggasse 85, 3052, Zollikofen, Switzerland.
| | - Hubert Pausch
- Chair of Animal Breeding, Technische Universität München, Liesel-Beckmann-Str. 1, 85354, Freising, Germany.,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.,ETH Zurich, Tannenstrasse 1, 8092, Zurich, Switzerland
| | - Beat Bapst
- Qualitas AG, Chamerstrasse 56a, 6300, Zug, Switzerland
| | - Heidi Signer-Hasler
- Bern University of Applied Sciences, School of Agricultural, Forest and Food Sciences HAFL, Länggasse 85, 3052, Zollikofen, Switzerland
| | - Christine Flury
- Bern University of Applied Sciences, School of Agricultural, Forest and Food Sciences HAFL, Länggasse 85, 3052, Zollikofen, Switzerland
| | - Dorian Garrick
- Institute of Veterinary, Animal & Biomedical Sciences, Massey University, 4442, Palmerston North, New Zealand
| | | | - Ruedi Fries
- Chair of Animal Breeding, Technische Universität München, Liesel-Beckmann-Str. 1, 85354, Freising, Germany
| | | |
Collapse
|
14
|
Casiró S, Velez-Irizarry D, Ernst CW, Raney NE, Bates RO, Charles MG, Steibel JP. Genome-wide association study in an F2 Duroc x Pietrain resource population for economically important meat quality and carcass traits. J Anim Sci 2017; 95:545-558. [PMID: 28380601 DOI: 10.2527/jas.2016.1003] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Meat quality is essential for consumer acceptance, it ultimately impacts pork production profitability and it is subject to genetic control. The objective of this study was to map genomic regions associated with economically important meat quality and carcass traits. We performed a genome-wide association (GWA) analysis to map regions associated with 38 meat quality and carcass traits recorded for 948 F2 pigs from the Michigan State University Duroc × Pietrain resource population. The F0, F1, and 336 F2 pigs were genotyped with the Illumina Porcine SNP60 BeadChip, while the remaining F2 pigs were genotyped with the GeneSeek Genomic Profiler for Porcine Low Desnisty (LD) chip, and imputed with high accuracy ( = 0.97). Altogether the genomic dataset comprised 1,019 animals and 44,911 SNP. A Gaussian linear mixed model was fitted to estimate the breeding values and the variance components. A linear transformation was performed to estimate the marker effects and variances. Type I error rate was controlled at a False Discovery Rate of 5%. Seven putative QTL found in this study were previously reported in other studies. Two novel QTL associated with tenderness (TEN) were located on SSC3 [135.6:137.5Mb; False Discovery rate (FDR) < 0.03] and SSC5 (67.3:69.1Mb; FDR < 0.02). The QTL region identified on SSC15 includes Protein Kinase AMP-activated ɣ 3-subunit gene (), which has been associated with 24-h pH (pH24), drip loss (DL) and cook yield (CY). Also, novel candidate genes were identified for TEN in the region on SSC5 [A Kinase (PRKA) Anchor Protein 3 (], and for tenth rib backfat thickness (BF10) [Carnitine O-Acetyltransferase ()] on SSC1. The association of gene polymorphisms with pork quality traits has been reported for several pig populations. However, there are no SNP for this gene on the chip used, thus we genotyped the animals for 2 non-synonymous variants ( and ). We then performed a GWA conditioning on the genotype of both SNP and was associated with pH24, DL, protein content (PRO) and CY ( < 0.004) and T30N with Juiciness, TEN, shear force, pH24, PRO, and CY < 0.04). Finally, we performed a GWA conditioning on the genotype of the SNP peak detected in this study, and T30N remained associated only with PRO ( < 0.02). Therefore, in this study we identified 2 novel QTL regions, suggest 2 novel candidate genes, and conclude that other SNP in PRKAG3 or nearby gene(s) explain the observed associations on SSC15 in this population.
Collapse
|
15
|
Chen C, Steibel JP, Tempelman RJ. Genome-Wide Association Analyses Based on Broadly Different Specifications for Prior Distributions, Genomic Windows, and Estimation Methods. Genetics 2017; 206:1791-1806. [PMID: 28637709 PMCID: PMC5560788 DOI: 10.1534/genetics.117.202259] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2017] [Accepted: 06/19/2017] [Indexed: 11/18/2022] Open
Abstract
A currently popular strategy (EMMAX) for genome-wide association (GWA) analysis infers association for the specific marker of interest by treating its effect as fixed while treating all other marker effects as classical Gaussian random effects. It may be more statistically coherent to specify all markers as sharing the same prior distribution, whether that distribution is Gaussian, heavy-tailed (BayesA), or has variable selection specifications based on a mixture of, say, two Gaussian distributions [stochastic search and variable selection (SSVS)]. Furthermore, all such GWA inference should be formally based on posterior probabilities or test statistics as we present here, rather than merely being based on point estimates. We compared these three broad categories of priors within a simulation study to investigate the effects of different degrees of skewness for quantitative trait loci (QTL) effects and numbers of QTL using 43,266 SNP marker genotypes from 922 Duroc-Pietrain F2-cross pigs. Genomic regions were based either on single SNP associations, on nonoverlapping windows of various fixed sizes (0.5-3 Mb), or on adaptively determined windows that cluster the genome into blocks based on linkage disequilibrium. We found that SSVS and BayesA lead to the best receiver operating curve properties in almost all cases. We also evaluated approximate maximum a posteriori (MAP) approaches to BayesA and SSVS as potential computationally feasible alternatives; however, MAP inferences were not promising, particularly due to their sensitivity to starting values. We determined that it is advantageous to use variable selection specifications based on adaptively constructed genomic window lengths for GWA studies.
Collapse
Affiliation(s)
- Chunyu Chen
- Department of Animal Science, Michigan State University, East Lansing, Michigan 48824
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, Michigan 48824
| | - Robert J Tempelman
- Department of Animal Science, Michigan State University, East Lansing, Michigan 48824
| |
Collapse
|
16
|
Duarte JLG, Cantet RJC, Rubio YLB, Bates RO, Ernst CW, Raney NE, Rogberg-Muñoz A, Steibel JP. Refining genomewide association for growth and fat deposition traits in an F pig population. J Anim Sci 2017; 94:1387-97. [PMID: 27135998 DOI: 10.2527/jas.2015-0182] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
The identification of genomic regions that affect additive genetic variation and contain genes involved in controlling growth and fat deposition has enormous impact in the farm animal industry (e.g., carcass merit and meat quality). Therefore, a genomewide association study was implemented in an F pig population using a 60,000 SNP marker panel for traits related to growth and fat deposition. Estimated genomic EBV were linearly transformed to calculate SNP effects and to identify genomic positions possibly associated with the genetic variability of each trait. Genomic segments were then defined considering the markers included in a region 1 Mb up- and downstream from the SNP with the smallest -value and a false discovery rate < 0.05 for each trait. The significance for each 2-Mb segment was tested using the Bonferroni correction. Significant SNP were detected on SSC2, SSC3, SSC5, and SSC6, but 2-Mb segment significant effects were observed on SSC3 for weight at birth (wt_birth) and on SSC6 for 10th-rib backfat and last-rib backfat measured by ultrasound at different ages. Furthermore, a 6-Mb segment on SSC6 was also considered because the 2-Mb segments for 10 different fat deposition traits were overlapped. Although the segment effects for each trait remain significant, the proportion of additive variance explained by this larger segment was slightly smaller in some traits. In general, the results confirm the presence of genetic variability for wt_birth on SSC3 (18.0-20.2 Mb) and for fat deposition traits on SSC6 (133.8-136.0 Mb). Within these regions, fibrosin () and myosin light chain, phosphorylatable, fast skeletal muscle () genes could be considered as candidates for the wt_birth signal on SSC3, and the SERPINE1 mRNAbinding protein 1 gene () may be a candidate for the fat deposition trait signals on SSC6.
Collapse
|
17
|
Furuta T, Ashikari M, Jena KK, Doi K, Reuscher S. Adapting Genotyping-by-Sequencing for Rice F2 Populations. G3 (BETHESDA, MD.) 2017; 7:881-893. [PMID: 28082325 PMCID: PMC5345719 DOI: 10.1534/g3.116.038190] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2016] [Accepted: 01/09/2017] [Indexed: 12/30/2022]
Abstract
Rapid and cost-effective genotyping of large mapping populations can be achieved by sequencing a reduced representation of the genome of every individual in a given population, and using that information to generate genetic markers. A customized genotyping-by-sequencing (GBS) pipeline was developed to genotype a rice F2 population from a cross of Oryza sativa ssp. japonica cv. Nipponbare and the African wild rice species O. longistaminata While most GBS pipelines aim to analyze mainly homozygous populations, we attempted to genotype a highly heterozygous F2 population. We show how species- and population-specific improvements of established protocols can drastically increase sample throughput and genotype quality. Using as few as 50,000 reads for some individuals (134,000 reads on average), we were able to generate up to 8154 informative SNP markers in 1081 F2 individuals. Additionally, the effects of enzyme choice, read coverage, and data postprocessing are evaluated. Using GBS-derived markers, we were able to assemble a genetic map of 1536 cM. To demonstrate the usefulness of our GBS pipeline, we determined quantitative trait loci (QTL) for the number of tillers. We were able to map four QTL to chromosomes 1, 3, 4, and 8, and partially confirm their effects using introgression lines. We provide an example of how to successfully use GBS with heterozygous F2 populations. By using the comparatively low-cost MiSeq platform, we show that the GBS method is flexible and cost-effective, even for smaller laboratories.
Collapse
Affiliation(s)
- Tomoyuki Furuta
- Bioscience and Biotechnology Center, Nagoya University, 464-8601, Japan
| | - Motoyuki Ashikari
- Bioscience and Biotechnology Center, Nagoya University, 464-8601, Japan
| | - Kshirod K Jena
- Plant Breeding Division, International Rice Research Institute, 1301 Manila, Philippines
| | - Kazuyuki Doi
- Associated Field Science and Research Center, Nagoya University, 470-0151, Japan
| | - Stefan Reuscher
- Bioscience and Biotechnology Center, Nagoya University, 464-8601, Japan
| |
Collapse
|
18
|
Bernal Rubio YL, Gualdrón Duarte JL, Bates RO, Ernst CW, Nonneman D, Rohrer GA, King DA, Shackelford SD, Wheeler TL, Cantet RJC, Steibel JP. Implementing meta-analysis from genome-wide association studies for pork quality traits. J Anim Sci 2016; 93:5607-17. [PMID: 26641170 DOI: 10.2527/jas.2015-9502] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Pork quality plays an important role in the meat processing industry. Thus, different methodologies have been implemented to elucidate the genetic architecture of traits affecting meat quality. One of the most common and widely used approaches is to perform genome-wide association (GWA) studies. However, a limitation of many GWA in animal breeding is the limited power due to small sample sizes in animal populations. One alternative is to implement a meta-analysis of GWA (MA-GWA) combining results from independent association studies. The objective of this study was to identify significant genomic regions associated with meat quality traits by performing MA-GWA for 8 different traits in 3 independent pig populations. Results from MA-GWA were used to search for genes possibly associated with the set of evaluated traits. Data from 3 pig data sets (U.S. Meat Animal Research Center, commercial, and Michigan State University Pig Resource Population) were used. A MA was implemented by combining -scores derived for each SNP in every population and then weighting them using the inverse of estimated variance of SNP effects. A search for annotated genes retrieved genes previously reported as candidates for shear force (calpain-1 catalytic subunit [] and calpastatin []), as well as for ultimate pH, purge loss, and cook loss (protein kinase, AMP-activated, γ 3 noncatalytic subunit []). In addition, novel candidate genes were identified for intramuscular fat and cook loss (acyl-CoA synthetase family member 3 mitochondrial []) and for the objective measure of muscle redness, CIE a* (glycogen synthase 1, muscle [] and ferritin, light polypeptide []). Thus, implementation of MA-GWA allowed integration of results for economically relevant traits and identified novel genes to be tested as candidates for meat quality traits in pig populations.
Collapse
|
19
|
Sevillano CA, Vandenplas J, Bastiaansen JWM, Calus MPL. Empirical determination of breed-of-origin of alleles in three-breed cross pigs. Genet Sel Evol 2016; 48:55. [PMID: 27491547 PMCID: PMC4973529 DOI: 10.1186/s12711-016-0234-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 07/27/2016] [Indexed: 01/01/2023] Open
Abstract
Background Although breeding programs for pigs and poultry aim at improving crossbred performance, they mainly use training populations that consist of purebred animals. For some traits, e.g. residual feed intake, the genetic correlation between purebred and crossbred performance is low and thus including crossbred animals in the training population is required. With crossbred animals, the effects of single nucleotide polymorphisms (SNPs) may be breed-specific because linkage disequilibrium patterns between a SNP and a quantitative trait locus (QTL), and allele frequencies and allele substitution effects of a QTL may differ between breeds. To estimate the breed-specific effects of alleles in a crossbred population, the breed-of-origin of alleles in crossbred animals must be known. This study was aimed at investigating the performance of an approach that assigns breed-of-origin of alleles in real data of three-breed cross pigs. Genotypic data were available for 14,187 purebred, 1354 F1, and 1723 three-breed cross pigs. Results On average, 93.0 % of the alleles of three-breed cross pigs were assigned a breed-of-origin without using pedigree information and 94.6 % with using pedigree information. The assignment percentage could be improved by allowing a percentage (fr) of the copies of a haplotype to be observed in a purebred population different from the assigned breed-of-origin. Changing fr from 0 to 20 %, increased assignment of breed-of-origin by 0.6 and 0.7 % when pedigree information was and was not used, respectively, which indicates the benefit of setting fr to 20 %. Conclusions Breed-of-origin of alleles of three-breed cross pigs can be derived empirically without the need for pedigree information, with 93.7 % of the alleles assigned a breed-of-origin. Pedigree information is useful to reduce computation time and can slightly increase the percentage of assignments. Knowledge on the breed-of-origin of alleles allows the use of models that implement breed-specific effects of SNP alleles in genomic prediction, with the aim of improving selection of purebred animals for crossbred offspring performance. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0234-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Claudia A Sevillano
- Animal Breeding and Genomics Centre, Wageningen University, PO Box 338, 6700 AH, Wageningen, The Netherlands. .,Topigs Norsvin, PO Box 43, 6640 AA, Beuningen, The Netherlands.
| | - Jeremie Vandenplas
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, 6700 AH, Wageningen, The Netherlands
| | - John W M Bastiaansen
- Animal Breeding and Genomics Centre, Wageningen University, PO Box 338, 6700 AH, Wageningen, The Netherlands
| | - Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
20
|
Mikhchi A, Honarvar M, Kashan NEJ, Aminafshar M. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation. J Theor Biol 2016; 399:148-58. [PMID: 27049046 DOI: 10.1016/j.jtbi.2016.03.035] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Revised: 03/06/2016] [Accepted: 03/24/2016] [Indexed: 11/17/2022]
Abstract
Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation.
Collapse
Affiliation(s)
- Abbas Mikhchi
- Department of Animal Science, Science and Research Branch, Islamic Azad University, Tehran, Iran.
| | - Mahmood Honarvar
- Department of Animal Science, Shahr-e-Qods Branch, Islamic Azad University, Tehran, Iran
| | - Nasser Emam Jomeh Kashan
- Department of Animal Science, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Mehdi Aminafshar
- Department of Animal Science, Science and Research Branch, Islamic Azad University, Tehran, Iran
| |
Collapse
|
21
|
Forneris NS, Steibel JP, Legarra A, Vitezica ZG, Bates RO, Ernst CW, Basso AL, Cantet RJC. A comparison of methods to estimate genomic relationships using pedigree and markers in livestock populations. J Anim Breed Genet 2016; 133:452-462. [PMID: 27135179 DOI: 10.1111/jbg.12217] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 03/30/2016] [Indexed: 12/20/2022]
Abstract
Accurate prediction of breeding values depends on capturing the variability in genome sharing of relatives with the same pedigree relationship. Here, we compare two approaches to set up genomic relationship matrices for precision of genomic relationships (GR) and accuracy of estimated breeding values (GEBV). Real and simulated data (pigs, 60k SNP) were analysed, and GR were estimated using two approaches: (i) identity by state, corrected with either the observed (GVR-O ) or the base population (GVR-B ) allele frequencies and (ii) identity by descent using linkage analysis (GIBD-L ). Estimators were evaluated for precision and empirical bias with respect to true pedigree IBD GR. All three estimators had very low bias. GIBD-L displayed the lowest sampling error and the highest correlation with true genome-shared values. GVR-B approximated GIBD-L 's correlation and had lower error than GVR-O . Accuracy of GEBV for selection candidates was significantly higher when GIBD-L was used and identical between GVR-O and GVR-B . In real data, GIBD-L 's sampling standard deviation was the closest to the theoretical value for each pedigree relationship. Use of pedigree to calculate GR improved the precision of estimates and the accuracy of GEBV.
Collapse
Affiliation(s)
- N S Forneris
- Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina
| | - J P Steibel
- Department of Animal Science, Michigan State University, East Lansing, MI, USA
| | - A Legarra
- INRA, GenPhySE (Génétique, Physiologie et Systèmes d'Elevage), Castanet-Tolosan, France
| | - Z G Vitezica
- INRA, GenPhySE (Génétique, Physiologie et Systèmes d'Elevage), Castanet-Tolosan, France.,INP, ENSAT, GenPhySE (Génétique, Physiologie et Systèmes d'Elevage), Université de Toulouse, Castanet-Tolosan, France
| | - R O Bates
- Department of Animal Science, Michigan State University, East Lansing, MI, USA
| | - C W Ernst
- Department of Animal Science, Michigan State University, East Lansing, MI, USA
| | - A L Basso
- Departamento de Biología Aplicada y Alimentos, Facultad de Agronomía, Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina
| | - R J C Cantet
- Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina.,INPA-CONICET (Consejo Nacional de Investigaciones Científicas y Técnicas), Buenos Aires, Argentina
| |
Collapse
|
22
|
Sato S, Kikuchi T, Uemoto Y, Mikawa S, Suzuki K. Effect of candidate gene polymorphisms on reproductive traits in a Large White pig population. Anim Sci J 2016; 87:1455-1463. [DOI: 10.1111/asj.12580] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Revised: 09/24/2015] [Accepted: 11/09/2015] [Indexed: 12/21/2022]
Affiliation(s)
- Shuji Sato
- National Livestock Breeding Center; Nishigo Fukushima Japan
| | | | | | - Satoshi Mikawa
- National Institute of Agrobiological Sciences; Tsukuba Ibaraki Japan
| | - Keiichi Suzuki
- Graduate School of Agricultural Science; Tohoku University; Sendai Miyagi Japan
| |
Collapse
|
23
|
Bernal Rubio YL, Gualdrón Duarte JL, Bates RO, Ernst CW, Nonneman D, Rohrer GA, King A, Shackelford SD, Wheeler TL, Cantet RJC, Steibel JP. Meta-analysis of genome-wide association from genomic prediction models. Anim Genet 2015; 47:36-48. [PMID: 26607299 PMCID: PMC4738412 DOI: 10.1111/age.12378] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/11/2015] [Indexed: 12/21/2022]
Abstract
Genome-wide association (GWA) studies based on GBLUP models are a common practice in animal breeding. However, effect sizes of GWA tests are small, requiring larger sample sizes to enhance power of detection of rare variants. Because of difficulties in increasing sample size in animal populations, one alternative is to implement a meta-analysis (MA), combining information and results from independent GWA studies. Although this methodology has been used widely in human genetics, implementation in animal breeding has been limited. Thus, we present methods to implement a MA of GWA, describing the proper approach to compute weights derived from multiple genomic evaluations based on animal-centric GBLUP models. Application to real datasets shows that MA increases power of detection of associations in comparison with population-level GWA, allowing for population structure and heterogeneity of variance components across populations to be accounted for. Another advantage of MA is that it does not require access to genotype data that is required for a joint analysis. Scripts related to the implementation of this approach, which consider the strength of association as well as the sign, are distributed and thus account for heterogeneity in association phase between QTL and SNPs. Thus, MA of GWA is an attractive alternative to summarizing results from multiple genomic studies, avoiding restrictions with genotype data sharing, definition of fixed effects and different scales of measurement of evaluated traits.
Collapse
Affiliation(s)
- Y L Bernal Rubio
- Departamento de Producción Animal, Facultad de Agronomía, UBA, Buenos Aires, 1417, Argentina.,Department of Animal Science, Michigan State University, East Lansing, MI, 48824-1225, USA
| | - J L Gualdrón Duarte
- Departamento de Producción Animal, Facultad de Agronomía, UBA, Buenos Aires, 1417, Argentina
| | - R O Bates
- Departamento de Producción Animal, Facultad de Agronomía, UBA, Buenos Aires, 1417, Argentina
| | - C W Ernst
- Departamento de Producción Animal, Facultad de Agronomía, UBA, Buenos Aires, 1417, Argentina
| | - D Nonneman
- USDA/ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933-0166, USA
| | - G A Rohrer
- USDA/ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933-0166, USA
| | - A King
- USDA/ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933-0166, USA
| | - S D Shackelford
- USDA/ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933-0166, USA
| | - T L Wheeler
- USDA/ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933-0166, USA
| | - R J C Cantet
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824-1225, USA.,Consejo Nacional de Investigaciones Cientificas y Tecnicas - CONICET, Buenos Aires, Argentina
| | - J P Steibel
- Departamento de Producción Animal, Facultad de Agronomía, UBA, Buenos Aires, 1417, Argentina.,Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, 48824-1225, USA
| |
Collapse
|
24
|
Abstract
Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit.
Collapse
|
25
|
Zhao X, Zhao K, Ren J, Zhang F, Jiang C, Hong Y, Jiang K, Yang Q, Wang C, Ding N, Huang L, Zhang Z, Xing Y. An imputation-based genome-wide association study on traits related to male reproduction in a White Duroc × Erhualian F2 population. Anim Sci J 2015; 87:646-54. [PMID: 26425933 DOI: 10.1111/asj.12468] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Revised: 04/12/2015] [Accepted: 04/27/2015] [Indexed: 01/22/2023]
Abstract
Boar reproductive traits are economically important for the pig industry. Here we conducted a genome-wide association study (GWAS) for 13 reproductive traits measured on 205 F2 boars at day 300 using 60 K single nucleotide polymorphism (SNP) data imputed from a reference panel of 1200 pigs in a White Duroc × Erhualian F2 intercross population. We identified 10 significant loci for seven traits on eight pig chromosomes (SSC). Two loci surpassed the genome-wide significance level, including one for epididymal weight around 60.25 Mb on SSC7 and one for semen temperature around 43.69 Mb on SSC4. Four of the 10 significant loci that we identified were consistent with previously reported quantitative trait loci for boar reproduction traits. We highlighted several interesting candidate genes at these loci, including APN, TEP1, PARP2, SPINK1 and PDE1C. To evaluate the imputation accuracy, we further genotyped nine GWAS top SNPs using PCR restriction fragment length polymorphism or Sanger sequencing. We found an average of 91.44% of genotype concordance, 95.36% of allelic concordance and 0.85 of r(2) correlation between imputed and real genotype data. This indicates that our GWAS mapping results based on imputed SNP data are reliable, providing insights into the genetic basis of boar reproductive traits.
Collapse
Affiliation(s)
- Xueyan Zhao
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| | - Kewei Zhao
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| | - Jun Ren
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| | - Feng Zhang
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| | - Chao Jiang
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| | - Yuan Hong
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| | - Kai Jiang
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| | - Qiang Yang
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| | - Chengbin Wang
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| | - Nengshui Ding
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| | - Lusheng Huang
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| | - Zhiyan Zhang
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| | - Yuyun Xing
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| |
Collapse
|
26
|
Heidaritabar M, Calus MPL, Vereijken A, Groenen MAM, Bastiaansen JWM. Accuracy of imputation using the most common sires as reference population in layer chickens. BMC Genet 2015; 16:101. [PMID: 26282557 PMCID: PMC4539854 DOI: 10.1186/s12863-015-0253-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 07/10/2015] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Genotype imputation has become a standard practice in modern genetic research to increase genome coverage and improve the accuracy of genomic selection (GS) and genome-wide association studies (GWAS). We assessed accuracies of imputing 60K genotype data from lower density single nucleotide polymorphism (SNP) panels using a small set of the most common sires in a population of 2140 white layer chickens. Several factors affecting imputation accuracy were investigated, including the size of the reference population, the level of the relationship between the reference and validation populations, and minor allele frequency (MAF) of the SNP being imputed. RESULTS The accuracy of imputation was assessed with different scenarios using 22 and 62 carefully selected reference animals (Ref(22) and Ref(62)). Animal-specific imputation accuracy corrected for gene content was moderate on average (~ 0.80) in most scenarios and low in the 3K to 60K scenario. Maximum average accuracies were 0.90 and 0.93 for the most favourable scenario for Ref(22) and Ref(62) respectively, when SNPs were masked independent of their MAF. SNPs with low MAF were more difficult to impute, and the larger reference population considerably improved the imputation accuracy for these rare SNPs. When Ref(22) was used for imputation, the average imputation accuracy decreased by 0.04 when validation population was two instead of one generation away from the reference and increased again by 0.05 when validation was three generations away. Selecting the reference animals from the most common sires, compared with random animals from the population, considerably improved imputation accuracy for low MAF SNPs, but gave only limited improvement for other MAF classes. The allelic R(2) measure from Beagle software was found to be a good predictor of imputation reliability (correlation ~ 0.8) when the density of validation panel was very low (3K) and the MAF of the SNP and the size of the reference population were not extremely small. CONCLUSIONS Even with a very small number of animals in the reference population, reasonable accuracy of imputation can be achieved. Selecting a set of the most common sires, rather than selecting random animals for the reference population, improves the imputation accuracy of rare alleles, which may be a benefit when imputing with whole genome re-sequencing data.
Collapse
Affiliation(s)
- Marzieh Heidaritabar
- Animal Breeding and Genomics Centre, Wageningen University, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
| | - Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
| | - Addie Vereijken
- Hendrix Genetics Research, Technology and Services B.V., P.O. Box 114, 5830 AC, Boxmeer, the Netherlands.
| | - Martien A M Groenen
- Animal Breeding and Genomics Centre, Wageningen University, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
| | - John W M Bastiaansen
- Animal Breeding and Genomics Centre, Wageningen University, P.O. Box 338, 6700 AH, Wageningen, the Netherlands.
| |
Collapse
|
27
|
Xiang T, Ma P, Ostersen T, Legarra A, Christensen OF. Imputation of genotypes in Danish purebred and two-way crossbred pigs using low-density panels. Genet Sel Evol 2015; 47:54. [PMID: 26122927 PMCID: PMC4486706 DOI: 10.1186/s12711-015-0134-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Accepted: 06/13/2015] [Indexed: 01/30/2023] Open
Abstract
Background Genotype imputation is commonly used as an initial step in genomic selection since the accuracy of genomic selection does not decline if accurately imputed genotypes are used instead of actual genotypes but for a lower cost. Performance of imputation has rarely been investigated in crossbred animals and, in particular, in pigs. The extent and pattern of linkage disequilibrium differ in crossbred versus purebred animals, which may impact the performance of imputation. In this study, first we compared different scenarios of imputation from 5 K to 8 K single nucleotide polymorphisms (SNPs) in genotyped Danish Landrace and Yorkshire and crossbred Landrace-Yorkshire datasets and, second, we compared imputation from 8 K to 60 K SNPs in genotyped purebred and simulated crossbred datasets. All imputations were done using software Beagle version 3.3.2. Then, we investigated the reasons that could explain the differences observed. Results Genotype imputation performs as well in crossbred animals as in purebred animals when both parental breeds are included in the reference population. When the size of the reference population is very large, it is not necessary to use a reference population that combines the two breeds to impute the genotypes of purebred animals because a within-breed reference population can provide a very high level of imputation accuracy (correct rate ≥ 0.99, correlation ≥ 0.95). However, to ensure that similar imputation accuracies are obtained for crossbred animals, a reference population that combines both parental purebred animals is required. Imputation accuracies are higher when a larger proportion of haplotypes are shared between the reference population and the validation (imputed) populations. Conclusions The results from both real data and pedigree-based simulated data demonstrate that genotype imputation from low-density panels to medium-density panels is highly accurate in both purebred and crossbred pigs. In crossbred pigs, combining the parental purebred animals in the reference population is necessary to obtain high imputation accuracy. Electronic supplementary material The online version of this article (doi:10.1186/s12711-015-0134-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tao Xiang
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, DK-8830, Denmark. .,INRA, UR1388 GenPhySE, CS-52627, Castanet-Tolosan, F-31326, France.
| | - Peipei Ma
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, DK-8830, Denmark.
| | - Tage Ostersen
- Pig Research Centre, Danish Agricultural and Food Council, Copenhagen, DK-1609, Denmark.
| | - Andres Legarra
- INRA, UR1388 GenPhySE, CS-52627, Castanet-Tolosan, F-31326, France.
| | - Ole F Christensen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, DK-8830, Denmark.
| |
Collapse
|
28
|
Yang W, Chen C, Steibel JP, Ernst CW, Bates RO, Zhou L, Tempelman RJ. A comparison of alternative random regression and reaction norm models for whole genome predictions. J Anim Sci 2015; 93:2678-92. [PMID: 26115256 DOI: 10.2527/jas.2014-8685] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Whole genome prediction (WGP) based on high density SNP marker panels is known to improve the accuracy of breeding value (BV) prediction in livestock. However, these accuracies can be compromised when genotype by environment interaction (G×E) exists but is not accounted for. Reaction norm (RN) and random regression (RR) models have proven to be useful in accounting for G×E in pre-WGP evaluations by modeling BV as linear or higher order functions of environmental or temporal covariates. We extend these RR/RN models based on several alternative specifications for SNP-specific intercepts and linear slopes on environmental covariates. One specification is based on bivariate normality (BVN) of SNP-specific intercepts and slopes, whereas 2 others, IW-BayesA and based on inverted Wishart (IW) extensions IW-BayesB, are, respectively, bivariate Student t extensions of currently popular models without (BayesA) or with (BayesB) variable selection. We highlight alternative specifications based on the square root free Cholesky decomposition (CD) of SNP-specific variance-covariance (VCV) matrices in an attempt to better differentially model environmentally sensitive from environmentally robust QTL. Two CD specifications were considered with (CD-BayesB) or without (CD-BayesA) any variable selection on intercept and slope effects. We compared each of the 5 models based on an RN simulation study. Six scenarios were considered based on differences in overall genetic correlations between SNP-specific intercept and slope effects as well as on heritabilities and numbers of environmentally robust versus sensitive QTL. In most scenarios, IW-BayesA had the greatest accuracy, whereas CD-BayesB exhibited the greatest accuracy in low complexity architectures (i.e., low number of QTL). In an RR application of a Duroc × Pietrain resource population at Michigan State University, 5,271 SNP markers and 928 F2 animals with known pedigree were analyzed for backfat thickness at wk 10, 13, 16, 19, and 22. SNP-based RR methods had a 2.5% greater (P < 0.0001) cross-validation accuracy for predicting phenotypes than the SNP-based conventional BayesA/BayesB and/or pedigree based RR BLUP; however, none of the proposed RR models had performances that were different from each other.
Collapse
|
29
|
A hidden Markov approach for ascertaining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium. BMC Bioinformatics 2015; 16:61. [PMID: 25887316 PMCID: PMC4351697 DOI: 10.1186/s12859-015-0479-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 01/27/2015] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Allelic specific expression (ASE) increases our understanding of the genetic control of gene expression and its links to phenotypic variation. ASE testing is implemented through binomial or beta-binomial tests of sequence read counts of alternative alleles at a cSNP of interest in heterozygous individuals. This requires prior ascertainment of the cSNP genotypes for all individuals. To meet the needs, we propose hidden Markov methods to call SNPs from next generation RNA sequence data when ASE possibly exists. RESULTS We propose two hidden Markov models (HMMs), HMM-ASE and HMM-NASE that consider or do not consider ASE, respectively, in order to improve genotyping accuracy. Both HMMs have the advantages of calling the genotypes of several SNPs simultaneously and allow mapping error which, respectively, utilize the dependence among SNPs and correct the bias due to mapping error. In addition, HMM-ASE exploits ASE information to further improve genotype accuracy when the ASE is likely to be present. Simulation results indicate that the HMMs proposed demonstrate a very good prediction accuracy in terms of controlling both the false discovery rate (FDR) and the false negative rate (FNR). When ASE is present, the HMM-ASE had a lower FNR than HMM-NASE, while both can control the false discovery rate (FDR) at a similar level. By exploiting linkage disequilibrium (LD), a real data application demonstrate that the proposed methods have better sensitivity and similar FDR in calling heterozygous SNPs than the VarScan method. Sensitivity and FDR are similar to that of the BCFtools and Beagle methods. The resulting genotypes show good properties for the estimation of the genetic parameters and ASE ratios. CONCLUSIONS We introduce HMMs, which are able to exploit LD and account for the ASE and mapping errors, to simultaneously call SNPs from the next generation RNA sequence data. The method introduced can reliably call for cSNP genotypes even in the presence of ASE and under low sequencing coverage. As a byproduct, the proposed method is able to provide predictions of ASE ratios for the heterozygous genotypes, which can then be used for ASE testing.
Collapse
|
30
|
Frischknecht M, Neuditschko M, Jagannathan V, Drögemüller C, Tetens J, Thaller G, Leeb T, Rieder S. Imputation of sequence level genotypes in the Franches-Montagnes horse breed. Genet Sel Evol 2014; 46:63. [PMID: 25927638 PMCID: PMC4180851 DOI: 10.1186/s12711-014-0063-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Accepted: 09/11/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A cost-effective strategy to increase the density of available markers within a population is to sequence a small proportion of the population and impute whole-genome sequence data for the remaining population. Increased densities of typed markers are advantageous for genome-wide association studies (GWAS) and genomic predictions. METHODS We obtained genotypes for 54 602 SNPs (single nucleotide polymorphisms) in 1077 Franches-Montagnes (FM) horses and Illumina paired-end whole-genome sequencing data for 30 FM horses and 14 Warmblood horses. After variant calling, the sequence-derived SNP genotypes (~13 million SNPs) were used for genotype imputation with the software programs Beagle, Impute2 and FImpute. RESULTS The mean imputation accuracy of FM horses using Impute2 was 92.0%. Imputation accuracy using Beagle and FImpute was 74.3% and 77.2%, respectively. In addition, for Impute2 we determined the imputation accuracy of all individual horses in the validation population, which ranged from 85.7% to 99.8%. The subsequent inclusion of Warmblood sequence data further increased the correlation between true and imputed genotypes for most horses, especially for horses with a high level of admixture. The final imputation accuracy of the horses ranged from 91.2% to 99.5%. CONCLUSIONS Using Impute2, the imputation accuracy was higher than 91% for all horses in the validation population, which indicates that direct imputation of 50k SNP-chip data to sequence level genotypes is feasible in the FM population. The individual imputation accuracy depended mainly on the applied software and the level of admixture.
Collapse
Affiliation(s)
- Mirjam Frischknecht
- Agroscope - Swiss National Stud Farm, 1580, Avenches, Switzerland. .,Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland. .,Swiss Competence Center of Animal Breeding and Genetics, University of Bern, Bern University of Applied Sciences HAFL & Agroscope, 3001, Bern, Switzerland. .,Graduate School for Cellular and Molecular Biology, University of Bern, 3012, Bern, Switzerland.
| | - Markus Neuditschko
- Agroscope - Swiss National Stud Farm, 1580, Avenches, Switzerland. .,Swiss Competence Center of Animal Breeding and Genetics, University of Bern, Bern University of Applied Sciences HAFL & Agroscope, 3001, Bern, Switzerland.
| | - Vidhya Jagannathan
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland. .,Swiss Competence Center of Animal Breeding and Genetics, University of Bern, Bern University of Applied Sciences HAFL & Agroscope, 3001, Bern, Switzerland.
| | - Cord Drögemüller
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland. .,Swiss Competence Center of Animal Breeding and Genetics, University of Bern, Bern University of Applied Sciences HAFL & Agroscope, 3001, Bern, Switzerland.
| | - Jens Tetens
- Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, 24118, Kiel, Germany.
| | - Georg Thaller
- Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, 24118, Kiel, Germany.
| | - Tosso Leeb
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland. .,Swiss Competence Center of Animal Breeding and Genetics, University of Bern, Bern University of Applied Sciences HAFL & Agroscope, 3001, Bern, Switzerland.
| | - Stefan Rieder
- Agroscope - Swiss National Stud Farm, 1580, Avenches, Switzerland. .,Swiss Competence Center of Animal Breeding and Genetics, University of Bern, Bern University of Applied Sciences HAFL & Agroscope, 3001, Bern, Switzerland.
| |
Collapse
|
31
|
Boison S, Neves H, Pérez O’Brien A, Utsunomiya Y, Carvalheiro R, da Silva M, Sölkner J, Garcia J. Imputation of non-genotyped individuals using genotyped progeny in Nellore, a Bos indicus cattle breed. Livest Sci 2014. [DOI: 10.1016/j.livsci.2014.05.033] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
32
|
Gualdrón Duarte JL, Cantet RJC, Bates RO, Ernst CW, Raney NE, Steibel JP. Rapid screening for phenotype-genotype associations by linear transformations of genomic evaluations. BMC Bioinformatics 2014; 15:246. [PMID: 25038782 PMCID: PMC4112210 DOI: 10.1186/1471-2105-15-246] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 07/07/2014] [Indexed: 01/02/2023] Open
Abstract
Background Currently, association studies are analysed using statistical mixed models, with marker effects estimated by a linear transformation of genomic breeding values. The variances of marker effects are needed when performing the tests of association. However, approaches used to estimate the parameters rely on a prior variance or on a constant estimate of the additive variance. Alternatively, we propose a standardized test of association using the variance of each marker effect, which generally differ among each other. Random breeding values from a mixed model including fixed effects and a genomic covariance matrix are linearly transformed to estimate the marker effects. Results The standardized test was neither conservative nor liberal with respect to type I error rate (false-positives), compared to a similar test using Predictor Error Variance, a method that was too conservative. Furthermore, genomic predictions are solved efficiently by the procedure, and the p-values are virtually identical to those calculated from tests for one marker effect at a time. Moreover, the standardized test reduces computing time and memory requirements. The following steps are used to locate genome segments displaying strong association. The marker with the highest − log(p-value) in each chromosome is selected, and the segment is expanded one Mb upstream and one Mb downstream of the marker. A genomic matrix is calculated using the information from those markers only, which is used as the variance-covariance of the segment effects in a model that also includes fixed effects and random genomic breeding values. The likelihood ratio is then calculated to test for the effect in every chromosome against a reduced model with fixed effects and genomic breeding values. In a case study with pigs, a significant segment from chromosome 6 explained 11% of total genetic variance. Conclusions The standardized test of marker effects using their own variance helps in detecting specific genomic regions involved in the additive variance, and in reducing false positives. Moreover, genome scanning of candidate segments can be used in meta-analyses of genome-wide association studies, as it enables the detection of specific genome regions that affect an economically relevant trait when using multiple populations. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-246) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
33
|
Accuracy of estimation of genomic breeding values in pigs using low-density genotypes and imputation. G3-GENES GENOMES GENETICS 2014; 4:623-31. [PMID: 24531728 PMCID: PMC4059235 DOI: 10.1534/g3.114.010504] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Genomic selection has the potential to increase genetic progress. Genotype imputation of high-density single-nucleotide polymorphism (SNP) genotypes can improve the cost efficiency of genomic breeding value (GEBV) prediction for pig breeding. Consequently, the objectives of this work were to: (1) estimate accuracy of genomic evaluation and GEBV for three traits in a Yorkshire population and (2) quantify the loss of accuracy of genomic evaluation and GEBV when genotypes were imputed under two scenarios: a high-cost, high-accuracy scenario in which only selection candidates were imputed from a low-density platform and a low-cost, low-accuracy scenario in which all animals were imputed using a small reference panel of haplotypes. Phenotypes and genotypes obtained with the PorcineSNP60 BeadChip were available for 983 Yorkshire boars. Genotypes of selection candidates were masked and imputed using tagSNP in the GeneSeek Genomic Profiler (10K). Imputation was performed with BEAGLE using 128 or 1800 haplotypes as reference panels. GEBV were obtained through an animal-centric ridge regression model using de-regressed breeding values as response variables. Accuracy of genomic evaluation was estimated as the correlation between estimated breeding values and GEBV in a 10-fold cross validation design. Accuracy of genomic evaluation using observed genotypes was high for all traits (0.65−0.68). Using genotypes imputed from a large reference panel (accuracy: R2 = 0.95) for genomic evaluation did not significantly decrease accuracy, whereas a scenario with genotypes imputed from a small reference panel (R2 = 0.88) did show a significant decrease in accuracy. Genomic evaluation based on imputed genotypes in selection candidates can be implemented at a fraction of the cost of a genomic evaluation using observed genotypes and still yield virtually the same accuracy. On the other side, using a very small reference panel of haplotypes to impute training animals and candidates for selection results in lower accuracy of genomic evaluation.
Collapse
|
34
|
Huang Y, Bates RO, Ernst CW, Fix JS, Steibel JP. Estimation of U.S. Yorkshire breed composition using genomic data
1. J Anim Sci 2014; 92:1395-404. [DOI: 10.2527/jas.2013-6907] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Y. Huang
- Department of Animal Science, Michigan State University, East Lansing 48824
| | - R. O. Bates
- Department of Animal Science, Michigan State University, East Lansing 48824
| | - C. W. Ernst
- Department of Animal Science, Michigan State University, East Lansing 48824
| | - J. S. Fix
- Smithfield Premium Genetics, Roanoke Rapids, NC 27870
| | - J. P. Steibel
- Department of Animal Science, Michigan State University, East Lansing 48824
- Department of Fisheries and Wildlife, Michigan State University, East Lansing 48824
| |
Collapse
|