1
|
Tu TC, Lin CJ, Liu MC, Hsu ZT, Chen CF. Comparison of genomic prediction accuracy using different models for egg production traits in Taiwan country chicken. Poult Sci 2024; 103:104063. [PMID: 39098301 DOI: 10.1016/j.psj.2024.104063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 06/20/2024] [Accepted: 07/01/2024] [Indexed: 08/06/2024] Open
Abstract
In local chickens targeted for niche markets, genotyping costs are relatively high due to the small population size and diverse breeding goals. The single-step genomic best linear unbiased prediction (ssGBLUP) model, which combines pedigree and genomic information, has been introduced to increase the accuracy of genomic estimated breeding value (GEBV). Therefore, this model may be more beneficial than the genomic BLUP (GBLUP) model for genomic selection in local chickens. Additionally, the single-step genome-wide association study (ssGWAS) can be used to extend the ssGBLUP model results to animals with available phenotypic information but without genotypic data. In this study, we compared the accuracy of (G)EBVs using the pedigree-based BLUP (PBLUP), GBLUP, and ssGBLUP models. Moreover, we conducted single-SNP GWAS (SNP-GWAS), GBLUP-GWAS, and ssGWAS methods to identify genes associated with egg production traits in the NCHU-G101 chicken to understand the feasibility of using genomic selection in a small population. The average prediction accuracy of (G)EBV for egg production traits using the PBLUP, GBLUP, and ssGBLUP models is 0.536, 0.531, and 0.555, respectively. In total, 22 suggestive- and 5% Bonferroni genome-wide significant-level SNPs for total egg number (EN), average laying rate (LR), average clutch length, and total clutch number are detected using 3 GWAS methods. These SNPs are mapped onto Gallus gallus chromosomes (GGA) 4, 6, 10, 18, and 25 in NCHU-G101 chicken. Furthermore, through SNP-GWAS and ssGWAS methods, we identify 2 genes on GGA4 associated with EN and LR: ENSGALG00000023172 and PPARGC1A. In conclusion, the ssGBLUP model demonstrates superior prediction accuracy, performing on average 3.41% than the PBLUP model. The implications of our gene results may guide future selection strategies for Taiwan Country chickens. Our results highlight the applicability of the ssGBLUP model for egg production traits selection in a small population, specifically NCHU-G101 chicken in Taiwan.
Collapse
Affiliation(s)
- Tsung-Che Tu
- Department of Animal Science, National Chung Hsing University, Taichung 402, Taiwan; Ray Hsing Agricultural Biotechnology Co. Ltd., Yunlin 633, Taiwan
| | - Chen-Jyuan Lin
- Department of Animal Science, National Chung Hsing University, Taichung 402, Taiwan
| | - Ming-Che Liu
- Ray Hsing Agricultural Biotechnology Co. Ltd., Yunlin 633, Taiwan
| | - Zhi-Ting Hsu
- Ray Hsing Agricultural Biotechnology Co. Ltd., Yunlin 633, Taiwan
| | - Chih-Feng Chen
- Department of Animal Science, National Chung Hsing University, Taichung 402, Taiwan; The iEGG and Animal Biotechnology Center, National Chung Hsing University, Taichung 402, Taiwan.
| |
Collapse
|
2
|
González-Recio O, López-Catalina A, Peiró-Pastor R, Nieto-Valle A, Castro M, Fernández A. Evaluating the potential of (epi)genotype-by-low pass nanopore sequencing in dairy cattle: a study on direct genomic value and methylation analysis. J Anim Sci Biotechnol 2023; 14:98. [PMID: 37434255 PMCID: PMC10337168 DOI: 10.1186/s40104-023-00896-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 05/17/2023] [Indexed: 07/13/2023] Open
Abstract
BACKGROUND Genotype-by-sequencing has been proposed as an alternative to SNP genotyping arrays in genomic selection to obtain a high density of markers along the genome. It requires a low sequencing depth to be cost effective, which may increase the error at the genotype assigment. Third generation nanopore sequencing technology offers low cost sequencing and the possibility to detect genome methylation, which provides added value to genotype-by-sequencing. The aim of this study was to evaluate the performance of genotype-by-low pass nanopore sequencing for estimating the direct genomic value in dairy cattle, and the possibility to obtain methylation marks simultaneously. RESULTS Latest nanopore chemistry (LSK14 and Q20) achieved a modal base calling accuracy of 99.55%, whereas previous kit (LSK109) achieved slightly lower accuracy (99.1%). The direct genomic value accuracy from genotype-by-low pass sequencing ranged between 0.79 and 0.99, depending on the trait (milk, fat or protein yield), with a sequencing depth as low as 2 × and using the latest chemistry (LSK114). Lower sequencing depth led to biased estimates, yet with high rank correlations. The LSK109 and Q20 achieved lower accuracies (0.57-0.93). More than one million high reliable methylated sites were obtained, even at low sequencing depth, located mainly in distal intergenic (87%) and promoter (5%) regions. CONCLUSIONS This study showed that the latest nanopore technology in useful in a LowPass sequencing framework to estimate direct genomic values with high reliability. It may provide advantages in populations with no available SNP chip, or when a large density of markers with a wide range of allele frequencies is needed. In addition, low pass sequencing provided nucleotide methylation status of > 1 million nucleotides at ≥ 10 × , which is an added value for epigenetic studies.
Collapse
Affiliation(s)
- Oscar González-Recio
- Dpt. Mejora Genética Animal, INIA-CSIC, Ctra La Coruña Km 7.5, 28040, Madrid, Spain.
| | | | - Ramón Peiró-Pastor
- Dpt. Mejora Genética Animal, INIA-CSIC, Ctra La Coruña Km 7.5, 28040, Madrid, Spain
| | - Alicia Nieto-Valle
- ETSIAAB, Universidad Politécnica de Madrid. Ciudad Universitaria S/N, 28040, Madrid, Spain
| | - Monica Castro
- Dpt. Mejora Genética Animal, INIA-CSIC, Ctra La Coruña Km 7.5, 28040, Madrid, Spain
| | - Almudena Fernández
- Dpt. Mejora Genética Animal, INIA-CSIC, Ctra La Coruña Km 7.5, 28040, Madrid, Spain
| |
Collapse
|
3
|
Tahir MS, Porto-Neto LR, Reverter-Gomez T, Olasege BS, Sajid MR, Wockner KB, Tan AWL, Fortes MRS. Utility of multi-omics data to inform genomic prediction of heifer fertility traits. J Anim Sci 2022; 100:skac340. [PMID: 36239447 PMCID: PMC9733504 DOI: 10.1093/jas/skac340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 10/12/2022] [Indexed: 12/15/2022] Open
Abstract
Biologically informed single nucleotide polymorphisms (SNPs) impact genomic prediction accuracy of the target traits. Our previous genomics, proteomics, and transcriptomics work identified candidate genes related to puberty and fertility in Brahman heifers. We aimed to test this biological information for capturing heritability and predicting heifer fertility traits in another breed i.e., Tropical Composite. The SNP from the identified genes including 10 kilobases (kb) region on either side were selected as biologically informed SNP set. The SNP from the rest of the Bos taurus genes including 10-kb region on either side were selected as biologically uninformed SNP set. Bovine high-density (HD) complete SNP set (628,323 SNP) was used as a control. Two populations-Tropical Composites (N = 1331) and Brahman (N = 2310)-had records for three traits: pregnancy after first mating season (PREG1, binary), first conception score (FCS, score 1 to 3), and rebreeding score (REB, score 1 to 3.5). Using the best linear unbiased prediction method, effectiveness of each SNP set to predict the traits was tested in two scenarios: a 5-fold cross-validation within Tropical Composites using biological information from Brahman studies, and application of prediction equations from one breed to the other. The accuracy of prediction was calculated as the correlation between genomic estimated breeding values and adjusted phenotypes. Results show that biologically informed SNP set estimated heritabilities not significantly better than the control HD complete SNP set in Tropical Composites; however, it captured all the observed genetic variance in PREG1 and FCS when modeled together with the biologically uninformed SNP set. In 5-fold cross-validation within Tropical Composites, the biologically informed SNP set performed marginally better (statistically insignificant) in terms of prediction accuracies (PREG1: 0.20, FCS: 0.13, and REB: 0.12) as compared to HD complete SNP set (PREG1: 0.17, FCS: 0.10, and REB: 0.11), and biologically uninformed SNP set (PREG1: 0.16, FCS: 0.10, and REB: 0.11). Across-breed use of prediction equations still remained a challenge: accuracies by all SNP sets dropped to around zero for all traits. The performance of biologically informed SNP was not significantly better than other sets in Tropical Composites. However, results indicate that biological information obtained from Brahman was successful to predict the fertility traits in Tropical Composite population.
Collapse
Affiliation(s)
- Muhammad S Tahir
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| | - Laercio R Porto-Neto
- Commonwealth Scientific and Industrial Research Organization, St. Lucia, Brisbane 4072, QLD, Australia
| | - Toni Reverter-Gomez
- Commonwealth Scientific and Industrial Research Organization, St. Lucia, Brisbane 4072, QLD, Australia
| | - Babatunde S Olasege
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| | - Mirza R Sajid
- Department of Statistics, University of Gujrat, 50700 Punjab, Pakistan
| | - Kimberley B Wockner
- Queensland Department of Agriculture and Fisheries, Brisbane 4072, QLD, Australia
| | - Andre W L Tan
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| | - Marina R S Fortes
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| |
Collapse
|
4
|
Lee D, Kim Y, Chung Y, Lee D, Seo D, Choi TJ, Lim D, Yoon D, Lee SH. Accuracy of genotype imputation based on reference population size and marker density in Hanwoo cattle. JOURNAL OF ANIMAL SCIENCE AND TECHNOLOGY 2021; 63:1232-1246. [PMID: 34957440 PMCID: PMC8672260 DOI: 10.5187/jast.2021.e117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/13/2021] [Accepted: 10/14/2021] [Indexed: 11/20/2022]
Abstract
Recently, the cattle genome sequence has been completed, followed by developing a
commercial single nucleotide polymorphism (SNP) chip panel in the animal genome
industry. In order to increase statistical power for detecting quantitative
trait locus (QTL), a number of animals should be genotyped. However, a
high-density chip for many animals would be increasing the genotyping cost.
Therefore, statistical inference of genotype imputation (low-density chip to
high-density) will be useful in the animal industry. The purpose of this study
is to investigate the effect of the reference population size and marker density
on the imputation accuracy and to suggest the appropriate number of reference
population sets for the imputation in Hanwoo cattle. A total of 3,821 Hanwoo
cattle were divided into reference and validation populations. The reference
sets consisted of 50k (38,916) marker data and different population sizes (500,
1,000, 1,500, 2,000, and 3,600). The validation sets consisted of four
validation sets (Total 889) and the different marker density (5k [5,000], 10k
[10,000], and 15k [15,000]). The accuracy of imputation was calculated by direct
comparison of the true genotype and the imputed genotype. In conclusion, when
the lowest marker density (5k) was used in the validation set, according to the
reference population size, the imputation accuracy was 0.793 to 0.929. On the
other hand, when the highest marker density (15k), according to the reference
population size, the imputation accuracy was 0.904 to 0.967. Moreover, the
reference population size should be more than 1,000 to obtain at least 88%
imputation accuracy in Hanwoo cattle.
Collapse
Affiliation(s)
- DooHo Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Yeongkuk Kim
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Yoonji Chung
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Dongjae Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Dongwon Seo
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Tae Jeong Choi
- National Institute of Animal Science, Cheonan 31000, Korea
| | - Dajeong Lim
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Wanju 55365, Korea
| | - Duhak Yoon
- Department of Animal Science & Biotechnology, Kyungpook National University, Sangju 37224, Korea
| | - Seung Hwan Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| |
Collapse
|
5
|
Ferreira CER, Campos GS, Schmidt PI, Sollero BP, Goularte KL, Corcini CD, Gasperin BG, Lucia T, Boligon AA, Cardoso FF. Genome-wide association and genomic prediction for scrotal circumference in Hereford and Braford bulls. Theriogenology 2021; 172:268-280. [PMID: 34303226 DOI: 10.1016/j.theriogenology.2021.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 07/12/2021] [Accepted: 07/14/2021] [Indexed: 11/19/2022]
Abstract
Scrotal circumference (SC) is widely used as a selection criterion for bulls in breeding programs, since it is easily assessed and correlated with several desirable reproductive traits. The objectives of this study were: to perform a genome-wide association study (GWAS) to identify genomic regions associated with SC adjusted for age (SCa) and for both age and weight (SCaw); to select Tag SNPs from GWAS to construct low-density panel for genomic prediction; and to compare the prediction accuracy of the SC through different methods for Braford and Hereford bulls from the same genetic breeding program. Data of SC from 18,172 bulls (30.4 ± 3.7 cm) and of genotypes from 131 sires and 3,545 animals were used. From GWAS, the top 1% of 1-Mb windows were observed on chromosome (BTA) 2, 20, 7, 8, 15, 3, 16, 27, 6 and 8 for SCa and on BTA 8, 15, 16, 21, 19, 2, 6, 5 and 10 for SCaw, representing 17.4% and 18.8% of the additive genetic variance of SCa and SCaw, respectively. The MeSH analysis was able to translate genomic information providing biological meanings of more specific gene functions related to the SCa and SCaw. The genomic enhancement methods, especially single step GBLUP, that combined phenotype and pedigree data with direct genomic values generated gains in accuracy in relation to pedigree BLUP, suggesting that genomic predictions should be applied to improve genetic gain and to narrow the generation interval compared to traditional methods. The proposed Tag-SNP panels may be useful for lower-cost commercial genomic prediction applications in the future, when the number of bulls in the reference population increases for SC in Hereford and Braford breeds.
Collapse
Affiliation(s)
- Carlos E R Ferreira
- ReproPel, Faculdade de Veterinária, Universidade Federal de Pelotas, Pelotas, RS, Brazil.
| | - Gabriel S Campos
- Departamento de Zootecnia, Faculdade de Agronomia Eliseu Maciel, Universidade Federal de Pelotas, Pelotas, RS, Brazil
| | - Patricia I Schmidt
- Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual de São Paulo, Jaboticabal, SP, Brazil
| | | | - Karina L Goularte
- ReproPel, Faculdade de Veterinária, Universidade Federal de Pelotas, Pelotas, RS, Brazil
| | - Carine D Corcini
- ReproPel, Faculdade de Veterinária, Universidade Federal de Pelotas, Pelotas, RS, Brazil
| | - Bernardo G Gasperin
- ReproPel, Faculdade de Veterinária, Universidade Federal de Pelotas, Pelotas, RS, Brazil
| | - Thomaz Lucia
- ReproPel, Faculdade de Veterinária, Universidade Federal de Pelotas, Pelotas, RS, Brazil
| | - Arione A Boligon
- Departamento de Zootecnia, Faculdade de Agronomia Eliseu Maciel, Universidade Federal de Pelotas, Pelotas, RS, Brazil
| | - Fernando F Cardoso
- Departamento de Zootecnia, Faculdade de Agronomia Eliseu Maciel, Universidade Federal de Pelotas, Pelotas, RS, Brazil; Embrapa Pecuária Sul, Bagé, RS, Brazil
| |
Collapse
|
6
|
Joshi R, Skaarud A, Alvarez AT, Moen T, Ødegård J. Bayesian genomic models boost prediction accuracy for survival to Streptococcus agalactiae infection in Nile tilapia (Oreochromus nilioticus). Genet Sel Evol 2021; 53:37. [PMID: 33882834 PMCID: PMC8058985 DOI: 10.1186/s12711-021-00629-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 04/06/2021] [Indexed: 11/10/2022] Open
Abstract
Background Streptococcosis is a major bacterial disease in Nile tilapia that is caused by Streptococcus agalactiae infection, and development of resistant strains of Nile tilapia represents a sustainable approach towards combating this disease. In this study, we performed a controlled disease trial on 120 full-sib families to (i) quantify and characterize the potential of genomic selection for survival to S. agalactiae infection in Nile tilapia, and (ii) identify the best genomic model and the optimal density of single nucleotide polymorphisms (SNPs) for this trait. Methods In total, 40 fish per family (15 fish intraperitoneally injected and 25 fish as cohabitants) were used in the challenge test. Mortalities were recorded every 3 h for 35 days. After quality control, genotypes (50,690 SNPs) and phenotypes (0 for dead and 1 for alive) for 2472 cohabitant fish were available. Genetic parameters were obtained using various genomic selection models (genomic best linear unbiased prediction (GBLUP), BayesB, BayesC, BayesR and BayesS) and a traditional pedigree-based model (PBLUP). The pedigree-based analysis used a deep 17-generation pedigree. Prediction accuracy and bias were evaluated using five replicates of tenfold cross-validation. The genomic models were further analyzed using 10 subsets of SNPs at different densities to explore the effect of pruning and SNP density on predictive accuracy. Results Moderate estimates of heritabilities ranging from 0.15 ± 0.03 to 0.26 ± 0.05 were obtained with the different models. Compared to a pedigree-based model, GBLUP (using all the SNPs) increased prediction accuracy by 15.4%. Furthermore, use of the most appropriate Bayesian genomic selection model and SNP density increased the prediction accuracy up to 71%. The 40 to 50 SNPs with non-zero effects were consistent for all BayesB, BayesC and BayesS models with respect to marker id and/or marker locations. Conclusions These results demonstrate the potential of genomic selection for survival to S. agalactiae infection in Nile tilapia. Compared to the PBLUP and GBLUP models, Bayesian genomic models were found to boost the prediction accuracy significantly. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-021-00629-y.
Collapse
Affiliation(s)
- Rajesh Joshi
- GenoMar Genetics AS, Tjuvholmen allé 11, 0252, Oslo, Norway.
| | - Anders Skaarud
- GenoMar Genetics AS, Tjuvholmen allé 11, 0252, Oslo, Norway
| | | | - Thomas Moen
- AquaGen AS, Sluppen, P.O. Box 1240, 7462, Trondheim, Norway
| | - Jørgen Ødegård
- AquaGen AS, Sluppen, P.O. Box 1240, 7462, Trondheim, Norway
| |
Collapse
|
7
|
Aono AH, Costa EA, Rody HVS, Nagai JS, Pimenta RJG, Mancini MC, Dos Santos FRC, Pinto LR, Landell MGDA, de Souza AP, Kuroshu RM. Machine learning approaches reveal genomic regions associated with sugarcane brown rust resistance. Sci Rep 2020; 10:20057. [PMID: 33208862 PMCID: PMC7676261 DOI: 10.1038/s41598-020-77063-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Accepted: 08/24/2020] [Indexed: 12/18/2022] Open
Abstract
Sugarcane is an economically important crop, but its genomic complexity has hindered advances in molecular approaches for genetic breeding. New cultivars are released based on the identification of interesting traits, and for sugarcane, brown rust resistance is a desirable characteristic due to the large economic impact of the disease. Although marker-assisted selection for rust resistance has been successful, the genes involved are still unknown, and the associated regions vary among cultivars, thus restricting methodological generalization. We used genotyping by sequencing of full-sib progeny to relate genomic regions with brown rust phenotypes. We established a pipeline to identify reliable SNPs in complex polyploid data, which were used for phenotypic prediction via machine learning. We identified 14,540 SNPs, which led to a mean prediction accuracy of 50% when using different models. We also tested feature selection algorithms to increase predictive accuracy, resulting in a reduced dataset with more explanatory power for rust phenotypes. As a result of this approach, we achieved an accuracy of up to 95% with a dataset of 131 SNPs related to brown rust QTL regions and auxiliary genes. Therefore, our novel strategy has the potential to assist studies of the genomic organization of brown rust resistance in sugarcane.
Collapse
Affiliation(s)
- Alexandre Hild Aono
- Molecular Biology and Genetic Engineering Center (CBMEG), University of Campinas (UNICAMP), Campinas, SP, Brazil
| | - Estela Araujo Costa
- Instituto de Ciência e Tecnologia (ICT), Universidade Federal de São Paulo (UNIFESP), São José dos Campos, SP, Brazil
| | - Hugo Vianna Silva Rody
- Instituto de Ciência e Tecnologia (ICT), Universidade Federal de São Paulo (UNIFESP), São José dos Campos, SP, Brazil
| | - James Shiniti Nagai
- Instituto de Ciência e Tecnologia (ICT), Universidade Federal de São Paulo (UNIFESP), São José dos Campos, SP, Brazil
| | - Ricardo José Gonzaga Pimenta
- Molecular Biology and Genetic Engineering Center (CBMEG), University of Campinas (UNICAMP), Campinas, SP, Brazil
| | - Melina Cristina Mancini
- Molecular Biology and Genetic Engineering Center (CBMEG), University of Campinas (UNICAMP), Campinas, SP, Brazil
| | | | - Luciana Rossini Pinto
- Advanced Center of Sugarcane Agrobusiness Technological Research, Agronomic Institute of Campinas (IAC), Ribeirão Preto, SP, Brazil
| | | | - Anete Pereira de Souza
- Molecular Biology and Genetic Engineering Center (CBMEG), University of Campinas (UNICAMP), Campinas, SP, Brazil.
- Department of Plant Biology, Institute of Biology (IB), University of Campinas (UNICAMP), Campinas, SP, Brazil.
| | - Reginaldo Massanobu Kuroshu
- Instituto de Ciência e Tecnologia (ICT), Universidade Federal de São Paulo (UNIFESP), São José dos Campos, SP, Brazil.
| |
Collapse
|
8
|
Khanal P, Maltecca C, Schwab C, Fix J, Bergamaschi M, Tiezzi F. Modeling host-microbiome interactions for the prediction of meat quality and carcass composition traits in swine. Genet Sel Evol 2020; 52:41. [PMID: 32727371 PMCID: PMC7388461 DOI: 10.1186/s12711-020-00561-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 07/17/2020] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND The objectives of this study were to evaluate genomic and microbial predictions of phenotypes for meat quality and carcass traits in swine, and to evaluate the contribution of host-microbiome interactions to the prediction. Data were collected from Duroc-sired three-way crossbred individuals (n = 1123) that were genotyped with a 60 k SNP chip. Phenotypic information and fecal 16S rRNA microbial sequences at three stages of growth (Wean, Mid-test, and Off-test) were available for all these individuals. We used fourfold cross-validation with animals grouped based on sire relatedness. Five models with three sets of predictors (full, informatively reduced, and randomly reduced) were evaluated. 'Full' included information from all genetic markers and all operational taxonomic units (OTU), while 'informatively reduced' and 'randomly reduced' represented a reduced number of markers and OTU based on significance preselection and random sampling, respectively. The baseline model included the fixed effects of dam line, sex and contemporary group and the random effect of pen. The other four models were constructed by including only genomic information, only microbiome information, both genomic and microbiome information, and microbiome and genomic information and their interaction. RESULTS Inclusion of microbiome information increased predictive ability of phenotype for most traits, in particular when microbiome information collected at a later growth stage was used. Inclusion of microbiome information resulted in higher accuracies and lower mean squared errors for fat-related traits (fat depth, belly weight, intramuscular fat and subjective marbling), objective color measures (Minolta a*, Minolta b* and Minolta L*) and carcass daily gain. Informative selection of markers increased predictive ability but decreasing the number of informatively reduced OTU did not improve model performance. The proportion of variation explained by the host-genome-by-microbiome interaction was highest for fat depth (~ 20% at Mid-test and Off-test) and shearing force (~ 20% consistently at Wean, Mid-test and Off-test), although the inclusion of the interaction term did not increase the accuracy of predictions significantly. CONCLUSIONS This study provides novel insight on the use of microbiome information for the phenotypic prediction of meat quality and carcass traits in swine. Inclusion of microbiome information in the model improved predictive ability of phenotypes for fat deposition and color traits whereas including a genome-by-microbiome term did not improve prediction accuracy significantly.
Collapse
Affiliation(s)
- Piush Khanal
- Department of Animal Science, North Carolina State University, Raleigh, NC 27695 USA
| | - Christian Maltecca
- Department of Animal Science, North Carolina State University, Raleigh, NC 27695 USA
| | | | - Justin Fix
- The Maschhoffs LLC, Carlyle, IL 62231 USA
| | - Matteo Bergamaschi
- Department of Animal Science, North Carolina State University, Raleigh, NC 27695 USA
| | - Francesco Tiezzi
- Department of Animal Science, North Carolina State University, Raleigh, NC 27695 USA
| |
Collapse
|
9
|
Interest of using imputation for genomic evaluation in layer chicken. Poult Sci 2020; 99:2324-2336. [PMID: 32359567 PMCID: PMC7597443 DOI: 10.1016/j.psj.2020.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 12/27/2019] [Accepted: 01/01/2020] [Indexed: 11/21/2022] Open
Abstract
With the availability of the 600K Affymetrix Axiom high-density (HD) single nucleotide polymorphism (SNP) chip, genomic selection has been implemented in broiler and layer chicken. However, the cost of this SNP chip is too high to genotype all selection candidates. A solution is to develop a low-density SNP chip, at a lower price, and to impute all missing markers. But to routinely implement this solution, the impact of imputation on genomic evaluation accuracy must be studied. It is also interesting to study the consequences of the use of low-density SNP chips in genomic evaluation accuracy. In this perspective, the interest of using imputation in genomic selection was studied in a pure layer line. Two low-density SNP chip designs were compared: an equidistant methodology and a methodology based on linkage disequilibrium. Egg weight, egg shell color, egg shell strength, and albumen height were evaluated with single-step genomic best linear unbiased prediction methodology. The impact of imputation errors or the absence of imputation on the ranking of the male selection candidates was assessed with a genomic evaluation based on ancestry. Thus, genomic estimated breeding values (GEBV) obtained with imputed HD genotypes or low-density genotypes were compared with GEBV obtained with the HD SNP chip. The relative accuracy of GEBV was also investigated by considering as reference GEBV estimated on the offspring. A limited reordering of the breeders, selected on a multitrait index, was observed. Spearman correlations between GEBV on HD genotypes and GEBV on low-density genotypes (with or without imputation) were always higher than 0.94 with more than 3K SNP. For the genetically closer, top 150 individuals for a specific trait, with imputation, the reordering was reduced with correlation higher than 0.94 with more than 3K SNP. Without imputation, the correlations remained lower than 0.85 with less than 3K and 16K SNP for equidistant and linkage disequilibrium methodology, respectively. The differences in GEBV correlations between both methodologies were never significant. The conclusions were the same for all studied traits.
Collapse
|
10
|
Comparison of the Efficiency of BLUP and GBLUP in Genomic Prediction of Immune Traits in Chickens. Animals (Basel) 2020; 10:ani10030419. [PMID: 32138151 PMCID: PMC7142406 DOI: 10.3390/ani10030419] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 03/01/2020] [Accepted: 03/01/2020] [Indexed: 11/17/2022] Open
Abstract
: Poultry diseases pose a large threat to poultry production. Selection to improve immune traits is a feasible way to prevent and control avian diseases. The objective of this study was to investigate the efficiency of estimation of genetic parameters for antibody response to avian influenza virus (Ab-AIV), antibody response to Newcastle disease virus (Ab-NDV), sheep red blood cell antibody titer (SRBC), the ratio of heterophils to lymphocytes (H/L), immunoglobulin G (IgG), the spleen immune index (SII), thymus immune index (TII), thymus weight at 100 d (TW) and the spleen weight at 100 d (SW) in Beijing oil chickens, by using the best linear unbiased prediction (BLUP) method and genomic best linear unbiased prediction (GBLUP) method. The phenotypic data used in the two methods were the same and were from 519 individuals. With the BLUP model, Ab-AIV, Ab-NDV, SRBC, H/L, IgG, TII, and TW had low heritability ranging from 0.000 to 0.281, whereas SII and SW had high heritability of 0.631 and 0.573. With the GBLUP model, all individuals were genotyped with Illumina 60K SNP chips, and Ab-AIV, Ab-NDV, SRBC, H/L and IgG had low heritability ranging from 0.000 to 0.266, whereas SII, TII, TW and SW had moderate heritability ranging from 0.300 to 0.472. We compared the prediction accuracy obtained from BLUP and GBLUP through 50 time 5-fold cross-validation (CV), and the results indicated that BLUP provided a slightly higher accuracy of prediction than GBLUP in this population.
Collapse
|
11
|
Wu XL, Li H, Ferretti R, Simpson B, Walker J, Parham J, Mastro L, Qiu J, Schultz T, Tait RG, Bauck S. A unified local objective function for optimally selecting SNPs on arrays for agricultural genomics applications. Anim Genet 2020; 51:306-310. [PMID: 32004392 DOI: 10.1111/age.12916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/09/2020] [Indexed: 11/28/2022]
Abstract
Over the years, ad-hoc procedures were used for designing SNP arrays, but the procedures and strategies varied considerably case by case. Recently, a multiple-objective, local optimization (MOLO) algorithm was proposed to select SNPs for SNP arrays, which maximizes the adjusted SNP information (E score) under multiple constraints, e.g. on MAF, uniformness of SNP locations (U score), the inclusion of obligatory SNPs and the number and size of gaps. In the MOLO, each chromosome is split into equally spaced segments and local optima are selected as the SNPs having the highest adjusted E score within each segment, conditional on the presence of obligatory SNPs. The computation of the adjusted E score, however, is empirical, and it does not scale well between the uniformness of SNP locations and SNP informativeness. In addition, the MOLO objective function does not accommodate the selection of uniformly distributed SNPs. In the present study, we proposed a unified local function for optimally selecting SNPs, as an amendment to the MOLO algorithm. This new local function takes scalable weights between the uniformness and informativeness of SNPs, which allows the selection of SNPs under varied scenarios. The results showed that the weighting between the U and the E scores led to a higher imputation concordance rate than the U score or E score alone. The results from the evaluation of six commercial bovine SNP chips further confirmed this conclusion.
Collapse
Affiliation(s)
- X-L Wu
- Bioinformatics and Biostatistics, Neogen GeneSeek, Lincoln, NE, 68504, USA.,Department of Animal Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - H Li
- Bioinformatics and Biostatistics, Neogen GeneSeek, Lincoln, NE, 68504, USA.,Department of Animal Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - R Ferretti
- Bioinformatics and Biostatistics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| | - B Simpson
- Bioinformatics and Biostatistics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| | - J Walker
- Bioinformatics and Biostatistics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| | - J Parham
- Bioinformatics and Biostatistics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| | - L Mastro
- Bioinformatics and Biostatistics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| | - J Qiu
- Bioinformatics and Biostatistics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| | - T Schultz
- Bioinformatics and Biostatistics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| | - R G Tait
- Bioinformatics and Biostatistics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| | - S Bauck
- Bioinformatics and Biostatistics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| |
Collapse
|
12
|
Hou L, Liang W, Xu G, Huang B, Zhang X, Hu CY, Wang C. Accuracy of genomic prediction using mixed low-density marker panels. ANIMAL PRODUCTION SCIENCE 2020. [DOI: 10.1071/an18503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Low-density single-nucleotide polymorphism (LD-SNP) panel is one effective way to reduce the cost of genomic selection in animal breeding. The present study proposes a new type of LD-SNP panel called mixed low-density (MLD) panel, which considers SNPs with a substantial effect estimated by Bayes method B (BayesB) from many traits and evenly spaced distribution simultaneously. Simulated and real data were used to compare the imputation accuracy and genomic-selection accuracy of two types of LD-SNP panels. The result of genotyping imputation for simulated data showed that the number of quantitative trait loci (QTL) had limited influence on the imputation accuracy only for MLD panels. Evenly spaced (ELD) panel was not affected by QTL. For real data, ELD performed slightly better than did MLD when panel contained 500 and 1000 SNP. However, this advantage vanished quickly as the density increased. The result of genomic selection for simulated data using BayesB showed that MLD performed much better than did ELD when QTL was 100. For real data, MLD also outperformed ELD in growth and carcass traits when using BayesB. In conclusion, the MLD strategy is superior to ELD in genomic selection under most situations.
Collapse
|
13
|
Takeda M, Uemoto Y, Inoue K, Ogino A, Nozaki T, Kurogi K, Yasumori T, Satoh M. Genome-wide association study and genomic evaluation of feed efficiency traits in Japanese Black cattle using single-step genomic best linear unbiased prediction method. Anim Sci J 2019; 91:e13316. [PMID: 31769129 DOI: 10.1111/asj.13316] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 09/30/2019] [Accepted: 10/23/2019] [Indexed: 01/18/2023]
Abstract
The objectives of this study were to better understand the genetic architecture and the possibility of genomic evaluation for feed efficiency traits by (i) performing genome-wide association studies (GWAS), and (ii) assessing the accuracy of genomic evaluation for feed efficiency traits, using single-step genomic best linear unbiased prediction (ssGBLUP)-based methods. The analyses were performed in residual feed intake (RFI), residual body weight gain (RG), and residual intake and body weight gain (RIG) during three different fattening periods. The phenotypes from 4,578 Japanese Black steers, which were progenies of 362 progeny-tested bulls and the genotypes from the bulls were used in this study. The results of GWAS showed that a total of 16, 8, and 12 gene ontology terms were related to RFI, RG, and RIG, respectively, and the candidate genes identified in RFI and RG were involved in olfactory transduction and the phosphatidylinositol signaling system, respectively. The realized reliabilities of genomic estimated breeding values were low to moderate in the feed efficiency traits. In conclusion, ssGBLUP-based method can lead to understand some biological functions related to feed efficiency traits, even with small population with genotypes, however, an alternative strategy will be needed to enhance the reliability of genomic evaluation.
Collapse
Affiliation(s)
- Masayuki Takeda
- National Livestock Breeding Center, Fukushima, Japan.,Graduate School of Agricultural Science, Tohoku University, Miyagi, Japan
| | - Yoshinobu Uemoto
- Graduate School of Agricultural Science, Tohoku University, Miyagi, Japan
| | - Keiichi Inoue
- National Livestock Breeding Center, Fukushima, Japan
| | - Atushi Ogino
- Maebashi Institute of Animal Science, Livestock Improvement Association of Japan, Inc, Gunma, Japan
| | - Takayoshi Nozaki
- Cattle Breeding Department, Livestock Improvement Association of Japan, Inc, Tokyo, Japan
| | - Kazuhito Kurogi
- Maebashi Institute of Animal Science, Livestock Improvement Association of Japan, Inc, Gunma, Japan
| | - Takanori Yasumori
- Cattle Breeding Department, Livestock Improvement Association of Japan, Inc, Tokyo, Japan
| | - Masahiro Satoh
- Graduate School of Agricultural Science, Tohoku University, Miyagi, Japan
| |
Collapse
|
14
|
Wongpom B, Koonawootrittriron S, Elzo MA, Suwanasopee T, Jattawa D. Accuracy of genomic-polygenic estimated breeding value for milk yield and fat yield in the Thai multibreed dairy population with five single nucleotide polymorphism sets. ASIAN-AUSTRALASIAN JOURNAL OF ANIMAL SCIENCES 2019; 32:1340-1348. [PMID: 31010996 PMCID: PMC6722314 DOI: 10.5713/ajas.18.0816] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2018] [Accepted: 02/12/2019] [Indexed: 12/18/2022]
Abstract
Objective The objectives were to compare variance components, genetic parameters, prediction accuracies, and genomic-polygenic EBV rankings for milk yield (MY) and fat yield (FY) in the Thai multibreed dairy population computed using five SNP sets from GeneSeek GGP80K chip. Methods The dataset contained monthly MY and FY of 8,361 first-lactation cows from 810 farms. Variance components, genetic parameters, and EBV for five SNP sets from the GeneSeek GGP80K chip were obtained using a 2-trait single-step average-information REML procedure. The SNP sets were the complete SNP set (all available SNP; SNP100), top 75% set (SNP75), top 50% set (SNP50), top 25% set (SNP25) and top 5% set (SNP5). The 2-trait models included herd-year-season, heterozygosity and age at first calving as fixed effects, and animal additive genetic and residual as random effects. Results The estimates of additive genetic variances for MY and FY from SNP subsets were mostly higher than those of the complete set. The SNP25 MY and FY heritability estimates (0.276 and 0.183) were higher than those from SNP75 (0.265 and 0.168), SNP50 (0.275 and 0.179), SNP5 (0.231 and 0.169) and SNP100 (0.251and 0.159). The SNP25 EBV accuracies for MY and FY (39.76% and 33.82%) were higher than for SNP75 (35.01% and 32.60%), SNP50 (39.64% and 33.38%), SNP5 (38.61% and 29.70%) and SNP100 (34.43% and 31.61%). All rank correlations between SNP100 and SNP subsets were above 0.98 for both traits, except for SNP100 and SNP5 (0.93 for MY; 0.92 for FY). Conclusion The high SNP25 estimates of genetic variances, heritabilities, EBV accuracies, and rank correlations between SNP100 and SNP25 for MY and FY indicated that genotyping animals with SNP25 dedicated chip would be a suitable alternative to maintain genotyping costs low while speeding up genetic progress for MY and FY in the Thai dairy population.
Collapse
Affiliation(s)
- Bodin Wongpom
- Department of Animal Science, Kasetsart University, Bangkok 10900, Thailand
| | | | - Mauricio A Elzo
- Department of Animal Sciences, University of Florida, Gainesville, FL 32611-0910, USA
| | | | - Danai Jattawa
- Department of Animal Science, Kasetsart University, Bangkok 10900, Thailand
| |
Collapse
|
15
|
Budhlakoti N, Mishra DC, Rai A, Lal SB, Chaturvedi KK, Kumar RR. A Comparative Study of Single-Trait and Multi-Trait Genomic Selection. J Comput Biol 2019; 26:1100-1112. [PMID: 30994361 DOI: 10.1089/cmb.2019.0032] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
In recent years of animal and plant breeding research, genomic selection (GS) became a choice for selection of appropriate candidate for breeding as it significantly contributes to enhance the genetic gain. Various studies related to GS have been carried out in the recent past. These studies were mostly confined to single trait. Although GS methods based on single trait have not performed very well in cases like pleiotropy, missing data and when the trait under study has low heritability. Gradually, some studies were carried out to explore the possibility of methods for GS based on multiple traits in the view of overcoming the above-mentioned problems in the method of single-trait GS (STGS). Currently, multi-trait-based GS methods are getting importance as it exploits the information of correlated structure among response. In this study, we have compared various methods related to STGS, such as stepwise regression, ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian, best linear unbiased prediction, and support vector machine, and multi-trait-based GS methods, such as multivariate regression with covariance estimation, conditional Gaussian graphical models, mixed model, and LASSO. In almost all cases, multi-trait-based methods are found to be more accurate. Based on the results of this study, it may be concluded that multi-trait-based methods have great potential to increase genetic gain as they utilize the correlation among the response variable as extra information, which contributes to estimate breeding value more precisely. This study is a comprehensive review of the methods of GS right from single trait to multiple traits and comparisons among these two classes.
Collapse
Affiliation(s)
- Neeraj Budhlakoti
- ICAR-Indian Agricultural Statistics Research Institute, Pusa, New Delhi, India
| | | | - Anil Rai
- ICAR-Indian Agricultural Statistics Research Institute, Pusa, New Delhi, India
| | - S B Lal
- ICAR-Indian Agricultural Statistics Research Institute, Pusa, New Delhi, India
| | | | - Rajeev Ranjan Kumar
- ICAR-Indian Agricultural Statistics Research Institute, Pusa, New Delhi, India
| |
Collapse
|
16
|
Accurate prediction of maize grain yield using its contributing genes for gene-based breeding. Genomics 2019; 112:225-236. [PMID: 30826444 DOI: 10.1016/j.ygeno.2019.02.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Revised: 01/20/2019] [Accepted: 02/01/2019] [Indexed: 01/12/2023]
Abstract
Accurately predicting the phenotypes of complex traits is crucial to enhanced breeding in plants and livestock, and to enhanced medicine in humans. Here we reports the first study accurately predicting complex traits using their contributing genes, especially their number of favorable alleles (NFAs), genotypes and transcript expressions, with the grain yield of maize, Zea mays L. When the NFAs or genotypes of only 27 SNP/InDel-containing grain yield genes were used, a prediction accuracy of r = 0.52 or 0.49 was obtained. When the expressions of grain yield gene transcripts were used, a plateaued prediction accuracy of r = 0.84 was achieved. When the phenotypes predicted with two or three of the genic datasets were used for progeny selection, the selected lines were completely consistent with those selected by phenotypic selection. Therefore, the genes controlling complex traits enable accurately predicting their phenotypes, thus desirable for gene-based breeding in crop plants.
Collapse
|
17
|
Rezende FM, Nani JP, Peñagaricano F. Genomic prediction of bull fertility in US Jersey dairy cattle. J Dairy Sci 2019; 102:3230-3240. [PMID: 30712930 DOI: 10.3168/jds.2018-15810] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Accepted: 11/29/2018] [Indexed: 01/02/2023]
Abstract
Service sire has a major effect on reproductive success in dairy cattle. Recent studies have reported accurate predictions for Holstein bull fertility using genomic data. The objective of this study was to assess the feasibility of genomic prediction of sire conception rate (SCR) in US Jersey cattle using alternative predictive models. Data set consisted of 1.5k Jersey bulls with SCR records and 95k SNP covering the entire genome. The analyses included the use of linear and Gaussian kernel-based models fitting either all the SNP or subsets of markers with presumed functional roles, such as SNP significantly associated with SCR or SNP located within or close to annotated genes. Model predictive ability was evaluated using 5-fold cross-validation with 10 replicates. The entire SNP set exhibited predictive correlations around 0.30. Interestingly, either SNP marginally associated with SCR or genic SNP achieved higher predictive abilities than their counterparts using random sets of SNP. Among alternative SNP subsets, Gaussian kernel models fitting significant SNP achieved the best performance with increases in predictive correlation up to 7% compared with the standard whole-genome approach. Notably, the use of a multi-breed reference population including the entire US Holstein SCR data set (11.5k bulls) allowed us to achieve predictive correlations up to 0.315, gaining 8% in accuracy compared with the standard model fitting a pure Jersey reference set. Overall, our findings indicate that genomic prediction of Jersey bull fertility is feasible. The use of Gaussian kernels fitting markers with relevant roles and the inclusion of Holstein records in the training set seem to be promising alternatives to the standard whole-genome approach. These results have the potential to help the dairy industry improve US Jersey sire fertility through accurate genome-guided decisions.
Collapse
Affiliation(s)
- Fernanda M Rezende
- Department of Animal Sciences, University of Florida, Gainesville 32611; Faculdade de Medicina Veterinária, Universidade Federal de Uberlândia, Uberlândia MG 38410-337, Brazil
| | - Juan Pablo Nani
- Department of Animal Sciences, University of Florida, Gainesville 32611; Estación Experimental Agropecuaria Rafaela, Instituto Nacional de Tecnología Agropecuaria, Rafaela SF 22-2300, Argentina
| | - Francisco Peñagaricano
- Department of Animal Sciences, University of Florida, Gainesville 32611; University of Florida Genetics Institute, University of Florida, Gainesville 32610.
| |
Collapse
|
18
|
Sun S, Miao Z, Ratcliffe B, Campbell P, Pasch B, El-Kassaby YA, Balasundaram B, Chen C. SNP variable selection by generalized graph domination. PLoS One 2019; 14:e0203242. [PMID: 30677030 PMCID: PMC6345469 DOI: 10.1371/journal.pone.0203242] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 01/08/2019] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND High-throughput sequencing technology has revolutionized both medical and biological research by generating exceedingly large numbers of genetic variants. The resulting datasets share a number of common characteristics that might lead to poor generalization capacity. Concerns include noise accumulated due to the large number of predictors, sparse information regarding the p≫n problem, and overfitting and model mis-identification resulting from spurious collinearity. Additionally, complex correlation patterns are present among variables. As a consequence, reliable variable selection techniques play a pivotal role in predictive analysis, generalization capability, and robustness in clustering, as well as interpretability of the derived models. METHODS AND FINDINGS K-dominating set, a parameterized graph-theoretic generalization model, was used to model SNP (single nucleotide polymorphism) data as a similarity network and searched for representative SNP variables. In particular, each SNP was represented as a vertex in the graph, (dis)similarity measures such as correlation coefficients or pairwise linkage disequilibrium were estimated to describe the relationship between each pair of SNPs; a pair of vertices are adjacent, i.e. joined by an edge, if the pairwise similarity measure exceeds a user-specified threshold. A minimum k-dominating set in the SNP graph was then made as the smallest subset such that every SNP that is excluded from the subset has at least k neighbors in the selected ones. The strength of k-dominating set selection in identifying independent variables, and in culling representative variables that are highly correlated with others, was demonstrated by a simulated dataset. The advantages of k-dominating set variable selection were also illustrated in two applications: pedigree reconstruction using SNP profiles of 1,372 Douglas-fir trees, and species delineation for 226 grasshopper mouse samples. A C++ source code that implements SNP-SELECT and uses Gurobi optimization solver for the k-dominating set variable selection is available (https://github.com/transgenomicsosu/SNP-SELECT).
Collapse
Affiliation(s)
- Shuzhen Sun
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, United States of America
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, B.C. Canada
| | - Zhuqi Miao
- Center for Health Systems Innovation, Oklahoma State University, Stillwater, United States of America
| | - Blaise Ratcliffe
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, B.C. Canada
| | - Polly Campbell
- Department of Integrative Biology, Oklahoma State University, Stillwater, United States of America
- Department of Evolution, Ecology and Organismal Biology, University of California, Riverside, Riverside, United States of America
| | - Bret Pasch
- Department of Biological Sciences, Northern Arizona University, Flagstaff, United States of America
| | - Yousry A. El-Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, B.C. Canada
| | - Balabhaskar Balasundaram
- School of Industrial Engineering and Management, Oklahoma State University, Stillwater, United States of America
| | - Charles Chen
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, United States of America
- * E-mail:
| |
Collapse
|
19
|
Bresolin T, Rosa GJDM, Valente BD, Espigolan R, Gordo DGM, Braz CU, Fernandes Júnior GA, Magalhães AFB, Garcia DA, Frezarim GB, Leão GFC, Carvalheiro R, Baldi F, Nunes de Oliveira H, Galvão de Albuquerque L. Effect of quality control, density and allele frequency of markers on the accuracy of genomic prediction for complex traits in Nellore cattle. ANIMAL PRODUCTION SCIENCE 2019. [DOI: 10.1071/an16821] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
This study was designed to test the impact of quality control, density and allele frequency of single nucleotide polymorphisms (SNP) markers on the accuracy of genomic predictions, using three traits with different heritabilities and two methods of prediction in a Nellore cattle population genotyped with the Illumina Bovine HD Assay. A total of 1756; 3150 and 3119 records of age at first calving (AFC); weaning weight (WW) and yearling weight (YW), respectively, were used. Three scenarios with different exclusion thresholds for minor allele frequency (MAF), deviation from Hardy–Weinberg equilibrium (HWE) and correlation between SNP pairs (r2) were constructed for all traits: (1) high rigor (S1): call rate <0.98, MAF <0.05, HWE with P <10−5, and r2 >0.999; (2) Moderate rigor (S2): call rate <0.85 and MAF <0.01; (3) Low rigor (S3): only non-autosomal SNP and those mapped on the same position were excluded. Additionally, to assess the prediction accuracy from different markers density, six panels (10K, 50K, 100K, 300K, 500K and 700K) were customised using the high-density genotyping assay as reference. Finally, from the markers available in high-density genotyping assay, six groups (G) with different minor allele frequency bins were defined to estimate the accuracy of genomic prediction. The range of MAF bins was approximately equal for the traits studied: G1 (0.000–0.009), G2 (0.010–0.064), G3 (0.065–0.174), G4 (0.175–0.325), G5 (0.326–0.500) and G6 (0.000–0.500). The Genomic Best Linear Unbiased Predictor and BayesCπ methods were used to estimate the SNP marker effects. Five-fold cross-validation was used to measure the accuracy of genomic prediction for all scenarios. There were no effects of genotypes quality control criteria on the accuracies of genomic predictions. For all traits, the higher density panel did not provide greater prediction accuracies than the low density one (10K panel). The groups of SNP with low MAF (MAF ≤0.007 for AFC, MAF ≤0.009 for WW and MAF ≤0.008 for YW) provided lower prediction accuracies than the groups with higher allele frequencies.
Collapse
|
20
|
Genomic Prediction of Growth and Stem Quality Traits in Eucalyptus globulus Labill. at Its Southernmost Distribution Limit in Chile. FORESTS 2018. [DOI: 10.3390/f9120779] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The present study was undertaken to examine the ability of different genomic selection (GS) models to predict growth traits (diameter at breast height, tree height and wood volume), stem straightness and branching quality of Eucalyptus globulus Labill. trees using a genome-wide Single Nucleotide Polymorphism (SNP) chip (60 K), in one of the southernmost progeny trials of the species, close to its southern distribution limit in Chile. The GS methods examined were Ridge Regression-BLUP (RRBLUP), Bayes-A, Bayes-B, Bayesian least absolute shrinkage and selection operator (BLASSO), principal component regression (PCR), supervised PCR and a variant of the RRBLUP method that involves the previous selection of predictor variables (RRBLUP-B). RRBLUP-B and supervised PCR models presented the greatest predictive ability (PA), followed by the PCR method, for most of the traits studied. The highest PA was obtained for the branching quality (~0.7). For the growth traits, the maximum values of PA varied from 0.43 to 0.54, while for stem straightness, the maximum value of PA reached 0.62 (supervised PCR). The study population presented a more extended linkage disequilibrium (LD) than other populations of E. globulus previously studied. The genome-wide LD decayed rapidly within 0.76 Mbp (threshold value of r2 = 0.1). The average LD on all chromosomes was r2 = 0.09. In addition, the 0.15% of total pairs of linked SNPs were in a complete LD (r2 = 1), and the 3% had an r2 value >0.5. Genomic prediction, which is based on the reduction in dimensionality and variable selection may be a promising method, considering the early growth of the trees and the low-to-moderate values of heritability found in the traits evaluated. These findings provide new understanding of how develop novel breeding strategies for tree improvement of E. globulus at its southernmost range limit in Chile, which could represent new opportunities for forest planting that can benefit the local economy.
Collapse
|
21
|
Juliana P, Singh RP, Poland J, Mondal S, Crossa J, Montesinos-López OA, Dreisigacker S, Pérez-Rodríguez P, Huerta-Espino J, Crespo-Herrera L, Govindan V. Prospects and Challenges of Applied Genomic Selection-A New Paradigm in Breeding for Grain Yield in Bread Wheat. THE PLANT GENOME 2018; 11:10.3835/plantgenome2018.03.0017. [PMID: 30512048 PMCID: PMC7822054 DOI: 10.3835/plantgenome2018.03.0017] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Genomic selection (GS) has been promising for increasing genetic gains in several species. Therefore, we evaluated the potential integration of GS for grain yield (GY) in bread wheat ( L.) in CIMMYT's elite yield trial nurseries. We observed that the genomic prediction accuracies within nurseries (0.44 and 0.35) were substantially higher than across-nursery accuracies (0.15 and 0.05) for GY evaluated in the bed and flat planting systems, respectively. The accuracies from using only a subset of 251 genotyping-by-sequencing markers were comparable to the accuracies using all 2038 markers. We also used the item-based collaborative filtering approach for incorporating other related traits in predicting GY and observed that it outperformed genomic predictions across nurseries, but was less predictive when trait correlations with GY were low. Furthermore, we compared GS and phenotypic selections (PS) and observed that at a selection intensity of 0.5, GS could select a maximum of 70.9 and 61.5% of the top lines and discard 71.5 and 60.5% of the poor lines selected or discarded by PS within and across nurseries, respectively. Comparisons of GS and pedigree-based predictions revealed that the advantage of GS over the pedigree was moderate in populations without full-sibs. However, GS was less advantageous for within-family selections in elite families with few full-sibs and minimal Mendelian sampling variance. Overall, our results demonstrate the importance of applying GS for GY at the appropriate stage of the breeding cycle, and we speculate that gains can be maximized if it is implemented in early-generation within-family selections.
Collapse
Affiliation(s)
- Philomin Juliana
- CIMMYT, Apdo, Postal 6-641, 06600 Mexico, D.F., Mexico
- Corresponding authors (, )
| | - Ravi P. Singh
- CIMMYT, Apdo, Postal 6-641, 06600 Mexico, D.F., Mexico
- Corresponding authors (, )
| | - Jesse Poland
- Wheat Genetics Resource Center, Dep. of Plant Pathology, Kansas State Univ., Manhattan, KS 66506; J. Poland, Dep. of Agronomy, Kansas State Univ., Manhattan, KS 66506
| | | | - José Crossa
- CIMMYT, Apdo, Postal 6-641, 06600 Mexico, D.F., Mexico
| | | | | | | | - Julio Huerta-Espino
- Campo experimental Valle de México Instituto Nacional de Investigaciones Forestales, Agrícolas y Pecuarias, 56230, Chapingo, Edo. de México, México
| | | | - Velu Govindan
- CIMMYT, Apdo, Postal 6-641, 06600 Mexico, D.F., Mexico
| |
Collapse
|
22
|
Hosseini-Vardanjani SM, Shariati MM, Moradi Shahrebabak H, Tahmoorespur M. Incorporating Prior Knowledge of Principal Components in Genomic Prediction. Front Genet 2018; 9:289. [PMID: 30116258 PMCID: PMC6082966 DOI: 10.3389/fgene.2018.00289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2017] [Accepted: 07/11/2018] [Indexed: 12/05/2022] Open
Abstract
Genomic prediction using a large number of markers is challenging, due to the curse of dimensionality as well as multicollinearity arising from linkage disequilibrium between markers. Several methods have been proposed to solve these problems such as Principal Component Analysis (PCA) that is commonly used to reduce the dimension of predictor variables by generating orthogonal variables. Usually, the knowledge from PCA is incorporated in genomic prediction, assuming equal variance for the PCs or a variance proportional to the eigenvalues, both treat variances as fixed. Here, three prior distributions including normal, scaled-t and double exponential were assumed for PC effects in a Bayesian framework with a subset of PCs. These developed PCR models (dPCRm) were compared to routine genomic prediction models (RGPM) i.e., ridge and Bayesian ridge regression, BayesA, BayesB, and PC regression with a subset of PCs but PC variances predefined as proportional to the eigenvalues (PCR-Eigen). The performance of methods was compared by simulating a single trait with heritability of 0.25 on a genome consisted of 3,000 SNPs on three chromosomes and QTL numbers of 15, 60, and 105. After 500 generations of random mating as the historical population, a population was isolated and mated for another 15 generations. The generations 8 and 9 of recent population were used as the reference population and the next six generations as validation populations. The accuracy and bias of predictions were evaluated within the reference population, and each of validation populations. The accuracies of dPCRm were similar to RGPM (0.536 to 0.664 vs. 0.542 to 0.671), and higher than the accuracies of PCR-Eigen (0.504 to 0.641) within reference population over different QTL numbers. Decline in accuracies in validation populations were from 0.633 to 0.310, 0.639 to 0.313, and 0.617 to 0.298 using dPCRm, RGPM and PCR-Eigen, respectively. Prediction biases of dPCRm and RGPM were similar and always much less than biases of PCR-Eigen. In conclusion assuming PC variances as random variables via prior specification yielded higher accuracy than PCR-Eigen and same accuracy as RGPM, while fewer predictors were used.
Collapse
Affiliation(s)
| | - Mohammad M. Shariati
- Department of Animal Science, Ferdowsi University of Mashhad, Mashhad, Iran
- *Correspondence: Mohammad M. Shariati
| | - Hossein Moradi Shahrebabak
- Department of Animal Science, University College of Agriculture and Natural Resources, University of Tehran, Tehran, Iran
| | | |
Collapse
|
23
|
Abstract
Genomic selection (GS) has become a tool for selecting candidates in plant and animal breeding programs. In the case of quantitative traits, it is common to assume that the distribution of the response variable can be approximated by a normal distribution. However, it is known that the selection process leads to skewed distributions. There is vast statistical literature on skewed distributions, but the skew normal distribution is of particular interest in this research. This distribution includes a third parameter that drives the skewness, so that it generalizes the normal distribution. We propose an extension of the Bayesian whole-genome regression to skew normal distribution data in the context of GS applications, where usually the number of predictors vastly exceeds the sample size. However, it can also be applied when the number of predictors is smaller than the sample size. We used a stochastic representation of a skew normal random variable, which allows the implementation of standard Markov Chain Monte Carlo (MCMC) techniques to efficiently fit the proposed model. The predictive ability and goodness of fit of the proposed model were evaluated using simulated and real data, and the results were compared to those obtained by the Bayesian Ridge Regression model. Results indicate that the proposed model has a better fit and is as good as the conventional Bayesian Ridge Regression model for prediction, based on the DIC criterion and cross-validation, respectively. A computing program coded in the R statistical package and C programming language to fit the proposed model is available as supplementary material.
Collapse
|
24
|
Comparing strategies for selection of low-density SNPs for imputation-mediated genomic prediction in U. S. Holsteins. Genetica 2017; 146:137-149. [PMID: 29243001 DOI: 10.1007/s10709-017-0004-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 12/08/2017] [Indexed: 10/18/2022]
Abstract
SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821-0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825-0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.
Collapse
|
25
|
Abdollahi-Arpanahi R, Morota G, Peñagaricano F. Predicting bull fertility using genomic data and biological information. J Dairy Sci 2017; 100:9656-9666. [PMID: 28987577 DOI: 10.3168/jds.2017-13288] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2017] [Accepted: 09/13/2017] [Indexed: 01/04/2023]
Abstract
The genomic prediction of unobserved genetic values or future phenotypes for complex traits has revolutionized agriculture and human medicine. Fertility traits are undoubtedly complex traits of great economic importance to the dairy industry. Although genomic prediction for improved cow fertility has received much attention, bull fertility largely has been ignored. The first aim of this study was to investigate the feasibility of genomic prediction of sire conception rate (SCR) in US Holstein dairy cattle. Standard genomic prediction often ignores any available information about functional features of the genome, although it is believed that such information can yield more accurate and more persistent predictions. Hence, the second objective was to incorporate prior biological information into predictive models and evaluate their performance. The analyses included the use of kernel-based models fitting either all single nucleotide polymorphisms (SNP; 55K) or only markers with presumed functional roles, such as SNP linked to Gene Ontology or Medical Subject Heading terms related to male fertility, or SNP significantly associated with SCR. Both single- and multikernel models were evaluated using linear and Gaussian kernels. Predictive ability was evaluated in 5-fold cross-validation. The entire set of SNP exhibited predictive correlations around 0.35. Neither Gene Ontology nor Medical Subject Heading gene sets achieved predictive abilities higher than their counterparts using random sets of SNP. Notably, kernel models fitting significant SNP achieved the best performance with increases in accuracy up to 5% compared with the standard whole-genome approach. Models fitting Gaussian kernels outperformed their counterparts fitting linear kernels irrespective of the set of SNP. Overall, our findings suggest that genomic prediction of bull fertility is feasible in dairy cattle. This provides potential for accurate genome-guided decisions, such as early culling of bull calves with low SCR predictions. In addition, exploiting nonlinear effects through the use of Gaussian kernels together with the incorporation of relevant markers seems to be a promising alternative to the standard approach. The inclusion of gene set results into prediction models deserves further research.
Collapse
Affiliation(s)
- Rostam Abdollahi-Arpanahi
- Department of Animal Sciences, University of Florida, Gainesville 32611; Department of Animal and Poultry Science, University of Tehran, Pakdasht, Iran 3391653755
| | - Gota Morota
- Department of Animal Science, University of Nebraska, Lincoln 68583
| | - Francisco Peñagaricano
- Department of Animal Sciences, University of Florida, Gainesville 32611; University of Florida Genetics Institute, Gainesville 32611.
| |
Collapse
|
26
|
Magnabosco CU, Lopes FB, Fragoso RC, Eifert EC, Valente BD, Rosa GJM, Sainz RD. Accuracy of genomic breeding values for meat tenderness in Polled Nellore cattle. J Anim Sci 2017; 94:2752-60. [PMID: 27482662 DOI: 10.2527/jas.2016-0279] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Zebu () cattle, mostly of the Nellore breed, comprise more than 80% of the beef cattle in Brazil, given their tolerance of the tropical climate and high resistance to ectoparasites. Despite their advantages for production in tropical environments, zebu cattle tend to produce tougher meat than Bos taurus breeds. Traditional genetic selection to improve meat tenderness is constrained by the difficulty and cost of phenotypic evaluation for meat quality. Therefore, genomic selection may be the best strategy to improve meat quality traits. This study was performed to compare the accuracies of different Bayesian regression models in predicting molecular breeding values for meat tenderness in Polled Nellore cattle. The data set was composed of Warner-Bratzler shear force (WBSF) of longissimus muscle from 205, 141, and 81 animals slaughtered in 2005, 2010, and 2012, respectively, which were selected and mated so as to create extreme segregation for WBSF. The animals were genotyped with either the Illumina BovineHD (HD; 777,000 from 90 samples) chip or the GeneSeek Genomic Profiler (GGP Indicus HD; 77,000 from 337 samples). The quality controls of SNP were Hard-Weinberg Proportion -value ≥ 0.1%, minor allele frequency > 1%, and call rate > 90%. The FImpute program was used for imputation from the GGP Indicus HD chip to the HD chip. The effect of each SNP was estimated using ridge regression, least absolute shrinkage and selection operator (LASSO), Bayes A, Bayes B, and Bayes Cπ methods. Different numbers of SNP were used, with 1, 2, 3, 4, 5, 7, 10, 20, 40, 60, 80, or 100% of the markers preselected based on their significance test (-value from genomewide association studies [GWAS]) or randomly sampled. The prediction accuracy was assessed by the correlation between genomic breeding value and the observed WBSF phenotype, using a leave-one-out cross-validation methodology. The prediction accuracies using all markers were all very similar for all models, ranging from 0.22 (Bayes Cπ) to 0.25 (Bayes B). When preselecting SNP based on GWAS results, the highest correlation (0.27) between WBSF and the genomic breeding value was achieved using the Bayesian LASSO model with 15,030 (3%) markers. Although this study used relatively few animals, the design of the segregating population ensured wide genetic variability for meat tenderness, which was important to achieve acceptable accuracy of genomic prediction. Although all models showed similar levels of prediction accuracy, some small advantages were observed with the Bayes B approach when higher numbers of markers were preselected based on their -values resulting from a GWAS analysis.
Collapse
|
27
|
Moghaddar N, Swan AA, van der Werf JHJ. Genomic prediction from observed and imputed high-density ovine genotypes. Genet Sel Evol 2017; 49:40. [PMID: 28427324 PMCID: PMC5399335 DOI: 10.1186/s12711-017-0315-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Accepted: 04/04/2017] [Indexed: 01/10/2023] Open
Abstract
Background Genomic prediction using high-density (HD) marker genotypes is expected to lead to higher prediction accuracy, particularly for more heterogeneous multi-breed and crossbred populations such as those in sheep and beef cattle, due to providing stronger linkage disequilibrium between single nucleotide polymorphisms and quantitative trait loci controlling a trait. The objective of this study was to evaluate a possible improvement in genomic prediction accuracy of production traits in Australian sheep breeds based on HD genotypes (600k, both observed and imputed) compared to prediction based on 50k marker genotypes. In particular, we compared improvement in prediction accuracy of animals that are more distantly related to the reference population and across sheep breeds. Methods Genomic best linear unbiased prediction (GBLUP) and a Bayesian approach (BayesR) were used as prediction methods using whole or subsets of a large multi-breed/crossbred sheep reference set. Empirical prediction accuracy was evaluated for purebred Merino, Border Leicester, Poll Dorset and White Suffolk sire breeds according to the Pearson correlation coefficient between genomic estimated breeding values and breeding values estimated based on a progeny test in a separate dataset. Results Results showed a small absolute improvement (0.0 to 8.0% and on average 2.2% across all traits) in prediction accuracy of purebred animals from HD genotypes when prediction was based on the whole dataset. Greater improvement in prediction accuracy (1.0 to 12.0% and on average 5.2%) was observed for animals that were genetically lowly related to the reference set while it ranged from 0.0 to 5.0% for across-breed prediction. On average, no significant advantage was observed with BayesR compared to GBLUP.
Collapse
Affiliation(s)
- Nasir Moghaddar
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia.
| | - Andrew A Swan
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,Animal Genetics and Breeding Unit (AGBU), University of New England, Armidale, NSW, 2351, Australia
| | - Julius H J van der Werf
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| |
Collapse
|
28
|
Ogawa S, Matsuda H, Taniguchi Y, Watanabe T, Kitamura Y, Tabuchi I, Sugimoto Y, Iwaisaki H. Genomic prediction for carcass traits in Japanese Black cattle using single nucleotide polymorphism markers of different densities. ANIMAL PRODUCTION SCIENCE 2017. [DOI: 10.1071/an15696] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Genomic prediction (GP) of breeding values using single nucleotide polymorphism (SNP) markers can be conducted even when pedigree information is unavailable, providing phenotypes are known and marker data are provided. While use of high-density SNP markers is desirable for accurate GP, lower-density SNPs can perform well in some situations. In the present study, GP was performed for carcass weight and marbling score in Japanese Black cattle using SNP markers of varying densities. The 1791 fattened steers with phenotypic data and 189 having predicted breeding values provided by the official genetic evaluation using pedigree data were treated as the training and validation populations respectively. Genotype data on 565837 autosomal SNPs were available and SNPs were selected to provide different equally spaced SNP subsets of lower densities. Genomic estimated breeding values (GEBVs) were obtained using genomic best linear unbiased prediction incorporating one of two types of genomic relationship matrices (G matrices). The GP accuracy assessed as the correlation between the GEBVs and the corrected records divided by the square root of estimated heritability was around 0.85 for carcass weight and 0.60 for marbling score when using 565837 SNPs. The type of G matrix used gave no substantial difference in the results at a given SNP density for traits examined. Around 80% of the GP accuracy was retained when the SNP density was decreased to 1/1000 of that of all available SNPs. These results indicate that even when a SNP panel of a lower density is used, GP may be beneficial to the pre-selection for the carcass traits in Japanese Black young breeding animals.
Collapse
|
29
|
Genetic Marker Discovery in Complex Traits: A Field Example on Fat Content and Composition in Pigs. Int J Mol Sci 2016; 17:ijms17122100. [PMID: 27983643 PMCID: PMC5187900 DOI: 10.3390/ijms17122100] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Revised: 12/06/2016] [Accepted: 12/07/2016] [Indexed: 12/11/2022] Open
Abstract
Among the large number of attributes that define pork quality, fat content and composition have attracted the attention of breeders in the recent years due to their interaction with human health and technological and sensorial properties of meat. In livestock species, fat accumulates in different depots following a temporal pattern that is also recognized in humans. Intramuscular fat deposition rate and fatty acid composition change with life. Despite indication that it might be possible to select for intramuscular fat without affecting other fat depots, to date only one depot-specific genetic marker (PCK1 c.2456C>A) has been reported. In contrast, identification of polymorphisms related to fat composition has been more successful. For instance, our group has described a variant in the stearoyl-coA desaturase (SCD) gene that improves the desaturation index of fat without affecting overall fatness or growth. Identification of mutations in candidate genes can be a tedious and costly process. Genome-wide association studies can help in narrowing down the number of candidate genes by highlighting those which contribute most to the genetic variation of the trait. Results from our group and others indicate that fat content and composition are highly polygenic and that very few genes explain more than 5% of the variance of the trait. Moreover, as the complexity of the genome emerges, the role of non-coding genes and regulatory elements cannot be disregarded. Prediction of breeding values from genomic data is discussed in comparison with conventional best linear predictors of breeding values. An example based on real data is given, and the implications in phenotype prediction are discussed in detail. The benefits and limitations of using large SNP sets versus a few very informative markers as predictors of genetic merit of breeding candidates are evaluated using field data as an example.
Collapse
|
30
|
Optimizing Training Population Data and Validation of Genomic Selection for Economic Traits in Soft Winter Wheat. G3-GENES GENOMES GENETICS 2016; 6:2919-28. [PMID: 27440921 PMCID: PMC5015948 DOI: 10.1534/g3.116.032532] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Genomic selection (GS) is a breeding tool that estimates breeding values (GEBVs) of individuals based solely on marker data by using a model built using phenotypic and marker data from a training population (TP). The effectiveness of GS increases as the correlation of GEBVs and phenotypes (accuracy) increases. Using phenotypic and genotypic data from a TP of 470 soft winter wheat lines, we assessed the accuracy of GS for grain yield, Fusarium Head Blight (FHB) resistance, softness equivalence (SE), and flour yield (FY). Four TP data sampling schemes were tested: (1) use all TP data, (2) use subsets of TP lines with low genotype-by-environment interaction, (3) use subsets of markers significantly associated with quantitative trait loci (QTL), and (4) a combination of 2 and 3. We also correlated the phenotypes of relatives of the TP to their GEBVs calculated from TP data. The GS accuracy within the TP using all TP data ranged from 0.35 (FHB) to 0.62 (FY). On average, the accuracy of GS from using subsets of data increased by 54% relative to using all TP data. Using subsets of markers selected for significant association with the target trait had the greatest impact on GS accuracy. Between-environment prediction accuracy was also increased by using data subsets. The accuracy of GS when predicting the phenotypes of TP relatives ranged from 0.00 to 0.85. These results suggest that GS could be useful for these traits and GS accuracy can be greatly improved by using subsets of TP data.
Collapse
|
31
|
Wu XL, Xu J, Feng G, Wiggans GR, Taylor JF, He J, Qian C, Qiu J, Simpson B, Walker J, Bauck S. Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications. PLoS One 2016; 11:e0161719. [PMID: 27583971 PMCID: PMC5008792 DOI: 10.1371/journal.pone.0161719] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 08/10/2016] [Indexed: 11/19/2022] Open
Abstract
Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal.
Collapse
Affiliation(s)
- Xiao-Lin Wu
- Bioinformatics and Biostatistics, GeneSeek (a Neogen Company), Lincoln, Nebraska, United States of America
- * E-mail:
| | - Jiaqi Xu
- Bioinformatics and Biostatistics, GeneSeek (a Neogen Company), Lincoln, Nebraska, United States of America
- Department of Statistics, University of Nebraska, Lincoln, Nebraska, United States of America
| | - Guofei Feng
- Bioinformatics and Biostatistics, GeneSeek (a Neogen Company), Lincoln, Nebraska, United States of America
- Department of Statistics, University of Nebraska, Lincoln, Nebraska, United States of America
| | - George R. Wiggans
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, United States Department of Agriculture, Beltsville, Maryland, United States of America
| | - Jeremy F. Taylor
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, United States of America
| | - Jun He
- College of Animal Sciences and Technology, Hunan Agricultural University, Changsha, China
| | - Changsong Qian
- Marketing and Business Development, Neogen Bio-Scientific Technology (Shanghai) Company Ltd., Shanghai, China
| | - Jiansheng Qiu
- Bioinformatics and Biostatistics, GeneSeek (a Neogen Company), Lincoln, Nebraska, United States of America
| | - Barry Simpson
- Bioinformatics and Biostatistics, GeneSeek (a Neogen Company), Lincoln, Nebraska, United States of America
| | - Jeremy Walker
- Bioinformatics and Biostatistics, GeneSeek (a Neogen Company), Lincoln, Nebraska, United States of America
| | - Stewart Bauck
- Bioinformatics and Biostatistics, GeneSeek (a Neogen Company), Lincoln, Nebraska, United States of America
| |
Collapse
|
32
|
Heidaritabar M, Wolc A, Arango J, Zeng J, Settar P, Fulton J, O'Sullivan N, Bastiaansen J, Fernando R, Garrick D, Dekkers J. Impact of fitting dominance and additive effects on accuracy of genomic prediction of breeding values in layers. J Anim Breed Genet 2016; 133:334-46. [DOI: 10.1111/jbg.12225] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Accepted: 05/14/2016] [Indexed: 02/06/2023]
Affiliation(s)
- M. Heidaritabar
- Department of Animal Science Iowa State University Ames IA USA
- Animal Breeding and Genomics Center Wageningen University Wageningen the Netherlands
| | - A. Wolc
- Department of Animal Science Iowa State University Ames IA USA
- Hy‐Line International Dallas Center IA USA
| | - J. Arango
- Hy‐Line International Dallas Center IA USA
| | - J. Zeng
- Department of Animal Science Iowa State University Ames IA USA
| | - P. Settar
- Hy‐Line International Dallas Center IA USA
| | | | | | - J.W.M. Bastiaansen
- Animal Breeding and Genomics Center Wageningen University Wageningen the Netherlands
| | - R.L. Fernando
- Department of Animal Science Iowa State University Ames IA USA
| | - D.J. Garrick
- Department of Animal Science Iowa State University Ames IA USA
| | - J.C.M. Dekkers
- Department of Animal Science Iowa State University Ames IA USA
| |
Collapse
|
33
|
Heidaritabar M, Calus MPL, Megens HJ, Vereijken A, Groenen MAM, Bastiaansen JWM. Accuracy of genomic prediction using imputed whole-genome sequence data in white layers. J Anim Breed Genet 2016; 133:167-79. [PMID: 26776363 DOI: 10.1111/jbg.12199] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Accepted: 11/26/2015] [Indexed: 01/17/2023]
Abstract
There is an increasing interest in using whole-genome sequence data in genomic selection breeding programmes. Prediction of breeding values is expected to be more accurate when whole-genome sequence is used, because the causal mutations are assumed to be in the data. We performed genomic prediction for the number of eggs in white layers using imputed whole-genome resequence data including ~4.6 million SNPs. The prediction accuracies based on sequence data were compared with the accuracies from the 60 K SNP panel. Predictions were based on genomic best linear unbiased prediction (GBLUP) as well as a Bayesian variable selection model (BayesC). Moreover, the prediction accuracy from using different types of variants (synonymous, non-synonymous and non-coding SNPs) was evaluated. Genomic prediction using the 60 K SNP panel resulted in a prediction accuracy of 0.74 when GBLUP was applied. With sequence data, there was a small increase (~1%) in prediction accuracy over the 60 K genotypes. With both 60 K SNP panel and sequence data, GBLUP slightly outperformed BayesC in predicting the breeding values. Selection of SNPs more likely to affect the phenotype (i.e. non-synonymous SNPs) did not improve the accuracy of genomic prediction. The fact that sequence data were based on imputation from a small number of sequenced animals may have limited the potential to improve the prediction accuracy. A small reference population (n = 1004) and possible exclusion of many causal SNPs during quality control can be other possible reasons for limited benefit of sequence data. We expect, however, that the limited improvement is because the 60 K SNP panel was already sufficiently dense to accurately determine the relationships between animals in our data.
Collapse
Affiliation(s)
- M Heidaritabar
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
| | - M P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Wageningen, the Netherlands
| | - H-J Megens
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
| | - A Vereijken
- Hendrix Genetics Research, Technology and Services B.V., Boxmeer, the Netherlands
| | - M A M Groenen
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
| | - J W M Bastiaansen
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
| |
Collapse
|
34
|
Multi-generational imputation of single nucleotide polymorphism marker genotypes and accuracy of genomic selection. Animal 2016; 10:1077-85. [PMID: 27076192 DOI: 10.1017/s1751731115002906] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Availability of high-density single nucleotide polymorphism (SNP) genotyping platforms provided unprecedented opportunities to enhance breeding programmes in livestock, poultry and plant species, and to better understand the genetic basis of complex traits. Using this genomic information, genomic breeding values (GEBVs), which are more accurate than conventional breeding values. The superiority of genomic selection is possible only when high-density SNP panels are used to track genes and QTLs affecting the trait. Unfortunately, even with the continuous decrease in genotyping costs, only a small fraction of the population has been genotyped with these high-density panels. It is often the case that a larger portion of the population is genotyped with low-density and low-cost SNP panels and then imputed to a higher density. Accuracy of SNP genotype imputation tends to be high when minimum requirements are met. Nevertheless, a certain rate of genotype imputation errors is unavoidable. Thus, it is reasonable to assume that the accuracy of GEBVs will be affected by imputation errors; especially, their cumulative effects over time. To evaluate the impact of multi-generational selection on the accuracy of SNP genotypes imputation and the reliability of resulting GEBVs, a simulation was carried out under varying updating of the reference population, distance between the reference and testing sets, and the approach used for the estimation of GEBVs. Using fixed reference populations, imputation accuracy decayed by about 0.5% per generation. In fact, after 25 generations, the accuracy was only 7% lower than the first generation. When the reference population was updated by either 1% or 5% of the top animals in the previous generations, decay of imputation accuracy was substantially reduced. These results indicate that low-density panels are useful, especially when the generational interval between reference and testing population is small. As the generational interval increases, the imputation accuracies decay, although not at an alarming rate. In absence of updating of the reference population, accuracy of GEBVs decays substantially in one or two generations at the rate of 20% to 25% per generation. When the reference population is updated by 1% or 5% every generation, the decay in accuracy was 8% to 11% after seven generations using true and imputed genotypes. These results indicate that imputed genotypes provide a viable alternative, even after several generations, as long the reference and training populations are appropriately updated to reflect the genetic change in the population.
Collapse
|
35
|
Moghaddar N, Gore KP, Daetwyler HD, Hayes BJ, van der Werf JHJ. Accuracy of genotype imputation based on random and selected reference sets in purebred and crossbred sheep populations and its effect on accuracy of genomic prediction. Genet Sel Evol 2015; 47:97. [PMID: 26694131 PMCID: PMC4688977 DOI: 10.1186/s12711-015-0175-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 11/30/2015] [Indexed: 02/02/2023] Open
Abstract
Background The objectives of this study were to investigate the accuracy of genotype imputation from low (12k) to medium (50k Illumina-Ovine) SNP (single nucleotide polymorphism) densities in purebred and crossbred Merino sheep based on a random or selected reference set and to evaluate the impact of using imputed genotypes on accuracy of genomic prediction. Methods Imputation validation sets were composed of random purebred or crossbred Merinos, while imputation reference sets were of variable sizes and included random purebred or crossbred Merinos or a group of animals that were selected based on high genetic relatedness to animals in the validation set. The Beagle software program was used for imputation and accuracy of imputation was assessed based on the Pearson correlation coefficient between observed and imputed genotypes. Genomic evaluation was performed based on genomic best linear unbiased prediction and its accuracy was evaluated as the Pearson correlation coefficient between genomic estimated breeding values using either observed (12k/50k) or imputed genotypes with varying levels of imputation accuracy and accurate estimated breeding values based on progeny-tests. Results Imputation accuracy increased as the size of the reference set increased. However, accuracy was higher for purebred Merinos that were imputed from other purebred Merinos (on average 0.90 to 0.95 based on 1000 to 3000 animals) than from crossbred Merinos (0.78 to 0.87 based on 1000 to 3000 animals) or from non-Merino purebreds (on average 0.50). The imputation accuracy for crossbred Merinos based on 1000 to 3000 other crossbred Merino ranged from 0.86 to 0.88. Considerably higher imputation accuracy was observed when a selected reference set with a high genetic relationship to target animals was used vs. a random reference set of the same size (0.96 vs. 0.88, respectively). Accuracy of genomic prediction based on 50k genotypes imputed with high accuracy (0.88 to 0.99) decreased only slightly (0.0 to 0.67 % across traits) compared to using observed 50k genotypes. Accuracy of genomic prediction based on observed 12k genotypes was higher than accuracy based on lowly accurate (0.62 to 0.86) imputed 50k genotypes.
Collapse
Affiliation(s)
- Nasir Moghaddar
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia.
| | - Klint P Gore
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,Animal Genetics & Breeding Unit (AGBU), University of New England, Armidale, NSW, 2351, Australia.
| | - Hans D Daetwyler
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,Biosciences Research Division, Department of Economic Development, Jobs, Transport and Resources, Bundoora, VIC, Australia. .,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia.
| | - Ben J Hayes
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,Biosciences Research Division, Department of Economic Development, Jobs, Transport and Resources, Bundoora, VIC, Australia. .,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia.
| | - Julius H J van der Werf
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia.
| |
Collapse
|
36
|
Ogawa S, Matsuda H, Taniguchi Y, Watanabe T, Sugimoto Y, Iwaisaki H. Estimation of variance and genomic prediction using genotypes imputed from low-density marker subsets for carcass traits in Japanese black cattle. Anim Sci J 2015; 87:1106-13. [DOI: 10.1111/asj.12570] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Revised: 09/15/2015] [Accepted: 10/07/2015] [Indexed: 12/31/2022]
Affiliation(s)
| | | | - Yukio Taniguchi
- Graduate School of Agriculture; Kyoto University; Kyoto Japan
| | | | - Yoshikazu Sugimoto
- Shirakawa Institute of Animal Genetics; Japan Livestock Technology Association; Nishigo Fukushima Japan
| | | |
Collapse
|
37
|
The Causal Meaning of Genomic Predictors and How It Affects Construction and Comparison of Genome-Enabled Selection Models. Genetics 2015; 200:483-94. [PMID: 25908318 DOI: 10.1534/genetics.114.169490] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Accepted: 04/19/2015] [Indexed: 02/05/2023] Open
Abstract
The term "effect" in additive genetic effect suggests a causal meaning. However, inferences of such quantities for selection purposes are typically viewed and conducted as a prediction task. Predictive ability as tested by cross-validation is currently the most acceptable criterion for comparing models and evaluating new methodologies. Nevertheless, it does not directly indicate if predictors reflect causal effects. Such evaluations would require causal inference methods that are not typical in genomic prediction for selection. This suggests that the usual approach to infer genetic effects contradicts the label of the quantity inferred. Here we investigate if genomic predictors for selection should be treated as standard predictors or if they must reflect a causal effect to be useful, requiring causal inference methods. Conducting the analysis as a prediction or as a causal inference task affects, for example, how covariates of the regression model are chosen, which may heavily affect the magnitude of genomic predictors and therefore selection decisions. We demonstrate that selection requires learning causal genetic effects. However, genomic predictors from some models might capture noncausal signal, providing good predictive ability but poorly representing true genetic effects. Simulated examples are used to show that aiming for predictive ability may lead to poor modeling decisions, while causal inference approaches may guide the construction of regression models that better infer the target genetic effect even when they underperform in cross-validation tests. In conclusion, genomic selection models should be constructed to aim primarily for identifiability of causal genetic effects, not for predictive ability.
Collapse
|
38
|
Felipe VPS, Okut H, Gianola D, Silva MA, Rosa GJM. Effect of genotype imputation on genome-enabled prediction of complex traits: an empirical study with mice data. BMC Genet 2014; 15:149. [PMID: 25544265 PMCID: PMC4333171 DOI: 10.1186/s12863-014-0149-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2014] [Accepted: 12/10/2014] [Indexed: 02/01/2023] Open
Abstract
Background Genotype imputation is an important tool for whole-genome prediction as it allows cost reduction of individual genotyping. However, benefits of genotype imputation have been evaluated mostly for linear additive genetic models. In this study we investigated the impact of employing imputed genotypes when using more elaborated models of phenotype prediction. Our hypothesis was that such models would be able to track genetic signals using the observed genotypes only, with no additional information to be gained from imputed genotypes. Results For the present study, an outbred mice population containing 1,904 individuals and genotypes for 1,809 pre-selected markers was used. The effect of imputation was evaluated for a linear model (the Bayesian LASSO - BL) and for semi and non-parametric models (Reproducing Kernel Hilbert spaces regressions – RKHS, and Bayesian Regularized Artificial Neural Networks – BRANN, respectively). The RKHS method had the best predictive accuracy. Genotype imputation had a similar impact on the effectiveness of BL and RKHS. BRANN predictions were, apparently, more sensitive to imputation errors. In scenarios where the masking rates were 75% and 50%, the genotype imputation was not beneficial. However, genotype imputation incorporated information about important markers and improved predictive ability, especially for body mass index (BMI), when genotype information was sparse (90% masking), and for body weight (BW) when the reference sample for imputation was weakly related to the target population. Conclusions In conclusion, genotype imputation is not always helpful for phenotype prediction, and so it should be considered in a case-by-case basis. In summary, factors that can affect the usefulness of genotype imputation for prediction of yet-to-be observed traits are: the imputation accuracy itself, the structure of the population, the genetic architecture of the target trait and also the model used for phenotype prediction.
Collapse
Affiliation(s)
- Vivian P S Felipe
- Department of Animal Sciences, University of Wisconsin, Madison, 53706, USA.
| | - Hayrettin Okut
- Department of Animal Sciences, Biometry and Genetics Branch, University of Yuzuncu Yil, Van, 65080, Turkey.
| | - Daniel Gianola
- Department of Animal Sciences, University of Wisconsin, Madison, 53706, USA.
| | - Martinho A Silva
- Department of Animal Sciences, Federal University of Jequitinhonha and Mucuri Valleys, Minas Gerais, Brazil.
| | - Guilherme J M Rosa
- Department of Animal Sciences, University of Wisconsin, Madison, 53706, USA.
| |
Collapse
|
39
|
Beaulieu J, Doerksen T, Clément S, MacKay J, Bousquet J. Accuracy of genomic selection models in a large population of open-pollinated families in white spruce. Heredity (Edinb) 2014; 113:343-52. [PMID: 24781808 DOI: 10.1038/hdy.2014.36] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2013] [Revised: 03/16/2014] [Accepted: 03/21/2014] [Indexed: 12/16/2022] Open
Abstract
Genomic selection (GS) is of interest in breeding because of its potential for predicting the genetic value of individuals and increasing genetic gains per unit of time. To date, very few studies have reported empirical results of GS potential in the context of large population sizes and long breeding cycles such as for boreal trees. In this study, we assessed the effectiveness of marker-aided selection in an undomesticated white spruce (Picea glauca (Moench) Voss) population of large effective size using a GS approach. A discovery population of 1694 trees representative of 214 open-pollinated families from 43 natural populations was phenotyped for 12 wood and growth traits and genotyped for 6385 single-nucleotide polymorphisms (SNPs) mined in 2660 gene sequences. GS models were built to predict estimated breeding values using all the available SNPs or SNP subsets of the largest absolute effects, and they were validated using various cross-validation schemes. The accuracy of genomic estimated breeding values (GEBVs) varied from 0.327 to 0.435 when the training and the validation data sets shared half-sibs that were on average 90% of the accuracies achieved through traditionally estimated breeding values. The trend was also the same for validation across sites. As expected, the accuracy of GEBVs obtained after cross-validation with individuals of unknown relatedness was lower with about half of the accuracy achieved when half-sibs were present. We showed that with the marker densities used in the current study, predictions with low to moderate accuracy could be obtained within a large undomesticated population of related individuals, potentially resulting in larger gains per unit of time with GS than with the traditional approach.
Collapse
Affiliation(s)
- J Beaulieu
- 1] Natural Resources Canada, Canadian Wood Fibre Centre, Québec, Québec, Canada [2] Canada Research Chair in Forest and Environmental Genomics and Institute for Systems and Integrative Biology, Université Laval, Québec, Québec, Canada
| | - T Doerksen
- 1] Natural Resources Canada, Canadian Wood Fibre Centre, Québec, Québec, Canada [2] Canada Research Chair in Forest and Environmental Genomics and Institute for Systems and Integrative Biology, Université Laval, Québec, Québec, Canada
| | - S Clément
- Natural Resources Canada, Canadian Wood Fibre Centre, Québec, Québec, Canada
| | - J MacKay
- Canada Research Chair in Forest and Environmental Genomics and Institute for Systems and Integrative Biology, Université Laval, Québec, Québec, Canada
| | - J Bousquet
- Canada Research Chair in Forest and Environmental Genomics and Institute for Systems and Integrative Biology, Université Laval, Québec, Québec, Canada
| |
Collapse
|
40
|
Guo G, Zhao F, Wang Y, Zhang Y, Du L, Su G. Comparison of single-trait and multiple-trait genomic prediction models. BMC Genet 2014; 15:30. [PMID: 24593261 PMCID: PMC3975852 DOI: 10.1186/1471-2156-15-30] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Accepted: 02/26/2014] [Indexed: 11/30/2022] Open
Abstract
Background In this study, a single-trait genomic model (STGM) is compared with a multiple-trait genomic model (MTGM) for genomic prediction using conventional estimated breeding values (EBVs) calculated using a conventional single-trait and multiple-trait linear mixed models as the response variables. Three scenarios with and without missing data were simulated; no missing data, 90% missing data in a trait with high heritability, and 90% missing data in a trait with low heritability. The simulated genome had a length of 500 cM with 5000 equally spaced single nucleotide polymorphism markers and 300 randomly distributed quantitative trait loci (QTL). The true breeding values of each trait were determined using 200 of the QTLs, and the remaining 100 QTLs were assumed to affect both the high (trait I with heritability of 0.3) and the low (trait II with heritability of 0.05) heritability traits. The genetic correlation between traits I and II was 0.5, and the residual correlation was zero. Results The results showed that when there were no missing records, MTGM and STGM gave the same reliability for the genomic predictions for trait I while, for trait II, MTGM performed better that STGM. When there were missing records for one of the two traits, MTGM performed much better than STGM. In general, the difference in reliability of genomic EBVs predicted using the EBV response variables estimated from either the multiple-trait or single-trait models was relatively small for the trait without missing data. However, for the trait with missing data, the EBV response variable obtained from the multiple-trait model gave a more reliable genomic prediction than the EBV response variable from the single-trait model. Conclusions These results indicate that MTGM performed better than STGM for the trait with low heritability and for the trait with a limited number of records. Even when the EBV response variable was obtained using the multiple-trait model, the genomic prediction using MTGM was more reliable than the prediction using the STGM.
Collapse
Affiliation(s)
| | | | | | | | - Lixin Du
- National Center for Molecular Genetics and Breeding of Animal, Institute of Animal Sciences, Chinese academy of Agricultural Sciences, Beijing 100193, China.
| | | |
Collapse
|
41
|
Guo G, Zhao F, Wang Y, Zhang Y, Du L, Su G. Comparison of single-trait and multiple-trait genomic prediction models. BMC Genet 2014; 15:30. [PMID: 24593261 DOI: 10.1186/1471-2156-1115-1130] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Accepted: 02/26/2014] [Indexed: 05/28/2023] Open
Abstract
BACKGROUND In this study, a single-trait genomic model (STGM) is compared with a multiple-trait genomic model (MTGM) for genomic prediction using conventional estimated breeding values (EBVs) calculated using a conventional single-trait and multiple-trait linear mixed models as the response variables. Three scenarios with and without missing data were simulated; no missing data, 90% missing data in a trait with high heritability, and 90% missing data in a trait with low heritability. The simulated genome had a length of 500 cM with 5000 equally spaced single nucleotide polymorphism markers and 300 randomly distributed quantitative trait loci (QTL). The true breeding values of each trait were determined using 200 of the QTLs, and the remaining 100 QTLs were assumed to affect both the high (trait I with heritability of 0.3) and the low (trait II with heritability of 0.05) heritability traits. The genetic correlation between traits I and II was 0.5, and the residual correlation was zero. RESULTS The results showed that when there were no missing records, MTGM and STGM gave the same reliability for the genomic predictions for trait I while, for trait II, MTGM performed better that STGM. When there were missing records for one of the two traits, MTGM performed much better than STGM. In general, the difference in reliability of genomic EBVs predicted using the EBV response variables estimated from either the multiple-trait or single-trait models was relatively small for the trait without missing data. However, for the trait with missing data, the EBV response variable obtained from the multiple-trait model gave a more reliable genomic prediction than the EBV response variable from the single-trait model. CONCLUSIONS These results indicate that MTGM performed better than STGM for the trait with low heritability and for the trait with a limited number of records. Even when the EBV response variable was obtained using the multiple-trait model, the genomic prediction using MTGM was more reliable than the prediction using the STGM.
Collapse
Affiliation(s)
| | | | | | | | - Lixin Du
- National Center for Molecular Genetics and Breeding of Animal, Institute of Animal Sciences, Chinese academy of Agricultural Sciences, Beijing 100193, China.
| | | |
Collapse
|
42
|
Ogawa S, Matsuda H, Taniguchi Y, Watanabe T, Nishimura S, Sugimoto Y, Iwaisaki H. Effects of single nucleotide polymorphism marker density on degree of genetic variance explained and genomic evaluation for carcass traits in Japanese Black beef cattle. BMC Genet 2014; 15:15. [PMID: 24491120 PMCID: PMC3913948 DOI: 10.1186/1471-2156-15-15] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2013] [Accepted: 01/31/2014] [Indexed: 01/25/2023] Open
Abstract
Background Japanese Black cattle are a beef breed whose meat is well known to excel in meat quality, especially in marbling, and whose effective population size is relatively low in Japan. Unlike dairy cattle, the accuracy of genomic evaluation (GE) for carcass traits in beef cattle, including this breed, has been poorly studied. For carcass weight and marbling score in the breed, as well as the extent of whole genome linkage disequilibrium (LD), the effects of equally-spaced single nucleotide polymorphisms (SNPs) density on genomic relationship matrix (G matrix), genetic variance explained and GE were investigated using the genotype data of about 40,000 SNPs and two statistical models. Results Using all pairs of two adjacent SNPs in the whole SNP set, the means of LD (r2) at ranges 0–0.1, 0.1–0.2, 0.2–0.5 and 0.5–1 Mb were 0.22, 0.13, 0.10 and 0.08, respectively, and 25.7, 13.9, 10.4 and 6.4% of the r2 values exceeded 0.3, respectively. While about 90% of the genetic variance for carcass weight estimated using all available SNPs was explained using 4,000–6,000 SNPs, the corresponding percentage for marbling score was consistently lower. With the conventional linear model incorporating the G matrix, correlation between the genomic estimated breeding values (GEBVs) obtained using 4,000 SNPs and all available SNPs was 0.99 for carcass weight and 0.98 for marbling score, with an underestimation of the former GEBVs, especially for marbling score. Conclusions The Japanese Black is likely to be in a breed group with a relatively high extent of whole genome LD. The results indicated that the degree of marbling is controlled by only QTLs with relatively small effects, compared with carcass weight, and that using at least 4,000 equally-spaced SNPs, there is a possibility of ranking animals genetically for these carcass traits in this breed.
Collapse
Affiliation(s)
- Shinichiro Ogawa
- Graduate School of Agriculture, Kyoto University, Sakyo-ku, Kyoto 606-8502, Japan.
| | | | | | | | | | | | | |
Collapse
|
43
|
Abdollahi-Arpanahi R, Nejati-Javaremi A, Pakdel A, Moradi-Shahrbabak M, Morota G, Valente BD, Kranis A, Rosa GJM, Gianola D. Effect of allele frequencies, effect sizes and number of markers on prediction of quantitative traits in chickens. J Anim Breed Genet 2014; 131:123-33. [PMID: 24397350 DOI: 10.1111/jbg.12075] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 11/29/2013] [Indexed: 01/09/2023]
Abstract
The objective was to assess goodness of fit and predictive ability of subsets of single nucleotide polymorphism (SNP) markers constructed based on minor allele frequency (MAF), effect sizes and varying marker density. Target traits were body weight (BW), ultrasound measurement of breast muscle (BM) and hen house egg production (HHP) in broiler chickens. We used a 600 K Affymetrix platform with 1352 birds genotyped. The prediction method was genomic best linear unbiased prediction (GBLUP) with 354 564 single nucleotide polymorphisms (SNPs) used to derive a genomic relationship matrix (G). Predictive ability was assessed as the correlation between predicted genomic values and corrected phenotypes from a threefold cross-validation. Predictive ability was 0.27 ± 0.002 for BW, 0.33 ± 0.001 for BM and 0.20 ± 0.002 for HHP. For the three traits studied, predictive ability decreased when SNPs with a higher MAF were used to construct G. Selection of the 20% SNPs with the largest absolute effect sizes induced a predictive ability equal to that from fitting all markers together. When density of markers increased from 5 K to 20 K, predictive ability enhanced slightly. These results provide evidence that designing a low-density chip using low-frequency markers with large effect sizes may be useful for commercial usage.
Collapse
Affiliation(s)
- R Abdollahi-Arpanahi
- Department of Animal Science, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran
| | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Predictive ability of selected subsets of single nucleotide polymorphisms (SNPs) in a moderately sized dairy cattle population. Animal 2014; 8:208-16. [DOI: 10.1017/s1751731113002188] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
45
|
Berry DP, McClure MC, Mullen MP. Within- and across-breed imputation of high-density genotypes in dairy and beef cattle from medium- and low-density genotypes. J Anim Breed Genet 2013; 131:165-72. [PMID: 24906026 DOI: 10.1111/jbg.12067] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Accepted: 11/05/2013] [Indexed: 11/28/2022]
Abstract
The objective of this study was to evaluate, using three different genotype density panels, the accuracy of imputation from lower- to higher-density genotypes in dairy and beef cattle. High-density genotypes consisting of 777,962 single-nucleotide polymorphisms (SNP) were available on 3122 animals comprised of 269, 196, 710, 234, 719, 730 and 264 Angus, Belgian Blue, Charolais, Hereford, Holstein-Friesian, Limousin and Simmental bulls, respectively. Three different genotype densities were generated: low density (LD; 6501 autosomal SNPs), medium density (50K; 47,770 autosomal SNPs) and high density (HD; 735,151 autosomal SNPs). Imputation from lower- to higher-density genotype platforms was undertaken within and across breeds exploiting population-wide linkage disequilibrium. The mean allele concordance rate per breed from LD to HD when undertaken using a single breed or multiple breed reference population varied from 0.956 to 0.974 and from 0.947 to 0.967, respectively. The mean allele concordance rate per breed from 50K to HD when undertaken using a single breed or multiple breed reference population varied from 0.987 to 0.994 and from 0.987 to 0.993, respectively. The accuracy of imputation was generally greater when the reference population was solely comprised of the breed to be imputed compared to when the reference population comprised of multiple breeds, although the impact was less when imputing from 50K to HD compared to imputing from LD.
Collapse
Affiliation(s)
- D P Berry
- Animal & Grassland Research and Innovation Centre, Cork, Ireland
| | | | | |
Collapse
|
46
|
Yao C, Spurlock D, Armentano L, Page C, VandeHaar M, Bickhart D, Weigel K. Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle. J Dairy Sci 2013; 96:6716-29. [DOI: 10.3168/jds.2012-6237] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Accepted: 06/20/2013] [Indexed: 01/23/2023]
|
47
|
Melzer N, Wittenburg D, Repsilber D. Integrating milk metabolite profile information for the prediction of traditional milk traits based on SNP information for Holstein cows. PLoS One 2013; 8:e70256. [PMID: 23990900 PMCID: PMC3749218 DOI: 10.1371/journal.pone.0070256] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2012] [Accepted: 06/18/2013] [Indexed: 12/18/2022] Open
Abstract
In this study the benefit of metabolome level analysis for the prediction of genetic value of three traditional milk traits was investigated. Our proposed approach consists of three steps: First, milk metabolite profiles are used to predict three traditional milk traits of 1,305 Holstein cows. Two regression methods, both enabling variable selection, are applied to identify important milk metabolites in this step. Second, the prediction of these important milk metabolite from single nucleotide polymorphisms (SNPs) enables the detection of SNPs with significant genetic effects. Finally, these SNPs are used to predict milk traits. The observed precision of predicted genetic values was compared to the results observed for the classical genotype-phenotype prediction using all SNPs or a reduced SNP subset (reduced classical approach). To enable a comparison between SNP subsets, a special invariable evaluation design was implemented. SNPs close to or within known quantitative trait loci (QTL) were determined. This enabled us to determine if detected important SNP subsets were enriched in these regions. The results show that our approach can lead to genetic value prediction, but requires less than 1% of the total amount of (40,317) SNPs., significantly more important SNPs in known QTL regions were detected using our approach compared to the reduced classical approach. Concluding, our approach allows a deeper insight into the associations between the different levels of the genotype-phenotype map (genotype-metabolome, metabolome-phenotype, genotype-phenotype).
Collapse
Affiliation(s)
- Nina Melzer
- Institute for Genetics and Biometry, Leibniz Institute for Farm Animal Biology, Dummerstorf, Mecklenburg-Western Pomerania, Germany
| | - Dörte Wittenburg
- Institute for Genetics and Biometry, Leibniz Institute for Farm Animal Biology, Dummerstorf, Mecklenburg-Western Pomerania, Germany
| | - Dirk Repsilber
- Institute for Genetics and Biometry, Leibniz Institute for Farm Animal Biology, Dummerstorf, Mecklenburg-Western Pomerania, Germany
- * E-mail:
| |
Collapse
|
48
|
Boligon AA, Long N, Albuquerque LG, Weigel KA, Gianola D, Rosa GJM. Comparison of selective genotyping strategies for prediction of breeding values in a population undergoing selection. J Anim Sci 2013; 90:4716-22. [PMID: 23372045 DOI: 10.2527/jas.2012-4857] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genomewide marker information can improve the reliability of breeding value predictions for young selection candidates in genomic selection. However, the cost of genotyping limits its use to elite animals, and how such selective genotyping affects predictive ability of genomic selection models is an open question. We performed a simulation study to evaluate the quality of breeding value predictions for selection candidates based on different selective genotyping strategies in a population undergoing selection. The genome consisted of 10 chromosomes of 100 cM each. After 5,000 generations of random mating with a population size of 100 (50 males and 50 females), generation G(0) (reference population) was produced via a full factorial mating between the 50 males and 50 females from generation 5,000. Different levels of selection intensities (animals with the largest yield deviation value) in G(0) or random sampling (no selection) were used to produce offspring of G(0) generation (G(1)). Five genotyping strategies were used to choose 500 animals in G(0) to be genotyped: 1) Random: randomly selected animals, 2) Top: animals with largest yield deviation values, 3) Bottom: animals with lowest yield deviations values, 4) Extreme: animals with the 250 largest and the 250 lowest yield deviations values, and 5) Less Related: less genetically related animals. The number of individuals in G(0) and G(1) was fixed at 2,500 each, and different levels of heritability were considered (0.10, 0.25, and 0.50). Additionally, all 5 selective genotyping strategies (Random, Top, Bottom, Extreme, and Less Related) were applied to an indicator trait in generation G(0,) and the results were evaluated for the target trait in generation G(1), with the genetic correlation between the 2 traits set to 0.50. The 5 genotyping strategies applied to individuals in G(0) (reference population) were compared in terms of their ability to predict the genetic values of the animals in G(1) (selection candidates). Lower correlations between genomic-based estimates of breeding values (GEBV) and true breeding values (TBV) were obtained when using the Bottom strategy. For Random, Extreme, and Less Related strategies, the correlation between GEBV and TBV became slightly larger as selection intensity decreased and was largest when no selection occurred. These 3 strategies were better than the Top approach. In addition, the Extreme, Random, and Less Related strategies had smaller predictive mean squared errors (PMSE) followed by the Top and Bottom methods. Overall, the Extreme genotyping strategy led to the best predictive ability of breeding values, indicating that animals with extreme yield deviations values in a reference population are the most informative when training genomic selection models.
Collapse
Affiliation(s)
- A A Boligon
- Department of Animal Sciences, São Paulo State University, Jaboticabal, SP 14884-000, Brazil.
| | | | | | | | | | | |
Collapse
|
49
|
Abstract
Whole-genome enabled prediction of complex traits has received enormous attention in animal and plant breeding and is making inroads into human and even Drosophila genetics. The term "Bayesian alphabet" denotes a growing number of letters of the alphabet used to denote various Bayesian linear regressions that differ in the priors adopted, while sharing the same sampling model. We explore the role of the prior distribution in whole-genome regression models for dissecting complex traits in what is now a standard situation with genomic data where the number of unknown parameters (p) typically exceeds sample size (n). Members of the alphabet aim to confront this overparameterization in various manners, but it is shown here that the prior is always influential, unless n ≫ p. This happens because parameters are not likelihood identified, so Bayesian learning is imperfect. Since inferences are not devoid of the influence of the prior, claims about genetic architecture from these methods should be taken with caution. However, all such procedures may deliver reasonable predictions of complex traits, provided that some parameters ("tuning knobs") are assessed via a properly conducted cross-validation. It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that n ≪ p.
Collapse
|
50
|
Enlarging a training set for genomic selection by imputation of un-genotyped animals in populations of varying genetic architecture. Genet Sel Evol 2013; 45:12. [PMID: 23621897 PMCID: PMC3652763 DOI: 10.1186/1297-9686-45-12] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Accepted: 03/24/2013] [Indexed: 02/02/2023] Open
Abstract
Background The most common application of imputation is to infer genotypes of a high-density panel of markers on animals that are genotyped for a low-density panel. However, the increase in accuracy of genomic predictions resulting from an increase in the number of markers tends to reach a plateau beyond a certain density. Another application of imputation is to increase the size of the training set with un-genotyped animals. This strategy can be particularly successful when a set of closely related individuals are genotyped. Methods Imputation on completely un-genotyped dams was performed using known genotypes from the sire of each dam, one offspring and the offspring’s sire. Two methods were applied based on either allele or haplotype frequencies to infer genotypes at ambiguous loci. Results of these methods and of two available software packages were compared. Quality of imputation under different population structures was assessed. The impact of using imputed dams to enlarge training sets on the accuracy of genomic predictions was evaluated for different populations, heritabilities and sizes of training sets. Results Imputation accuracy ranged from 0.52 to 0.93 depending on the population structure and the method used. The method that used allele frequencies performed better than the method based on haplotype frequencies. Accuracy of imputation was higher for populations with higher levels of linkage disequilibrium and with larger proportions of markers with more extreme allele frequencies. Inclusion of imputed dams in the training set increased the accuracy of genomic predictions. Gains in accuracy ranged from close to zero to 37.14%, depending on the simulated scenario. Generally, the larger the accuracy already obtained with the genotyped training set, the lower the increase in accuracy achieved by adding imputed dams. Conclusions Whenever a reference population resembling the family configuration considered here is available, imputation can be used to achieve an extra increase in accuracy of genomic predictions by enlarging the training set with completely un-genotyped dams. This strategy was shown to be particularly useful for populations with lower levels of linkage disequilibrium, for genomic selection on traits with low heritability, and for species or breeds for which the size of the reference population is limited.
Collapse
|