Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: van den Berg I, Boichard D, Lund MS. Sequence variants selected from a multi-breed GWAS can improve the reliability of genomic predictions in dairy cattle. Genet Sel Evol 2016;48:83. [PMID: 27809758 PMCID: PMC5095991 DOI: 10.1186/s12711-016-0259-0] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Accepted: 10/19/2016] [Indexed: 01/01/2023] Open

For:	van den Berg I, Boichard D, Lund MS. Sequence variants selected from a multi-breed GWAS can improve the reliability of genomic predictions in dairy cattle. Genet Sel Evol 2016;48:83. [PMID: 27809758 PMCID: PMC5095991 DOI: 10.1186/s12711-016-0259-0] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Accepted: 10/19/2016] [Indexed: 01/01/2023] Open

Number

Cited by Other Article(s)

van den Berg I, Chamberlain AJ, MacLeod IM, Nguyen TV, Goddard ME, Xiang R, Mason B, Meier S, Phyn CVC, Burke CR, Pryce JE. Using expression data to fine map QTL associated with fertility in dairy cattle. Genet Sel Evol 2024;56:42. [PMID: 38844868 PMCID: PMC11154999 DOI: 10.1186/s12711-024-00912-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 05/13/2024] [Indexed: 06/09/2024] Open

Abstract

BACKGROUND

Female fertility is an important trait in dairy cattle. Identifying putative causal variants associated with fertility may help to improve the accuracy of genomic prediction of fertility. Combining expression data (eQTL) of genes, exons, gene splicing and allele specific expression is a promising approach to fine map QTL to get closer to the causal mutations. Another approach is to identify genomic differences between cows selected for high and low fertility and a selection experiment in New Zealand has created exactly this resource. Our objective was to combine multiple types of expression data, fertility traits and allele frequency in high- (POS) and low-fertility (NEG) cows with a genome-wide association study (GWAS) on calving interval in Australian cows to fine-map QTL associated with fertility in both Australia and New Zealand dairy cattle populations.

RESULTS

Variants that were significantly associated with calving interval (CI) were strongly enriched for variants associated with gene, exon, gene splicing and allele-specific expression, indicating that there is substantial overlap between QTL associated with CI and eQTL. We identified 671 genes with significant differential expression between POS and NEG cows, with the largest fold change detected for the CCDC196 gene on chromosome 10. Our results provide numerous candidate genes associated with female fertility in dairy cattle, including GYS2 and TIGAR on chromosome 5 and SYT3 and HSD17B14 on chromosome 18. Multiple QTL regions were located in regions with large numbers of copy number variants (CNV). To identify the causal mutations for these variants, long read sequencing may be useful.

CONCLUSIONS

Variants that were significantly associated with CI were highly enriched for eQTL. We detected 671 genes that were differentially expressed between POS and NEG cows. Several QTL detected for CI overlapped with eQTL, providing candidate genes for fertility in dairy cattle.

Collapse

Id-Lahoucine S, Cánovas A, Legarra A, Casellas J. Transmission ratio distortion regions in the context of genomic evaluation and their effects on reproductive traits in cattle. J Dairy Sci 2023;106:7786-7798. [PMID: 37210358 DOI: 10.3168/jds.2022-23062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 04/19/2023] [Indexed: 05/22/2023]

Abstract

Transmission ratio distortion (TRD), which is a deviation from Mendelian expectations, has been associated with basic mechanisms of life such as sperm and ova fertility and viability at developmental stages of the reproductive cycle. In this study different models including TRD regions were tested for different reproductive traits [days from first service to conception (FSTC), number of services, first service nonreturn rate (NRR), and stillbirth (SB)]. Thus, in addition to a basic model with systematic and random effects, including genetic effects modeled through a genomic relationship matrix, we developed 2 additional models, including a second genomic relationship matrix based on TRD regions, and TRD regions as a random effect assuming heterogeneous variances. The analyses were performed with 10,623 cows and 1,520 bulls genotyped for 47,910 SNPs, 590 TRD regions, and several records ranging from 9,587 (FSTC) to 19,667 (SB). The results of this study showed the ability of TRD regions to capture some additional genetic variance for some traits; however, this did not translate into higher accuracy for genomic prediction. This could be explained by the nature of TRD itself, which may arise in different stages of the reproductive cycle. Nevertheless, important effects of TRD regions were found on SB (31 regions) and NRR (18 regions) when comparing at-risk versus control matings, especially for regions with allelic TRD pattern. Particularly for NRR, the probability of observing nonpregnant cow increases by up to 27% for specific TRD regions, and the probability of observing stillbirth increased by up to 254%. These results support the relevance of several TRD regions on some reproductive traits, especially those with allelic patterns that have not received as much attention as recessive TRD patterns.

Collapse

Calderón-Chagoya R, Vega-Murillo VE, García-Ruiz A, Ríos-Utrera Á, Martínez-Velázquez G, Montaño-Bermúdez M. Discovering Genomic Regions Associated with Reproductive Traits and Frame Score in Mexican Simmental and Simbrah Cattle Using Individual SNP and Haplotype Markers. Genes (Basel) 2023;14:2004. [PMID: 38002947 PMCID: PMC10671695 DOI: 10.3390/genes14112004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/11/2023] [Accepted: 10/20/2023] [Indexed: 11/26/2023] Open

Valente BD, de los Campos G, Grueneberg A, Chen CY, Ros-Freixedes R, Herring WO. Using residual regressions to quantify and map signal leakage in genomic prediction. Genet Sel Evol 2023;55:57. [PMID: 37550618 PMCID: PMC10405418 DOI: 10.1186/s12711-023-00830-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 07/12/2023] [Indexed: 08/09/2023] Open

Abstract

BACKGROUND

Most genomic prediction applications in animal breeding use genotypes with tens of thousands of single nucleotide polymorphisms (SNPs). However, modern sequencing technologies and imputation algorithms can generate ultra-high-density genotypes (including millions of SNPs) at an affordable cost. Empirical studies have not produced clear evidence that using ultra-high-density genotypes can significantly improve prediction accuracy. However, (whole-genome) prediction accuracy is not very informative about the ability of a model to capture the genetic signals from specific genomic regions. To address this problem, we propose a simple methodology that detects chromosome regions for which a specific model (e.g., single-step genomic best linear unbiased prediction (ssGBLUP)) may fail to fully capture the genetic signal present in such segments-a phenomenon that we refer to as signal leakage. We propose to detect regions with evidence of signal leakage by testing the association of residuals from a pedigree or a genomic model with SNP genotypes. We discuss how this approach can be used to map regions with signals that are poorly captured by a model and to identify strategies to fix those problems (e.g., using a different prior or increasing marker density). Finally, we explored the proposed approach to scan for signal leakage of different models (pedigree-based, ssGBLUP, and various Bayesian models) applied to growth-related phenotypes (average daily gain and backfat thickness) in pigs.

RESULTS

We report widespread evidence of signal leakage for pedigree-based models. Including a percentage of animals with SNP data in ssGBLUP reduced the extent of signal leakage. However, local peaks of missed signals remained in some regions, even when all animals were genotyped. Using variable selection priors solves leakage points that are caused by excessive shrinkage of marker effects. Nevertheless, these models still miss signals in some regions due to low linkage disequilibrium between the SNPs on the array used and causal variants. Thus, we discuss how such problems could be addressed by adding sequence SNPs from those regions to the prediction model.

CONCLUSIONS

Residual single-marker regression analysis is a simple approach that can be used to detect regional genomic signals that are poorly captured by a model and to indicate ways to fix such problems.

Collapse

Jang S, Ros-Freixedes R, Hickey JM, Chen CY, Herring WO, Holl J, Misztal I, Lourenco D. Multi-line ssGBLUP evaluation using preselected markers from whole-genome sequence data in pigs. Front Genet 2023;14:1163626. [PMID: 37252662 PMCID: PMC10213539 DOI: 10.3389/fgene.2023.1163626] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 05/03/2023] [Indexed: 05/31/2023] Open

Abstract

Genomic evaluations in pigs could benefit from using multi-line data along with whole-genome sequencing (WGS) if the data are large enough to represent the variability across populations. The objective of this study was to investigate strategies to combine large-scale data from different terminal pig lines in a multi-line genomic evaluation (MLE) through single-step GBLUP (ssGBLUP) models while including variants preselected from whole-genome sequence (WGS) data. We investigated single-line and multi-line evaluations for five traits recorded in three terminal lines. The number of sequenced animals in each line ranged from 731 to 1,865, with 60k to 104k imputed to WGS. Unknown parent groups (UPG) and metafounders (MF) were explored to account for genetic differences among the lines and improve the compatibility between pedigree and genomic relationships in the MLE. Sequence variants were preselected based on multi-line genome-wide association studies (GWAS) or linkage disequilibrium (LD) pruning. These preselected variant sets were used for ssGBLUP predictions without and with weights from BayesR, and the performances were compared to that of a commercial porcine single-nucleotide polymorphisms (SNP) chip. Using UPG and MF in MLE showed small to no gain in prediction accuracy (up to 0.02), depending on the lines and traits, compared to the single-line genomic evaluation (SLE). Likewise, adding selected variants from the GWAS to the commercial SNP chip resulted in a maximum increase of 0.02 in the prediction accuracy, only for average daily feed intake in the most numerous lines. In addition, no benefits were observed when using preselected sequence variants in multi-line genomic predictions. Weights from BayesR did not help improve the performance of ssGBLUP. This study revealed limited benefits of using preselected whole-genome sequence variants for multi-line genomic predictions, even when tens of thousands of animals had imputed sequence data. Correctly accounting for line differences with UPG or MF in MLE is essential to obtain predictions similar to SLE; however, the only observed benefit of an MLE is to have comparable predictions across lines. Further investigation into the amount of data and novel methods to preselect whole-genome causative variants in combined populations would be of significant interest.

Collapse

Jones HE, Wilson PB. Progress and opportunities through use of genomics in animal production. Trends Genet 2022;38:1228-1252. [PMID: 35945076 DOI: 10.1016/j.tig.2022.06.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 06/08/2022] [Accepted: 06/17/2022] [Indexed: 01/24/2023]

Ribeiro G, Baldi F, Cesar ASM, Alexandre PA, Peripolli E, Ferraz JBS, Fukumasu H. Detection of potential functional variants based on systems-biology: the case of feed efficiency in beef cattle. BMC Genomics 2022;23:774. [PMID: 36434498 PMCID: PMC9700932 DOI: 10.1186/s12864-022-08958-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 10/20/2022] [Indexed: 11/26/2022] Open

Abstract

BACKGROUND

Potential functional variants (PFVs) can be defined as genetic variants responsible for a given phenotype. Ultimately, these are the best DNA markers for animal breeding and selection, especially for polygenic and complex phenotypes. Herein, we described the identification of PFVs for complex phenotypes (in this case, Feed Efficiency in beef cattle) using a systems-biology driven approach based on RNA-seq data from physiologically relevant organs.

RESULTS

The systems-biology coupled with deep molecular phenotyping by RNA-seq of liver, muscle, hypothalamus, pituitary, and adrenal glands of animals with high and low feed efficiency (FE) measured by residual feed intake (RFI) identified 2,000,936 uniquely variants. Among them, 9986 variants were significantly associated with FE and only 78 had a high impact on protein expression and were considered as PFVs. A set of 169 significant uniquely variants were expressed in all five organs, however, only 27 variants had a moderate impact and none of them a had high impact on protein expression. These results provide evidence of tissue-specific effects of high-impact PFVs. The PFVs were enriched (FDR < 0.05) for processing and presentation of MHC Class I and II mediated antigens, which are an important part of the adaptive immune response. The experimental validation of these PFVs was demonstrated by the increased prediction accuracy for RFI using the weighted G matrix (ssGBLUP+wG; Acc = 0.10 and b = 0.48) obtained in the ssGWAS in comparison to the unweighted G matrix (ssGBLUP; Acc = 0.29 and b = 1.10).

CONCLUSION

Here we identified PFVs for FE in beef cattle using a strategy based on systems-biology and deep molecular phenotyping. This approach has great potential to be used in genetic prediction programs, especially for polygenic phenotypes.

Collapse

Ros-Freixedes R, Johnsson M, Whalen A, Chen CY, Valente BD, Herring WO, Gorjanc G, Hickey JM. Genomic prediction with whole-genome sequence data in intensely selected pig lines. GENETICS SELECTION EVOLUTION 2022;54:65. [PMID: 36153511 PMCID: PMC9509613 DOI: 10.1186/s12711-022-00756-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 09/05/2022] [Indexed: 12/03/2022]

Abstract

Background

Early simulations indicated that whole-genome sequence data (WGS) could improve the accuracy of genomic predictions within and across breeds. However, empirical results have been ambiguous so far. Large datasets that capture most of the genomic diversity in a population must be assembled so that allele substitution effects are estimated with high accuracy. The objectives of this study were to use a large pig dataset from seven intensely selected lines to assess the benefits of using WGS for genomic prediction compared to using commercial marker arrays and to identify scenarios in which WGS provides the largest advantage.

Methods

We sequenced 6931 individuals from seven commercial pig lines with different numerical sizes. Genotypes of 32.8 million variants were imputed for 396,100 individuals (17,224 to 104,661 per line). We used BayesR to perform genomic prediction for eight complex traits. Genomic predictions were performed using either data from a standard marker array or variants preselected from WGS based on association tests.

Results

The accuracies of genomic predictions based on preselected WGS variants were not robust across traits and lines and the improvements in prediction accuracy that we achieved so far with WGS compared to standard marker arrays were generally small. The most favourable results for WGS were obtained when the largest training sets were available and standard marker arrays were augmented with preselected variants with statistically significant associations to the trait. With this method and training sets of around 80k individuals, the accuracy of within-line genomic predictions was on average improved by 0.025. With multi-line training sets, improvements of 0.04 compared to marker arrays could be expected.

Conclusions

Our results showed that WGS has limited potential to improve the accuracy of genomic predictions compared to marker arrays in intensely selected pig lines. Thus, although we expect that larger improvements in accuracy from the use of WGS are possible with a combination of larger training sets and optimised pipelines for generating and analysing such datasets, the use of WGS in the current implementations of genomic prediction should be carefully evaluated against the cost of large-scale WGS data on a case-by-case basis.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12711-022-00756-0.

Collapse

Bolormaa S, MacLeod IM, Khansefid M, Marett LC, Wales WJ, Miglior F, Baes CF, Schenkel FS, Connor EE, Manzanilla-Pech CIV, Stothard P, Herman E, Nieuwhof GJ, Goddard ME, Pryce JE. Sharing of either phenotypes or genetic variants can increase the accuracy of genomic prediction of feed efficiency. Genet Sel Evol 2022;54:60. [PMID: 36068488 PMCID: PMC9450441 DOI: 10.1186/s12711-022-00749-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 08/17/2022] [Indexed: 11/16/2022] Open

Abstract

BACKGROUND

Sharing individual phenotype and genotype data between countries is complex and fraught with potential errors, while sharing summary statistics of genome-wide association studies (GWAS) is relatively straightforward, and thus would be especially useful for traits that are expensive or difficult-to-measure, such as feed efficiency. Here we examined: (1) the sharing of individual cow data from international partners; and (2) the use of sequence variants selected from GWAS of international cow data to evaluate the accuracy of genomic estimated breeding values (GEBV) for residual feed intake (RFI) in Australian cows.

RESULTS

GEBV for RFI were estimated using genomic best linear unbiased prediction (GBLUP) with 50k or high-density single nucleotide polymorphisms (SNPs), from a training population of 3797 individuals in univariate to trivariate analyses where the three traits were RFI phenotypes calculated using 584 Australian lactating cows (AUSc), 824 growing heifers (AUSh), and 2526 international lactating cows (OVE). Accuracies of GEBV in AUSc were evaluated by either cohort-by-birth-year or fourfold random cross-validations. GEBV of AUSc were also predicted using only the AUS training population with a weighted genomic relationship matrix constructed with SNPs from the 50k array and sequence variants selected from a meta-GWAS that included only international datasets. The genomic heritabilities estimated using the AUSc, OVE and AUSh datasets were moderate, ranging from 0.20 to 0.36. The genetic correlations (rg) of traits between heifers and cows ranged from 0.30 to 0.95 but were associated with large standard errors. The mean accuracies of GEBV in Australian cows were up to 0.32 and almost doubled when either overseas cows, or both overseas cows and AUS heifers were included in the training population. They also increased when selected sequence variants were combined with 50k SNPs, but with a smaller relative increase.

CONCLUSIONS

The accuracy of RFI GEBV increased when international data were used or when selected sequence variants were combined with 50k SNP array data. This suggests that if direct sharing of data is not feasible, a meta-analysis of summary GWAS statistics could provide selected SNPs for custom panels to use in genomic selection programs. However, since this finding is based on a small cross-validation study, confirmation through a larger study is recommended.

Collapse

Affiliation(s)

Sunduimijid Bolormaa Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia
Iona M. MacLeod Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia
Majid Khansefid Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia
Leah C. Marett Agriculture Victoria Research, Ellinbank Centre, Ellinbank, Gippsland, VIC 3821 Australia School of Agriculture and Food, University of Melbourne, Parkville, VIC 3010 Australia
William J. Wales Agriculture Victoria Research, Ellinbank Centre, Ellinbank, Gippsland, VIC 3821 Australia School of Agriculture and Food, University of Melbourne, Parkville, VIC 3010 Australia
Filippo Miglior LACTANET, Sainte-Anne-de-Bellevue, QC H9X 3R4 Canada CGIL, University of Guelph, Guelph, ON N1G 2W1 Canada
Christine F. Baes CGIL, University of Guelph, Guelph, ON N1G 2W1 Canada Institute of Genetics, Vetsuisse Faculty, University of Bern, 3002 Bern, Switzerland
Flavio S. Schenkel CGIL, University of Guelph, Guelph, ON N1G 2W1 Canada
Erin E. Connor Animal Genomics and Improvement Laboratory, USDA, Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, MD 20705 USA Department of Animal and Food Sciences, University of Delaware, Newark, DE 19716 USA
Coralia I. V. Manzanilla-Pech Center for Quantitative Genetics and Genomics, Aarhus University, Blichers Alle 20, 8830 Tjele, Denmark
Paul Stothard Faculty of Agricultural, Life & Environmental Sciences, University of Alberta, Edmonton, AB T6G 2R3 Canada
Emily Herman Faculty of Agricultural, Life & Environmental Sciences, University of Alberta, Edmonton, AB T6G 2R3 Canada
Gert J. Nieuwhof Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia DataGene Ltd, Agribio, Bundoora, VIC 3083 Australia
Michael E. Goddard Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia School of Veterinary and Agricultural Sciences, University of Melbourne, Parkville, VIC 3052 Australia
Jennie E. Pryce Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083 Australia

Collapse

Knutsen TM, Olsen HG, Ketto IA, Sundsaasen KK, Kohler A, Tafintseva V, Svendsen M, Kent MP, Lien S. Genetic variants associated with two major bovine milk fatty acids offer opportunities to breed for altered milk fat composition. Genet Sel Evol 2022;54:35. [PMID: 35619070 PMCID: PMC9137198 DOI: 10.1186/s12711-022-00731-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 05/13/2022] [Indexed: 11/30/2022] Open

Abstract

Background

Although bovine milk is regarded as healthy and nutritious, its high content of saturated fatty acids (FA) may be harmful to cardiovascular health. Palmitic acid (C16:0) is the predominant saturated FA in milk with adverse health effects that could be countered by substituting it with higher levels of unsaturated FA, such as oleic acid (C18:1cis-9). In this work, we performed genome-wide association analyses for milk fatty acids predicted from FTIR spectroscopy data using 1811 Norwegian Red cattle genotyped and imputed to a high-density 777k single nucleotide polymorphism (SNP)-array. In a follow-up analysis, we used imputed whole-genome sequence data to detect genetic variants that are involved in FTIR-predicted levels of C16:0 and C18:1cis-9 and explore the transcript profile and protein level of candidate genes.

Results

Genome-wise significant associations were detected for C16:0 on Bos taurus (BTA) autosomes 11, 16 and 27, and for C18:1cis-9 on BTA5, 13 and 19. Closer examination of a significant locus on BTA11 identified the PAEP gene, which encodes the milk protein β-lactoglobulin, as a particularly attractive positional candidate gene. At this locus, we discovered a tightly linked cluster of genetic variants in coding and regulatory sequences that have opposing effects on the levels of C16:0 and C18:1cis-9. The favourable haplotype, linked to reduced levels of C16:0 and increased levels of C18:1cis-9 was also associated with a marked reduction in PAEP expression and β-lactoglobulin protein levels. β-lactoglobulin is the most abundant whey protein in milk and lower levels are associated with important dairy production parameters such as improved cheese yield.

Conclusions

The genetic variants detected in this study may be used in breeding to produce milk with an improved FA health-profile and enhanced cheese-making properties.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12711-022-00731-9.

Collapse

van den Berg I, Ho PN, Nguyen TV, Haile-Mariam M, Luke TDW, Pryce JE. Using mid-infrared spectroscopy to increase GWAS power to detect QTL associated with blood urea nitrogen. Genet Sel Evol 2022;54:27. [PMID: 35436852 PMCID: PMC9014603 DOI: 10.1186/s12711-022-00719-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 04/05/2022] [Indexed: 11/20/2022] Open

van den Berg I, Ho PN, Nguyen TV, Haile-Mariam M, MacLeod IM, Beatson PR, O'Connor E, Pryce JE. GWAS and genomic prediction of milk urea nitrogen in Australian and New Zealand dairy cattle. Genet Sel Evol 2022;54:15. [PMID: 35183113 PMCID: PMC8858489 DOI: 10.1186/s12711-022-00707-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 01/31/2022] [Indexed: 11/24/2022] Open

Abstract

Background

Urinary nitrogen leakage is an environmental concern in dairy cattle. Selection for reduced urinary nitrogen leakage may be done using indicator traits such as milk urea nitrogen (MUN). The result of a previous study indicated that the genetic correlation between MUN in Australia (AUS) and MUN in New Zealand (NZL) was only low to moderate (between 0.14 and 0.58). In this context, an alternative is to select sequence variants based on genome-wide association studies (GWAS) with a view to improve genomic prediction accuracies. A GWAS can also be used to detect quantitative trait loci (QTL) associated with MUN. Therefore, our objectives were to perform within-country GWAS and a meta-GWAS for MUN using records from up to 33,873 dairy cows and imputed whole-genome sequence data, to compare QTL detected in the GWAS for MUN in AUS and NZL, and to use sequence variants selected from the meta-GWAS to improve the prediction accuracy for MUN based on a joint AUS-NZL reference set.

Results

Using the meta-GWAS, we detected 14 QTL for MUN, located on chromosomes 1, 6, 11, 14, 19, 22, 26 and the X chromosome. The three most significant QTL encompassed the casein genes on chromosome 6, PAEP on chromosome 11 and DGAT1 on chromosome 14. We selected 50,000 sequence variants that had the same direction of effect for MUN in AUS and MUN in NZL and that were most significant in the meta-analysis for the GWAS. The selected sequence variants yielded a genetic correlation between MUN in AUS and MUN in NZL of 0.95 and substantially increased prediction accuracy in both countries.

Conclusions

Our results demonstrate how the sharing of data between two countries can increase the power of a GWAS and increase the accuracy of genomic prediction using a multi-country reference population and sequence variants selected based on a meta-GWAS.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12711-022-00707-9.

Collapse

Guillenea A, Su G, Lund MS, Karaman E. Genomic prediction in Nordic Red dairy cattle considering breed origin of alleles. J Dairy Sci 2022;105:2426-2438. [PMID: 35033341 DOI: 10.3168/jds.2021-21173] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 11/23/2021] [Indexed: 01/02/2023]

Abstract

This study investigated the reliability of genomic prediction (GP) using breed origin of alleles (BOA) approach in the Nordic Red (RDC) population, which has an admixed population structure. The RDC population consists of animals with varying degrees of genetic materials from the Danish Red (RDM), Swedish Red (SRB), Finnish Ayrshire (FAY), and Holstein (HOL) because bulls have been used across the breeds. The BOA approach was tested using 39,550 RDC animals in the reference population and 11,786 in the validation population. Deregressed proofs (DRP) of milk, fat and protein were used as response variable for GP. Direct genomic breeding values (DGV) for animals in the validation population were calculated with (BOA model) or without (joint model) considering breed origin of alleles. The joint model assumed homogeneous marker effects and a single set of marker effects were estimated, whereas BOA model assumed heterogeneous marker effects, and different sets of marker effects were estimated across the breeds. For the BOA approach, we tested scenarios assuming both correlated (BOA_cor) and uncorrelated (BOA_uncor) marker effects between the breeds. Additionally, we investigated GP using a standard Illumina 50K chip and including SNP selected from imputed whole-genome sequencing (50K+WGS). We also studied the effect of estimating (co)variances for genome regions of different sizes to exploit the information of the genome regions contributing to the (co)variance between the breeds. Region sizes were set as 1 SNP, a group of 30 or 100 adjacent SNP, or the whole genome. Reliability of DGV was measured as squared correlations between DGV and DRP divided by the reliability of DRP. Across the 3 traits, in general, RS30 and RS100 SNP yielded the highest reliabilities. Including WGS SNP improved reliabilities in almost all scenarios (0.297 on average for 50K and 0.307 on average for 50K+WGS). The BOA_uncor (0.233 on average) was inferior to the joint model (0.339 on average), but the reliabilities obtained using BOA_cor (0.334 on average) in most cases were not significantly different from those obtained using the joint model. The results indicate that both including additional whole-genome sequencing SNP and dividing the genome into fixed regions improve GP in the RDC. The BOA models have the potential to increase the reliability of GP, but the benefit is limited in populations with a high exchange of genetic material for a long time, as is the case for RDC.

Collapse

Mollandin F, Rau A, Croiseau P. An evaluation of the predictive performance and mapping power of the BayesR model for genomic prediction. G3 GENES|GENOMES|GENETICS 2021;11:6317672. [PMID: 34849780 PMCID: PMC8527474 DOI: 10.1093/g3journal/jkab225] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 06/27/2021] [Indexed: 12/02/2022]

Ling AS, Hay EH, Aggrey SE, Rekaya R. Dissection of the impact of prioritized QTL-linked and -unlinked SNP markers on the accuracy of genomic selection¹. BMC Genom Data 2021;22:26. [PMID: 34380418 PMCID: PMC8356450 DOI: 10.1186/s12863-021-00979-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 07/18/2021] [Indexed: 12/01/2022] Open

Abstract

Background

Use of genomic information has resulted in an undeniable improvement in prediction accuracies and an increase in genetic gain in animal and plant genetic selection programs in spite of oversimplified assumptions about the true biological processes. Even for complex traits, a large portion of markers do not segregate with or effectively track genomic regions contributing to trait variation; yet it is not clear how genomic prediction accuracies are impacted by such potentially nonrelevant markers. In this study, a simulation was carried out to evaluate genomic predictions in the presence of markers unlinked with trait-relevant QTL. Further, we compared the ability of the population statistic F_ST and absolute estimated marker effect as preselection statistics to discriminate between linked and unlinked markers and the corresponding impact on accuracy.

Results

We found that the accuracy of genomic predictions decreased as the proportion of unlinked markers used to calculate the genomic relationships increased. Using all, only linked, and only unlinked marker sets yielded prediction accuracies of 0.62, 0.89, and 0.22, respectively. Furthermore, it was found that prediction accuracies are severely impacted by unlinked markers with large spurious associations. F_ST-preselected marker sets of 10 k and larger yielded accuracies 8.97 to 17.91% higher than those achieved using preselection by absolute estimated marker effects, despite selecting 5.1 to 37.7% more unlinked markers and explaining 2.4 to 5.0% less of the genetic variance. This was attributed to false positives selected by absolute estimated marker effects having a larger spurious association with the trait of interest and more negative impact on predictions. The Pearson correlation between F_ST scores and absolute estimated marker effects was 0.77 and 0.27 among only linked and only unlinked markers, respectively. The sensitivity of F_ST scores to detect truly linked markers is comparable to absolute estimated marker effects but the consistency between the two statistics regarding false positives is weak.

Conclusion

Identification and exclusion of markers that have little to no relevance to the trait of interest may significantly increase genomic prediction accuracies. The population statistic F_ST presents an efficient and effective tool for preselection of trait-relevant markers.

Collapse

Gebreyesus G, Lund MS, Sahana G, Su G. Reliabilities of Genomic Prediction for Young Stock Survival Traits Using 54K SNP Chip Augmented With Additional Single-Nucleotide Polymorphisms Selected From Imputed Whole-Genome Sequencing Data. Front Genet 2021;12:667300. [PMID: 34349779 PMCID: PMC8326759 DOI: 10.3389/fgene.2021.667300] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 06/23/2021] [Indexed: 11/16/2022] Open

Abstract

This study investigated effects of integrating single-nucleotide polymorphisms (SNPs) selected based on previous genome-wide association studies (GWASs), from imputed whole-genome sequencing (WGS) data, in the conventional 54K chip on genomic prediction reliability of young stock survival (YSS) traits in dairy cattle. The WGS SNPs included two groups of SNP sets that were selected based on GWAS in the Danish Holstein for YSS index (YSS_SNPs, n = 98) and SNPs chosen as peaks of quantitative trait loci for the traits of Nordic total merit index in Denmark–Finland–Sweden dairy cattle populations (DFS_SNPs, n = 1,541). Additionally, the study also investigated the possibility of improving genomic prediction reliability for survival traits by modeling the SNPs within recessive lethal haplotypes (LET_SNP, n = 130) detected from the 54K chip in the Nordic Holstein. De-regressed proofs (DRPs) were obtained from 6,558 Danish Holstein bulls genotyped with either 54K chip or customized LD chip that includes SNPs in the standard LD chip and some of the selected WGS SNPs. The chip data were subsequently imputed to 54K SNP together with the selected WGS SNPs. Genomic best linear unbiased prediction (GBLUP) models were implemented to predict breeding values through either pooling the 54K and selected WGS SNPs together as one genetic component (a one-component model) or considering 54K SNPs and selected WGS SNPs as two separate genetic components (a two-component model). Across all the traits, inclusion of each of the selected WGS SNP sets led to negligible improvements in prediction accuracies (0.17 percentage points on average) compared to prediction using only 54K. Similarly, marginal improvement in prediction reliability was obtained when all the selected WGS SNPs were included (0.22 percentage points). No further improvement in prediction reliability was observed when considering random regression on genotype code of recessive lethal alleles in the model including both groups of the WGS SNPs. Additionally, there was no difference in prediction reliability from integrating the selected WGS SNP sets through the two-component model compared to the one-component GBLUP.

Collapse

Impact of Marker Pruning Strategies Based on Different Measurements of Marker Distance on Genomic Prediction in Dairy Cattle. Animals (Basel) 2021;11:ani11071992. [PMID: 34359120 PMCID: PMC8300388 DOI: 10.3390/ani11071992] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 06/27/2021] [Accepted: 06/28/2021] [Indexed: 11/16/2022] Open

Abstract

Simple Summary

The usefulness of genomic prediction (GP) has been widely proofed by breeding analysis in livestock, plants and aquatic populations. It is well known that ‘marker density’ is a critical factor that affects the accuracy of GP, however, how to properly measure ‘marker density’ in GP is yet to be determined. With population-level whole-genome sequence data or high-density single nucleotide polymorphism (SNP) data available, this question seems to be answered more convincingly. In this study, we investigated and discussed the impact of four ‘marker density’ measures that reflect genetic or physical distances between SNPs on the accuracy of GP in a Germany Holstein dairy cattle population. Our results showed that the degree of variation of physical distance between adjacent SNPs had significant effects on the accuracy of GP, while the genetic distance between SNPs had no relationship with the accuracy of GP. Therefore, for studies based on high-density SNP data, the default strategy of pruning SNPs based on genetic distance is detrimental to heritability estimation and genomic prediction. The results extended the communities knowledge of ‘marker density’ and provided useful suggestions for the application and research on genome prediction.

Abstract

With the availability of high-density single-nucleotide polymorphism (SNP) data and the development of genotype imputation methods, high-density panel-based genomic prediction (GP) has become possible in livestock breeding. It is generally considered that the genomic estimated breeding value (GEBV) accuracy increases with the marker density, while studies have shown that the GEBV accuracy does not increase or even decrease when high-density panels were used. Therefore, in addition to the SNP number, other measurements of ‘marker density’ seem to have impacts on the GEBV accuracy, and exploring the relationship between the GEBV accuracy and the measurements of ‘marker density’ based on high-density SNP or whole-genome sequence data is important for the field of GP. In this study, we constructed different SNP panels with certain SNP numbers (e.g., 1 k) by using the physical distance (PhyD), genetic distance (GenD) and random distance (RanD) between SNPs respectively based on the high-density SNP data of a Germany Holstein dairy cattle population. Therefore, there are three different panels at a certain SNP number level. These panels were used to construct GP models to predict fat percentage, milk yield and somatic cell score. Meanwhile, the mean (d¯) and variance (σd2) of the physical distance between SNPs and the mean (r2¯) and variance (σr22) of the genetic distance between SNPs in each panel were used as marker density-related measurements and their influence on the GEBV accuracy was investigated. At the same SNP number level, the d¯ of all panels is basically the same, but the σd2, r2¯ and σr22 are different. Therefore, we only investigated the effects of σd2, r2¯ and σr22 on the GEBV accuracy. The results showed that at a certain SNP number level, the GEBV accuracy was negatively correlated with σd2, but not with r2¯ and σr22. Compared with GenD and RanD, the σd2 of panels constructed by PhyD is smaller. The low and moderate-density panels (< 50 k) constructed by RanD or GenD have large σd2, which is not conducive to genomic prediction. The GEBV accuracy of the low and moderate-density panels constructed by PhyD is 3.8~34.8% higher than that of the low and moderate-density panels constructed by RanD and GenD. Panels with 20–30 k SNPs constructed by PhyD can achieve the same or slightly higher GEBV accuracy than that of high-density SNP panels for all three traits. In summary, the smaller the variation degree of physical distance between adjacent SNPs, the higher the GEBV accuracy. The low and moderate-density panels construct by physical distance are beneficial to genomic prediction, while pruning high-density SNP data based on genetic distance is detrimental to genomic prediction. The results provide suggestions for the development of SNP panels and the research of genome prediction based on whole-genome sequence data.

Collapse

Karaman E, Su G, Croue I, Lund MS. Genomic prediction using a reference population of multiple pure breeds and admixed individuals. Genet Sel Evol 2021;53:46. [PMID: 34058971 PMCID: PMC8168010 DOI: 10.1186/s12711-021-00637-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 05/11/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In dairy cattle populations in which crossbreeding has been used, animals show some level of diversity in their origins. In rotational crossbreeding, for instance, crossbred dams are mated with purebred sires from different pure breeds, and the genetic composition of crossbred animals is an admixture of the breeds included in the rotation. How to use the data of such individuals in genomic evaluations is still an open question. In this study, we aimed at providing methodologies for the use of data from crossbred individuals with an admixed genetic background together with data from multiple pure breeds, for the purpose of genomic evaluations for both purebred and crossbred animals. A three-breed rotational crossbreeding system was mimicked using simulations based on animals genotyped with the 50 K single nucleotide polymorphism (SNP) chip.

RESULTS

For purebred populations, within-breed genomic predictions generally led to higher accuracies than those from multi-breed predictions using combined data of pure breeds. Adding admixed population's (MIX) data to the combined pure breed data considering MIX as a different breed led to higher accuracies. When prediction models were able to account for breed origin of alleles, accuracies were generally higher than those from combining all available data, depending on the correlation of quantitative trait loci (QTL) effects between the breeds. Accuracies varied when using SNP effects from any of the pure breeds to predict the breeding values of MIX. Using those breed-specific SNP effects that were estimated separately in each pure breed, while accounting for breed origin of alleles for the selection candidates of MIX, generally improved the accuracies. Models that are able to accommodate MIX data with the breed origin of alleles approach generally led to higher accuracies than models without breed origin of alleles, depending on the correlation of QTL effects between the breeds.

CONCLUSIONS

Combining all available data, pure breeds' and admixed population's data, in a multi-breed reference population is beneficial for the estimation of breeding values for pure breeds with a small reference population. For MIX, such an approach can lead to higher accuracies than considering breed origin of alleles for the selection candidates, and using breed-specific SNP effects estimated separately in each pure breed. Including MIX data in the reference population of multiple breeds by considering the breed origin of alleles, accuracies can be further improved. Our findings are relevant for breeding programs in which crossbreeding is systematically applied, and also for populations that involve different subpopulations and between which exchange of genetic material is routine practice.

Collapse

van den Berg I, Ho PN, Haile-Mariam M, Beatson PR, O'Connor E, Pryce JE. Genetic parameters of blood urea nitrogen and milk urea nitrogen concentration in dairy cattle managed in pasture-based production systems of New Zealand and Australia. ANIMAL PRODUCTION SCIENCE 2021. [DOI: 10.1071/an21049] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Abstract Context Urinary nitrogen excretion by grazing cattle causes environmental pollution. Selecting for cows with a lower concentration of urinary nitrogen excretion may reduce the environmental impact. While urinary nitrogen excretion is difficult to measure, blood urea nitrogen (BUN), mid-infrared spectroscopy (MIR)-predicted BUN (MBUN), which is predicted from MIR spectra measured on milk samples, and milk urea nitrogen (MUN) are potential indicator traits. Australia and New Zealand have increasing datasets of cows with urea records, with 18 120 and 15 754 cows with urea records in Australia and New Zealand respectively. A collaboration between Australia and New Zealand could further increase the size of the dataset by sharing data. Aims Our aims were to estimate genetic parameters for urea traits within country, and genetic correlations between countries to gauge the benefit of having a joint reference population for genomic prediction of an indicator trait that is potentially suitable for selection to reduce urinary nitrogen excretion for both countries. Methods Genetic parameters were estimated within country (Australia and New Zealand) in Holstein, Jersey and a multibreed population, for BUN, MBUN and MUN in Australia and MUN in New Zealand, using high-density genotypes. Genetic correlations were also estimated between the urea traits recorded in Australia and MUN in New Zealand. Analyses used the first record available for each cow or within days-in-milk (DIM) intervals. Key results Heritabilities ranged from 0.08 to 0.32 for the various urea traits. Higher heritabilities were obtained for Jersey than for Holstein, and for the New Zealand cows than for the Australian cows. While urea traits were highly correlated within Australia (0.71–0.94), genetic correlations between Australia and New Zealand were small to moderate (0.08–0.58). Conclusions Our results showed that the heritability for urea traits differs among trait, breed, and country. While urea traits are highly correlated within country, genetic correlations between urea traits in Australia and MUN in New Zealand were only low to moderate. Implications Further study is required to identify the underlying causes of the difference in heritabilities observed, to compare the accuracies of different reference populations, and to estimate genetic correlations between urea traits and other traits such as fertility and feed intake. Larger datasets may help estimate genetic correlations more accurately between countries. Collapse

van den Berg I, Xiang R, Jenko J, Pausch H, Boussaha M, Schrooten C, Tribout T, Gjuvsland AB, Boichard D, Nordbø Ø, Sanchez MP, Goddard ME. Meta-analysis for milk fat and protein percentage using imputed sequence variant genotypes in 94,321 cattle from eight cattle breeds. Genet Sel Evol 2020;52:37. [PMID: 32635893 PMCID: PMC7339598 DOI: 10.1186/s12711-020-00556-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 06/26/2020] [Indexed: 12/14/2022] Open

Abstract

Background

Sequence-based genome-wide association studies (GWAS) provide high statistical power to identify candidate causal mutations when a large number of individuals with both sequence variant genotypes and phenotypes is available. A meta-analysis combines summary statistics from multiple GWAS and increases the power to detect trait-associated variants without requiring access to data at the individual level of the GWAS mapping cohorts. Because linkage disequilibrium between adjacent markers is conserved only over short distances across breeds, a multi-breed meta-analysis can improve mapping precision.

Results

To maximise the power to identify quantitative trait loci (QTL), we combined the results of nine within-population GWAS that used imputed sequence variant genotypes of 94,321 cattle from eight breeds, to perform a large-scale meta-analysis for fat and protein percentage in cattle. The meta-analysis detected (p ≤ 10⁻⁸) 138 QTL for fat percentage and 176 QTL for protein percentage. This was more than the number of QTL detected in all within-population GWAS together (124 QTL for fat percentage and 104 QTL for protein percentage). Among all the lead variants, 100 QTL for fat percentage and 114 QTL for protein percentage had the same direction of effect in all within-population GWAS. This indicates either persistence of the linkage phase between the causal variant and the lead variant across breeds or that some of the lead variants might indeed be causal or tightly linked with causal variants. The percentage of intergenic variants was substantially lower for significant variants than for non-significant variants, and significant variants had mostly moderate to high minor allele frequencies. Significant variants were also clustered in genes that are known to be relevant for fat and protein percentages in milk.

Conclusions

Our study identified a large number of QTL associated with fat and protein percentage in dairy cattle. We demonstrated that large-scale multi-breed meta-analysis reveals more QTL at the nucleotide resolution than within-population GWAS. Significant variants were more often located in genic regions than non-significant variants and a large part of them was located in potentially regulatory regions.

Collapse

van den Berg I, MacLeod I, Reich C, Breen E, Pryce J. Optimizing genomic prediction for Australian Red dairy cattle. J Dairy Sci 2020;103:6276-6298. [DOI: 10.3168/jds.2019-17914] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 02/13/2020] [Indexed: 12/18/2022]

Konstantinov KV, Goddard ME. Application of multivariate single-step SNP best linear unbiased predictor model and revised SNP list for genomic evaluation of dairy cattle in Australia. J Dairy Sci 2020;103:8305-8316. [PMID: 32622609 DOI: 10.3168/jds.2020-18242] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 04/21/2020] [Indexed: 11/19/2022]

Abstract

The objectives of this study were (1) to evaluate the computational feasibility of the multitrait test-day single-step SNP-BLUP (ssSNP-BLUP) model using phenotypic records of genotyped and nongenotyped animals, and (2) to compare accuracies (coefficient of determination; R²) and bias of genomic estimated breeding values (GEBV) and de-regressed proofs as response variables in 3 Australian dairy cattle breeds (i.e., Holstein, Jersey, and Red breeds). Additive genomic random regression coefficients for milk, fat, protein yield and somatic cell score were predicted in the first, second, and third lactation. The predicted coefficients were used to derive 305-d GEBV and were compared with the traditional parent averages obtained from a BLUP model without genomic information. Cow fertility traits were evaluated from the 5-trait repeatability model (i.e., calving interval, days from calving to first service, pregnancy diagnosis, first service nonreturn rate, and lactation length). The de-regressed proofs were only for calving interval. Our results showed that ssSNP-BLUP using multitrait test-day model increased reliability and reduced bias of breeding values of young animals when compared with parent average from traditional BLUP in Australian Holsten, Jersey, and Red breeds. The use of a custom selection of approximately 46,000 SNP (custom XT SNP list) increased the reliability of GEBV compared with the results obtained using the commercial Illumina 50K chip (Illumina, San Diego, CA). The use of the second preconditioner substantially improved the convergence rate of the preconditioned conjugate gradient method, but further work is needed to improve the efficiency of the computation of the Kronecker matrix product by vector. Application of ssSNP-BLUP to multitrait random regression models is computationally feasible.

Collapse

Raymond B, Wientjes YCJ, Bouwman AC, Schrooten C, Veerkamp RF. A deterministic equation to predict the accuracy of multi-population genomic prediction with multiple genomic relationship matrices. Genet Sel Evol 2020;52:21. [PMID: 32345213 PMCID: PMC7189707 DOI: 10.1186/s12711-020-00540-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 04/14/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

A multi-population genomic prediction (GP) model in which important pre-selected single nucleotide polymorphisms (SNPs) are differentially weighted (MPMG) has been shown to result in better prediction accuracy than a multi-population, single genomic relationship matrix ([Formula: see text]) GP model (MPSG) in which all SNPs are weighted equally. Our objective was to underpin theoretically the advantages and limits of the MPMG model over the MPSG model, by deriving and validating a deterministic prediction equation for its accuracy.

METHODS

Using selection index theory, we derived an equation to predict the accuracy of estimated total genomic values of selection candidates from population [Formula: see text] ([Formula: see text]), when individuals from two populations, [Formula: see text] and [Formula: see text], are combined in the training population and two [Formula: see text], made respectively from pre-selected and remaining SNPs, are fitted simultaneously in MPMG. We used simulations to validate the prediction equation in scenarios that differed in the level of genetic correlation between populations, heritability, and proportion of genetic variance explained by the pre-selected SNPs. Empirical accuracy of the MPMG model in each scenario was calculated and compared to the predicted accuracy from the equation.

RESULTS

In general, the derived prediction equation resulted in accurate predictions of [Formula: see text] for the scenarios evaluated. Using the prediction equation, we showed that an important advantage of the MPMG model over the MPSG model is its ability to benefit from the small number of independent chromosome segments ([Formula: see text]) due to the pre-selected SNPs, both within and across populations, whereas for the MPSG model, there is only a single value for [Formula: see text], calculated based on all SNPs, which is very large. However, this advantage is dependent on the pre-selected SNPs that explain some proportion of the total genetic variance for the trait.

CONCLUSIONS

We developed an equation that gives insight into why, and under which conditions the MPMG outperforms the MPSG model for GP. The equation can be used as a deterministic tool to assess the potential benefit of combining information from different populations, e.g., different breeds or lines for GP in livestock or plants, or different groups of people based on their ethnic background for prediction of disease risk scores.

Collapse

Genomic Analysis Using Bayesian Methods under Different Genotyping Platforms in Korean Duroc Pigs. Animals (Basel) 2020;10:ani10050752. [PMID: 32344859 PMCID: PMC7277155 DOI: 10.3390/ani10050752] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 04/16/2020] [Accepted: 04/22/2020] [Indexed: 12/03/2022] Open

Abstract

Simple Summary

This study investigated the informative regions and the efficiency of genomic predictions for backfat thickness, days to 90 kg body weight, loin muscle area, and lean percentage in Korean Duroc pigs. The several regions of the genome were identified and a significant marker was found near the MC4R gene for growth and production-related traits. No differences in genomic accuracy were identified on the basis of the Bayesian approaches in these four growth and production-related traits. The genomic accuracy is improved by using deregressed estimated breeding values including parental information as a response variable in Korean Duroc pigs.

Abstract

Genomic evaluation has been widely applied to several species using commercial single nucleotide polymorphism (SNP) genotyping platforms. This study investigated the informative genomic regions and the efficiency of genomic prediction by using two Bayesian approaches (BayesB and BayesC) under two moderate-density SNP genotyping panels in Korean Duroc pigs. Growth and production records of 1026 individuals were genotyped using two medium-density, SNP genotyping platforms: Illumina60K and GeneSeek80K. These platforms consisted of 61,565 and 68,528 SNP markers, respectively. The deregressed estimated breeding values (DEBVs) derived from estimated breeding values (EBVs) and their reliabilities were taken as response variables. Two Bayesian approaches were implemented to perform the genome-wide association study (GWAS) and genomic prediction. Multiple significant regions for days to 90 kg (DAYS), lean muscle area (LMA), and lean percent (PCL) were detected. The most significant SNP marker, located near the MC4R gene, was detected using GeneSeek80K. Accuracy of genomic predictions was higher using the GeneSeek80K SNP panel for DAYS (Δ2%) and LMA (Δ2–3%) with two response variables, with no gains in accuracy by the Bayesian approaches in four growth and production-related traits. Genomic prediction is best derived from DEBVs including parental information as a response variable between two DEBVs regardless of the genotyping platform and the Bayesian method for genomic prediction accuracy in Korean Duroc pig breeding.

Collapse

VanRaden PM. Symposium review: How to implement genomic selection. J Dairy Sci 2020;103:5291-5301. [PMID: 32331884 DOI: 10.3168/jds.2019-17684] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 01/03/2020] [Indexed: 12/16/2022]

Abstract

Genomic selection was adopted very quickly in the 10 yr after first implementation, and breeders continue to find new uses for genomic testing. Breeding values with higher reliability earlier in life are estimated by combining DNA genotypes for many thousands of loci using existing identification, pedigree, and phenotype databases for millions of animals. Quality control for both new and previous data is greatly improved by comparing genomic and pedigree relationships to correct parent-progeny conflicts and discover many additional ancestors. Many quantitative trait loci and gene tests have been added to previous assays that used only evenly spaced, highly polymorphic markers. Imputation now combines genotypes from many assays of differing marker densities. Prediction models have gradually advanced from normal or Bayesian distributions within trait and breed to single-step, multitrait, or other more complex models, such as multibreed models that may be needed for crossbred prediction. Genomic selection was initially applied to males to predict progeny performance but is now widely applied to females or even embryos to predict their own later performance. The initial focus on additive merit has expanded to include mating programs, genomic inbreeding, and recessive alleles. Many producers now use DNA testing to decide which heifers should be inseminated with elite dairy, beef, or sex-sorted semen, which should be embryo donors or recipients, or which should be sold or kept for breeding. Because some of these decisions are expensive to delay, predictions are now provided weekly instead of every few months. Predictions from international genomic databases are often more accurate and cost-effective than those from within-country databases that were previously designed for progeny testing unless local breeds, conditions, or traits differ greatly from the larger database. Selection indexes include many new traits, often with lower heritability or requiring large initial investments to obtain phenotypes, which provide further incentive to cooperate internationally. The genomic prediction methods developed for dairy cattle are now applied widely to many animal, human, and plant populations and could be applied to many more.

Collapse

Haile-Mariam M, MacLeod IM, Bolormaa S, Schrooten C, O'Connor E, de Jong G, Daetwyler HD, Pryce JE. Value of sharing cow reference population between countries on reliability of genomic prediction for milk yield traits. J Dairy Sci 2019;103:1711-1728. [PMID: 31864746 DOI: 10.3168/jds.2019-17170] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 10/24/2019] [Indexed: 01/08/2023]

Abstract

Increasing the reliability of genomic prediction (GP) of economic traits in the pasture-based dairy production systems of New Zealand (NZ) and Australia (AU) is important to both countries. This study assessed if sharing cow phenotype and genotype data of NZ and AU improves the reliability of GP for NZ bulls. Data from approximately 32,000 NZ genotyped cows and their contemporaries were included in the May 2018 routine genetic evaluation of the Australian Dairy cattle in an attempt to provide consistent phenotypes for both countries. After the genetic evaluation, deregressed proofs of cows were calculated for milk yield traits. The April 2018 multiple across-country evaluation of Interbull was also used to calculate deregressed proofs for bulls on the NZ scale. Approximately 1,178 Jersey (Jer) and 6,422 Holstein (Hol) bulls had genotype and phenotype data. In addition to NZ cows, phenotype data of close to 60,000 genotyped Australian (AU) cows from the same genetic evaluation run as NZ cows were used. All AU and NZ females were genotyped using low-density SNP chips (<10K SNP) and were imputed first to 50K and then to ∼600K (referred to as high density; HD). We used up to 98,000 animals in the reference populations, both by expanding the NZ reference set (cow, bull, single breed to multi-breed set) and by adding AU cows. Reliabilities of GP were calculated for 508 Jer and 1,251 Hol bulls whose sires are not included in the reference set (RS) to ensure that real differences are not masked by close relationships. The GP was tested using 50K or high-density SNP chip using genomic BLUP in bivariate (considering country as a trait) or single trait models. The RS that gave the highest reliability for each breed were also tested using a hybrid GP method that combines expectation maximization with Bayes R. The addition of the AU cows to an NZ RS that included either NZ cows only, or cows and bulls, improved the reliability of GP for both NZ Hol and Jer validation bulls for all traits. Using single breed reference populations also increased reliability when NZ crossbred cows were added to reference populations that included only purebred NZ bulls and cows and AU cows. The full multi-breed RS (all NZ cows and bulls and AU cows) provided similar reliabilities in NZ Hol bulls, when compared with the single breed reference with crossbred NZ cows. For Jer validation bulls, the RS that included Jer cows and bulls and crossbred cows from NZ and Jer cows from AU was marginally better than the all-breed, all-country RS. In terms of reliability, the advantage of the HD SNP chip was small but captured more of the genomic variance than the 50K, particularly for Hol. The expectation maximization Bayes R GP method was slightly (up to 3 percentage points) better than genomic BLUP. We conclude that GP of milk production traits in NZ bulls improves by up to 7 percentage points in reliability by expanding the NZ reference population to include AU cows.

Collapse

VanRaden PM, Tooker ME, Chud TCS, Norman HD, Megonigal JH, Haagen IW, Wiggans GR. Genomic predictions for crossbred dairy cattle. J Dairy Sci 2019;103:1620-1631. [PMID: 31837783 DOI: 10.3168/jds.2019-16634] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 10/14/2019] [Indexed: 01/14/2023]

Abstract

Genomic evaluations are useful for crossbred as well as purebred populations when selection is applied to commercial herds. Dairy farmers had already spent more than $1 million to genotype over 32,000 crossbred animals before US genomic evaluations became available for those animals. Thus, new tools were needed to provide accurate genomic predictions for crossbreds. Genotypes for crossbreds are imputed more accurately when the imputation reference population includes purebreds. Therefore, genotypes of 6,296 crossbred animals were imputed from lower-density chips by including either 3,119 ancestors or 834,367 genotyped animals in the reference population. Crossbreds in the imputation study included 733 Jersey × Holstein F₁ animals, 55 Brown Swiss × Holstein F₁ animals, 2,300 Holstein backcrosses, 2,026 Jersey backcrosses, 27 Brown Swiss backcrosses, and 502 other crossbreds of various breed combinations. Another 653 animals appeared to be purebreds that owners had miscoded as a different breed. Genomic breed composition was estimated from 60,671 markers using the known breed identities for purebred, progeny-tested Holstein, Jersey, Brown Swiss, Ayrshire, and Guernsey bulls as the 5 traits (breed fractions) to be predicted. Estimates of breed composition were adjusted so that no percentages were negative or exceeded 100%, and breed percentages summed to 100%. Another adjustment set percentages above 93.5% equal to 100%, and the resulting value was termed breed base representation (BBR). Larger percentages of missing alleles were imputed by using a crossbred reference population rather than only the closest purebred reference population. Crossbred predictions were averages of genomic predictions computed using marker effects for each pure breed, which were weighted by the animal's BBR. Marker and polygenic effects were estimated separately for each breed on the all-breed scale instead of within-breed scales. For crossbreds, genomic predictions weighted by BBR were more accurate than the average of parents' breeding values and slightly more accurate than predictions using only the predominant breed. For purebreds, single-trait predictions using only within-breed data were as accurate as multi-trait predictions with allele effects in different breeds treated as correlated effects. Crossbred genomic predicted transmitting abilities were implemented by the Council on Dairy Cattle Breeding in April 2019 and will aid producers in managing their breeding programs and selecting replacement heifers.

Collapse

Genomic prediction based on selected variants from imputed whole-genome sequence data in Australian sheep populations. Genet Sel Evol 2019;51:72. [PMID: 31805849 PMCID: PMC6896509 DOI: 10.1186/s12711-019-0514-2] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Accepted: 11/25/2019] [Indexed: 12/13/2022] Open

Abstract

Background

Whole-genome sequence (WGS) data could contain information on genetic variants at or in high linkage disequilibrium with causative mutations that underlie the genetic variation of polygenic traits. Thus far, genomic prediction accuracy has shown limited increase when using such information in dairy cattle studies, in which one or few breeds with limited diversity predominate. The objective of our study was to evaluate the accuracy of genomic prediction in a multi-breed Australian sheep population of relatively less related target individuals, when using information on imputed WGS genotypes.

Methods

Between 9626 and 26,657 animals with phenotypes were available for nine economically important sheep production traits and all had WGS imputed genotypes. About 30% of the data were used to discover predictive single nucleotide polymorphism (SNPs) based on a genome-wide association study (GWAS) and the remaining data were used for training and validation of genomic prediction. Prediction accuracy using selected variants from imputed sequence data was compared to that using a standard array of 50k SNP genotypes, thereby comparing genomic best linear prediction (GBLUP) and Bayesian methods (BayesR/BayesRC). Accuracy of genomic prediction was evaluated in two independent populations that were each lowly related to the training set, one being purebred Merino and the other crossbred Border Leicester x Merino sheep.

Results

A substantial improvement in prediction accuracy was observed when selected sequence variants were fitted alongside 50k genotypes as a separate variance component in GBLUP (2GBLUP) or in Bayesian analysis as a separate category of SNPs (BayesRC). From an average accuracy of 0.27 in both validation sets for the 50k array, the average absolute increase in accuracy across traits with 2GBLUP was 0.083 and 0.073 for purebred and crossbred animals, respectively, whereas with BayesRC it was 0.102 and 0.087. The average gain in accuracy was smaller when selected sequence variants were treated in the same category as 50k SNPs. Very little improvement over 50k prediction was observed when using all WGS variants.

Conclusions

Accuracy of genomic prediction in diverse sheep populations increased substantially by using variants selected from whole-genome sequence data based on an independent multi-breed GWAS, when compared to genomic prediction using standard 50K genotypes.

Collapse

Xiang R, Berg IVD, MacLeod IM, Hayes BJ, Prowse-Wilkins CP, Wang M, Bolormaa S, Liu Z, Rochfort SJ, Reich CM, Mason BA, Vander Jagt CJ, Daetwyler HD, Lund MS, Chamberlain AJ, Goddard ME. Quantifying the contribution of sequence variants with regulatory and evolutionary significance to 34 bovine complex traits. Proc Natl Acad Sci U S A 2019;116:19398-19408. [PMID: 31501319 PMCID: PMC6765237 DOI: 10.1073/pnas.1904159116] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Abstract

Many genome variants shaping mammalian phenotype are hypothesized to regulate gene transcription and/or to be under selection. However, most of the evidence to support this hypothesis comes from human studies. Systematic evidence for regulatory and evolutionary signals contributing to complex traits in a different mammalian model is needed. Sequence variants associated with gene expression (expression quantitative trait loci [eQTLs]) and concentration of metabolites (metabolic quantitative trait loci [mQTLs]) and under histone-modification marks in several tissues were discovered from multiomics data of over 400 cattle. Variants under selection and evolutionary constraint were identified using genome databases of multiple species. These analyses defined 30 sets of variants, and for each set, we estimated the genetic variance the set explained across 34 complex traits in 11,923 bulls and 32,347 cows with 17,669,372 imputed variants. The per-variant trait heritability of these sets across traits was highly consistent (r > 0.94) between bulls and cows. Based on the per-variant heritability, conserved sites across 100 vertebrate species and mQTLs ranked the highest, followed by eQTLs, young variants, those under histone-modification marks, and selection signatures. From these results, we defined a Functional-And-Evolutionary Trait Heritability (FAETH) score indicating the functionality and predicted heritability of each variant. In additional 7,551 cattle, the high FAETH-ranking variants had significantly increased genetic variances and genomic prediction accuracies in 3 production traits compared to the low FAETH-ranking variants. The FAETH framework combines the information of gene regulation, evolution, and trait heritability to rank variants, and the publicly available FAETH data provide a set of biological priors for cattle genomic selection worldwide.

Collapse

Affiliation(s)

Ruidong Xiang Faculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052, Australia; Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
Irene van den Berg Faculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052, Australia Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
Iona M MacLeod Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
Benjamin J Hayes Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia Centre for Animal Science, The University of Queensland, St. Lucia, QLD 4067, Australia
Claire P Prowse-Wilkins Faculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052, Australia Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
Min Wang Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
Sunduimijid Bolormaa Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
Zhiqian Liu Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
Simone J Rochfort Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
Coralie M Reich Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
Brett A Mason Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
Christy J Vander Jagt Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
Hans D Daetwyler Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
Mogens S Lund Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark
Amanda J Chamberlain Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
Michael E Goddard Faculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052, Australia Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia

Collapse

GWAS for Meat and Carcass Traits Using Imputed Sequence Level Genotypes in Pooled F2-Designs in Pigs. G3-GENES GENOMES GENETICS 2019;9:2823-2834. [PMID: 31296617 PMCID: PMC6723123 DOI: 10.1534/g3.119.400452] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Improvement of genomic prediction by integrating additional single nucleotide polymorphisms selected from imputed whole genome sequencing data. Heredity (Edinb) 2019;124:37-49. [PMID: 31278370 PMCID: PMC6906477 DOI: 10.1038/s41437-019-0246-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 05/11/2019] [Accepted: 06/17/2019] [Indexed: 11/10/2022] Open

Al Kalaldeh M, Gibson J, Duijvesteijn N, Daetwyler HD, MacLeod I, Moghaddar N, Lee SH, van der Werf JHJ. Using imputed whole-genome sequence data to improve the accuracy of genomic prediction for parasite resistance in Australian sheep. Genet Sel Evol 2019;51:32. [PMID: 31242855 PMCID: PMC6595562 DOI: 10.1186/s12711-019-0476-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 06/18/2019] [Indexed: 01/16/2023] Open

Abstract

Background

This study aimed at (1) comparing the accuracies of genomic prediction for parasite resistance in sheep based on whole-genome sequence (WGS) data to those based on 50k and high-density (HD) single nucleotide polymorphism (SNP) panels; (2) investigating whether the use of variants within quantitative trait loci (QTL) regions that were selected from regional heritability mapping (RHM) in an independent dataset improved the accuracy more than variants selected from genome-wide association studies (GWAS); and (3) comparing the prediction accuracies between variants selected from WGS data to variants selected from the HD SNP panel.

Results

The accuracy of genomic prediction improved marginally from 0.16 ± 0.02 and 0.18 ± 0.01 when using all the variants from 50k and HD genotypes, respectively, to 0.19 ± 0.01 when using all the variants from WGS data. Fitting a GRM from the selected variants alongside a GRM from the 50k SNP genotypes improved the prediction accuracy substantially compared to fitting the 50k SNP genotypes alone. The gain in prediction accuracy was slightly more pronounced when variants were selected from WGS data compared to when variants were selected from the HD panel. When sequence variants that passed the GWAS \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$- log_{10} (p\,value)$$\end{document}-log10(pvalue) threshold of 3 across the entire genome were selected, the prediction accuracy improved by 5% (up to 0.21 ± 0.01), whereas when selection was limited to sequence variants that passed the same GWAS \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$- log_{10} (p\,value)$$\end{document}-log10(pvalue) threshold of 3 in regions identified by RHM, the accuracy improved by 9% (up to 0.25 ± 0.01).

Conclusions

Our results show that through careful selection of sequence variants from the QTL regions, the accuracy of genomic prediction for parasite resistance in sheep can be improved. These findings have important implications for genomic prediction in sheep.

Collapse

Ma P, Lund MS, Aamand GP, Su G. Use of a Bayesian model including QTL markers increases prediction reliability when test animals are distant from the reference population. J Dairy Sci 2019;102:7237-7247. [PMID: 31155255 DOI: 10.3168/jds.2018-15815] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 03/31/2019] [Indexed: 01/23/2023]

Abstract

Relatedness between reference and test animals has an important effect on the reliability of genomic prediction for test animals. Because genomic prediction has been widely applied in practical cattle breeding and bulls have been selected according to genomic breeding value without progeny testing, the sires or grandsires of candidates might not have phenotypic information and might not be in the reference population when the candidates are selected. The objective of this study was to investigate the decreasing trend of the reliability of genomic prediction given distant reference populations, using genomic best linear unbiased prediction (GBLUP) and Bayesian variable selection models with or without including the quantitative trait locus (QTL) markers detected from sequencing data. The data used in this study consisted of 22,242 bulls genotyped using the 54K SNP array from EuroGenomics. Among them, 1,444 Danish bulls born from 2006 to 2010 were selected as test animals. Different reference populations with varying relationships to test animals were created according to pedigree-based relationships. The reference individuals having a relationship with one or more test animals higher than 0.4 (scenario ρ < 0.4), 0.2 (ρ < 0.2), or 0.1 (ρ < 0.1, where ρ = relationship coefficient) were removed from reference sets; these represented the distance between reference and test animals being 2 generations, 3 generations, and 4 generations, respectively. Imputed whole-genome sequencing data of bulls from Denmark were used to conduct a genome-wide association study (GWAS). A small number of significant variants (QTL markers) from the GWAS were added to the array data. To compare the effects of different models, the basic GBLUP model, a Bayesian selection variable model, a GBLUP model with 2 components of genetic effects, and a Bayesian model with pooled array data and QTL markers were used for estimating genomic estimated breeding values (GEBV) of test animals. The reliability of genomic prediction decreased when the test animals were more generations away from the reference population. The reliability of genomic prediction was 0.461 for 1 generation away and 0.396 for 3 generations away, with the same number of individuals in the reference set, using a GBLUP model with chip markers only. The results showed that using the Bayesian method and QTL markers improved the reliability of genomic prediction in all scenarios of relationship between test and reference animals, in a range of 1.3% and 65.1% (4 generations away with only 841 individuals in the reference set). However, most gains were for predictions of milk yield and fat yield. There was little improvement for predictions of protein yield and mastitis, and no improvement for prediction of fertility, except for scenario ρ < 0.1, in which there was a large improvement for predictions of all traits. On the other hand, models including more than 10% polygenic effect decreased prediction reliability when the relationship between test and reference animals was distant.

Collapse

Bolormaa S, Chamberlain AJ, Khansefid M, Stothard P, Swan AA, Mason B, Prowse-Wilkins CP, Duijvesteijn N, Moghaddar N, van der Werf JH, Daetwyler HD, MacLeod IM. Accuracy of imputation to whole-genome sequence in sheep. Genet Sel Evol 2019;51:1. [PMID: 30654735 PMCID: PMC6337865 DOI: 10.1186/s12711-018-0443-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 12/18/2018] [Indexed: 12/12/2022] Open

Abstract

Background

The use of whole-genome sequence (WGS) data for genomic prediction and association studies is highly desirable because the causal mutations should be present in the data. The sequencing of 935 sheep from a range of breeds provides the opportunity to impute sheep genotyped with single nucleotide polymorphism (SNP) arrays to WGS. This study evaluated the accuracy of imputation from SNP genotypes to WGS using this reference population of 935 sequenced sheep.

Results

The accuracy of imputation from the Ovine Infinium^® HD BeadChip SNP (~ 500 k) to WGS was assessed for three target breeds: Merino, Poll Dorset and F1 Border Leicester × Merino. Imputation accuracy was highest for the Poll Dorset breed, although there were more Merino individuals in the sequenced reference population than Poll Dorset individuals. In addition, empirical imputation accuracies were higher (by up to 1.7%) when using larger multi-breed reference populations compared to using a smaller single-breed reference population. The mean accuracy of imputation across target breeds using the Minimac3 or the FImpute software was 0.94. The empirical imputation accuracy varied considerably across the genome; six chromosomes carried regions of one or more Mb with a mean imputation accuracy of < 0.7. Imputation accuracy in five variant annotation classes ranged from 0.87 (missense) up to 0.94 (intronic variants), where lower accuracy corresponded to higher proportions of rare alleles. The imputation quality statistic reported from Minimac3 (R²) had a clear positive relationship with the empirical imputation accuracy. Therefore, by first discarding imputed variants with an R² below 0.4, the mean empirical accuracy across target breeds increased to 0.97. Although accuracy of genomic prediction was less affected by filtering on R² in a multi-breed population of sheep with imputed WGS, the genomic heritability clearly tended to be lower when using variants with an R² ≤ 0.4.

Conclusions

The mean imputation accuracy was high for all target breeds and was increased by combining smaller breed sets into a multi-breed reference. We found that the Minimac3 software imputation quality statistic (R²) was a useful indicator of empirical imputation accuracy, enabling removal of very poorly imputed variants before downstream analyses.

Electronic supplementary material

The online version of this article (10.1186/s12711-018-0443-5) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Sunduimijid Bolormaa Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia. .,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.
Amanda J Chamberlain Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia
Majid Khansefid Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia.,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia
Paul Stothard Faculty of Agricultural, Life and Environmental Sciences, University of Alberta, Edmonton, AB, T6G 2R3, Canada
Andrew A Swan Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,Animal Genetics and Breeding Unit, University of New England, Armidale, NSW, 2351, Australia
Brett Mason Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia
Claire P Prowse-Wilkins Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia
Naomi Duijvesteijn Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
Nasir Moghaddar Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
Julius H van der Werf Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
Hans D Daetwyler Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia.,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3086, Australia
Iona M MacLeod Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia.,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia

Collapse

Zhang Q, Sahana G, Su G, Guldbrandtsen B, Lund MS, Calus MPL. Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle. Genet Sel Evol 2018;50:62. [PMID: 30458700 PMCID: PMC6247626 DOI: 10.1186/s12711-018-0432-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 11/14/2018] [Indexed: 11/05/2022] Open

Abstract

Background

Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of including RLFV that are within genes and selected from whole-genome sequence variants, on the reliability of genomic prediction for fertility, health and longevity in dairy cattle.

Results

All genic RLFV with a minor allele frequency lower than 0.05 were extracted from imputed sequence data and subsets were created using different strategies. These subsets were subsequently combined with Illumina 50 k single nucleotide polymorphism (SNP) data and used for genomic prediction. Reliability of prediction obtained by using 50 k SNP data alone was used as reference value and absolute changes in reliabilities are referred to as changes in percentage points. Adding a component that included either all the genic or a subset of selected RLFV into the model in addition to the 50 k component changed the reliability of predictions by − 2.2 to 1.1%, i.e. hardly no change in reliability of prediction was found, regardless of how the RLFV were selected. In addition to these empirical analyses, a simulation study was performed to evaluate the potential impact of adding RLFV in the model on the reliability of prediction. Three sets of causal RLFV (containing 21,468, 1348 and 235 RLFV) that were randomly selected from different numbers of genes were generated and accounted for 10% additional genetic variance of the estimated variance explained by the 50 k SNPs. When genic RLFV based on mapping results were included in the prediction model, reliabilities improved by up to 4.0% and when the causal RLFV were included they improved by up to 6.8%.

Conclusions

Using selected RLFV from whole-genome sequence data had only a small impact on the empirical reliability of genomic prediction in dairy cattle. Our simulations revealed that for sequence data to bring a benefit, the key is to identify causal RLFV.

Electronic supplementary material

The online version of this article (10.1186/s12711-018-0432-8) contains supplementary material, which is available to authorized users.

Collapse

Raymond B, Bouwman AC, Wientjes YCJ, Schrooten C, Houwing-Duistermaat J, Veerkamp RF. Genomic prediction for numerically small breeds, using models with pre-selected and differentially weighted markers. Genet Sel Evol 2018;50:49. [PMID: 30314431 PMCID: PMC6186145 DOI: 10.1186/s12711-018-0419-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 10/01/2018] [Indexed: 01/22/2023] Open

Abstract

BACKGROUND

Genomic prediction (GP) accuracy in numerically small breeds is limited by the small size of the reference population. Our objective was to test a multi-breed multiple genomic relationship matrices (GRM) GP model (MBMG) that weighs pre-selected markers separately, uses the remaining markers to explain the remaining genetic variance that can be explained by markers, and weighs information of breeds in the reference population by their genetic correlation with the validation breed.

METHODS

Genotype and phenotype data were used on 595 Jersey bulls from New Zealand and 5503 Holstein bulls from the Netherlands, all with deregressed proofs for stature. Different sets of markers were used, containing either pre-selected markers from a meta-genome-wide association analysis on stature, remaining markers or both. We implemented a multi-breed bivariate GREML model in which we fitted either a single multi-breed GRM (MBSG), or two distinct multi-breed GRM (MBMG), one made with pre-selected markers and the other with remaining markers. Accuracies of predicting stature for Jersey individuals using the multi-breed models (Holstein and Jersey combined reference population) was compared to those obtained using either the Jersey (within-breed) or Holstein (across-breed) reference population. All the models were subsequently fitted in the analysis of simulated phenotypes, with a simulated genetic correlation between breeds of 1, 0.5, and 0.25.

RESULTS

The MBMG model always gave better prediction accuracies for stature compared to MBSG, within-, and across-breed GP models. For example, with MBSG, accuracies obtained by fitting 48,912 unselected markers (0.43), 357 pre-selected markers (0.38) or a combination of both (0.43), were lower than accuracies obtained by fitting pre-selected and unselected markers in separate GRM in MBMG (0.49). This improvement was further confirmed by results from a simulation study, with MBMG performing on average 23% better than MBSG with all markers fitted.

CONCLUSIONS

With the MBMG model, it is possible to use information from numerically large breeds to improve prediction accuracy of numerically small breeds. The superiority of MBMG is mainly due to its ability to use information on pre-selected markers, explain the remaining genetic variance and weigh information from a different breed by the genetic correlation between breeds.

Collapse

Calus MPL, Goddard ME, Wientjes YCJ, Bowman PJ, Hayes BJ. Multibreed genomic prediction using multitrait genomic residual maximum likelihood and multitask Bayesian variable selection. J Dairy Sci 2018;101:4279-4294. [PMID: 29550121 DOI: 10.3168/jds.2017-13366] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 01/04/2018] [Indexed: 11/19/2022]

Abstract

Genomic prediction is applicable to individuals of different breeds. Empirical results to date, however, show limited benefits in using information on multiple breeds in the context of genomic prediction. We investigated a multitask Bayesian model, presented previously by others, implemented in a Bayesian stochastic search variable selection (BSSVS) model. This model allowed for evidence of quantitative trait loci (QTL) to be accumulated across breeds or for both QTL that segregate across breeds and breed-specific QTL. In both cases, single nucleotide polymorphism effects were estimated with information from a single breed. Other models considered were a single-trait and multitrait genomic residual maximum likelihood (GREML) model, with breeds considered as different traits, and a single-trait BSSVS model. All single-trait models were applied to each of the 2 breeds separately and to the pooled data of both breeds. The data used included a training data set of 6,278 Holstein and 722 Jersey bulls, as well as 374 Jersey validation bulls. All animals had genotypes for 474,773 single nucleotide polymorphisms after editing and phenotypes for milk, fat, and protein yields. Using the same training data, BSSVS consistently outperformed GREML. The multitask BSSVS, however, did not outperform single-trait BSSVS, which used pooled Holstein and Jersey data for training. Thus, the rigorous assumption that the traits are the same in both breeds yielded a slightly better prediction than a model that had to estimate the correlation between the breeds from the data. Adding the Holstein data significantly increased the accuracy of the single-trait GREML and BSSVS in predicting the Jerseys for milk and protein, in line with estimated correlations between the breeds of 0.66 and 0.47 for milk and protein yields, whereas only the BSSVS model significantly improved the accuracy for fat yield with an estimated correlation between breeds of only 0.05. The relatively high genetic correlations for milk and protein yields, and the superiority of the pooling strategy, is likely the result of the observed admixture between both breeds in our data. The Bayesian model was able to detect several QTL in Holsteins, which likely enabled it to outperform GREML. The inability of the multitask Bayesian models to outperform a simple pooling strategy may be explained by the fact that the pooling strategy assumes equal effects in both breeds; furthermore, this assumption may be valid for moderate- to large-sized QTL, which are important for multibreed genomic prediction.

Collapse

Werner CR, Qian L, Voss-Fels KP, Abbadi A, Leckband G, Frisch M, Snowdon RJ. Genome-wide regression models considering general and specific combining ability predict hybrid performance in oilseed rape with similar accuracy regardless of trait architecture. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2018;131:299-317. [PMID: 29080901 DOI: 10.1007/s00122-017-3002-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 10/09/2017] [Indexed: 05/02/2023]

Abstract

Genomic prediction using the Brassica 60 k genotyping array is efficient in oilseed rape hybrids. Prediction accuracy is more dependent on trait complexity than on the prediction model. In oilseed rape breeding programs, performance prediction of parental combinations is of fundamental importance. Due to the phenomenon of heterosis, per se performance is not a reliable indicator for F₁-hybrid performance, and selection of well-paired parents requires the testing of large quantities of hybrid combinations in extensive field trials. However, the number of potential hybrids, in general, dramatically exceeds breeding capacity and budget. Integration of genomic selection (GS) could substantially increase the number of potential combinations that can be evaluated. GS models can be used to predict the performance of untested individuals based only on their genotypic profiles, using marker effects previously predicted in a training population. This allows for a preselection of promising genotypes, enabling a more efficient allocation of resources. In this study, we evaluated the usefulness of the Illumina Brassica 60 k SNP array for genomic prediction and compared three alternative approaches based on a homoscedastic ridge regression BLUP and three Bayesian prediction models that considered general and specific combining ability (GCA and SCA, respectively). A total of 448 hybrids were produced in a commercial breeding program from unbalanced crosses between 220 paternal doubled haploid lines and five male-sterile testers. Predictive ability was evaluated for seven agronomic traits. We demonstrate that the Brassica 60 k genotyping array is an adequate and highly valuable platform to implement genomic prediction of hybrid performance in oilseed rape. Furthermore, we present first insights into the application of established statistical models for prediction of important agronomical traits with contrasting patterns of polygenic control.

Collapse

Which Individuals To Choose To Update the Reference Population? Minimizing the Loss of Genetic Diversity in Animal Genomic Selection Programs. G3-GENES GENOMES GENETICS 2018;8:113-121. [PMID: 29133511 PMCID: PMC5765340 DOI: 10.1534/g3.117.1117] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]

Pausch H, Emmerling R, Gredler-Grandl B, Fries R, Daetwyler HD, Goddard ME. Meta-analysis of sequence-based association studies across three cattle breeds reveals 25 QTL for fat and protein percentages in milk at nucleotide resolution. BMC Genomics 2017;18:853. [PMID: 29121857 PMCID: PMC5680815 DOI: 10.1186/s12864-017-4263-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 11/02/2017] [Indexed: 11/25/2022] Open

Abstract

Background

Genotyping and whole-genome sequencing data have been generated for hundreds of thousands of cattle. International consortia used these data to compile imputation reference panels that facilitate the imputation of sequence variant genotypes for animals that have been genotyped using dense microarrays. Association studies with imputed sequence variant genotypes allow for the characterization of quantitative trait loci (QTL) at nucleotide resolution particularly when individuals from several breeds are included in the mapping populations.

Results

We imputed genotypes for 28 million sequence variants in 17,229 cattle of the Braunvieh, Fleckvieh and Holstein breeds in order to compile large mapping populations that provide high power to identify QTL for milk production traits. Association tests between imputed sequence variant genotypes and fat and protein percentages in milk uncovered between six and thirteen QTL (P < 1e-8) per breed. Eight of the detected QTL were significant in more than one breed. We combined the results across breeds using meta-analysis and identified a total of 25 QTL including six that were not significant in the within-breed association studies. Two missense mutations in the ABCG2 (p.Y581S, rs43702337, P = 4.3e-34) and GHR (p.F279Y, rs385640152, P = 1.6e-74) genes were the top variants at QTL on chromosomes 6 and 20. Another known causal missense mutation in the DGAT1 gene (p.A232K, rs109326954, P = 8.4e-1436) was the second top variant at a QTL on chromosome 14 but its allelic substitution effects were inconsistent across breeds. It turned out that the conflicting allelic substitution effects resulted from flaws in the imputed genotypes due to the use of a multi-breed reference population for genotype imputation.

Conclusions

Many QTL for milk production traits segregate across breeds and across-breed meta-analysis has greater power to detect such QTL than within-breed association testing. Association testing between imputed sequence variant genotypes and phenotypes of interest facilitates identifying causal mutations provided the accuracy of imputation is high. However, true causal mutations may remain undetected when the imputed sequence variant genotypes contain flaws. It is highly recommended to validate the effect of known causal variants in order to assess the ability to detect true causal mutations in association studies with imputed sequence variants.

Electronic supplementary material

The online version of this article (10.1186/s12864-017-4263-8) contains supplementary material, which is available to authorized users.

Collapse

Evaluation of the potential use of a meta-population for genomic selection in autochthonous beef cattle populations. Animal 2017;12:1350-1357. [PMID: 29094666 DOI: 10.1017/s175173111700283x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open

Werner CR, Qian L, Voss-Fels KP, Abbadi A, Leckband G, Frisch M, Snowdon RJ. Genome-wide regression models considering general and specific combining ability predict hybrid performance in oilseed rape with similar accuracy regardless of trait architecture. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2017. [PMID: 29080901 DOI: 10.1007/s00122‐017‐3002‐5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Abstract

KEY MESSAGE

Collapse

van den Berg I, Bowman PJ, MacLeod IM, Hayes BJ, Wang T, Bolormaa S, Goddard ME. Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect. Genet Sel Evol 2017;49:70. [PMID: 28934948 PMCID: PMC5609075 DOI: 10.1186/s12711-017-0347-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 09/13/2017] [Indexed: 11/26/2022] Open

Abstract

Background

The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows.

Results

With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs.

Conclusions

We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-017-0347-9) contains supplementary material, which is available to authorized users.

Collapse

Veerkamp RF, Bouwman AC, Schrooten C, Calus MPL. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle. Genet Sel Evol 2016;48:95. [PMID: 27905878 PMCID: PMC5134274 DOI: 10.1186/s12711-016-0274-1] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 11/24/2016] [Indexed: 11/10/2022] Open

Abstract

Background

Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data.

Methods

Phenotypes were available for 5503 Holstein–Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants.

Results

The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased.

Conclusions

Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-016-0274-1) contains supplementary material, which is available to authorized users.

Collapse