1
|
van den Berg I, Chamberlain AJ, MacLeod IM, Nguyen TV, Goddard ME, Xiang R, Mason B, Meier S, Phyn CVC, Burke CR, Pryce JE. Using expression data to fine map QTL associated with fertility in dairy cattle. Genet Sel Evol 2024; 56:42. [PMID: 38844868 PMCID: PMC11154999 DOI: 10.1186/s12711-024-00912-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 05/13/2024] [Indexed: 06/09/2024] Open
Abstract
BACKGROUND Female fertility is an important trait in dairy cattle. Identifying putative causal variants associated with fertility may help to improve the accuracy of genomic prediction of fertility. Combining expression data (eQTL) of genes, exons, gene splicing and allele specific expression is a promising approach to fine map QTL to get closer to the causal mutations. Another approach is to identify genomic differences between cows selected for high and low fertility and a selection experiment in New Zealand has created exactly this resource. Our objective was to combine multiple types of expression data, fertility traits and allele frequency in high- (POS) and low-fertility (NEG) cows with a genome-wide association study (GWAS) on calving interval in Australian cows to fine-map QTL associated with fertility in both Australia and New Zealand dairy cattle populations. RESULTS Variants that were significantly associated with calving interval (CI) were strongly enriched for variants associated with gene, exon, gene splicing and allele-specific expression, indicating that there is substantial overlap between QTL associated with CI and eQTL. We identified 671 genes with significant differential expression between POS and NEG cows, with the largest fold change detected for the CCDC196 gene on chromosome 10. Our results provide numerous candidate genes associated with female fertility in dairy cattle, including GYS2 and TIGAR on chromosome 5 and SYT3 and HSD17B14 on chromosome 18. Multiple QTL regions were located in regions with large numbers of copy number variants (CNV). To identify the causal mutations for these variants, long read sequencing may be useful. CONCLUSIONS Variants that were significantly associated with CI were highly enriched for eQTL. We detected 671 genes that were differentially expressed between POS and NEG cows. Several QTL detected for CI overlapped with eQTL, providing candidate genes for fertility in dairy cattle.
Collapse
Affiliation(s)
- Irene van den Berg
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia.
| | - Amanda J Chamberlain
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
| | - Tuan V Nguyen
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
| | - Mike E Goddard
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
- Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Ruidong Xiang
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
- Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Brett Mason
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
| | | | | | | | - Jennie E Pryce
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| |
Collapse
|
2
|
Id-Lahoucine S, Cánovas A, Legarra A, Casellas J. Transmission ratio distortion regions in the context of genomic evaluation and their effects on reproductive traits in cattle. J Dairy Sci 2023; 106:7786-7798. [PMID: 37210358 DOI: 10.3168/jds.2022-23062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 04/19/2023] [Indexed: 05/22/2023]
Abstract
Transmission ratio distortion (TRD), which is a deviation from Mendelian expectations, has been associated with basic mechanisms of life such as sperm and ova fertility and viability at developmental stages of the reproductive cycle. In this study different models including TRD regions were tested for different reproductive traits [days from first service to conception (FSTC), number of services, first service nonreturn rate (NRR), and stillbirth (SB)]. Thus, in addition to a basic model with systematic and random effects, including genetic effects modeled through a genomic relationship matrix, we developed 2 additional models, including a second genomic relationship matrix based on TRD regions, and TRD regions as a random effect assuming heterogeneous variances. The analyses were performed with 10,623 cows and 1,520 bulls genotyped for 47,910 SNPs, 590 TRD regions, and several records ranging from 9,587 (FSTC) to 19,667 (SB). The results of this study showed the ability of TRD regions to capture some additional genetic variance for some traits; however, this did not translate into higher accuracy for genomic prediction. This could be explained by the nature of TRD itself, which may arise in different stages of the reproductive cycle. Nevertheless, important effects of TRD regions were found on SB (31 regions) and NRR (18 regions) when comparing at-risk versus control matings, especially for regions with allelic TRD pattern. Particularly for NRR, the probability of observing nonpregnant cow increases by up to 27% for specific TRD regions, and the probability of observing stillbirth increased by up to 254%. These results support the relevance of several TRD regions on some reproductive traits, especially those with allelic patterns that have not received as much attention as recessive TRD patterns.
Collapse
Affiliation(s)
- S Id-Lahoucine
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph N1G 2W1, ON, Canada
| | - A Cánovas
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph N1G 2W1, ON, Canada.
| | - A Legarra
- INRAE, UR631 SAGA, BP 52627, 32326 Castanet-Tolosan, France
| | - J Casellas
- Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, Bellaterra 08193, Barcelona, Spain
| |
Collapse
|
3
|
Calderón-Chagoya R, Vega-Murillo VE, García-Ruiz A, Ríos-Utrera Á, Martínez-Velázquez G, Montaño-Bermúdez M. Discovering Genomic Regions Associated with Reproductive Traits and Frame Score in Mexican Simmental and Simbrah Cattle Using Individual SNP and Haplotype Markers. Genes (Basel) 2023; 14:2004. [PMID: 38002947 PMCID: PMC10671695 DOI: 10.3390/genes14112004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/11/2023] [Accepted: 10/20/2023] [Indexed: 11/26/2023] Open
Abstract
Reproductive efficiency stands as a critical determinant of profitability within beef production systems. The incorporation of molecular markers can expedite advancements in reproductive performance. While the use of SNPs in association analysis is prevalent, approaches centered on haplotypes can offer a more comprehensive insight. The study used registered Simmental and Simbrah cattle genotyped with the GGP Bovine 150 k panel. Phenotypes included scrotal circumference (SC), heifer fertility (HF), stayability (STAY), and frame score (FS). After quality control, 105,129 autosomal SNPs from 967 animals were used. Haplotype blocks were defined based on linkage disequilibrium. Comparison between haplotypes and SNPs for reproductive traits and FS was conducted using Bayesian and frequentist models. 23, 13, 7, and 2 SNPs exhibited associations with FS, SC, HF, and STAY, respectively. In addition, seven, eight, seven, and one haplotypes displayed associations with FS, SC, HF, and STAY, respectively. Within these delineated genomic segments, potential candidate genes were associated.
Collapse
Affiliation(s)
- René Calderón-Chagoya
- Faculty of Veterinary Medicine and Zootechnics, National Autonomous University of Mexico, Ciudad de México 04510, Mexico;
- National Center for Disciplinary Research in Physiology and Animal Improvement, National Institute for Forestry, Agricultural and Livestock Research, Querétaro 76280, Mexico;
| | - Vicente Eliezer Vega-Murillo
- Faculty of Veterinary Medicine and Zootechnics, Veracruzana University, Veracruz 91710, Mexico; (V.E.V.-M.); (Á.R.-U.)
| | - Adriana García-Ruiz
- National Center for Disciplinary Research in Physiology and Animal Improvement, National Institute for Forestry, Agricultural and Livestock Research, Querétaro 76280, Mexico;
| | - Ángel Ríos-Utrera
- Faculty of Veterinary Medicine and Zootechnics, Veracruzana University, Veracruz 91710, Mexico; (V.E.V.-M.); (Á.R.-U.)
| | - Guillermo Martínez-Velázquez
- Experimental Field Santiago Ixcuintla, National Institute for Forestry, Agricultural and Livestock Research, Nayarit 63570, Mexico;
| | - Moisés Montaño-Bermúdez
- National Center for Disciplinary Research in Physiology and Animal Improvement, National Institute for Forestry, Agricultural and Livestock Research, Querétaro 76280, Mexico;
| |
Collapse
|
4
|
Valente BD, de los Campos G, Grueneberg A, Chen CY, Ros-Freixedes R, Herring WO. Using residual regressions to quantify and map signal leakage in genomic prediction. Genet Sel Evol 2023; 55:57. [PMID: 37550618 PMCID: PMC10405418 DOI: 10.1186/s12711-023-00830-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 07/12/2023] [Indexed: 08/09/2023] Open
Abstract
BACKGROUND Most genomic prediction applications in animal breeding use genotypes with tens of thousands of single nucleotide polymorphisms (SNPs). However, modern sequencing technologies and imputation algorithms can generate ultra-high-density genotypes (including millions of SNPs) at an affordable cost. Empirical studies have not produced clear evidence that using ultra-high-density genotypes can significantly improve prediction accuracy. However, (whole-genome) prediction accuracy is not very informative about the ability of a model to capture the genetic signals from specific genomic regions. To address this problem, we propose a simple methodology that detects chromosome regions for which a specific model (e.g., single-step genomic best linear unbiased prediction (ssGBLUP)) may fail to fully capture the genetic signal present in such segments-a phenomenon that we refer to as signal leakage. We propose to detect regions with evidence of signal leakage by testing the association of residuals from a pedigree or a genomic model with SNP genotypes. We discuss how this approach can be used to map regions with signals that are poorly captured by a model and to identify strategies to fix those problems (e.g., using a different prior or increasing marker density). Finally, we explored the proposed approach to scan for signal leakage of different models (pedigree-based, ssGBLUP, and various Bayesian models) applied to growth-related phenotypes (average daily gain and backfat thickness) in pigs. RESULTS We report widespread evidence of signal leakage for pedigree-based models. Including a percentage of animals with SNP data in ssGBLUP reduced the extent of signal leakage. However, local peaks of missed signals remained in some regions, even when all animals were genotyped. Using variable selection priors solves leakage points that are caused by excessive shrinkage of marker effects. Nevertheless, these models still miss signals in some regions due to low linkage disequilibrium between the SNPs on the array used and causal variants. Thus, we discuss how such problems could be addressed by adding sequence SNPs from those regions to the prediction model. CONCLUSIONS Residual single-marker regression analysis is a simple approach that can be used to detect regional genomic signals that are poorly captured by a model and to indicate ways to fix such problems.
Collapse
Affiliation(s)
| | - Gustavo de los Campos
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI USA
- Department of Statistics and Probability, Michigan State University, East Lansing, MI USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI USA
| | - Alexander Grueneberg
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI USA
| | - Ching-Yi Chen
- The Pig Improvement Company, Genus Plc, Hendersonville, TN USA
| | - Roger Ros-Freixedes
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
- Departament de Ciència Animal, Universitat de Lleida-Agrotecnio-CERCA Center, Lleida, Spain
| | | |
Collapse
|
5
|
Jang S, Ros-Freixedes R, Hickey JM, Chen CY, Herring WO, Holl J, Misztal I, Lourenco D. Multi-line ssGBLUP evaluation using preselected markers from whole-genome sequence data in pigs. Front Genet 2023; 14:1163626. [PMID: 37252662 PMCID: PMC10213539 DOI: 10.3389/fgene.2023.1163626] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 05/03/2023] [Indexed: 05/31/2023] Open
Abstract
Genomic evaluations in pigs could benefit from using multi-line data along with whole-genome sequencing (WGS) if the data are large enough to represent the variability across populations. The objective of this study was to investigate strategies to combine large-scale data from different terminal pig lines in a multi-line genomic evaluation (MLE) through single-step GBLUP (ssGBLUP) models while including variants preselected from whole-genome sequence (WGS) data. We investigated single-line and multi-line evaluations for five traits recorded in three terminal lines. The number of sequenced animals in each line ranged from 731 to 1,865, with 60k to 104k imputed to WGS. Unknown parent groups (UPG) and metafounders (MF) were explored to account for genetic differences among the lines and improve the compatibility between pedigree and genomic relationships in the MLE. Sequence variants were preselected based on multi-line genome-wide association studies (GWAS) or linkage disequilibrium (LD) pruning. These preselected variant sets were used for ssGBLUP predictions without and with weights from BayesR, and the performances were compared to that of a commercial porcine single-nucleotide polymorphisms (SNP) chip. Using UPG and MF in MLE showed small to no gain in prediction accuracy (up to 0.02), depending on the lines and traits, compared to the single-line genomic evaluation (SLE). Likewise, adding selected variants from the GWAS to the commercial SNP chip resulted in a maximum increase of 0.02 in the prediction accuracy, only for average daily feed intake in the most numerous lines. In addition, no benefits were observed when using preselected sequence variants in multi-line genomic predictions. Weights from BayesR did not help improve the performance of ssGBLUP. This study revealed limited benefits of using preselected whole-genome sequence variants for multi-line genomic predictions, even when tens of thousands of animals had imputed sequence data. Correctly accounting for line differences with UPG or MF in MLE is essential to obtain predictions similar to SLE; however, the only observed benefit of an MLE is to have comparable predictions across lines. Further investigation into the amount of data and novel methods to preselect whole-genome causative variants in combined populations would be of significant interest.
Collapse
Affiliation(s)
- Sungbong Jang
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States
| | - Roger Ros-Freixedes
- Departament de Ciència Animal, Universitat de Lleida-Agrotecnio-CERCA Center, Lleida, Spain
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, Scotland, United Kingdom
| | - Ching-Yi Chen
- The Pig Improvement Company, Genus plc, Hendersonville, TN, United States
| | - William O Herring
- The Pig Improvement Company, Genus plc, Hendersonville, TN, United States
| | - Justin Holl
- The Pig Improvement Company, Genus plc, Hendersonville, TN, United States
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States
| |
Collapse
|
6
|
Jones HE, Wilson PB. Progress and opportunities through use of genomics in animal production. Trends Genet 2022; 38:1228-1252. [PMID: 35945076 DOI: 10.1016/j.tig.2022.06.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 06/08/2022] [Accepted: 06/17/2022] [Indexed: 01/24/2023]
Abstract
The rearing of farmed animals is a vital component of global food production systems, but its impact on the environment, human health, animal welfare, and biodiversity is being increasingly challenged. Developments in genetic and genomic technologies have had a key role in improving the productivity of farmed animals for decades. Advances in genome sequencing, annotation, and editing offer a means not only to continue that trend, but also, when combined with advanced data collection, analytics, cloud computing, appropriate infrastructure, and regulation, to take precision livestock farming (PLF) and conservation to an advanced level. Such an approach could generate substantial additional benefits in terms of reducing use of resources, health treatments, and environmental impact, while also improving animal health and welfare.
Collapse
Affiliation(s)
- Huw E Jones
- UK Genetics for Livestock and Equines (UKGLE) Committee, Department for Environment, Food and Rural Affairs, Nobel House, 17 Smith Square, London, SW1P 3JR, UK; Nottingham Trent University, Brackenhurst Campus, Brackenhurst Lane, Southwell, NG25 0QF, UK.
| | - Philippe B Wilson
- UK Genetics for Livestock and Equines (UKGLE) Committee, Department for Environment, Food and Rural Affairs, Nobel House, 17 Smith Square, London, SW1P 3JR, UK; Nottingham Trent University, Brackenhurst Campus, Brackenhurst Lane, Southwell, NG25 0QF, UK
| |
Collapse
|
7
|
Ribeiro G, Baldi F, Cesar ASM, Alexandre PA, Peripolli E, Ferraz JBS, Fukumasu H. Detection of potential functional variants based on systems-biology: the case of feed efficiency in beef cattle. BMC Genomics 2022; 23:774. [PMID: 36434498 PMCID: PMC9700932 DOI: 10.1186/s12864-022-08958-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 10/20/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Potential functional variants (PFVs) can be defined as genetic variants responsible for a given phenotype. Ultimately, these are the best DNA markers for animal breeding and selection, especially for polygenic and complex phenotypes. Herein, we described the identification of PFVs for complex phenotypes (in this case, Feed Efficiency in beef cattle) using a systems-biology driven approach based on RNA-seq data from physiologically relevant organs. RESULTS The systems-biology coupled with deep molecular phenotyping by RNA-seq of liver, muscle, hypothalamus, pituitary, and adrenal glands of animals with high and low feed efficiency (FE) measured by residual feed intake (RFI) identified 2,000,936 uniquely variants. Among them, 9986 variants were significantly associated with FE and only 78 had a high impact on protein expression and were considered as PFVs. A set of 169 significant uniquely variants were expressed in all five organs, however, only 27 variants had a moderate impact and none of them a had high impact on protein expression. These results provide evidence of tissue-specific effects of high-impact PFVs. The PFVs were enriched (FDR < 0.05) for processing and presentation of MHC Class I and II mediated antigens, which are an important part of the adaptive immune response. The experimental validation of these PFVs was demonstrated by the increased prediction accuracy for RFI using the weighted G matrix (ssGBLUP+wG; Acc = 0.10 and b = 0.48) obtained in the ssGWAS in comparison to the unweighted G matrix (ssGBLUP; Acc = 0.29 and b = 1.10). CONCLUSION Here we identified PFVs for FE in beef cattle using a strategy based on systems-biology and deep molecular phenotyping. This approach has great potential to be used in genetic prediction programs, especially for polygenic phenotypes.
Collapse
Affiliation(s)
- Gabriela Ribeiro
- grid.11899.380000 0004 1937 0722Department of Veterinary Medicine, Faculty of Animal Science and Food Engineering, University of Sao Paulo, Pirassununga, Sao Paulo, 13635-900 Brazil
| | - Fernando Baldi
- grid.410543.70000 0001 2188 478XDepartment of Animal Science, São Paulo State University (UNESP), Jaboticabal, São Paulo, Brazil
| | - Aline S. M. Cesar
- grid.11899.380000 0004 1937 0722Escola Superior de Agricultura “Luiz de Queiroz”, University of Sao Paulo, Piracicaba, São Paulo, Brazil
| | - Pâmela A. Alexandre
- grid.11899.380000 0004 1937 0722Department of Veterinary Medicine, Faculty of Animal Science and Food Engineering, University of Sao Paulo, Pirassununga, Sao Paulo, 13635-900 Brazil ,CSIRO Agriculture & Food, 306 Carmody Rd., St. Lucia, Brisbane, QLD 4067 Australia
| | - Elisa Peripolli
- grid.11899.380000 0004 1937 0722Department of Veterinary Medicine, Faculty of Animal Science and Food Engineering, University of Sao Paulo, Pirassununga, Sao Paulo, 13635-900 Brazil ,grid.410543.70000 0001 2188 478XDepartment of Animal Science, São Paulo State University (UNESP), Jaboticabal, São Paulo, Brazil
| | - José B. S. Ferraz
- grid.11899.380000 0004 1937 0722Department of Veterinary Medicine, Faculty of Animal Science and Food Engineering, University of Sao Paulo, Pirassununga, Sao Paulo, 13635-900 Brazil
| | - Heidge Fukumasu
- grid.11899.380000 0004 1937 0722Department of Veterinary Medicine, Faculty of Animal Science and Food Engineering, University of Sao Paulo, Pirassununga, Sao Paulo, 13635-900 Brazil
| |
Collapse
|
8
|
Ros-Freixedes R, Johnsson M, Whalen A, Chen CY, Valente BD, Herring WO, Gorjanc G, Hickey JM. Genomic prediction with whole-genome sequence data in intensely selected pig lines. GENETICS SELECTION EVOLUTION 2022; 54:65. [PMID: 36153511 PMCID: PMC9509613 DOI: 10.1186/s12711-022-00756-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 09/05/2022] [Indexed: 12/03/2022]
Abstract
Background Early simulations indicated that whole-genome sequence data (WGS) could improve the accuracy of genomic predictions within and across breeds. However, empirical results have been ambiguous so far. Large datasets that capture most of the genomic diversity in a population must be assembled so that allele substitution effects are estimated with high accuracy. The objectives of this study were to use a large pig dataset from seven intensely selected lines to assess the benefits of using WGS for genomic prediction compared to using commercial marker arrays and to identify scenarios in which WGS provides the largest advantage. Methods We sequenced 6931 individuals from seven commercial pig lines with different numerical sizes. Genotypes of 32.8 million variants were imputed for 396,100 individuals (17,224 to 104,661 per line). We used BayesR to perform genomic prediction for eight complex traits. Genomic predictions were performed using either data from a standard marker array or variants preselected from WGS based on association tests. Results The accuracies of genomic predictions based on preselected WGS variants were not robust across traits and lines and the improvements in prediction accuracy that we achieved so far with WGS compared to standard marker arrays were generally small. The most favourable results for WGS were obtained when the largest training sets were available and standard marker arrays were augmented with preselected variants with statistically significant associations to the trait. With this method and training sets of around 80k individuals, the accuracy of within-line genomic predictions was on average improved by 0.025. With multi-line training sets, improvements of 0.04 compared to marker arrays could be expected. Conclusions Our results showed that WGS has limited potential to improve the accuracy of genomic predictions compared to marker arrays in intensely selected pig lines. Thus, although we expect that larger improvements in accuracy from the use of WGS are possible with a combination of larger training sets and optimised pipelines for generating and analysing such datasets, the use of WGS in the current implementations of genomic prediction should be carefully evaluated against the cost of large-scale WGS data on a case-by-case basis. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-022-00756-0.
Collapse
|
9
|
Bolormaa S, MacLeod IM, Khansefid M, Marett LC, Wales WJ, Miglior F, Baes CF, Schenkel FS, Connor EE, Manzanilla-Pech CIV, Stothard P, Herman E, Nieuwhof GJ, Goddard ME, Pryce JE. Sharing of either phenotypes or genetic variants can increase the accuracy of genomic prediction of feed efficiency. Genet Sel Evol 2022; 54:60. [PMID: 36068488 PMCID: PMC9450441 DOI: 10.1186/s12711-022-00749-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 08/17/2022] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Sharing individual phenotype and genotype data between countries is complex and fraught with potential errors, while sharing summary statistics of genome-wide association studies (GWAS) is relatively straightforward, and thus would be especially useful for traits that are expensive or difficult-to-measure, such as feed efficiency. Here we examined: (1) the sharing of individual cow data from international partners; and (2) the use of sequence variants selected from GWAS of international cow data to evaluate the accuracy of genomic estimated breeding values (GEBV) for residual feed intake (RFI) in Australian cows. RESULTS GEBV for RFI were estimated using genomic best linear unbiased prediction (GBLUP) with 50k or high-density single nucleotide polymorphisms (SNPs), from a training population of 3797 individuals in univariate to trivariate analyses where the three traits were RFI phenotypes calculated using 584 Australian lactating cows (AUSc), 824 growing heifers (AUSh), and 2526 international lactating cows (OVE). Accuracies of GEBV in AUSc were evaluated by either cohort-by-birth-year or fourfold random cross-validations. GEBV of AUSc were also predicted using only the AUS training population with a weighted genomic relationship matrix constructed with SNPs from the 50k array and sequence variants selected from a meta-GWAS that included only international datasets. The genomic heritabilities estimated using the AUSc, OVE and AUSh datasets were moderate, ranging from 0.20 to 0.36. The genetic correlations (rg) of traits between heifers and cows ranged from 0.30 to 0.95 but were associated with large standard errors. The mean accuracies of GEBV in Australian cows were up to 0.32 and almost doubled when either overseas cows, or both overseas cows and AUS heifers were included in the training population. They also increased when selected sequence variants were combined with 50k SNPs, but with a smaller relative increase. CONCLUSIONS The accuracy of RFI GEBV increased when international data were used or when selected sequence variants were combined with 50k SNP array data. This suggests that if direct sharing of data is not feasible, a meta-analysis of summary GWAS statistics could provide selected SNPs for custom panels to use in genomic selection programs. However, since this finding is based on a small cross-validation study, confirmation through a larger study is recommended.
Collapse
Affiliation(s)
| | - Iona M. MacLeod
- Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia
| | - Majid Khansefid
- Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia
| | - Leah C. Marett
- Agriculture Victoria Research, Ellinbank Centre, Ellinbank, Gippsland, VIC 3821 Australia
- School of Agriculture and Food, University of Melbourne, Parkville, VIC 3010 Australia
| | - William J. Wales
- Agriculture Victoria Research, Ellinbank Centre, Ellinbank, Gippsland, VIC 3821 Australia
- School of Agriculture and Food, University of Melbourne, Parkville, VIC 3010 Australia
| | - Filippo Miglior
- LACTANET, Sainte-Anne-de-Bellevue, QC H9X 3R4 Canada
- CGIL, University of Guelph, Guelph, ON N1G 2W1 Canada
| | - Christine F. Baes
- CGIL, University of Guelph, Guelph, ON N1G 2W1 Canada
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3002 Bern, Switzerland
| | | | - Erin E. Connor
- Animal Genomics and Improvement Laboratory, USDA, Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, MD 20705 USA
- Department of Animal and Food Sciences, University of Delaware, Newark, DE 19716 USA
| | | | - Paul Stothard
- Faculty of Agricultural, Life & Environmental Sciences, University of Alberta, Edmonton, AB T6G 2R3 Canada
| | - Emily Herman
- Faculty of Agricultural, Life & Environmental Sciences, University of Alberta, Edmonton, AB T6G 2R3 Canada
| | - Gert J. Nieuwhof
- Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia
- DataGene Ltd, Agribio, Bundoora, VIC 3083 Australia
| | - Michael E. Goddard
- Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia
- School of Veterinary and Agricultural Sciences, University of Melbourne, Parkville, VIC 3052 Australia
| | - Jennie E. Pryce
- Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083 Australia
| |
Collapse
|
10
|
Knutsen TM, Olsen HG, Ketto IA, Sundsaasen KK, Kohler A, Tafintseva V, Svendsen M, Kent MP, Lien S. Genetic variants associated with two major bovine milk fatty acids offer opportunities to breed for altered milk fat composition. Genet Sel Evol 2022; 54:35. [PMID: 35619070 PMCID: PMC9137198 DOI: 10.1186/s12711-022-00731-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 05/13/2022] [Indexed: 11/30/2022] Open
Abstract
Background Although bovine milk is regarded as healthy and nutritious, its high content of saturated fatty acids (FA) may be harmful to cardiovascular health. Palmitic acid (C16:0) is the predominant saturated FA in milk with adverse health effects that could be countered by substituting it with higher levels of unsaturated FA, such as oleic acid (C18:1cis-9). In this work, we performed genome-wide association analyses for milk fatty acids predicted from FTIR spectroscopy data using 1811 Norwegian Red cattle genotyped and imputed to a high-density 777k single nucleotide polymorphism (SNP)-array. In a follow-up analysis, we used imputed whole-genome sequence data to detect genetic variants that are involved in FTIR-predicted levels of C16:0 and C18:1cis-9 and explore the transcript profile and protein level of candidate genes. Results Genome-wise significant associations were detected for C16:0 on Bos taurus (BTA) autosomes 11, 16 and 27, and for C18:1cis-9 on BTA5, 13 and 19. Closer examination of a significant locus on BTA11 identified the PAEP gene, which encodes the milk protein β-lactoglobulin, as a particularly attractive positional candidate gene. At this locus, we discovered a tightly linked cluster of genetic variants in coding and regulatory sequences that have opposing effects on the levels of C16:0 and C18:1cis-9. The favourable haplotype, linked to reduced levels of C16:0 and increased levels of C18:1cis-9 was also associated with a marked reduction in PAEP expression and β-lactoglobulin protein levels. β-lactoglobulin is the most abundant whey protein in milk and lower levels are associated with important dairy production parameters such as improved cheese yield. Conclusions The genetic variants detected in this study may be used in breeding to produce milk with an improved FA health-profile and enhanced cheese-making properties. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-022-00731-9.
Collapse
Affiliation(s)
| | - Hanne Gro Olsen
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Isaya Appelesy Ketto
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences,, Ås, Norway
| | - Kristil Kindem Sundsaasen
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Achim Kohler
- Faculty of Science and Technology, Norwegian University of Life Sciences, Ås, Norway
| | - Valeria Tafintseva
- Faculty of Science and Technology, Norwegian University of Life Sciences, Ås, Norway
| | | | - Matthew Peter Kent
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Sigbjørn Lien
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| |
Collapse
|
11
|
van den Berg I, Ho PN, Nguyen TV, Haile-Mariam M, Luke TDW, Pryce JE. Using mid-infrared spectroscopy to increase GWAS power to detect QTL associated with blood urea nitrogen. Genet Sel Evol 2022; 54:27. [PMID: 35436852 PMCID: PMC9014603 DOI: 10.1186/s12711-022-00719-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 04/05/2022] [Indexed: 11/20/2022] Open
Abstract
Blood urea nitrogen (BUN) is an indicator trait for urinary nitrogen excretion. Measuring BUN level requires a blood sample, which limits the number of records that can be obtained. Alternatively, BUN can be predicted using mid-infrared (MIR) spectroscopy of a milk sample and thus records become available on many more cows through routine milk recording processes. The genetic correlation between MIR predicted BUN (MBUN) and BUN is 0.90. Hence, genetically, BUN and MBUN can be considered as the same trait. The objective of our study was to perform genome-wide association studies (GWAS) for BUN and MBUN, compare these two GWAS and detect quantitative trait loci (QTL) for both traits, and compare the detected QTL with previously reported QTL for milk urea nitrogen (MUN). The dataset used for our analyses included 2098 and 18,120 phenotypes for BUN and MBUN, respectively, and imputed whole-genome sequence data. The GWAS for MBUN was carried out using either the full dataset, the 2098 cows with records for BUN, or 2000 randomly selected cows, so that the dataset size is comparable to that for BUN. The GWAS results for BUN and MBUN were very different, in spite of the strong genetic correlation between the two traits. We detected 12 QTL for MBUN, on bovine chromosomes 2, 3, 9, 11, 12, 14 and X, and one QTL for BUN on chromosome 13. The QTL detected on chromosomes 11, 14 and X overlapped with QTL detected for MUN. The GWAS results were highly sensitive to the subset of records used. Hence, caution is warranted when interpreting GWAS based on small datasets, such as for BUN. MBUN may provide an attractive alternative to perform a more powerful GWAS to detect QTL for BUN.
Collapse
|
12
|
van den Berg I, Ho PN, Nguyen TV, Haile-Mariam M, MacLeod IM, Beatson PR, O'Connor E, Pryce JE. GWAS and genomic prediction of milk urea nitrogen in Australian and New Zealand dairy cattle. Genet Sel Evol 2022; 54:15. [PMID: 35183113 PMCID: PMC8858489 DOI: 10.1186/s12711-022-00707-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 01/31/2022] [Indexed: 11/24/2022] Open
Abstract
Background Urinary nitrogen leakage is an environmental concern in dairy cattle. Selection for reduced urinary nitrogen leakage may be done using indicator traits such as milk urea nitrogen (MUN). The result of a previous study indicated that the genetic correlation between MUN in Australia (AUS) and MUN in New Zealand (NZL) was only low to moderate (between 0.14 and 0.58). In this context, an alternative is to select sequence variants based on genome-wide association studies (GWAS) with a view to improve genomic prediction accuracies. A GWAS can also be used to detect quantitative trait loci (QTL) associated with MUN. Therefore, our objectives were to perform within-country GWAS and a meta-GWAS for MUN using records from up to 33,873 dairy cows and imputed whole-genome sequence data, to compare QTL detected in the GWAS for MUN in AUS and NZL, and to use sequence variants selected from the meta-GWAS to improve the prediction accuracy for MUN based on a joint AUS-NZL reference set. Results Using the meta-GWAS, we detected 14 QTL for MUN, located on chromosomes 1, 6, 11, 14, 19, 22, 26 and the X chromosome. The three most significant QTL encompassed the casein genes on chromosome 6, PAEP on chromosome 11 and DGAT1 on chromosome 14. We selected 50,000 sequence variants that had the same direction of effect for MUN in AUS and MUN in NZL and that were most significant in the meta-analysis for the GWAS. The selected sequence variants yielded a genetic correlation between MUN in AUS and MUN in NZL of 0.95 and substantially increased prediction accuracy in both countries. Conclusions Our results demonstrate how the sharing of data between two countries can increase the power of a GWAS and increase the accuracy of genomic prediction using a multi-country reference population and sequence variants selected based on a meta-GWAS. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-022-00707-9.
Collapse
Affiliation(s)
- Irene van den Berg
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia.
| | - Phuong N Ho
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia
| | - Tuan V Nguyen
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia
| | - Mekonnen Haile-Mariam
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia
| | - Iona M MacLeod
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia
| | | | | | - Jennie E Pryce
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| |
Collapse
|
13
|
Guillenea A, Su G, Lund MS, Karaman E. Genomic prediction in Nordic Red dairy cattle considering breed origin of alleles. J Dairy Sci 2022; 105:2426-2438. [PMID: 35033341 DOI: 10.3168/jds.2021-21173] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 11/23/2021] [Indexed: 01/02/2023]
Abstract
This study investigated the reliability of genomic prediction (GP) using breed origin of alleles (BOA) approach in the Nordic Red (RDC) population, which has an admixed population structure. The RDC population consists of animals with varying degrees of genetic materials from the Danish Red (RDM), Swedish Red (SRB), Finnish Ayrshire (FAY), and Holstein (HOL) because bulls have been used across the breeds. The BOA approach was tested using 39,550 RDC animals in the reference population and 11,786 in the validation population. Deregressed proofs (DRP) of milk, fat and protein were used as response variable for GP. Direct genomic breeding values (DGV) for animals in the validation population were calculated with (BOA model) or without (joint model) considering breed origin of alleles. The joint model assumed homogeneous marker effects and a single set of marker effects were estimated, whereas BOA model assumed heterogeneous marker effects, and different sets of marker effects were estimated across the breeds. For the BOA approach, we tested scenarios assuming both correlated (BOA_cor) and uncorrelated (BOA_uncor) marker effects between the breeds. Additionally, we investigated GP using a standard Illumina 50K chip and including SNP selected from imputed whole-genome sequencing (50K+WGS). We also studied the effect of estimating (co)variances for genome regions of different sizes to exploit the information of the genome regions contributing to the (co)variance between the breeds. Region sizes were set as 1 SNP, a group of 30 or 100 adjacent SNP, or the whole genome. Reliability of DGV was measured as squared correlations between DGV and DRP divided by the reliability of DRP. Across the 3 traits, in general, RS30 and RS100 SNP yielded the highest reliabilities. Including WGS SNP improved reliabilities in almost all scenarios (0.297 on average for 50K and 0.307 on average for 50K+WGS). The BOA_uncor (0.233 on average) was inferior to the joint model (0.339 on average), but the reliabilities obtained using BOA_cor (0.334 on average) in most cases were not significantly different from those obtained using the joint model. The results indicate that both including additional whole-genome sequencing SNP and dividing the genome into fixed regions improve GP in the RDC. The BOA models have the potential to increase the reliability of GP, but the benefit is limited in populations with a high exchange of genetic material for a long time, as is the case for RDC.
Collapse
Affiliation(s)
- Ana Guillenea
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark.
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| | - Mogens Sand Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| | - Emre Karaman
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| |
Collapse
|
14
|
Mollandin F, Rau A, Croiseau P. An evaluation of the predictive performance and mapping power of the BayesR model for genomic prediction. G3 GENES|GENOMES|GENETICS 2021; 11:6317672. [PMID: 34849780 PMCID: PMC8527474 DOI: 10.1093/g3journal/jkab225] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 06/27/2021] [Indexed: 12/02/2022]
Abstract
Technological advances and decreasing costs have led to the rise of increasingly dense genotyping data, making feasible the identification of potential causal markers. Custom genotyping chips, which combine medium-density genotypes with a custom genotype panel, can capitalize on these candidates to potentially yield improved accuracy and interpretability in genomic prediction. A particularly promising model to this end is BayesR, which divides markers into four effect size classes. BayesR has been shown to yield accurate predictions and promise for quantitative trait loci (QTL) mapping in real data applications, but an extensive benchmarking in simulated data is currently lacking. Based on a set of real genotypes, we generated simulated data under a variety of genetic architectures and phenotype heritabilities, and we evaluated the impact of excluding or including causal markers among the genotypes. We define several statistical criteria for QTL mapping, including several based on sliding windows to account for linkage disequilibrium (LD). We compare and contrast these statistics and their ability to accurately prioritize known causal markers. Overall, we confirm the strong predictive performance for BayesR in moderately to highly heritable traits, particularly for 50k custom data. In cases of low heritability or weak LD with the causal marker in 50k genotypes, QTL mapping is a challenge, regardless of the criterion used. BayesR is a promising approach to simultaneously obtain accurate predictions and interpretable classifications of SNPs into effect size classes. We illustrated the performance of BayesR in a variety of simulation scenarios, and compared the advantages and limitations of each.
Collapse
Affiliation(s)
- Fanny Mollandin
- INRAE, AgroParisTech, GABI, Université Paris-Saclay, Jouy-en-Josas 78350, France
| | - Andrea Rau
- INRAE, AgroParisTech, GABI, Université Paris-Saclay, Jouy-en-Josas 78350, France
- BioEcoAgro Joint Research Unit, INRAE, Université de Liège, Université de Lille, Université de Picardie Jules Verne, Peronne 80203, France
| | - Pascal Croiseau
- INRAE, AgroParisTech, GABI, Université Paris-Saclay, Jouy-en-Josas 78350, France
| |
Collapse
|
15
|
Ling AS, Hay EH, Aggrey SE, Rekaya R. Dissection of the impact of prioritized QTL-linked and -unlinked SNP markers on the accuracy of genomic selection 1. BMC Genom Data 2021; 22:26. [PMID: 34380418 PMCID: PMC8356450 DOI: 10.1186/s12863-021-00979-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 07/18/2021] [Indexed: 12/01/2022] Open
Abstract
Background Use of genomic information has resulted in an undeniable improvement in prediction accuracies and an increase in genetic gain in animal and plant genetic selection programs in spite of oversimplified assumptions about the true biological processes. Even for complex traits, a large portion of markers do not segregate with or effectively track genomic regions contributing to trait variation; yet it is not clear how genomic prediction accuracies are impacted by such potentially nonrelevant markers. In this study, a simulation was carried out to evaluate genomic predictions in the presence of markers unlinked with trait-relevant QTL. Further, we compared the ability of the population statistic FST and absolute estimated marker effect as preselection statistics to discriminate between linked and unlinked markers and the corresponding impact on accuracy. Results We found that the accuracy of genomic predictions decreased as the proportion of unlinked markers used to calculate the genomic relationships increased. Using all, only linked, and only unlinked marker sets yielded prediction accuracies of 0.62, 0.89, and 0.22, respectively. Furthermore, it was found that prediction accuracies are severely impacted by unlinked markers with large spurious associations. FST-preselected marker sets of 10 k and larger yielded accuracies 8.97 to 17.91% higher than those achieved using preselection by absolute estimated marker effects, despite selecting 5.1 to 37.7% more unlinked markers and explaining 2.4 to 5.0% less of the genetic variance. This was attributed to false positives selected by absolute estimated marker effects having a larger spurious association with the trait of interest and more negative impact on predictions. The Pearson correlation between FST scores and absolute estimated marker effects was 0.77 and 0.27 among only linked and only unlinked markers, respectively. The sensitivity of FST scores to detect truly linked markers is comparable to absolute estimated marker effects but the consistency between the two statistics regarding false positives is weak. Conclusion Identification and exclusion of markers that have little to no relevance to the trait of interest may significantly increase genomic prediction accuracies. The population statistic FST presents an efficient and effective tool for preselection of trait-relevant markers.
Collapse
Affiliation(s)
- Ashley S Ling
- Department of Animal and Dairy Science, The University of Georgia, 30602, Athens, GA, USA.
| | - El Hamidi Hay
- USDA Agricultural Research Service, Fort Keogh Livestock and Range Research Laboratory, Miles City, MT, 59301, USA
| | - Samuel E Aggrey
- Department of Poultry Science, The University of Georgia, 30602, Athens, GA, USA.,Institute of Bioinformatics, The University of Georgia, 30602, Athens, GA, USA
| | - Romdhane Rekaya
- Department of Animal and Dairy Science, The University of Georgia, 30602, Athens, GA, USA.,Institute of Bioinformatics, The University of Georgia, 30602, Athens, GA, USA.,Department of Statistics, The University of Georgia , 30602, Athens, GA, USA
| |
Collapse
|
16
|
Gebreyesus G, Lund MS, Sahana G, Su G. Reliabilities of Genomic Prediction for Young Stock Survival Traits Using 54K SNP Chip Augmented With Additional Single-Nucleotide Polymorphisms Selected From Imputed Whole-Genome Sequencing Data. Front Genet 2021; 12:667300. [PMID: 34349779 PMCID: PMC8326759 DOI: 10.3389/fgene.2021.667300] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 06/23/2021] [Indexed: 11/16/2022] Open
Abstract
This study investigated effects of integrating single-nucleotide polymorphisms (SNPs) selected based on previous genome-wide association studies (GWASs), from imputed whole-genome sequencing (WGS) data, in the conventional 54K chip on genomic prediction reliability of young stock survival (YSS) traits in dairy cattle. The WGS SNPs included two groups of SNP sets that were selected based on GWAS in the Danish Holstein for YSS index (YSS_SNPs, n = 98) and SNPs chosen as peaks of quantitative trait loci for the traits of Nordic total merit index in Denmark–Finland–Sweden dairy cattle populations (DFS_SNPs, n = 1,541). Additionally, the study also investigated the possibility of improving genomic prediction reliability for survival traits by modeling the SNPs within recessive lethal haplotypes (LET_SNP, n = 130) detected from the 54K chip in the Nordic Holstein. De-regressed proofs (DRPs) were obtained from 6,558 Danish Holstein bulls genotyped with either 54K chip or customized LD chip that includes SNPs in the standard LD chip and some of the selected WGS SNPs. The chip data were subsequently imputed to 54K SNP together with the selected WGS SNPs. Genomic best linear unbiased prediction (GBLUP) models were implemented to predict breeding values through either pooling the 54K and selected WGS SNPs together as one genetic component (a one-component model) or considering 54K SNPs and selected WGS SNPs as two separate genetic components (a two-component model). Across all the traits, inclusion of each of the selected WGS SNP sets led to negligible improvements in prediction accuracies (0.17 percentage points on average) compared to prediction using only 54K. Similarly, marginal improvement in prediction reliability was obtained when all the selected WGS SNPs were included (0.22 percentage points). No further improvement in prediction reliability was observed when considering random regression on genotype code of recessive lethal alleles in the model including both groups of the WGS SNPs. Additionally, there was no difference in prediction reliability from integrating the selected WGS SNP sets through the two-component model compared to the one-component GBLUP.
Collapse
Affiliation(s)
- Grum Gebreyesus
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| |
Collapse
|
17
|
Impact of Marker Pruning Strategies Based on Different Measurements of Marker Distance on Genomic Prediction in Dairy Cattle. Animals (Basel) 2021; 11:ani11071992. [PMID: 34359120 PMCID: PMC8300388 DOI: 10.3390/ani11071992] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 06/27/2021] [Accepted: 06/28/2021] [Indexed: 11/16/2022] Open
Abstract
Simple Summary The usefulness of genomic prediction (GP) has been widely proofed by breeding analysis in livestock, plants and aquatic populations. It is well known that ‘marker density’ is a critical factor that affects the accuracy of GP, however, how to properly measure ‘marker density’ in GP is yet to be determined. With population-level whole-genome sequence data or high-density single nucleotide polymorphism (SNP) data available, this question seems to be answered more convincingly. In this study, we investigated and discussed the impact of four ‘marker density’ measures that reflect genetic or physical distances between SNPs on the accuracy of GP in a Germany Holstein dairy cattle population. Our results showed that the degree of variation of physical distance between adjacent SNPs had significant effects on the accuracy of GP, while the genetic distance between SNPs had no relationship with the accuracy of GP. Therefore, for studies based on high-density SNP data, the default strategy of pruning SNPs based on genetic distance is detrimental to heritability estimation and genomic prediction. The results extended the communities knowledge of ‘marker density’ and provided useful suggestions for the application and research on genome prediction. Abstract With the availability of high-density single-nucleotide polymorphism (SNP) data and the development of genotype imputation methods, high-density panel-based genomic prediction (GP) has become possible in livestock breeding. It is generally considered that the genomic estimated breeding value (GEBV) accuracy increases with the marker density, while studies have shown that the GEBV accuracy does not increase or even decrease when high-density panels were used. Therefore, in addition to the SNP number, other measurements of ‘marker density’ seem to have impacts on the GEBV accuracy, and exploring the relationship between the GEBV accuracy and the measurements of ‘marker density’ based on high-density SNP or whole-genome sequence data is important for the field of GP. In this study, we constructed different SNP panels with certain SNP numbers (e.g., 1 k) by using the physical distance (PhyD), genetic distance (GenD) and random distance (RanD) between SNPs respectively based on the high-density SNP data of a Germany Holstein dairy cattle population. Therefore, there are three different panels at a certain SNP number level. These panels were used to construct GP models to predict fat percentage, milk yield and somatic cell score. Meanwhile, the mean (d¯) and variance (σd2) of the physical distance between SNPs and the mean (r2¯) and variance (σr22) of the genetic distance between SNPs in each panel were used as marker density-related measurements and their influence on the GEBV accuracy was investigated. At the same SNP number level, the d¯ of all panels is basically the same, but the σd2, r2¯ and σr22 are different. Therefore, we only investigated the effects of σd2, r2¯ and σr22 on the GEBV accuracy. The results showed that at a certain SNP number level, the GEBV accuracy was negatively correlated with σd2, but not with r2¯ and σr22. Compared with GenD and RanD, the σd2 of panels constructed by PhyD is smaller. The low and moderate-density panels (< 50 k) constructed by RanD or GenD have large σd2, which is not conducive to genomic prediction. The GEBV accuracy of the low and moderate-density panels constructed by PhyD is 3.8~34.8% higher than that of the low and moderate-density panels constructed by RanD and GenD. Panels with 20–30 k SNPs constructed by PhyD can achieve the same or slightly higher GEBV accuracy than that of high-density SNP panels for all three traits. In summary, the smaller the variation degree of physical distance between adjacent SNPs, the higher the GEBV accuracy. The low and moderate-density panels construct by physical distance are beneficial to genomic prediction, while pruning high-density SNP data based on genetic distance is detrimental to genomic prediction. The results provide suggestions for the development of SNP panels and the research of genome prediction based on whole-genome sequence data.
Collapse
|
18
|
Karaman E, Su G, Croue I, Lund MS. Genomic prediction using a reference population of multiple pure breeds and admixed individuals. Genet Sel Evol 2021; 53:46. [PMID: 34058971 PMCID: PMC8168010 DOI: 10.1186/s12711-021-00637-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 05/11/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In dairy cattle populations in which crossbreeding has been used, animals show some level of diversity in their origins. In rotational crossbreeding, for instance, crossbred dams are mated with purebred sires from different pure breeds, and the genetic composition of crossbred animals is an admixture of the breeds included in the rotation. How to use the data of such individuals in genomic evaluations is still an open question. In this study, we aimed at providing methodologies for the use of data from crossbred individuals with an admixed genetic background together with data from multiple pure breeds, for the purpose of genomic evaluations for both purebred and crossbred animals. A three-breed rotational crossbreeding system was mimicked using simulations based on animals genotyped with the 50 K single nucleotide polymorphism (SNP) chip. RESULTS For purebred populations, within-breed genomic predictions generally led to higher accuracies than those from multi-breed predictions using combined data of pure breeds. Adding admixed population's (MIX) data to the combined pure breed data considering MIX as a different breed led to higher accuracies. When prediction models were able to account for breed origin of alleles, accuracies were generally higher than those from combining all available data, depending on the correlation of quantitative trait loci (QTL) effects between the breeds. Accuracies varied when using SNP effects from any of the pure breeds to predict the breeding values of MIX. Using those breed-specific SNP effects that were estimated separately in each pure breed, while accounting for breed origin of alleles for the selection candidates of MIX, generally improved the accuracies. Models that are able to accommodate MIX data with the breed origin of alleles approach generally led to higher accuracies than models without breed origin of alleles, depending on the correlation of QTL effects between the breeds. CONCLUSIONS Combining all available data, pure breeds' and admixed population's data, in a multi-breed reference population is beneficial for the estimation of breeding values for pure breeds with a small reference population. For MIX, such an approach can lead to higher accuracies than considering breed origin of alleles for the selection candidates, and using breed-specific SNP effects estimated separately in each pure breed. Including MIX data in the reference population of multiple breeds by considering the breed origin of alleles, accuracies can be further improved. Our findings are relevant for breeding programs in which crossbreeding is systematically applied, and also for populations that involve different subpopulations and between which exchange of genetic material is routine practice.
Collapse
Affiliation(s)
- Emre Karaman
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark.
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | | | - Mogens S Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| |
Collapse
|
19
|
van den Berg I, Ho PN, Haile-Mariam M, Beatson PR, O'Connor E, Pryce JE. Genetic parameters of blood urea nitrogen and milk urea nitrogen concentration in dairy cattle managed in pasture-based production systems of New Zealand and Australia. ANIMAL PRODUCTION SCIENCE 2021. [DOI: 10.1071/an21049] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Context
Urinary nitrogen excretion by grazing cattle causes environmental pollution. Selecting for cows with a lower concentration of urinary nitrogen excretion may reduce the environmental impact. While urinary nitrogen excretion is difficult to measure, blood urea nitrogen (BUN), mid-infrared spectroscopy (MIR)-predicted BUN (MBUN), which is predicted from MIR spectra measured on milk samples, and milk urea nitrogen (MUN) are potential indicator traits. Australia and New Zealand have increasing datasets of cows with urea records, with 18 120 and 15 754 cows with urea records in Australia and New Zealand respectively. A collaboration between Australia and New Zealand could further increase the size of the dataset by sharing data.
Aims
Our aims were to estimate genetic parameters for urea traits within country, and genetic correlations between countries to gauge the benefit of having a joint reference population for genomic prediction of an indicator trait that is potentially suitable for selection to reduce urinary nitrogen excretion for both countries.
Methods
Genetic parameters were estimated within country (Australia and New Zealand) in Holstein, Jersey and a multibreed population, for BUN, MBUN and MUN in Australia and MUN in New Zealand, using high-density genotypes. Genetic correlations were also estimated between the urea traits recorded in Australia and MUN in New Zealand. Analyses used the first record available for each cow or within days-in-milk (DIM) intervals.
Key results
Heritabilities ranged from 0.08 to 0.32 for the various urea traits. Higher heritabilities were obtained for Jersey than for Holstein, and for the New Zealand cows than for the Australian cows. While urea traits were highly correlated within Australia (0.71–0.94), genetic correlations between Australia and New Zealand were small to moderate (0.08–0.58).
Conclusions
Our results showed that the heritability for urea traits differs among trait, breed, and country. While urea traits are highly correlated within country, genetic correlations between urea traits in Australia and MUN in New Zealand were only low to moderate.
Implications
Further study is required to identify the underlying causes of the difference in heritabilities observed, to compare the accuracies of different reference populations, and to estimate genetic correlations between urea traits and other traits such as fertility and feed intake. Larger datasets may help estimate genetic correlations more accurately between countries.
Collapse
|
20
|
van den Berg I, Xiang R, Jenko J, Pausch H, Boussaha M, Schrooten C, Tribout T, Gjuvsland AB, Boichard D, Nordbø Ø, Sanchez MP, Goddard ME. Meta-analysis for milk fat and protein percentage using imputed sequence variant genotypes in 94,321 cattle from eight cattle breeds. Genet Sel Evol 2020; 52:37. [PMID: 32635893 PMCID: PMC7339598 DOI: 10.1186/s12711-020-00556-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 06/26/2020] [Indexed: 12/14/2022] Open
Abstract
Background Sequence-based genome-wide association studies (GWAS) provide high statistical power to identify candidate causal mutations when a large number of individuals with both sequence variant genotypes and phenotypes is available. A meta-analysis combines summary statistics from multiple GWAS and increases the power to detect trait-associated variants without requiring access to data at the individual level of the GWAS mapping cohorts. Because linkage disequilibrium between adjacent markers is conserved only over short distances across breeds, a multi-breed meta-analysis can improve mapping precision. Results To maximise the power to identify quantitative trait loci (QTL), we combined the results of nine within-population GWAS that used imputed sequence variant genotypes of 94,321 cattle from eight breeds, to perform a large-scale meta-analysis for fat and protein percentage in cattle. The meta-analysis detected (p ≤ 10−8) 138 QTL for fat percentage and 176 QTL for protein percentage. This was more than the number of QTL detected in all within-population GWAS together (124 QTL for fat percentage and 104 QTL for protein percentage). Among all the lead variants, 100 QTL for fat percentage and 114 QTL for protein percentage had the same direction of effect in all within-population GWAS. This indicates either persistence of the linkage phase between the causal variant and the lead variant across breeds or that some of the lead variants might indeed be causal or tightly linked with causal variants. The percentage of intergenic variants was substantially lower for significant variants than for non-significant variants, and significant variants had mostly moderate to high minor allele frequencies. Significant variants were also clustered in genes that are known to be relevant for fat and protein percentages in milk. Conclusions Our study identified a large number of QTL associated with fat and protein percentage in dairy cattle. We demonstrated that large-scale multi-breed meta-analysis reveals more QTL at the nucleotide resolution than within-population GWAS. Significant variants were more often located in genic regions than non-significant variants and a large part of them was located in potentially regulatory regions.
Collapse
Affiliation(s)
- Irene van den Berg
- Agriculture Victoria Research, AgriBio, 5 Ring Road, Bundoora, VIC, 3083, Australia.
| | - Ruidong Xiang
- Agriculture Victoria Research, AgriBio, 5 Ring Road, Bundoora, VIC, 3083, Australia.,Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Janez Jenko
- GENO SA, Storhamargata 44, 2317, Hamar, Norway
| | | | - Mekki Boussaha
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | | | - Thierry Tribout
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | | | - Didier Boichard
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | | | - Marie-Pierre Sanchez
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | - Mike E Goddard
- Agriculture Victoria Research, AgriBio, 5 Ring Road, Bundoora, VIC, 3083, Australia.,Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia
| |
Collapse
|
21
|
van den Berg I, MacLeod I, Reich C, Breen E, Pryce J. Optimizing genomic prediction for Australian Red dairy cattle. J Dairy Sci 2020; 103:6276-6298. [DOI: 10.3168/jds.2019-17914] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 02/13/2020] [Indexed: 12/18/2022]
|
22
|
Konstantinov KV, Goddard ME. Application of multivariate single-step SNP best linear unbiased predictor model and revised SNP list for genomic evaluation of dairy cattle in Australia. J Dairy Sci 2020; 103:8305-8316. [PMID: 32622609 DOI: 10.3168/jds.2020-18242] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 04/21/2020] [Indexed: 11/19/2022]
Abstract
The objectives of this study were (1) to evaluate the computational feasibility of the multitrait test-day single-step SNP-BLUP (ssSNP-BLUP) model using phenotypic records of genotyped and nongenotyped animals, and (2) to compare accuracies (coefficient of determination; R2) and bias of genomic estimated breeding values (GEBV) and de-regressed proofs as response variables in 3 Australian dairy cattle breeds (i.e., Holstein, Jersey, and Red breeds). Additive genomic random regression coefficients for milk, fat, protein yield and somatic cell score were predicted in the first, second, and third lactation. The predicted coefficients were used to derive 305-d GEBV and were compared with the traditional parent averages obtained from a BLUP model without genomic information. Cow fertility traits were evaluated from the 5-trait repeatability model (i.e., calving interval, days from calving to first service, pregnancy diagnosis, first service nonreturn rate, and lactation length). The de-regressed proofs were only for calving interval. Our results showed that ssSNP-BLUP using multitrait test-day model increased reliability and reduced bias of breeding values of young animals when compared with parent average from traditional BLUP in Australian Holsten, Jersey, and Red breeds. The use of a custom selection of approximately 46,000 SNP (custom XT SNP list) increased the reliability of GEBV compared with the results obtained using the commercial Illumina 50K chip (Illumina, San Diego, CA). The use of the second preconditioner substantially improved the convergence rate of the preconditioned conjugate gradient method, but further work is needed to improve the efficiency of the computation of the Kronecker matrix product by vector. Application of ssSNP-BLUP to multitrait random regression models is computationally feasible.
Collapse
Affiliation(s)
- K V Konstantinov
- DataGene Limited, Agriculture Victoria, AgriBio Centre for AgriBusiness, 5 Ring Rd., Bundoora, Victoria 3083, Australia.
| | - M E Goddard
- Melbourne School of Land and Environment, University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
23
|
Raymond B, Wientjes YCJ, Bouwman AC, Schrooten C, Veerkamp RF. A deterministic equation to predict the accuracy of multi-population genomic prediction with multiple genomic relationship matrices. Genet Sel Evol 2020; 52:21. [PMID: 32345213 PMCID: PMC7189707 DOI: 10.1186/s12711-020-00540-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 04/14/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A multi-population genomic prediction (GP) model in which important pre-selected single nucleotide polymorphisms (SNPs) are differentially weighted (MPMG) has been shown to result in better prediction accuracy than a multi-population, single genomic relationship matrix ([Formula: see text]) GP model (MPSG) in which all SNPs are weighted equally. Our objective was to underpin theoretically the advantages and limits of the MPMG model over the MPSG model, by deriving and validating a deterministic prediction equation for its accuracy. METHODS Using selection index theory, we derived an equation to predict the accuracy of estimated total genomic values of selection candidates from population [Formula: see text] ([Formula: see text]), when individuals from two populations, [Formula: see text] and [Formula: see text], are combined in the training population and two [Formula: see text], made respectively from pre-selected and remaining SNPs, are fitted simultaneously in MPMG. We used simulations to validate the prediction equation in scenarios that differed in the level of genetic correlation between populations, heritability, and proportion of genetic variance explained by the pre-selected SNPs. Empirical accuracy of the MPMG model in each scenario was calculated and compared to the predicted accuracy from the equation. RESULTS In general, the derived prediction equation resulted in accurate predictions of [Formula: see text] for the scenarios evaluated. Using the prediction equation, we showed that an important advantage of the MPMG model over the MPSG model is its ability to benefit from the small number of independent chromosome segments ([Formula: see text]) due to the pre-selected SNPs, both within and across populations, whereas for the MPSG model, there is only a single value for [Formula: see text], calculated based on all SNPs, which is very large. However, this advantage is dependent on the pre-selected SNPs that explain some proportion of the total genetic variance for the trait. CONCLUSIONS We developed an equation that gives insight into why, and under which conditions the MPMG outperforms the MPSG model for GP. The equation can be used as a deterministic tool to assess the potential benefit of combining information from different populations, e.g., different breeds or lines for GP in livestock or plants, or different groups of people based on their ethnic background for prediction of disease risk scores.
Collapse
Affiliation(s)
- Biaty Raymond
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands. .,Biometris, Wageningen University and Research, 6700AA, Wageningen, The Netherlands.
| | - Yvonne C J Wientjes
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Aniek C Bouwman
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | | | - Roel F Veerkamp
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
24
|
Genomic Analysis Using Bayesian Methods under Different Genotyping Platforms in Korean Duroc Pigs. Animals (Basel) 2020; 10:ani10050752. [PMID: 32344859 PMCID: PMC7277155 DOI: 10.3390/ani10050752] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 04/16/2020] [Accepted: 04/22/2020] [Indexed: 12/03/2022] Open
Abstract
Simple Summary This study investigated the informative regions and the efficiency of genomic predictions for backfat thickness, days to 90 kg body weight, loin muscle area, and lean percentage in Korean Duroc pigs. The several regions of the genome were identified and a significant marker was found near the MC4R gene for growth and production-related traits. No differences in genomic accuracy were identified on the basis of the Bayesian approaches in these four growth and production-related traits. The genomic accuracy is improved by using deregressed estimated breeding values including parental information as a response variable in Korean Duroc pigs. Abstract Genomic evaluation has been widely applied to several species using commercial single nucleotide polymorphism (SNP) genotyping platforms. This study investigated the informative genomic regions and the efficiency of genomic prediction by using two Bayesian approaches (BayesB and BayesC) under two moderate-density SNP genotyping panels in Korean Duroc pigs. Growth and production records of 1026 individuals were genotyped using two medium-density, SNP genotyping platforms: Illumina60K and GeneSeek80K. These platforms consisted of 61,565 and 68,528 SNP markers, respectively. The deregressed estimated breeding values (DEBVs) derived from estimated breeding values (EBVs) and their reliabilities were taken as response variables. Two Bayesian approaches were implemented to perform the genome-wide association study (GWAS) and genomic prediction. Multiple significant regions for days to 90 kg (DAYS), lean muscle area (LMA), and lean percent (PCL) were detected. The most significant SNP marker, located near the MC4R gene, was detected using GeneSeek80K. Accuracy of genomic predictions was higher using the GeneSeek80K SNP panel for DAYS (Δ2%) and LMA (Δ2–3%) with two response variables, with no gains in accuracy by the Bayesian approaches in four growth and production-related traits. Genomic prediction is best derived from DEBVs including parental information as a response variable between two DEBVs regardless of the genotyping platform and the Bayesian method for genomic prediction accuracy in Korean Duroc pig breeding.
Collapse
|
25
|
VanRaden PM. Symposium review: How to implement genomic selection. J Dairy Sci 2020; 103:5291-5301. [PMID: 32331884 DOI: 10.3168/jds.2019-17684] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 01/03/2020] [Indexed: 12/16/2022]
Abstract
Genomic selection was adopted very quickly in the 10 yr after first implementation, and breeders continue to find new uses for genomic testing. Breeding values with higher reliability earlier in life are estimated by combining DNA genotypes for many thousands of loci using existing identification, pedigree, and phenotype databases for millions of animals. Quality control for both new and previous data is greatly improved by comparing genomic and pedigree relationships to correct parent-progeny conflicts and discover many additional ancestors. Many quantitative trait loci and gene tests have been added to previous assays that used only evenly spaced, highly polymorphic markers. Imputation now combines genotypes from many assays of differing marker densities. Prediction models have gradually advanced from normal or Bayesian distributions within trait and breed to single-step, multitrait, or other more complex models, such as multibreed models that may be needed for crossbred prediction. Genomic selection was initially applied to males to predict progeny performance but is now widely applied to females or even embryos to predict their own later performance. The initial focus on additive merit has expanded to include mating programs, genomic inbreeding, and recessive alleles. Many producers now use DNA testing to decide which heifers should be inseminated with elite dairy, beef, or sex-sorted semen, which should be embryo donors or recipients, or which should be sold or kept for breeding. Because some of these decisions are expensive to delay, predictions are now provided weekly instead of every few months. Predictions from international genomic databases are often more accurate and cost-effective than those from within-country databases that were previously designed for progeny testing unless local breeds, conditions, or traits differ greatly from the larger database. Selection indexes include many new traits, often with lower heritability or requiring large initial investments to obtain phenotypes, which provide further incentive to cooperate internationally. The genomic prediction methods developed for dairy cattle are now applied widely to many animal, human, and plant populations and could be applied to many more.
Collapse
Affiliation(s)
- P M VanRaden
- Animal Genomics and Improvement Laboratory, USDA, Agricultural Research Service, Beltsville, MD 20705-2350.
| |
Collapse
|
26
|
Haile-Mariam M, MacLeod IM, Bolormaa S, Schrooten C, O'Connor E, de Jong G, Daetwyler HD, Pryce JE. Value of sharing cow reference population between countries on reliability of genomic prediction for milk yield traits. J Dairy Sci 2019; 103:1711-1728. [PMID: 31864746 DOI: 10.3168/jds.2019-17170] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 10/24/2019] [Indexed: 01/08/2023]
Abstract
Increasing the reliability of genomic prediction (GP) of economic traits in the pasture-based dairy production systems of New Zealand (NZ) and Australia (AU) is important to both countries. This study assessed if sharing cow phenotype and genotype data of NZ and AU improves the reliability of GP for NZ bulls. Data from approximately 32,000 NZ genotyped cows and their contemporaries were included in the May 2018 routine genetic evaluation of the Australian Dairy cattle in an attempt to provide consistent phenotypes for both countries. After the genetic evaluation, deregressed proofs of cows were calculated for milk yield traits. The April 2018 multiple across-country evaluation of Interbull was also used to calculate deregressed proofs for bulls on the NZ scale. Approximately 1,178 Jersey (Jer) and 6,422 Holstein (Hol) bulls had genotype and phenotype data. In addition to NZ cows, phenotype data of close to 60,000 genotyped Australian (AU) cows from the same genetic evaluation run as NZ cows were used. All AU and NZ females were genotyped using low-density SNP chips (<10K SNP) and were imputed first to 50K and then to ∼600K (referred to as high density; HD). We used up to 98,000 animals in the reference populations, both by expanding the NZ reference set (cow, bull, single breed to multi-breed set) and by adding AU cows. Reliabilities of GP were calculated for 508 Jer and 1,251 Hol bulls whose sires are not included in the reference set (RS) to ensure that real differences are not masked by close relationships. The GP was tested using 50K or high-density SNP chip using genomic BLUP in bivariate (considering country as a trait) or single trait models. The RS that gave the highest reliability for each breed were also tested using a hybrid GP method that combines expectation maximization with Bayes R. The addition of the AU cows to an NZ RS that included either NZ cows only, or cows and bulls, improved the reliability of GP for both NZ Hol and Jer validation bulls for all traits. Using single breed reference populations also increased reliability when NZ crossbred cows were added to reference populations that included only purebred NZ bulls and cows and AU cows. The full multi-breed RS (all NZ cows and bulls and AU cows) provided similar reliabilities in NZ Hol bulls, when compared with the single breed reference with crossbred NZ cows. For Jer validation bulls, the RS that included Jer cows and bulls and crossbred cows from NZ and Jer cows from AU was marginally better than the all-breed, all-country RS. In terms of reliability, the advantage of the HD SNP chip was small but captured more of the genomic variance than the 50K, particularly for Hol. The expectation maximization Bayes R GP method was slightly (up to 3 percentage points) better than genomic BLUP. We conclude that GP of milk production traits in NZ bulls improves by up to 7 percentage points in reliability by expanding the NZ reference population to include AU cows.
Collapse
Affiliation(s)
- M Haile-Mariam
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia.
| | - I M MacLeod
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia
| | - S Bolormaa
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia
| | | | | | - G de Jong
- CRV, 6800 AL Arnhem, the Netherlands
| | - H D Daetwyler
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia; School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - J E Pryce
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Bundoora, VIC 3083, Australia; School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| |
Collapse
|
27
|
VanRaden PM, Tooker ME, Chud TCS, Norman HD, Megonigal JH, Haagen IW, Wiggans GR. Genomic predictions for crossbred dairy cattle. J Dairy Sci 2019; 103:1620-1631. [PMID: 31837783 DOI: 10.3168/jds.2019-16634] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 10/14/2019] [Indexed: 01/14/2023]
Abstract
Genomic evaluations are useful for crossbred as well as purebred populations when selection is applied to commercial herds. Dairy farmers had already spent more than $1 million to genotype over 32,000 crossbred animals before US genomic evaluations became available for those animals. Thus, new tools were needed to provide accurate genomic predictions for crossbreds. Genotypes for crossbreds are imputed more accurately when the imputation reference population includes purebreds. Therefore, genotypes of 6,296 crossbred animals were imputed from lower-density chips by including either 3,119 ancestors or 834,367 genotyped animals in the reference population. Crossbreds in the imputation study included 733 Jersey × Holstein F1 animals, 55 Brown Swiss × Holstein F1 animals, 2,300 Holstein backcrosses, 2,026 Jersey backcrosses, 27 Brown Swiss backcrosses, and 502 other crossbreds of various breed combinations. Another 653 animals appeared to be purebreds that owners had miscoded as a different breed. Genomic breed composition was estimated from 60,671 markers using the known breed identities for purebred, progeny-tested Holstein, Jersey, Brown Swiss, Ayrshire, and Guernsey bulls as the 5 traits (breed fractions) to be predicted. Estimates of breed composition were adjusted so that no percentages were negative or exceeded 100%, and breed percentages summed to 100%. Another adjustment set percentages above 93.5% equal to 100%, and the resulting value was termed breed base representation (BBR). Larger percentages of missing alleles were imputed by using a crossbred reference population rather than only the closest purebred reference population. Crossbred predictions were averages of genomic predictions computed using marker effects for each pure breed, which were weighted by the animal's BBR. Marker and polygenic effects were estimated separately for each breed on the all-breed scale instead of within-breed scales. For crossbreds, genomic predictions weighted by BBR were more accurate than the average of parents' breeding values and slightly more accurate than predictions using only the predominant breed. For purebreds, single-trait predictions using only within-breed data were as accurate as multi-trait predictions with allele effects in different breeds treated as correlated effects. Crossbred genomic predicted transmitting abilities were implemented by the Council on Dairy Cattle Breeding in April 2019 and will aid producers in managing their breeding programs and selecting replacement heifers.
Collapse
Affiliation(s)
- P M VanRaden
- USDA, Agricultural Research Service, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705-2350.
| | - M E Tooker
- USDA, Agricultural Research Service, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705-2350
| | - T C S Chud
- Departamento de Ciências Exatas, Universidade Estadual Paulista (Unesp), Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, São Paulo CEP 14884-900, Brazil
| | - H D Norman
- Council on Dairy Cattle Breeding, Bowie, MD 20716
| | | | - I W Haagen
- Council on Dairy Cattle Breeding, Bowie, MD 20716
| | - G R Wiggans
- Council on Dairy Cattle Breeding, Bowie, MD 20716
| |
Collapse
|
28
|
Genomic prediction based on selected variants from imputed whole-genome sequence data in Australian sheep populations. Genet Sel Evol 2019; 51:72. [PMID: 31805849 PMCID: PMC6896509 DOI: 10.1186/s12711-019-0514-2] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Accepted: 11/25/2019] [Indexed: 12/13/2022] Open
Abstract
Background Whole-genome sequence (WGS) data could contain information on genetic variants at or in high linkage disequilibrium with causative mutations that underlie the genetic variation of polygenic traits. Thus far, genomic prediction accuracy has shown limited increase when using such information in dairy cattle studies, in which one or few breeds with limited diversity predominate. The objective of our study was to evaluate the accuracy of genomic prediction in a multi-breed Australian sheep population of relatively less related target individuals, when using information on imputed WGS genotypes. Methods Between 9626 and 26,657 animals with phenotypes were available for nine economically important sheep production traits and all had WGS imputed genotypes. About 30% of the data were used to discover predictive single nucleotide polymorphism (SNPs) based on a genome-wide association study (GWAS) and the remaining data were used for training and validation of genomic prediction. Prediction accuracy using selected variants from imputed sequence data was compared to that using a standard array of 50k SNP genotypes, thereby comparing genomic best linear prediction (GBLUP) and Bayesian methods (BayesR/BayesRC). Accuracy of genomic prediction was evaluated in two independent populations that were each lowly related to the training set, one being purebred Merino and the other crossbred Border Leicester x Merino sheep. Results A substantial improvement in prediction accuracy was observed when selected sequence variants were fitted alongside 50k genotypes as a separate variance component in GBLUP (2GBLUP) or in Bayesian analysis as a separate category of SNPs (BayesRC). From an average accuracy of 0.27 in both validation sets for the 50k array, the average absolute increase in accuracy across traits with 2GBLUP was 0.083 and 0.073 for purebred and crossbred animals, respectively, whereas with BayesRC it was 0.102 and 0.087. The average gain in accuracy was smaller when selected sequence variants were treated in the same category as 50k SNPs. Very little improvement over 50k prediction was observed when using all WGS variants. Conclusions Accuracy of genomic prediction in diverse sheep populations increased substantially by using variants selected from whole-genome sequence data based on an independent multi-breed GWAS, when compared to genomic prediction using standard 50K genotypes.
Collapse
|
29
|
Xiang R, Berg IVD, MacLeod IM, Hayes BJ, Prowse-Wilkins CP, Wang M, Bolormaa S, Liu Z, Rochfort SJ, Reich CM, Mason BA, Vander Jagt CJ, Daetwyler HD, Lund MS, Chamberlain AJ, Goddard ME. Quantifying the contribution of sequence variants with regulatory and evolutionary significance to 34 bovine complex traits. Proc Natl Acad Sci U S A 2019; 116:19398-19408. [PMID: 31501319 PMCID: PMC6765237 DOI: 10.1073/pnas.1904159116] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Many genome variants shaping mammalian phenotype are hypothesized to regulate gene transcription and/or to be under selection. However, most of the evidence to support this hypothesis comes from human studies. Systematic evidence for regulatory and evolutionary signals contributing to complex traits in a different mammalian model is needed. Sequence variants associated with gene expression (expression quantitative trait loci [eQTLs]) and concentration of metabolites (metabolic quantitative trait loci [mQTLs]) and under histone-modification marks in several tissues were discovered from multiomics data of over 400 cattle. Variants under selection and evolutionary constraint were identified using genome databases of multiple species. These analyses defined 30 sets of variants, and for each set, we estimated the genetic variance the set explained across 34 complex traits in 11,923 bulls and 32,347 cows with 17,669,372 imputed variants. The per-variant trait heritability of these sets across traits was highly consistent (r > 0.94) between bulls and cows. Based on the per-variant heritability, conserved sites across 100 vertebrate species and mQTLs ranked the highest, followed by eQTLs, young variants, those under histone-modification marks, and selection signatures. From these results, we defined a Functional-And-Evolutionary Trait Heritability (FAETH) score indicating the functionality and predicted heritability of each variant. In additional 7,551 cattle, the high FAETH-ranking variants had significantly increased genetic variances and genomic prediction accuracies in 3 production traits compared to the low FAETH-ranking variants. The FAETH framework combines the information of gene regulation, evolution, and trait heritability to rank variants, and the publicly available FAETH data provide a set of biological priors for cattle genomic selection worldwide.
Collapse
Affiliation(s)
- Ruidong Xiang
- Faculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052, Australia;
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Irene van den Berg
- Faculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052, Australia
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Benjamin J Hayes
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
- Centre for Animal Science, The University of Queensland, St. Lucia, QLD 4067, Australia
| | - Claire P Prowse-Wilkins
- Faculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052, Australia
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Min Wang
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Sunduimijid Bolormaa
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Zhiqian Liu
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Simone J Rochfort
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Coralie M Reich
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Brett A Mason
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Christy J Vander Jagt
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Hans D Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Mogens S Lund
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark
| | - Amanda J Chamberlain
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Michael E Goddard
- Faculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052, Australia
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| |
Collapse
|
30
|
GWAS for Meat and Carcass Traits Using Imputed Sequence Level Genotypes in Pooled F2-Designs in Pigs. G3-GENES GENOMES GENETICS 2019; 9:2823-2834. [PMID: 31296617 PMCID: PMC6723123 DOI: 10.1534/g3.119.400452] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
In order to gain insight into the genetic architecture of economically important traits in pigs and to derive suitable genetic markers to improve these traits in breeding programs, many studies have been conducted to map quantitative trait loci. Shortcomings of these studies were low mapping resolution, large confidence intervals for quantitative trait loci-positions and large linkage disequilibrium blocks. Here, we overcome these shortcomings by pooling four large F2 designs to produce smaller linkage disequilibrium blocks and by resequencing the founder generation at high coverage and the F1 generation at low coverage for subsequent imputation of the F2 generation to whole genome sequencing marker density. This lead to the discovery of more than 32 million variants, 8 million of which have not been previously reported. The pooling of the four F2 designs enabled us to perform a joint genome-wide association study, which lead to the identification of numerous significantly associated variant clusters on chromosomes 1, 2, 4, 7, 17 and 18 for the growth and carcass traits average daily gain, back fat thickness, meat fat ratio, and carcass length. We could not only confirm previously reported, but also discovered new quantitative trait loci. As a result, several new candidate genes are discussed, among them BMP2 (bone morphogenetic protein 2), which we recently discovered in a related study. Variant effect prediction revealed that 15 high impact variants for the traits back fat thickness, meat fat ratio and carcass length were among the statistically significantly associated variants.
Collapse
|
31
|
Improvement of genomic prediction by integrating additional single nucleotide polymorphisms selected from imputed whole genome sequencing data. Heredity (Edinb) 2019; 124:37-49. [PMID: 31278370 PMCID: PMC6906477 DOI: 10.1038/s41437-019-0246-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 05/11/2019] [Accepted: 06/17/2019] [Indexed: 11/10/2022] Open
Abstract
The availability of whole genome sequencing (WGS) data enables the discovery of causative single nucleotide polymorphisms (SNPs) or SNPs in high linkage disequilibrium with causative SNPs. This study investigated effects of integrating SNPs selected from imputed WGS data into the data of 54K chip on genomic prediction in Danish Jersey. The WGS SNPs, mainly including peaks of quantitative trait loci, structure variants, regulatory regions of genes, and SNPs within genes with strong effects predicted with variant effect predictor, were selected in previous analyses for dairy breeds in Denmark–Finland–Sweden (DFS) and France (FRA). Animals genotyped with 54K chip, standard LD chip, and customized LD chip which covered selected WGS SNPs and SNPs in the standard LD chip, were imputed to 54K together with DFS and FRA SNPs. Genomic best linear unbiased prediction (GBLUP) and Bayesian four-distribution mixture models considering 54K and selected WGS SNPs as one (a one-component model) or two separate genetic components (a two-component model) were used to predict breeding values. For milk production traits and mastitis, both DFS (0.025) and FRA (0.029) sets of additional WGS SNPs improved reliabilities, and inclusions of all selected WGS SNPs generally achieved highest improvements of reliabilities (0.034). A Bayesian four-distribution model yielded higher reliabilities than a GBLUP model for milk and protein, but extra gains in reliabilities from using selected WGS SNPs were smaller for a Bayesian four-distribution model than a GBLUP model. Generally, no significant difference was observed between one-component and two-component models, except for using GBLUP models for milk.
Collapse
|
32
|
Al Kalaldeh M, Gibson J, Duijvesteijn N, Daetwyler HD, MacLeod I, Moghaddar N, Lee SH, van der Werf JHJ. Using imputed whole-genome sequence data to improve the accuracy of genomic prediction for parasite resistance in Australian sheep. Genet Sel Evol 2019; 51:32. [PMID: 31242855 PMCID: PMC6595562 DOI: 10.1186/s12711-019-0476-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 06/18/2019] [Indexed: 01/16/2023] Open
Abstract
Background This study aimed at (1) comparing the accuracies of genomic prediction for parasite resistance in sheep based on whole-genome sequence (WGS) data to those based on 50k and high-density (HD) single nucleotide polymorphism (SNP) panels; (2) investigating whether the use of variants within quantitative trait loci (QTL) regions that were selected from regional heritability mapping (RHM) in an independent dataset improved the accuracy more than variants selected from genome-wide association studies (GWAS); and (3) comparing the prediction accuracies between variants selected from WGS data to variants selected from the HD SNP panel. Results The accuracy of genomic prediction improved marginally from 0.16 ± 0.02 and 0.18 ± 0.01 when using all the variants from 50k and HD genotypes, respectively, to 0.19 ± 0.01 when using all the variants from WGS data. Fitting a GRM from the selected variants alongside a GRM from the 50k SNP genotypes improved the prediction accuracy substantially compared to fitting the 50k SNP genotypes alone. The gain in prediction accuracy was slightly more pronounced when variants were selected from WGS data compared to when variants were selected from the HD panel. When sequence variants that passed the GWAS \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$- log_{10} (p\,value)$$\end{document}-log10(pvalue) threshold of 3 across the entire genome were selected, the prediction accuracy improved by 5% (up to 0.21 ± 0.01), whereas when selection was limited to sequence variants that passed the same GWAS \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$- log_{10} (p\,value)$$\end{document}-log10(pvalue) threshold of 3 in regions identified by RHM, the accuracy improved by 9% (up to 0.25 ± 0.01). Conclusions Our results show that through careful selection of sequence variants from the QTL regions, the accuracy of genomic prediction for parasite resistance in sheep can be improved. These findings have important implications for genomic prediction in sheep.
Collapse
Affiliation(s)
- Mohammad Al Kalaldeh
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia.
| | - John Gibson
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Naomi Duijvesteijn
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Hans D Daetwyler
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,Centre for AgriBioscience, Agriculture Victoria, Bundoora, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Iona MacLeod
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,Centre for AgriBioscience, Agriculture Victoria, Bundoora, VIC, 3083, Australia
| | - Nasir Moghaddar
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Sang Hong Lee
- Australian Centre for Precision Health, University of South Australia Cancer Research Institute, University of South Australia, Adelaide, SA, 5000, Australia
| | - Julius H J van der Werf
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| |
Collapse
|
33
|
Ma P, Lund MS, Aamand GP, Su G. Use of a Bayesian model including QTL markers increases prediction reliability when test animals are distant from the reference population. J Dairy Sci 2019; 102:7237-7247. [PMID: 31155255 DOI: 10.3168/jds.2018-15815] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 03/31/2019] [Indexed: 01/23/2023]
Abstract
Relatedness between reference and test animals has an important effect on the reliability of genomic prediction for test animals. Because genomic prediction has been widely applied in practical cattle breeding and bulls have been selected according to genomic breeding value without progeny testing, the sires or grandsires of candidates might not have phenotypic information and might not be in the reference population when the candidates are selected. The objective of this study was to investigate the decreasing trend of the reliability of genomic prediction given distant reference populations, using genomic best linear unbiased prediction (GBLUP) and Bayesian variable selection models with or without including the quantitative trait locus (QTL) markers detected from sequencing data. The data used in this study consisted of 22,242 bulls genotyped using the 54K SNP array from EuroGenomics. Among them, 1,444 Danish bulls born from 2006 to 2010 were selected as test animals. Different reference populations with varying relationships to test animals were created according to pedigree-based relationships. The reference individuals having a relationship with one or more test animals higher than 0.4 (scenario ρ < 0.4), 0.2 (ρ < 0.2), or 0.1 (ρ < 0.1, where ρ = relationship coefficient) were removed from reference sets; these represented the distance between reference and test animals being 2 generations, 3 generations, and 4 generations, respectively. Imputed whole-genome sequencing data of bulls from Denmark were used to conduct a genome-wide association study (GWAS). A small number of significant variants (QTL markers) from the GWAS were added to the array data. To compare the effects of different models, the basic GBLUP model, a Bayesian selection variable model, a GBLUP model with 2 components of genetic effects, and a Bayesian model with pooled array data and QTL markers were used for estimating genomic estimated breeding values (GEBV) of test animals. The reliability of genomic prediction decreased when the test animals were more generations away from the reference population. The reliability of genomic prediction was 0.461 for 1 generation away and 0.396 for 3 generations away, with the same number of individuals in the reference set, using a GBLUP model with chip markers only. The results showed that using the Bayesian method and QTL markers improved the reliability of genomic prediction in all scenarios of relationship between test and reference animals, in a range of 1.3% and 65.1% (4 generations away with only 841 individuals in the reference set). However, most gains were for predictions of milk yield and fat yield. There was little improvement for predictions of protein yield and mastitis, and no improvement for prediction of fertility, except for scenario ρ < 0.1, in which there was a large improvement for predictions of all traits. On the other hand, models including more than 10% polygenic effect decreased prediction reliability when the relationship between test and reference animals was distant.
Collapse
Affiliation(s)
- Peipei Ma
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, P.R. China; Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Aarhus, Denmark
| | - Mogens S Lund
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Aarhus, Denmark
| | - Gert P Aamand
- NAV Nordic Cattle Genetic Evaluation, DK-8200, Aarhus, Denmark
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830, Aarhus, Denmark.
| |
Collapse
|
34
|
Bolormaa S, Chamberlain AJ, Khansefid M, Stothard P, Swan AA, Mason B, Prowse-Wilkins CP, Duijvesteijn N, Moghaddar N, van der Werf JH, Daetwyler HD, MacLeod IM. Accuracy of imputation to whole-genome sequence in sheep. Genet Sel Evol 2019; 51:1. [PMID: 30654735 PMCID: PMC6337865 DOI: 10.1186/s12711-018-0443-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 12/18/2018] [Indexed: 12/12/2022] Open
Abstract
Background The use of whole-genome sequence (WGS) data for genomic prediction and association studies is highly desirable because the causal mutations should be present in the data. The sequencing of 935 sheep from a range of breeds provides the opportunity to impute sheep genotyped with single nucleotide polymorphism (SNP) arrays to WGS. This study evaluated the accuracy of imputation from SNP genotypes to WGS using this reference population of 935 sequenced sheep. Results The accuracy of imputation from the Ovine Infinium® HD BeadChip SNP (~ 500 k) to WGS was assessed for three target breeds: Merino, Poll Dorset and F1 Border Leicester × Merino. Imputation accuracy was highest for the Poll Dorset breed, although there were more Merino individuals in the sequenced reference population than Poll Dorset individuals. In addition, empirical imputation accuracies were higher (by up to 1.7%) when using larger multi-breed reference populations compared to using a smaller single-breed reference population. The mean accuracy of imputation across target breeds using the Minimac3 or the FImpute software was 0.94. The empirical imputation accuracy varied considerably across the genome; six chromosomes carried regions of one or more Mb with a mean imputation accuracy of < 0.7. Imputation accuracy in five variant annotation classes ranged from 0.87 (missense) up to 0.94 (intronic variants), where lower accuracy corresponded to higher proportions of rare alleles. The imputation quality statistic reported from Minimac3 (R2) had a clear positive relationship with the empirical imputation accuracy. Therefore, by first discarding imputed variants with an R2 below 0.4, the mean empirical accuracy across target breeds increased to 0.97. Although accuracy of genomic prediction was less affected by filtering on R2 in a multi-breed population of sheep with imputed WGS, the genomic heritability clearly tended to be lower when using variants with an R2 ≤ 0.4. Conclusions The mean imputation accuracy was high for all target breeds and was increased by combining smaller breed sets into a multi-breed reference. We found that the Minimac3 software imputation quality statistic (R2) was a useful indicator of empirical imputation accuracy, enabling removal of very poorly imputed variants before downstream analyses. Electronic supplementary material The online version of this article (10.1186/s12711-018-0443-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sunduimijid Bolormaa
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia. .,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.
| | - Amanda J Chamberlain
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia
| | - Majid Khansefid
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia.,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia
| | - Paul Stothard
- Faculty of Agricultural, Life and Environmental Sciences, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Andrew A Swan
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,Animal Genetics and Breeding Unit, University of New England, Armidale, NSW, 2351, Australia
| | - Brett Mason
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia
| | - Claire P Prowse-Wilkins
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia
| | - Naomi Duijvesteijn
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Nasir Moghaddar
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Julius H van der Werf
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Hans D Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia.,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3086, Australia
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia.,Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia
| |
Collapse
|
35
|
Zhang Q, Sahana G, Su G, Guldbrandtsen B, Lund MS, Calus MPL. Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle. Genet Sel Evol 2018; 50:62. [PMID: 30458700 PMCID: PMC6247626 DOI: 10.1186/s12711-018-0432-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 11/14/2018] [Indexed: 11/05/2022] Open
Abstract
Background Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of including RLFV that are within genes and selected from whole-genome sequence variants, on the reliability of genomic prediction for fertility, health and longevity in dairy cattle. Results All genic RLFV with a minor allele frequency lower than 0.05 were extracted from imputed sequence data and subsets were created using different strategies. These subsets were subsequently combined with Illumina 50 k single nucleotide polymorphism (SNP) data and used for genomic prediction. Reliability of prediction obtained by using 50 k SNP data alone was used as reference value and absolute changes in reliabilities are referred to as changes in percentage points. Adding a component that included either all the genic or a subset of selected RLFV into the model in addition to the 50 k component changed the reliability of predictions by − 2.2 to 1.1%, i.e. hardly no change in reliability of prediction was found, regardless of how the RLFV were selected. In addition to these empirical analyses, a simulation study was performed to evaluate the potential impact of adding RLFV in the model on the reliability of prediction. Three sets of causal RLFV (containing 21,468, 1348 and 235 RLFV) that were randomly selected from different numbers of genes were generated and accounted for 10% additional genetic variance of the estimated variance explained by the 50 k SNPs. When genic RLFV based on mapping results were included in the prediction model, reliabilities improved by up to 4.0% and when the causal RLFV were included they improved by up to 6.8%. Conclusions Using selected RLFV from whole-genome sequence data had only a small impact on the empirical reliability of genomic prediction in dairy cattle. Our simulations revealed that for sequence data to bring a benefit, the key is to identify causal RLFV. Electronic supplementary material The online version of this article (10.1186/s12711-018-0432-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qianqian Zhang
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark. .,Wageningen University and Research, Animal Breeding and Genomics, Wageningen, The Netherlands. .,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Goutam Sahana
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Guosheng Su
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Bernt Guldbrandtsen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Mogens Sandø Lund
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Mario P L Calus
- Wageningen University and Research, Animal Breeding and Genomics, Wageningen, The Netherlands
| |
Collapse
|
36
|
Raymond B, Bouwman AC, Wientjes YCJ, Schrooten C, Houwing-Duistermaat J, Veerkamp RF. Genomic prediction for numerically small breeds, using models with pre-selected and differentially weighted markers. Genet Sel Evol 2018; 50:49. [PMID: 30314431 PMCID: PMC6186145 DOI: 10.1186/s12711-018-0419-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 10/01/2018] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Genomic prediction (GP) accuracy in numerically small breeds is limited by the small size of the reference population. Our objective was to test a multi-breed multiple genomic relationship matrices (GRM) GP model (MBMG) that weighs pre-selected markers separately, uses the remaining markers to explain the remaining genetic variance that can be explained by markers, and weighs information of breeds in the reference population by their genetic correlation with the validation breed. METHODS Genotype and phenotype data were used on 595 Jersey bulls from New Zealand and 5503 Holstein bulls from the Netherlands, all with deregressed proofs for stature. Different sets of markers were used, containing either pre-selected markers from a meta-genome-wide association analysis on stature, remaining markers or both. We implemented a multi-breed bivariate GREML model in which we fitted either a single multi-breed GRM (MBSG), or two distinct multi-breed GRM (MBMG), one made with pre-selected markers and the other with remaining markers. Accuracies of predicting stature for Jersey individuals using the multi-breed models (Holstein and Jersey combined reference population) was compared to those obtained using either the Jersey (within-breed) or Holstein (across-breed) reference population. All the models were subsequently fitted in the analysis of simulated phenotypes, with a simulated genetic correlation between breeds of 1, 0.5, and 0.25. RESULTS The MBMG model always gave better prediction accuracies for stature compared to MBSG, within-, and across-breed GP models. For example, with MBSG, accuracies obtained by fitting 48,912 unselected markers (0.43), 357 pre-selected markers (0.38) or a combination of both (0.43), were lower than accuracies obtained by fitting pre-selected and unselected markers in separate GRM in MBMG (0.49). This improvement was further confirmed by results from a simulation study, with MBMG performing on average 23% better than MBSG with all markers fitted. CONCLUSIONS With the MBMG model, it is possible to use information from numerically large breeds to improve prediction accuracy of numerically small breeds. The superiority of MBMG is mainly due to its ability to use information on pre-selected markers, explain the remaining genetic variance and weigh information from a different breed by the genetic correlation between breeds.
Collapse
Affiliation(s)
- Biaty Raymond
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
- Biometris, Wageningen University and Research, 6700 AA Wageningen, The Netherlands
| | - Aniek C. Bouwman
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| | - Yvonne C. J. Wientjes
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| | | | - Jeanine Houwing-Duistermaat
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, 2333 ZC Leiden, The Netherlands
- School of Mathematics, Faculty of Mathematics and Physical Sciences, University of Leeds, Leeds, LS2 9JT UK
| | - Roel F. Veerkamp
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| |
Collapse
|
37
|
Calus MPL, Goddard ME, Wientjes YCJ, Bowman PJ, Hayes BJ. Multibreed genomic prediction using multitrait genomic residual maximum likelihood and multitask Bayesian variable selection. J Dairy Sci 2018; 101:4279-4294. [PMID: 29550121 DOI: 10.3168/jds.2017-13366] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 01/04/2018] [Indexed: 11/19/2022]
Abstract
Genomic prediction is applicable to individuals of different breeds. Empirical results to date, however, show limited benefits in using information on multiple breeds in the context of genomic prediction. We investigated a multitask Bayesian model, presented previously by others, implemented in a Bayesian stochastic search variable selection (BSSVS) model. This model allowed for evidence of quantitative trait loci (QTL) to be accumulated across breeds or for both QTL that segregate across breeds and breed-specific QTL. In both cases, single nucleotide polymorphism effects were estimated with information from a single breed. Other models considered were a single-trait and multitrait genomic residual maximum likelihood (GREML) model, with breeds considered as different traits, and a single-trait BSSVS model. All single-trait models were applied to each of the 2 breeds separately and to the pooled data of both breeds. The data used included a training data set of 6,278 Holstein and 722 Jersey bulls, as well as 374 Jersey validation bulls. All animals had genotypes for 474,773 single nucleotide polymorphisms after editing and phenotypes for milk, fat, and protein yields. Using the same training data, BSSVS consistently outperformed GREML. The multitask BSSVS, however, did not outperform single-trait BSSVS, which used pooled Holstein and Jersey data for training. Thus, the rigorous assumption that the traits are the same in both breeds yielded a slightly better prediction than a model that had to estimate the correlation between the breeds from the data. Adding the Holstein data significantly increased the accuracy of the single-trait GREML and BSSVS in predicting the Jerseys for milk and protein, in line with estimated correlations between the breeds of 0.66 and 0.47 for milk and protein yields, whereas only the BSSVS model significantly improved the accuracy for fat yield with an estimated correlation between breeds of only 0.05. The relatively high genetic correlations for milk and protein yields, and the superiority of the pooling strategy, is likely the result of the observed admixture between both breeds in our data. The Bayesian model was able to detect several QTL in Holsteins, which likely enabled it to outperform GREML. The inability of the multitask Bayesian models to outperform a simple pooling strategy may be explained by the fact that the pooling strategy assumes equal effects in both breeds; furthermore, this assumption may be valid for moderate- to large-sized QTL, which are important for multibreed genomic prediction.
Collapse
Affiliation(s)
- M P L Calus
- Wageningen University & Research, Animal Breeding and Genomics, PO Box 338, 6700 AH Wageningen, the Netherlands.
| | - M E Goddard
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Melbourne, Victoria 3010, Australia; Agriculture Research, Department of Economic Development, Jobs, Transport and Resources, Melbourne, Victoria 3083, Australia
| | - Y C J Wientjes
- Wageningen University & Research, Animal Breeding and Genomics, PO Box 338, 6700 AH Wageningen, the Netherlands
| | - P J Bowman
- Agriculture Research, Department of Economic Development, Jobs, Transport and Resources, Melbourne, Victoria 3083, Australia; School of Applied Systems Biology, La Trobe University, Bundoora, Victoria 3083, Australia
| | - B J Hayes
- School of Applied Systems Biology, La Trobe University, Bundoora, Victoria 3083, Australia; Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, The University of Queensland, St. Lucia, Queensland 4072, Australia
| |
Collapse
|
38
|
Werner CR, Qian L, Voss-Fels KP, Abbadi A, Leckband G, Frisch M, Snowdon RJ. Genome-wide regression models considering general and specific combining ability predict hybrid performance in oilseed rape with similar accuracy regardless of trait architecture. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2018; 131:299-317. [PMID: 29080901 DOI: 10.1007/s00122-017-3002-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 10/09/2017] [Indexed: 05/02/2023]
Abstract
Genomic prediction using the Brassica 60 k genotyping array is efficient in oilseed rape hybrids. Prediction accuracy is more dependent on trait complexity than on the prediction model. In oilseed rape breeding programs, performance prediction of parental combinations is of fundamental importance. Due to the phenomenon of heterosis, per se performance is not a reliable indicator for F1-hybrid performance, and selection of well-paired parents requires the testing of large quantities of hybrid combinations in extensive field trials. However, the number of potential hybrids, in general, dramatically exceeds breeding capacity and budget. Integration of genomic selection (GS) could substantially increase the number of potential combinations that can be evaluated. GS models can be used to predict the performance of untested individuals based only on their genotypic profiles, using marker effects previously predicted in a training population. This allows for a preselection of promising genotypes, enabling a more efficient allocation of resources. In this study, we evaluated the usefulness of the Illumina Brassica 60 k SNP array for genomic prediction and compared three alternative approaches based on a homoscedastic ridge regression BLUP and three Bayesian prediction models that considered general and specific combining ability (GCA and SCA, respectively). A total of 448 hybrids were produced in a commercial breeding program from unbalanced crosses between 220 paternal doubled haploid lines and five male-sterile testers. Predictive ability was evaluated for seven agronomic traits. We demonstrate that the Brassica 60 k genotyping array is an adequate and highly valuable platform to implement genomic prediction of hybrid performance in oilseed rape. Furthermore, we present first insights into the application of established statistical models for prediction of important agronomical traits with contrasting patterns of polygenic control.
Collapse
Affiliation(s)
- Christian R Werner
- Department of Plant Breeding, Justus Liebig University, 35392, Giessen, Germany
| | - Lunwen Qian
- Department of Plant Breeding, Justus Liebig University, 35392, Giessen, Germany
- Collaborative Innovation Center of Grain and Oil Crops in South China, Hunan Agricultural University, Changsha, 410128, China
| | - Kai P Voss-Fels
- Department of Plant Breeding, Justus Liebig University, 35392, Giessen, Germany
| | - Amine Abbadi
- NPZ Innovation GmbH, Hohenlieth, 24363, Holtsee, Germany
| | | | - Matthias Frisch
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, 35392, Giessen, Germany
| | - Rod J Snowdon
- Department of Plant Breeding, Justus Liebig University, 35392, Giessen, Germany.
| |
Collapse
|
39
|
Which Individuals To Choose To Update the Reference Population? Minimizing the Loss of Genetic Diversity in Animal Genomic Selection Programs. G3-GENES GENOMES GENETICS 2018; 8:113-121. [PMID: 29133511 PMCID: PMC5765340 DOI: 10.1534/g3.117.1117] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Genomic selection (GS) is commonly used in livestock and increasingly in plant breeding. Relying on phenotypes and genotypes of a reference population, GS allows performance prediction for young individuals having only genotypes. This is expected to achieve fast high genetic gain but with a potential loss of genetic diversity. Existing methods to conserve genetic diversity depend mostly on the choice of the breeding individuals. In this study, we propose a modification of the reference population composition to mitigate diversity loss. Since the high cost of phenotyping is the limiting factor for GS, our findings are of major economic interest. This study aims to answer the following questions: how would decisions on the reference population affect the breeding population, and how to best select individuals to update the reference population and balance maximizing genetic gain and minimizing loss of genetic diversity? We investigated three updating strategies for the reference population: random, truncation, and optimal contribution (OC) strategies. OC maximizes genetic merit for a fixed loss of genetic diversity. A French Montbéliarde dairy cattle population with 50K SNP chip genotypes and simulations over 10 generations were used to compare these different strategies using milk production as the trait of interest. Candidates were selected to update the reference population. Prediction bias and both genetic merit and diversity were measured. Changes in the reference population composition slightly affected the breeding population. Optimal contribution strategy appeared to be an acceptable compromise to maintain both genetic gain and diversity in the reference and the breeding populations.
Collapse
|
40
|
Pausch H, Emmerling R, Gredler-Grandl B, Fries R, Daetwyler HD, Goddard ME. Meta-analysis of sequence-based association studies across three cattle breeds reveals 25 QTL for fat and protein percentages in milk at nucleotide resolution. BMC Genomics 2017; 18:853. [PMID: 29121857 PMCID: PMC5680815 DOI: 10.1186/s12864-017-4263-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 11/02/2017] [Indexed: 11/25/2022] Open
Abstract
Background Genotyping and whole-genome sequencing data have been generated for hundreds of thousands of cattle. International consortia used these data to compile imputation reference panels that facilitate the imputation of sequence variant genotypes for animals that have been genotyped using dense microarrays. Association studies with imputed sequence variant genotypes allow for the characterization of quantitative trait loci (QTL) at nucleotide resolution particularly when individuals from several breeds are included in the mapping populations. Results We imputed genotypes for 28 million sequence variants in 17,229 cattle of the Braunvieh, Fleckvieh and Holstein breeds in order to compile large mapping populations that provide high power to identify QTL for milk production traits. Association tests between imputed sequence variant genotypes and fat and protein percentages in milk uncovered between six and thirteen QTL (P < 1e-8) per breed. Eight of the detected QTL were significant in more than one breed. We combined the results across breeds using meta-analysis and identified a total of 25 QTL including six that were not significant in the within-breed association studies. Two missense mutations in the ABCG2 (p.Y581S, rs43702337, P = 4.3e-34) and GHR (p.F279Y, rs385640152, P = 1.6e-74) genes were the top variants at QTL on chromosomes 6 and 20. Another known causal missense mutation in the DGAT1 gene (p.A232K, rs109326954, P = 8.4e-1436) was the second top variant at a QTL on chromosome 14 but its allelic substitution effects were inconsistent across breeds. It turned out that the conflicting allelic substitution effects resulted from flaws in the imputed genotypes due to the use of a multi-breed reference population for genotype imputation. Conclusions Many QTL for milk production traits segregate across breeds and across-breed meta-analysis has greater power to detect such QTL than within-breed association testing. Association testing between imputed sequence variant genotypes and phenotypes of interest facilitates identifying causal mutations provided the accuracy of imputation is high. However, true causal mutations may remain undetected when the imputed sequence variant genotypes contain flaws. It is highly recommended to validate the effect of known causal variants in order to assess the ability to detect true causal mutations in association studies with imputed sequence variants. Electronic supplementary material The online version of this article (10.1186/s12864-017-4263-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hubert Pausch
- Animal Genomics, Institute of Agricultural Sciences, ETH Zurich, 8092, Zurich, Switzerland. .,Agriculture Research Division, Agriculture Victoria, Department of Economic Development, Jobs, Transport and Resources, AgriBio, VIC, 3083, Australia.
| | - Reiner Emmerling
- Institute of Animal Breeding, Bavarian State Research Center for Agriculture, 85586, Grub, Germany
| | | | - Ruedi Fries
- Animal Breeding, Technische Universitaet Muenchen, 85354, Freising, Germany
| | - Hans D Daetwyler
- Agriculture Research Division, Agriculture Victoria, Department of Economic Development, Jobs, Transport and Resources, AgriBio, VIC, 3083, Australia.,School of Applied Systems Biology, LaTrobe University, Bundoora, VIC, 3083, Australia
| | - Michael E Goddard
- Agriculture Research Division, Agriculture Victoria, Department of Economic Development, Jobs, Transport and Resources, AgriBio, VIC, 3083, Australia.,Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, VIC, 3010, Australia
| |
Collapse
|
41
|
Evaluation of the potential use of a meta-population for genomic selection in autochthonous beef cattle populations. Animal 2017; 12:1350-1357. [PMID: 29094666 DOI: 10.1017/s175173111700283x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
This study investigated the potential application of genomic selection under a multi-breed scheme in the Spanish autochthonous beef cattle populations using a simulation study that replicates the structure of linkage disequilibrium obtained from a sample of 25 triplets of sire/dam/offspring per population and using the BovineHD Beadchip. Purebred and combined reference sets were used for the genomic evaluation and several scenarios of different genetic architecture of the trait were investigated. The single-breed evaluations yielded the highest within-breed accuracies. Across breed accuracies were found low but positive on average confirming the genetic connectedness between the populations. If the same genotyping effort is split in several populations, the accuracies were lower when compared with single-breed evaluation, but showed a small advantage over small-sized purebred reference sets over the accuracies of subsequent generations. Besides, the genetic architecture of the trait did not show any relevant effect on the accuracy with the exception of rare variants, which yielded slightly lower results and higher loss of predictive ability over the generations.
Collapse
|
42
|
Werner CR, Qian L, Voss-Fels KP, Abbadi A, Leckband G, Frisch M, Snowdon RJ. Genome-wide regression models considering general and specific combining ability predict hybrid performance in oilseed rape with similar accuracy regardless of trait architecture. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2017. [PMID: 29080901 DOI: 10.1007/s00122‐017‐3002‐5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
KEY MESSAGE Genomic prediction using the Brassica 60 k genotyping array is efficient in oilseed rape hybrids. Prediction accuracy is more dependent on trait complexity than on the prediction model. In oilseed rape breeding programs, performance prediction of parental combinations is of fundamental importance. Due to the phenomenon of heterosis, per se performance is not a reliable indicator for F1-hybrid performance, and selection of well-paired parents requires the testing of large quantities of hybrid combinations in extensive field trials. However, the number of potential hybrids, in general, dramatically exceeds breeding capacity and budget. Integration of genomic selection (GS) could substantially increase the number of potential combinations that can be evaluated. GS models can be used to predict the performance of untested individuals based only on their genotypic profiles, using marker effects previously predicted in a training population. This allows for a preselection of promising genotypes, enabling a more efficient allocation of resources. In this study, we evaluated the usefulness of the Illumina Brassica 60 k SNP array for genomic prediction and compared three alternative approaches based on a homoscedastic ridge regression BLUP and three Bayesian prediction models that considered general and specific combining ability (GCA and SCA, respectively). A total of 448 hybrids were produced in a commercial breeding program from unbalanced crosses between 220 paternal doubled haploid lines and five male-sterile testers. Predictive ability was evaluated for seven agronomic traits. We demonstrate that the Brassica 60 k genotyping array is an adequate and highly valuable platform to implement genomic prediction of hybrid performance in oilseed rape. Furthermore, we present first insights into the application of established statistical models for prediction of important agronomical traits with contrasting patterns of polygenic control.
Collapse
Affiliation(s)
- Christian R Werner
- Department of Plant Breeding, Justus Liebig University, 35392, Giessen, Germany
| | - Lunwen Qian
- Department of Plant Breeding, Justus Liebig University, 35392, Giessen, Germany.,Collaborative Innovation Center of Grain and Oil Crops in South China, Hunan Agricultural University, Changsha, 410128, China
| | - Kai P Voss-Fels
- Department of Plant Breeding, Justus Liebig University, 35392, Giessen, Germany
| | - Amine Abbadi
- NPZ Innovation GmbH, Hohenlieth, 24363, Holtsee, Germany
| | | | - Matthias Frisch
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, 35392, Giessen, Germany
| | - Rod J Snowdon
- Department of Plant Breeding, Justus Liebig University, 35392, Giessen, Germany.
| |
Collapse
|
43
|
van den Berg I, Bowman PJ, MacLeod IM, Hayes BJ, Wang T, Bolormaa S, Goddard ME. Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect. Genet Sel Evol 2017; 49:70. [PMID: 28934948 PMCID: PMC5609075 DOI: 10.1186/s12711-017-0347-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 09/13/2017] [Indexed: 11/26/2022] Open
Abstract
Background The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows. Results With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs. Conclusions We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0347-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Irene van den Berg
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, VIC, Australia.
| | - Phil J Bowman
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Ben J Hayes
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.,Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, University of Queensland, St Lucia, QLD, Australia
| | - Tingting Wang
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Sunduimijid Bolormaa
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Mike E Goddard
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, VIC, Australia.,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| |
Collapse
|
44
|
Veerkamp RF, Bouwman AC, Schrooten C, Calus MPL. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle. Genet Sel Evol 2016; 48:95. [PMID: 27905878 PMCID: PMC5134274 DOI: 10.1186/s12711-016-0274-1] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 11/24/2016] [Indexed: 11/10/2022] Open
Abstract
Background Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. Methods Phenotypes were available for 5503 Holstein–Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. Results The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. Conclusions Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0274-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Roel F Veerkamp
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands. .,Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, P.O. Box 5003, 1432, Ås, Norway.
| | - Aniek C Bouwman
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | | | - Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|