1
|
Oget-Ebrad C, Heumez E, Duchalais L, Goudemand-Dugué E, Oury FX, Elsen JM, Bouchet S. Validation of cross-progeny variance genomic prediction using simulations and experimental data in winter elite bread wheat. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:226. [PMID: 39292265 PMCID: PMC11410863 DOI: 10.1007/s00122-024-04718-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 08/16/2024] [Indexed: 09/19/2024]
Abstract
KEY MESSAGE From simulations and experimental data, the quality of cross progeny variance genomic predictions may be high, but depends on trait architecture and necessitates sufficient number of progenies. Genomic predictions are used to select genitors and crosses in plant breeding. The usefulness criterion (UC) is a cross-selection criterion that necessitates the estimation of parental mean (PM) and progeny standard deviation (SD). This study evaluates the parameters that affect the predictive ability of UC and its two components using simulations. Predictive ability increased with heritability and progeny size and decreased with QTL number, most notably for SD. Comparing scenarios where marker effects were known or estimated using prediction models, SD was strongly impacted by the quality of marker effect estimates. We proposed a new algebraic formula for SD estimation that takes into account the uncertainty of the estimation of marker effects. It improved predictions when the number of QTL was superior to 300, especially when heritability was low. We also compared estimated and observed UC using experimental data for heading date, plant height, grain protein content and yield. PM and UC estimates were significantly correlated for all traits (PM: 0.38, 0.63, 0.51 and 0.91; UC: 0.45, 0.52, 0.54 and 0.74; for yield, grain protein content, plant height and heading date, respectively), while SD was correlated only for heading date and plant height (0.64 and 0.49, respectively). According to simulations, SD estimations in the field would necessitate large progenies. This pioneering study experimentally validates genomic prediction of UC but the predictive ability depends on trait architecture and precision of marker effect estimates. We advise the breeders to adjust progeny size to realize the SD potential of a cross.
Collapse
Affiliation(s)
- Claire Oget-Ebrad
- UMR1095, GDEC, INRAE-Université Clermont-Auvergne, Clermont-Ferrand, France
| | - Emmanuel Heumez
- INRAE-UE Lille, 2 Chaussée Brunehaut, Estrées Mons, BP50136, 80203, Peronne Cedex, France
| | - Laure Duchalais
- Agri-Obtentions, Ferme de Gauvilliers, 78660, Orsonville, France
| | | | | | - Jean-Michel Elsen
- UMR1388, GenPhySE, INRAE-Université de Toulouse, Castanet-Tolosan, France
| | - Sophie Bouchet
- UMR1095, GDEC, INRAE-Université Clermont-Auvergne, Clermont-Ferrand, France.
| |
Collapse
|
2
|
Ghavi Hossein-Zadeh N. An overview of recent technological developments in bovine genomics. Vet Anim Sci 2024; 25:100382. [PMID: 39166173 PMCID: PMC11334705 DOI: 10.1016/j.vas.2024.100382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2024] Open
Abstract
Cattle are regarded as highly valuable animals because of their milk, beef, dung, fur, and ability to draft. The scientific community has tried a number of strategies to improve the genetic makeup of bovine germplasm. To ensure higher returns for the dairy and beef industries, researchers face their greatest challenge in improving commercially important traits. One of the biggest developments in the last few decades in the creation of instruments for cattle genetic improvement is the discovery of the genome. Breeding livestock is being revolutionized by genomic selection made possible by the availability of medium- and high-density single nucleotide polymorphism (SNP) arrays coupled with sophisticated statistical techniques. It is becoming easier to access high-dimensional genomic data in cattle. Continuously declining genotyping costs and an increase in services that use genomic data to increase return on investment have both made a significant contribution to this. The field of genomics has come a long way thanks to groundbreaking discoveries such as radiation-hybrid mapping, in situ hybridization, synteny analysis, somatic cell genetics, cytogenetic maps, molecular markers, association studies for quantitative trait loci, high-throughput SNP genotyping, whole-genome shotgun sequencing to whole-genome mapping, and genome editing. These advancements have had a significant positive impact on the field of cattle genomics. This manuscript aimed to review recent advances in genomic technologies for cattle breeding and future prospects in this field.
Collapse
Affiliation(s)
- Navid Ghavi Hossein-Zadeh
- Department of Animal Science, Faculty of Agricultural Sciences, University of Guilan, Rasht, 41635-1314, Iran
| |
Collapse
|
3
|
Pocrnic I, Lourenco D, Misztal I. Single nucleotide polymorphism profile for quantitative trait nucleotide in populations with small effective size and its impact on mapping and genomic predictions. Genetics 2024; 227:iyae103. [PMID: 38913695 PMCID: PMC11304960 DOI: 10.1093/genetics/iyae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 06/07/2024] [Accepted: 06/16/2024] [Indexed: 06/26/2024] Open
Abstract
Increasing SNP density by incorporating sequence information only marginally increases prediction accuracies of breeding values in livestock. To find out why, we used statistical models and simulations to investigate the shape of distribution of estimated SNP effects (a profile) around quantitative trait nucleotides (QTNs) in populations with a small effective population size (Ne). A QTN profile created by averaging SNP effects around each QTN was similar to the shape of expected pairwise linkage disequilibrium (PLD) based on Ne and genetic distance between SNP, with a distinct peak for the QTN. Populations with smaller Ne showed lower but wider QTN profiles. However, adding more genotyped individuals with phenotypes dragged the profile closer to the QTN. The QTN profile was higher and narrower for populations with larger compared to smaller Ne. Assuming the PLD curve for the QTN profile, 80% of the additive genetic variance explained by each QTN was contained in ± 1/Ne Morgan interval around the QTN, corresponding to 2 Mb in cattle and 5 Mb in pigs and chickens. With such large intervals, identifying QTN is difficult even if all of them are in the data and the assumed genetic architecture is simplistic. Additional complexity in QTN detection arises from confounding of QTN profiles with signals due to relationships, overlapping profiles with closely spaced QTN, and spurious signals. However, small Ne allows for accurate predictions with large data even without QTN identification because QTNs are accounted for by QTN profiles if SNP density is sufficient to saturate the segments.
Collapse
Affiliation(s)
- Ivan Pocrnic
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
4
|
Wu Y, Zheng Z, Thibaut2 L, Goddard ME, Wray NR, Visscher PM, Zeng J. Genome-wide fine-mapping improves identification of causal variants. RESEARCH SQUARE 2024:rs.3.rs-4759390. [PMID: 39149449 PMCID: PMC11326397 DOI: 10.21203/rs.3.rs-4759390/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Fine-mapping refines genotype-phenotype association signals to identify causal variants underlying complex traits. However, current methods typically focus on individual genomic segments without considering the global genetic architecture. Here, we demonstrate the advantages of performing genome-wide fine-mapping (GWFM) and develop methods to facilitate GWFM. In simulations and real data analyses, GWFM outperforms current methods in error control, mapping power and precision, replication rate, and trans-ancestry phenotype prediction. For 48 well-powered traits in the UK Biobank, we identify causal variants that collectively explain 17% of the SNP-based heritability, and predict that fine-mapping 50% of that would require 2 million samples on average. We pinpoint a known causal variant, as proof-of-principle, at FTO for body mass index, unveil a hidden secondary variant with evolutionary conservation, and identify new missense causal variants for schizophrenia and Crohn's disease. Overall, we analyse 600 complex traits with 13 million SNPs, highlighting the efficacy of GWFM with functional annotations.
Collapse
Affiliation(s)
- Yang Wu
- Institute of Rare Diseases, West China Hospital of Sichuan University, Chengdu, China
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Zhili Zheng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | | | - Michael E. Goddard
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, Victoria, Australia
- Biosciences Research Division, Department of Economic Development, Jobs, Transport and Resources, Bundoora, Victoria, Australia
| | - Naomi R. Wray
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
- Department of Psychiatry, University of Oxford, Oxford, UK
| | - Peter M. Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jian Zeng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
5
|
Villar-Hernández BDJ, Pérez-Rodríguez P, Vitale P, Gerard G, Montesinos-Lopez OA, Saint Pierre C, Crossa J, Dreisigacker S. Optimizing Genomic Parental Selection for Categorical and Continuous-Categorical Multi-Trait Mixtures. Genes (Basel) 2024; 15:995. [PMID: 39202356 PMCID: PMC11353433 DOI: 10.3390/genes15080995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Revised: 07/20/2024] [Accepted: 07/25/2024] [Indexed: 09/03/2024] Open
Abstract
This study presents a novel approach for the optimization of genomic parental selection in breeding programs involving categorical and continuous-categorical multi-trait mixtures (CMs and CCMMs). Utilizing the Bayesian decision theory (BDT) and latent trait models within a multivariate normal distribution framework, we address the complexities of selecting new parental lines across ordinal and continuous traits for breeding. Our methodology enhances precision and flexibility in genetic selection, validated through extensive simulations. This unified approach presents significant potential for the advancement of genetic improvements in diverse breeding contexts, underscoring the importance of integrating both categorical and continuous traits in genomic selection frameworks.
Collapse
Affiliation(s)
- Bartolo de Jesús Villar-Hernández
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco CP 52640, Estado de México, Mexico; (B.d.J.V.-H.); (P.V.); (G.G.); (C.S.P.)
| | | | - Paolo Vitale
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco CP 52640, Estado de México, Mexico; (B.d.J.V.-H.); (P.V.); (G.G.); (C.S.P.)
| | - Guillermo Gerard
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco CP 52640, Estado de México, Mexico; (B.d.J.V.-H.); (P.V.); (G.G.); (C.S.P.)
| | | | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco CP 52640, Estado de México, Mexico; (B.d.J.V.-H.); (P.V.); (G.G.); (C.S.P.)
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco CP 52640, Estado de México, Mexico; (B.d.J.V.-H.); (P.V.); (G.G.); (C.S.P.)
- Colegio de Postgraduados, Montecillos CP 56230, Estado de México, Mexico;
- Louisiana State University, Baton Rouge, LA 70803, USA
- Distinguish Scientist Fellowship Program and Department of Statistics and Operations Research, King Saud University, Riyah 11459, Saudi Arabia
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco CP 52640, Estado de México, Mexico; (B.d.J.V.-H.); (P.V.); (G.G.); (C.S.P.)
| |
Collapse
|
6
|
Yuan C, Gualdrón Duarte JL, Takeda H, Georges M, Druet T. Evaluation of heritability partitioning approaches in livestock populations. BMC Genomics 2024; 25:690. [PMID: 39003468 PMCID: PMC11246585 DOI: 10.1186/s12864-024-10600-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 07/08/2024] [Indexed: 07/15/2024] Open
Abstract
BACKGROUND Heritability partitioning approaches estimate the contribution of different functional classes, such as coding or regulatory variants, to the genetic variance. This information allows a better understanding of the genetic architecture of complex traits, including complex diseases, but can also help improve the accuracy of genomic selection in livestock species. However, methods have mainly been tested on human genomic data, whereas livestock populations have specific characteristics, such as high levels of relatedness, small effective population size or long-range levels of linkage disequilibrium. RESULTS Here, we used data from 14,762 cows, imputed at the whole-genome sequence level for 11,537,240 variants, to simulate traits in a typical livestock population and evaluate the accuracy of two state-of-the-art heritability partitioning methods, GREML and a Bayesian mixture model. In simulations where a single functional class had increased contribution to heritability, we observed that the estimators were unbiased but had low precision. When causal variants were enriched in variants with low (< 0.05) or high (> 0.20) minor allele frequency or low (below 1st quartile) or high (above 3rd quartile) linkage disequilibrium scores, it was necessary to partition the genetic variance into multiple classes defined on the basis of allele frequencies or LD scores to obtain unbiased results. When multiple functional classes had variable contributions to heritability, estimators showed higher levels of variation and confounding between certain categories was observed. In addition, estimators from small categories were particularly imprecise. However, the estimates and their ranking were still informative about the contribution of the classes. We also demonstrated that using methods that estimate the contribution of a single category at a time, a commonly used approach, results in an overestimation. Finally, we applied the methods to phenotypes for muscular development and height and estimated that, on average, variants in open chromatin regions had a higher contribution to the genetic variance (> 45%), while variants in coding regions had the strongest individual effects (> 25-fold enrichment on average). Conversely, variants in intergenic or intronic regions showed lower levels of enrichment (0.2 and 0.6-fold on average, respectively). CONCLUSIONS Heritability partitioning approaches should be used cautiously in livestock populations, in particular for small categories. Two-component approaches that fit only one functional category at a time lead to biased estimators and should not be used.
Collapse
Affiliation(s)
- Can Yuan
- Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de L'Hôpital, 1, 4000, Liège, Belgium.
| | | | - Haruko Takeda
- Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de L'Hôpital, 1, 4000, Liège, Belgium
| | - Michel Georges
- Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de L'Hôpital, 1, 4000, Liège, Belgium
| | - Tom Druet
- Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de L'Hôpital, 1, 4000, Liège, Belgium
| |
Collapse
|
7
|
Pedrosa VB, Chen SY, Gloria LS, Doucette JS, Boerman JP, Rosa GJM, Brito LF. Machine learning methods for genomic prediction of cow behavioral traits measured by automatic milking systems in North American Holstein cattle. J Dairy Sci 2024; 107:4758-4771. [PMID: 38395400 DOI: 10.3168/jds.2023-24082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 01/18/2024] [Indexed: 02/25/2024]
Abstract
Identifying genome-enabled methods that provide more accurate genomic prediction is crucial when evaluating complex traits such as dairy cow behavior. In this study, we aimed to compare the predictive performance of traditional genomic prediction methods and deep learning algorithms for genomic prediction of milking refusals (MREF) and milking failures (MFAIL) in North American Holstein cows measured by automatic milking systems (milking robots). A total of 1,993,509 daily records from 4,511 genotyped Holstein cows were collected by 36 milking robot stations. After quality control, 57,600 SNPs were available for the analyses. Four genomic prediction methods were considered: Bayesian least absolute shrinkage and selection operator (LASSO), multiple layer perceptron (MLP), convolutional neural network (CNN), and GBLUP. We implemented the first 3 methods using the Keras and TensorFlow libraries in Python (v.3.9) but the GBLUP method was implemented using the BLUPF90+ family programs. The accuracy of genomic prediction (mean square error) for MREF and MFAIL was 0.34 (0.08) and 0.27 (0.08) based on LASSO, 0.36 (0.09) and 0.32 (0.09) for MLP, 0.37 (0.08) and 0.30 (0.09) for CNN, and 0.35 (0.09) and 0.31(0.09) based on GBLUP, respectively. Additionally, we observed a lower reranking of top selected individuals based on the MLP versus CNN methods compared with the other approaches for both MREF and MFAIL. Although the deep learning methods showed slightly higher accuracies than GBLUP, the results may not be sufficient to justify their use over traditional methods due to their higher computational demand and the difficulty of performing genomic prediction for nongenotyped individuals using deep learning procedures. Overall, this study provides insights into the potential feasibility of using deep learning methods to enhance genomic prediction accuracy for behavioral traits in livestock. Further research is needed to determine their practical applicability to large dairy cattle breeding programs.
Collapse
Affiliation(s)
- Victor B Pedrosa
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907
| | - Shi-Yi Chen
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
| | - Leonardo S Gloria
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907
| | - Jarrod S Doucette
- Agriculture Information Technology (AgIT), Purdue University, West Lafayette, IN 47907
| | | | - Guilherme J M Rosa
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI, 53706
| | - Luiz F Brito
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907.
| |
Collapse
|
8
|
Jighly A. Boosting genome-wide association power and genomic prediction accuracy for date palm fruit traits with advanced statistics. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2024; 344:112110. [PMID: 38704095 DOI: 10.1016/j.plantsci.2024.112110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 03/05/2024] [Accepted: 04/30/2024] [Indexed: 05/06/2024]
Abstract
The date palm is economically vital in the Middle East and North Africa, providing essential fibres, vitamins, and carbohydrates. Understanding the genetic architecture of its traits remains complex due to the tree's perennial nature and long generation times. This study aims to address these complexities by employing advanced genome-wide association (GWAS) and genomic prediction models using previously published data involving fruit acid content, sugar content, dimension, and colour traits. The multivariate GWAS model identified seven QTL, including five novel associations, that shed light on the genetic control of these traits. Furthermore, the research evaluates different genomic prediction models that considered genotype by environment and genotype by trait interactions. While colour- traits demonstrate strong predictive power, other traits display moderate accuracies across different models and scenarios aligned with the expectations when using small reference populations. When designing the cross-validation to predict new individuals, the accuracy of the best multi-trait model was significantly higher than all single-trait models for dimension traits, but not for the remaining traits, which showed similar performances. However, the cross-validation strategy that masked random phenotypic records (i.e., mimicking the unbalanced phenotypic records) showed significantly higher accuracy for all traits except acid contents. The findings underscore the importance of understanding genetic architecture for informed breeding strategies. The research emphasises the need for larger population sizes and multivariate models to enhance gene tagging power and predictive accuracy to advance date palm breeding programs. These findings support more targeted breeding in date palm, improving productivity and resilience to various environments.
Collapse
|
9
|
Joukhadar R, Li Y, Thistlethwaite R, Forrest KL, Tibbits JF, Trethowan R, Hayden MJ. Optimising desired gain indices to maximise selection response. FRONTIERS IN PLANT SCIENCE 2024; 15:1337388. [PMID: 38978519 PMCID: PMC11228337 DOI: 10.3389/fpls.2024.1337388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 05/23/2024] [Indexed: 07/10/2024]
Abstract
Introduction In plant breeding, we often aim to improve multiple traits at once. However, without knowing the economic value of each trait, it is hard to decide which traits to focus on. This is where "desired gain selection indices" come in handy, which can yield optimal gains in each trait based on the breeder's prioritisation of desired improvements when economic weights are not available. However, they lack the ability to maximise the selection response and determine the correlation between the index and net genetic merit. Methods Here, we report the development of an iterative desired gain selection index method that optimises the sampling of the desired gain values to achieve a targeted or a user-specified selection response for multiple traits. This targeted selection response can be constrained or unconstrained for either a subset or all the studied traits. Results We tested the method using genomic estimated breeding values (GEBVs) for seven traits in a bread wheat (Triticum aestivum) reference breeding population comprising 3,331 lines and achieved prediction accuracies ranging between 0.29 and 0.47 across the seven traits. The indices were validated using 3,005 double haploid lines that were derived from crosses between parents selected from the reference population. We tested three user-specified response scenarios: a constrained equal weight (INDEX1), a constrained yield dominant weight (INDEX2), and an unconstrained weight (INDEX3). Our method achieved an equivalent response to the user-specified selection response when constraining a set of traits, and this response was much better than the response of the traditional desired gain selection indices method without iteration. Interestingly, when using unconstrained weight, our iterative method maximised the selection response and shifted the average GEBVs of the selection candidates towards the desired direction. Discussion Our results show that the method is an optimal choice not only when economic weights are unavailable, but also when constraining the selection response is an unfavourable option.
Collapse
Affiliation(s)
- Reem Joukhadar
- Agriculture Victoria, Centre for AgriBioscience, AgriBio, Bundoora, VIC, Australia
| | - Yongjun Li
- Agriculture Victoria, Centre for AgriBioscience, AgriBio, Bundoora, VIC, Australia
| | - Rebecca Thistlethwaite
- School of Life and Environmental Sciences, Plant Breeding Institute, Sydney Institute of Agriculture, The University of Sydney, Narrabri, NSW, Australia
| | - Kerrie L. Forrest
- Agriculture Victoria, Centre for AgriBioscience, AgriBio, Bundoora, VIC, Australia
| | - Josquin F. Tibbits
- Agriculture Victoria, Centre for AgriBioscience, AgriBio, Bundoora, VIC, Australia
| | - Richard Trethowan
- School of Life and Environmental Sciences, Plant Breeding Institute, Sydney Institute of Agriculture, The University of Sydney, Narrabri, NSW, Australia
- School of Life and Environmental Sciences, Plant Breeding Institute, Sydney Institute of Agriculture, The University of Sydney, Cobbitty, NSW, Australia
| | - Matthew J. Hayden
- Agriculture Victoria, Centre for AgriBioscience, AgriBio, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| |
Collapse
|
10
|
Zhao H, Khansefid M, Lin Z, Hayden MJ. Genetic Gain and Inbreeding in Different Simulated Genomic Selection Schemes for Grain Yield and Oil Content in Safflower. PLANTS (BASEL, SWITZERLAND) 2024; 13:1577. [PMID: 38891385 PMCID: PMC11174797 DOI: 10.3390/plants13111577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 05/27/2024] [Accepted: 05/29/2024] [Indexed: 06/21/2024]
Abstract
Safflower (Carthamus tinctorius L.) is a multipurpose minor crop consumed by developed and developing nations around the world with limited research funding and genetic resources. Genomic selection (GS) is an effective modern breeding tool that can help to fast-track the genetic diversity preserved in genebank collections to facilitate rapid and efficient germplasm improvement and variety development. In the present study, we simulated four GS strategies to compare genetic gains and inbreeding during breeding cycles in a safflower recurrent selection breeding program targeting grain yield (GY) and seed oil content (OL). We observed positive genetic gains over cycles in all four GS strategies, where the first cycle delivered the largest genetic gain. Single-trait GS strategies had the greatest gain for the target trait but had very limited genetic improvement for the other trait. Simultaneous selection for GY and OL via indices indicated higher gains for both traits than crossing between the two single-trait independent culling strategies. The multi-trait GS strategy with mating relationship control (GS_GY + OL + Rel) resulted in a lower inbreeding coefficeint but a similar gain compared to that of the GS_GY + OL (without inbreeding control) strategy after a few cycles. Our findings lay the foundation for future safflower GS breeding.
Collapse
Affiliation(s)
- Huanhuan Zhao
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia;
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia;
| | - Majid Khansefid
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia;
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia;
| | - Zibei Lin
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia;
| | - Matthew J. Hayden
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia;
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia;
| |
Collapse
|
11
|
Hunde D, Tadesse Y, Tadesse M, Abegaz S, Getachew T. Community-based breeding programs can realize sustainable genetic gain and economic benefits in tropical dairy cattle systems. Front Genet 2024; 15:1106709. [PMID: 38818034 PMCID: PMC11137272 DOI: 10.3389/fgene.2024.1106709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 04/11/2024] [Indexed: 06/01/2024] Open
Abstract
Implementing an appropriate breeding program is crucial to control fluctuation in performance, enhance adaptation, and further improve the crossbred population of dairy cattle. Five alternative breeding programs (BPs) were modeled considering available breeding units in the study area, the existing crossbreeding practices, and the future prospects of dairy research and development in Ethiopia. The study targeted 143,576 crossbred cows of 54,822 smallholder households in the Arsi, West Shewa, and North Shewa zones of the Oromia Region, as well as the North Shewa zone of the Amhara Region. The alternative BPs include conventional on-station progeny testing (SPT), conventional on-farm progeny testing (FPT), conventional on-station and on-farm progeny testing (SFPT), genomic selection (GS), and genomic progeny testing (GPT). Input parameters for modeling the BPs were taken from the analysis of long-term data obtained from the Holetta Agricultural Research Center and a survey conducted in the study area. ZPLAN+ software was used to predict estimates of genetic gain (GG) and discounted profit for goal traits. The predicted genetic gains (GGs) for milk yield (MY) per year were 34.52 kg, 49.63 kg, 29.35 kg, 76.16 kg, and 77.51 kg for SPT, FPT, SFPT, GS, and GPT, respectively. The GGs of the other goal traits range from 0.69 to 1.19 days per year for age at first calving, from 1.20 to 2.35 days per year for calving interval, and from 0.06 to 0.12 days per year for herd life. Compared to conventional BPs, genomic systems (GPT and GS) enhanced the GG of MY by 53%-164%, reduced generation interval by up to 21%, and improved the accuracy of test bull selection from 0.33 to 0.43. The discounted profit of the BPs varied from 249.58 Ethiopian Birr (ETB, 1 USD = 39.55696 ETB) per year in SPT to 689.79 ETB per year in GS. Genomic selection outperforms SPT, SFPT, and FPT by 266, 227%, and 138% of discounted profit, respectively. Community-based crossbreeding accompanied by GS and gradual support with progeny testing (GPT) is recommended as the main way forward to attain better genetic progress in dairy farms in Ethiopia and similar scenarios in other tropical countries.
Collapse
Affiliation(s)
- Direba Hunde
- Ethiopian Institute of Agricultural Research, Holetta Center, Holetta, Ethiopia
- Department of Animal Science, Haramaya University, Harar, Ethiopia
| | - Yosef Tadesse
- Department of Animal Science, Haramaya University, Harar, Ethiopia
| | - Million Tadesse
- Ethiopian Institute of Agricultural Research, Holetta Center, Holetta, Ethiopia
| | | | - Tesfaye Getachew
- International Center for Agricultural Research in the Dry Areas, Addis Ababa, Ethiopia
| |
Collapse
|
12
|
Ajasa AA, Boison SA, Gjøen HM, Lillehammer M. Accuracy of genomic prediction using multiple Atlantic salmon populations. Genet Sel Evol 2024; 56:38. [PMID: 38750427 PMCID: PMC11094890 DOI: 10.1186/s12711-024-00907-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 05/06/2024] [Indexed: 05/19/2024] Open
Abstract
BACKGROUND The accuracy of genomic prediction is partly determined by the size of the reference population. In Atlantic salmon breeding programs, four parallel populations often exist, thus offering the opportunity to increase the size of the reference set by combining these populations. By allowing a reduction in the number of records per population, multi-population prediction can potentially reduce cost and welfare issues related to the recording of traits, particularly for diseases. In this study, we evaluated the accuracy of multi- and across-population prediction of breeding values for resistance to amoebic gill disease (AGD) using all single nucleotide polymorphisms (SNPs) on a 55K chip or a selected subset of SNPs based on the signs of allele substitution effect estimates across populations, using both linear and nonlinear genomic prediction (GP) models in Atlantic salmon populations. In addition, we investigated genetic distance, genetic correlation estimated based on genomic relationships, and persistency of linkage disequilibrium (LD) phase across these populations. RESULTS The genetic distance between populations ranged from 0.03 to 0.07, while the genetic correlation ranged from 0.19 to 0.99. Nonetheless, compared to within-population prediction, there was limited or no impact of combining populations for multi-population prediction across the various models used or when using the selected subset of SNPs. The estimates of across-population prediction accuracy were low and to some extent proportional to the genetic correlation estimates. The persistency of LD phase between adjacent markers across populations using all SNP data ranged from 0.51 to 0.65, indicating that LD is poorly conserved across the studied populations. CONCLUSIONS Our results show that a high genetic correlation and a high genetic relationship between populations do not guarantee a higher prediction accuracy from multi-population genomic prediction in Atlantic salmon.
Collapse
Affiliation(s)
- Afees A Ajasa
- Nofima (Norwegian Institute of Food, Fisheries and Aquaculture Research), PO Box 210, 1431, Ås, Norway.
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, 1430, Ås, Norway.
| | | | - Hans M Gjøen
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, 1430, Ås, Norway
| | - Marie Lillehammer
- Nofima (Norwegian Institute of Food, Fisheries and Aquaculture Research), PO Box 210, 1431, Ås, Norway
| |
Collapse
|
13
|
Chen C, Bhuiyan SA, Ross E, Powell O, Dinglasan E, Wei X, Atkin F, Deomano E, Hayes B. Genomic prediction for sugarcane diseases including hybrid Bayesian-machine learning approaches. FRONTIERS IN PLANT SCIENCE 2024; 15:1398903. [PMID: 38751840 PMCID: PMC11095127 DOI: 10.3389/fpls.2024.1398903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 04/15/2024] [Indexed: 05/18/2024]
Abstract
Sugarcane smut and Pachymetra root rots are two serious diseases of sugarcane, with susceptible infected crops losing over 30% of yield. A heritable component to both diseases has been demonstrated, suggesting selection could improve disease resistance. Genomic selection could accelerate gains even further, enabling early selection of resistant seedlings for breeding and clonal propagation. In this study we evaluated four types of algorithms for genomic predictions of clonal performance for disease resistance. These algorithms were: Genomic best linear unbiased prediction (GBLUP), including extensions to model dominance and epistasis, Bayesian methods including BayesC and BayesR, Machine learning methods including random forest, multilayer perceptron (MLP), modified convolutional neural network (CNN) and attention networks designed to capture epistasis across the genome-wide markers. Simple hybrid methods, that first used BayesR/GWAS to identify a subset of 1000 markers with moderate to large marginal additive effects, then used attention networks to derive predictions from these effects and their interactions, were also developed and evaluated. The hypothesis for this approach was that using a subset of markers more likely to have an effect would enable better estimation of interaction effects than when there were an extremely large number of possible interactions, especially with our limited data set size. To evaluate the methods, we applied both random five-fold cross-validation and a structured PCA based cross-validation that separated 4702 sugarcane clones (that had disease phenotypes and genotyped for 26k genome wide SNP markers) by genomic relationship. The Bayesian methods (BayesR and BayesC) gave the highest accuracy of prediction, followed closely by hybrid methods with attention networks. The hybrid methods with attention networks gave the lowest variation in accuracy of prediction across validation folds (and lowest MSE), which may be a criteria worth considering in practical breeding programs. This suggests that hybrid methods incorporating the attention mechanism could be useful for genomic prediction of clonal performance, particularly where non-additive effects may be important.
Collapse
Affiliation(s)
- Chensong Chen
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Shamsul A. Bhuiyan
- Sugar Research Australia, Woodford, QLD, Australia
- Queensland Micro- and Nanotechnology Centre, Griffith University, Nathan, QLD, Australia
| | - Elizabeth Ross
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Owen Powell
- Center for Crop Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Eric Dinglasan
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Xianming Wei
- Sugar Research Australia, Indooroopilly, QLD, Australia
| | | | - Emily Deomano
- Sugar Research Australia, Indooroopilly, QLD, Australia
| | - Ben Hayes
- Center for Animal Science, The Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
14
|
Blake JM, Thompson J, HogenEsch H, Ekenstedt KJ. Heritability and genome-wide association study of vaccine-induced immune response in Beagles: A pilot study. Vaccine 2024; 42:3099-3106. [PMID: 38604911 PMCID: PMC11144447 DOI: 10.1016/j.vaccine.2024.03.076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/22/2024] [Accepted: 03/29/2024] [Indexed: 04/13/2024]
Abstract
Both genetic and non-genetic factors contribute to individual variation in the immune response to vaccination. Understanding how genetic background influences variation in both magnitude and persistence of vaccine-induced immunity is vital for improving vaccine development and identifying possible causes of vaccine failure. Dogs provide a relevant biomedical model for investigating mammalian vaccine genetics; canine breed structure and long linkage disequilibrium simplify genetic studies in this species compared to humans. The objective of this study was to estimate the heritability of the antibody response to vaccination against viral and bacterial pathogens, and to identify genes driving variation of the immune response to vaccination in Beagles. Sixty puppies were immunized following a standard vaccination schedule with an attenuated combination vaccine containing antigens for canine adenovirus type 2, canine distemper virus, canine parainfluenza virus, canine parvovirus, and four strains of Leptospira bacteria. Serum antibody measurements for each viral and bacterial component were measured at multiple time points. Heritability estimations and GWAS were conducted using SNP genotypes at 279,902 markers together with serum antibody titer phenotypes. The heritability estimates were: (1) to Leptospira antigens, ranging from 0.178 to 0.628; and (2) to viral antigens, ranging from 0.199 to 0.588. There was not a significant difference between overall heritability of vaccine-induced immune response to Leptospira antigens compared to viral antigens. Genetic architecture indicates that SNPs of low to high effect contribute to immune response to vaccination. GWAS identified two genetic markers associated with vaccine-induced immune response phenotypes. Collectively, these findings indicate that genetic regulation of the immune response to vaccination is antigen-specific and influenced by multiple genes of small effect.
Collapse
Affiliation(s)
- Jeanna M Blake
- Department of Basic Medical Sciences, College of Veterinary Medicine, Purdue University, West Lafayette, IN, USA.
| | - James Thompson
- Zoetis, Veterinary Medicine Research and Development, Kalamazoo, MI, USA
| | - Harm HogenEsch
- Department of Comparative Pathobiology, College of Veterinary Medicine, Purdue University, West Lafayette, IN, USA; Purdue Institute of Inflammation, Immunology and Infectious Diseases, West Lafayette, IN, USA
| | - Kari J Ekenstedt
- Department of Basic Medical Sciences, College of Veterinary Medicine, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
15
|
Alemu A, Åstrand J, Montesinos-López OA, Isidro Y Sánchez J, Fernández-Gónzalez J, Tadesse W, Vetukuri RR, Carlsson AS, Ceplitis A, Crossa J, Ortiz R, Chawade A. Genomic selection in plant breeding: Key factors shaping two decades of progress. MOLECULAR PLANT 2024; 17:552-578. [PMID: 38475993 DOI: 10.1016/j.molp.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/14/2024]
Abstract
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.
Collapse
Affiliation(s)
- Admas Alemu
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Johanna Åstrand
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden; Lantmännen Lantbruk, Svalöv, Sweden
| | | | - Julio Isidro Y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Javier Fernández-Gónzalez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Wuletaw Tadesse
- International Center for Agricultural Research in the Dry Areas (ICARDA), Rabat, Morocco
| | - Ramesh R Vetukuri
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - Anders S Carlsson
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco, México 52640, Mexico
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| |
Collapse
|
16
|
Zhao T, Wang F, Mott R, Dekkers J, Cheng H. Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality. Genetics 2024; 226:iyad210. [PMID: 38085098 PMCID: PMC11090459 DOI: 10.1093/genetics/iyad210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 11/13/2023] [Indexed: 03/08/2024] Open
Abstract
To adhere to and capitalize on the benefits of the FAIR (findable, accessible, interoperable, and reusable) principles in agricultural genome-to-phenome studies, it is crucial to address privacy and intellectual property issues that prevent sharing and reuse of data in research and industry. Direct sharing of genotype and phenotype data is often prohibited due to intellectual property and privacy concerns. Thus, there is a pressing need for encryption methods that obscure confidential aspects of the data, without affecting the outcomes of certain statistical analyses. A homomorphic encryption method for genotypes and phenotypes (HEGP) has been proposed for single-marker regression in genome-wide association studies (GWAS) using linear mixed models with Gaussian errors. This methodology permits frequentist likelihood-based parameter estimation and inference. In this paper, we extend HEGP to broader applications in genome-to-phenome analyses. We show that HEGP is suited to commonly used linear mixed models for genetic analyses of quantitative traits including genomic best linear unbiased prediction (GBLUP) and ridge-regression best linear unbiased prediction (RR-BLUP), as well as Bayesian variable selection methods (e.g. those in Bayesian Alphabet), for genetic parameter estimation, genomic prediction, and GWAS. By advancing the capabilities of HEGP, we offer researchers and industry professionals a secure and efficient approach for collaborative genomic analyses while preserving data confidentiality.
Collapse
Affiliation(s)
- Tianjing Zhao
- Department of Animal Science, University of California, Davis, CA 95616, USA
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| | - Fangyi Wang
- Department of Plant Sciences, University of California, Davis, CA 95616, USA
| | - Richard Mott
- Genetics Institute, University College London, London, WC1E 6BT, UK
| | - Jack Dekkers
- Department of Animal Science, Iowa State University, Ames, IA 50011, USA
| | - Hao Cheng
- Department of Animal Science, University of California, Davis, CA 95616, USA
| |
Collapse
|
17
|
Meuwissen T, Eikje LS, Gjuvsland AB. GWABLUP: genome-wide association assisted best linear unbiased prediction of genetic values. Genet Sel Evol 2024; 56:17. [PMID: 38429665 PMCID: PMC11234632 DOI: 10.1186/s12711-024-00881-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 01/31/2024] [Indexed: 03/03/2024] Open
Abstract
BACKGROUND Since the very beginning of genomic selection, researchers investigated methods that improved upon SNP-BLUP (single nucleotide polymorphism best linear unbiased prediction). SNP-BLUP gives equal weight to all SNPs, whereas it is expected that many SNPs are not near causal variants and thus do not have substantial effects. A recent approach to remedy this is to use genome-wide association study (GWAS) findings and increase the weights of GWAS-top-SNPs in genomic predictions. Here, we employ a genome-wide approach to integrate GWAS results into genomic prediction, called GWABLUP. RESULTS GWABLUP consists of the following steps: (1) performing a GWAS in the training data which results in likelihood ratios; (2) smoothing the likelihood ratios over the SNPs; (3) combining the smoothed likelihood ratio with the prior probability of SNPs having non-zero effects, which yields the posterior probability of the SNPs; (4) calculating a weighted genomic relationship matrix using the posterior probabilities as weights; and (5) performing genomic prediction using the weighted genomic relationship matrix. Using high-density genotypes and milk, fat, protein and somatic cell count phenotypes on dairy cows, GWABLUP was compared to GBLUP, GBLUP (topSNPs) with extra weights for GWAS top-SNPs, and BayesGC, i.e. a Bayesian variable selection model. The GWAS resulted in six, five, four, and three genome-wide significant peaks for milk, fat and protein yield and somatic cell count, respectively. GWABLUP genomic predictions were 10, 6, 7 and 1% more reliable than those of GBLUP for milk, fat and protein yield and somatic cell count, respectively. It was also more reliable than GBLUP (topSNPs) for all four traits, and more reliable than BayesGC for three of the traits. Although GWABLUP showed a tendency towards inflation bias for three of the traits, this was not statistically significant. In a multitrait analysis, GWABLUP yielded the highest accuracy for two of the traits. However, for SCC, which was relatively unrelated to the yield traits, including yield trait GWAS-results reduced the reliability compared to a single trait analysis. CONCLUSIONS GWABLUP uses GWAS results to differentially weigh all the SNPs in a weighted GBLUP genomic prediction analysis. GWABLUP yielded up to 10% and 13% more reliable genomic predictions than GBLUP for single and multitrait analyses, respectively. Extension of GWABLUP to single-step analyses is straightforward.
Collapse
Affiliation(s)
- Theo Meuwissen
- Faculty of Life Sciences, Norwegian University of Life Sciences, 1432, Ås, Norway.
| | | | | |
Collapse
|
18
|
Ma H, Li H, Ge F, Zhao H, Zhu B, Zhang L, Gao H, Xu L, Li J, Wang Z. Improving Genomic Predictions in Multi-Breed Cattle Populations: A Comparative Analysis of BayesR and GBLUP Models. Genes (Basel) 2024; 15:253. [PMID: 38397242 PMCID: PMC10887749 DOI: 10.3390/genes15020253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 02/09/2024] [Accepted: 02/16/2024] [Indexed: 02/25/2024] Open
Abstract
Numerous studies have shown that combining populations from similar or closely related genetic breeds improves the accuracy of genomic predictions (GP). Extensive experimentation with diverse Bayesian and genomic best linear unbiased prediction (GBLUP) models have been developed to explore multi-breed genomic selection (GS) in livestock, ultimately establishing them as successful approaches for predicting genomic estimated breeding value (GEBV). This study aimed to assess the effectiveness of using BayesR and GBLUP models with linkage disequilibrium (LD)-weighted genomic relationship matrices (GRMs) for genomic prediction in three different beef cattle breeds to identify the best approach for enhancing the accuracy of multi-breed genomic selection in beef cattle. Additionally, a comparison was conducted to evaluate the predictive precision of different marker densities and genetic correlations among the three breeds of beef cattle. The GRM between Yunling cattle (YL) and other breeds demonstrated modest affinity and highlighted a notable genetic concordance of 0.87 between Chinese Wagyu (WG) and Huaxi (HX) cattle. In the within-breed GS, BayesR demonstrated an advantage over GBLUP. The prediction accuracies for HX cattle using the BayesR model were 0.52 with BovineHD BeadChip data (HD) and 0.46 with whole-genome sequencing data (WGS). In comparison to the GBLUP model, the accuracy increased by 26.8% for HD data and 9.5% for WGS data. For WG and YL, BayesR doubled the within-breed prediction accuracy to 14.3% from 7.1%, outperforming GBLUP across both HD and WGS datasets. Moreover, analyzing multiple breeds using genomic selection showed that BayesR consistently outperformed GBLUP in terms of predictive accuracy, especially when using WGS. For instance, in a mixed reference population of HX and WG, BayesR achieved a significant accuracy of 0.53 using WGS for HX, which was a substantial enhancement over the accuracies obtained with GBLUP models. The research further highlights the benefit of including various breeds in the reference group, leading to enhanced accuracy in predictions and emphasizing the importance of comprehensive genomic selection methods. Our research findings indicate that BayesR exhibits superior performance compared to GBLUP in multi-breed genomic prediction accuracy, achieving a maximum improvement of 33.3%, especially in genetically diverse breeds. The improvement can be attributed to the effective utilization of higher single nucleotide polymorphism (SNP) marker density by BayesR, resulting in enhanced prediction accuracy. This evidence conclusively demonstrates the significant impact of BayesR on enhancing genomic predictions in diverse cattle populations, underscoring the crucial role of genetic relatedness in selection methodologies. In parallel, subsequent studies should focus on refining GRM and exploring alternative models for GP.
Collapse
Affiliation(s)
- Haoran Ma
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Hongwei Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB 510632, Canada
| | - Fei Ge
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Huqiong Zhao
- College of Animal Science, Shanxi Agricultural University, Jinzhong 030801, China
| | - Bo Zhu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Lupei Zhang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Huijiang Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Lingyang Xu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Junya Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| | - Zezhao Wang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (H.M.); (H.L.); (L.Z.); (J.L.)
| |
Collapse
|
19
|
Hayes BJ, Duff CJ, Hine BC, Mahony TJ. Genomic estimated breeding values for bovine respiratory disease resistance in Angus feedlot cattle. J Anim Sci 2024; 102:skae113. [PMID: 38659364 PMCID: PMC11107116 DOI: 10.1093/jas/skae113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/23/2024] [Indexed: 04/26/2024] Open
Abstract
Bovine respiratory disease (BRD) causes major losses in feedlot cattle worldwide. A genetic component for BRD resistance in feedlot cattle and calves has been reported in a number of studies, with heritabilities ranging from 0.04 to 0.2. These results suggest selection could be used to reduce the incidence of BRD. Genomic selection could be an attractive approach for breeding for BRD resistance, given the phenotype is not likely to be recorded on breeding animals. In this study, we derived GEBVs for BRD resistance and assessed their accuracy in a reasonably large data set recorded for feedlot treatment of BRD (1213 Angus steers, in two feedlots). In fivefold cross validation, genomic predictions were moderately accurate (0.23 ± 0.01) when a BayesR approach was used. Expansion of this approach to include more animals and a diversity of breeds is recommended to successfully develop a GEBV for BRD resistance in feedlots for the beef industry.
Collapse
Affiliation(s)
- Ben J Hayes
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD 4072, Australia
| | | | - Bradley C Hine
- CSIRO, F.D. McMaster Laboratory, Armidale, NSW 2350, Australia
| | - Timothy J Mahony
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
20
|
Cui R, Elzur RA, Kanai M, Ulirsch JC, Weissbrod O, Daly MJ, Neale BM, Fan Z, Finucane HK. Improving fine-mapping by modeling infinitesimal effects. Nat Genet 2024; 56:162-169. [PMID: 38036779 PMCID: PMC11056999 DOI: 10.1038/s41588-023-01597-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 10/26/2023] [Indexed: 12/02/2023]
Abstract
Fine-mapping aims to identify causal genetic variants for phenotypes. Bayesian fine-mapping algorithms (for example, SuSiE, FINEMAP, ABF and COJO-ABF) are widely used, but assessing posterior probability calibration remains challenging in real data, where model misspecification probably exists, and true causal variants are unknown. We introduce replication failure rate (RFR), a metric to assess fine-mapping consistency by downsampling. SuSiE, FINEMAP and COJO-ABF show high RFR, indicating potential overconfidence in their output. Simulations reveal that nonsparse genetic architecture can lead to miscalibration, while imputation noise, nonuniform distribution of causal variants and quality control filters have minimal impact. Here we present SuSiE-inf and FINEMAP-inf, fine-mapping methods modeling infinitesimal effects alongside fewer larger causal effects. Our methods show improved calibration, RFR and functional enrichment, competitive recall and computational efficiency. Notably, using our methods' posterior effect sizes substantially increases polygenic risk score accuracy over SuSiE and FINEMAP. Our work improves causal variant identification for complex traits, a fundamental goal of human genetics.
Collapse
Affiliation(s)
- Ran Cui
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Roy A Elzur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Jacob C Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Zhou Fan
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA.
| | - Hilary K Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
21
|
Haque MA, Lee YM, Ha JJ, Jin S, Park B, Kim NY, Won JI, Kim JJ. Genomic Predictions in Korean Hanwoo Cows: A Comparative Analysis of Genomic BLUP and Bayesian Methods for Reproductive Traits. Animals (Basel) 2023; 14:27. [PMID: 38200758 PMCID: PMC10778388 DOI: 10.3390/ani14010027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 12/07/2023] [Accepted: 12/18/2023] [Indexed: 01/12/2024] Open
Abstract
This study aimed to predict the accuracy of genomic estimated breeding values (GEBVs) for reproductive traits in Hanwoo cows using the GBLUP, BayesB, BayesLASSO, and BayesR methods. Accuracy estimates of GEBVs for reproductive traits were derived through fivefold cross-validation, analyzing a dataset comprising 11,348 animals and employing an Illumina Bovine 50K SNP chip. GBLUP showed an accuracy of 0.26 for AFC, while BayesB, BayesLASSO, and BayesR demonstrated values of 0.28, 0.29, and 0.29, respectively. For CI, GBLUP attained an accuracy of 0.19, whereas BayesB, BayesLASSO, and BayesR scored 0.21, 0.24, and 0.25, respectively. The accuracy for GL was uniform across GBLUP, BayesB, and BayesR at 0.31, whereas BayesLASSO showed a slightly higher accuracy of 0.33. For NAIPC, GBLUP showed an accuracy of 0.24, while BayesB, BayesLASSO, and BayesR recorded 0.22, 0.27, and 0.30, respectively. The variation in genomic prediction accuracy among methods indicated Bayesian approaches slightly outperformed GBLUP. The findings suggest that Bayesian methods, notably BayesLASSO and BayesR, offer improved predictive capabilities for reproductive traits. Future research may explore more advanced genomic approaches to enhance predictive accuracy and genetic gains in Hanwoo cattle breeding programs.
Collapse
Affiliation(s)
- Md Azizul Haque
- Department of Biotechnology, Yeungnam University, Gyeongsan 38541, Republic of Korea; (M.A.H.); (Y.-M.L.)
| | - Yun-Mi Lee
- Department of Biotechnology, Yeungnam University, Gyeongsan 38541, Republic of Korea; (M.A.H.); (Y.-M.L.)
| | - Jae-Jung Ha
- Gyeongbuk Livestock Research Institute, Yeongju 36052, Republic of Korea;
| | - Shil Jin
- Hanwoo Research Institute, National Institute of Animal Science, Pyeongchang 25340, Republic of Korea; (S.J.); (B.P.); (N.-Y.K.)
| | - Byoungho Park
- Hanwoo Research Institute, National Institute of Animal Science, Pyeongchang 25340, Republic of Korea; (S.J.); (B.P.); (N.-Y.K.)
| | - Nam-Young Kim
- Hanwoo Research Institute, National Institute of Animal Science, Pyeongchang 25340, Republic of Korea; (S.J.); (B.P.); (N.-Y.K.)
| | - Jeong-Il Won
- Hanwoo Research Institute, National Institute of Animal Science, Pyeongchang 25340, Republic of Korea; (S.J.); (B.P.); (N.-Y.K.)
| | - Jong-Joo Kim
- Department of Biotechnology, Yeungnam University, Gyeongsan 38541, Republic of Korea; (M.A.H.); (Y.-M.L.)
| |
Collapse
|
22
|
Costilla R, Zeng J, Al Kalaldeh M, Swaminathan M, Gibson JP, Ducrocq V, Hayes BJ. Developing flexible models for genetic evaluations in smallholder crossbred dairy farms. J Dairy Sci 2023; 106:9125-9135. [PMID: 37678792 PMCID: PMC10772325 DOI: 10.3168/jds.2022-23135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 07/07/2023] [Indexed: 09/09/2023]
Abstract
The productivity of smallholder dairy farms is very low in developing countries. Important genetic gains could be realized using genomic selection, but genetic evaluations need to be tailored for lack of pedigree information and very small farm sizes. To accommodate this situation, we propose a flexible Bayesian model for the genetic evaluation of milk yield, which allows us to simultaneously account for nongenetic random effects for farms and varying SNP variance (BayesR model). First, we used simulations based on real genotype data from Indian crossbred dairy cattle to demonstrate that the proposed model can separate the true genetic and nongenetic parameters even for small farm sizes (2 cows on average) although with high standard errors in scenarios with low heritability. The accuracy of genomic genetic evaluation increased until farm size was approximately 5. We then applied the model to real data from 4,655 crossbred cows with 106,109 monthly test day milk records and 689,750 autosomal SNPs. We estimated a heritability of 0.16 (0.04) for milk yield and using cross-validation, a genomic estimated breeding value (GEBV) accuracy of 0.45 and bias (regression of phenotype on GEBV) of 1.04 (0.26). Estimated genetic parameters were very similar using BayesR, BayesC, and genomic BLUP approaches. Candidate genes near the top variants, IMMP2L and ARHGEF2, have been previously associated with milk protein composition, mastitis resistance, and milk cholesterol content. The estimated heritability and GEBV accuracy for milk yield are much lower than those from intensive or pasture-based systems in many countries. Further increases in the number of phenotyped and genotyped animals in farms with at least 2 cows (preferably 3-5, to allow for dropout of cows) are needed to improve the estimation of genetic effects in these smallholder dairy farms.
Collapse
Affiliation(s)
- R Costilla
- AgResearch Limited, Ruakura Research Centre, Hamilton 3214, New Zealand; Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD 4067, Australia.
| | - J Zeng
- Institute for Molecular Biosciences, University of Queensland, St. Lucia, QLD 4067, Australia
| | - M Al Kalaldeh
- Centre for Genetic Analysis and Applications, School of Environmental and Rural Science, University of New England, Armidale, NSW 2350, Australia
| | - M Swaminathan
- BAIF Development Research Foundation, Pune 412 202, Maharashtra, India
| | - J P Gibson
- Centre for Genetic Analysis and Applications, School of Environmental and Rural Science, University of New England, Armidale, NSW 2350, Australia
| | - V Ducrocq
- Universite Paris-Saclay, INRAE, AgroParisTech, UMR GABI, 78350 Jouy-en-Josas, France
| | - B J Hayes
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD 4067, Australia
| |
Collapse
|
23
|
Warburton CL, Costilla R, Engle BN, Moore SS, Corbet NJ, Fordyce G, McGowan MR, Burns BM, Hayes BJ. Concurrently mapping quantitative trait loci associations from multiple subspecies within hybrid populations. Heredity (Edinb) 2023; 131:350-360. [PMID: 37798326 PMCID: PMC10673866 DOI: 10.1038/s41437-023-00651-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 09/19/2023] [Accepted: 09/21/2023] [Indexed: 10/07/2023] Open
Abstract
Many of the world's agriculturally important plant and animal populations consist of hybrids of subspecies. Cattle in tropical and sub-tropical regions for example, originate from two subspecies, Bos taurus indicus (Bos indicus) and Bos taurus taurus (Bos taurus). Methods to derive the underlying genetic architecture for these two subspecies are essential to develop accurate genomic predictions in these hybrid populations. We propose a novel method to achieve this. First, we use haplotypes to assign SNP alleles to ancestral subspecies of origin in a multi-breed and multi-subspecies population. Then we use a BayesR framework to allow SNP alleles originating from the different subspecies differing effects. Applying this method in a composite population of B. indicus and B. taurus hybrids, our results show that there are underlying genomic differences between the two subspecies, and these effects are not identified in multi-breed genomic evaluations that do not account for subspecies of origin effects. The method slightly improved the accuracy of genomic prediction. More significantly, by allocating SNP alleles to ancestral subspecies of origin, we were able to identify four SNP with high posterior probabilities of inclusion that have not been previously associated with cattle fertility and were close to genes associated with fertility in other species. These results show that haplotypes can be used to trace subspecies of origin through the genome of this hybrid population and, in conjunction with our novel Bayesian analysis, subspecies SNP allele allocation can be used to increase the accuracy of QTL association mapping in genetically diverse populations.
Collapse
Affiliation(s)
- Christie L Warburton
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia.
| | - Roy Costilla
- Agresearch Limited, Ruakura Research Centre, Hamilton, 3214, New Zealand
| | - Bailey N Engle
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| | - Stephen S Moore
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| | - Nicholas J Corbet
- Formerly Central Queensland University, School of Health, Medical and Applied Sciences, Rockhampton, QLD, Australia
| | - Geoffry Fordyce
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| | - Michael R McGowan
- The University of Queensland, School of Veterinary Science, St Lucia, QLD, Australia
| | - Brian M Burns
- Formerly Department of Agriculture and Fisheries, Rockhampton, QLD, Australia
| | - Ben J Hayes
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| |
Collapse
|
24
|
Berry DP, Spangler ML. Animal board invited review: Practical applications of genomic information in livestock. Animal 2023; 17:100996. [PMID: 37820404 DOI: 10.1016/j.animal.2023.100996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 09/08/2023] [Accepted: 09/11/2023] [Indexed: 10/13/2023] Open
Abstract
Access to high-dimensional genomic information in many livestock species is accelerating. This has been greatly aided not only by continual reductions in genotyping costs but also an expansion in the services available that leverage genomic information to create a greater return-on-investment. Genomic information on individual animals has many uses including (1) parentage verification and discovery, (2) traceability, (3) karyotyping, (4) sex determination, (5) reporting and monitoring of mutations conferring major effects or congenital defects, (6) better estimating inbreeding of individuals and coancestry among individuals, (7) mating advice, (8) determining breed composition, (9) enabling precision management, and (10) genomic evaluations; genomic evaluations exploit genome-wide genotype information to improve the accuracy of predicting an animal's (and by extension its progeny's) genetic merit. Genomic data also provide a huge resource for research, albeit the outcome from this research, if successful, should eventually be realised through one of the ten applications already mentioned. The process for generating a genotype all the way from sample procurement to identifying erroneous genotypes is described, as are the steps that should be considered when developing a bespoke genotyping panel for practical application.
Collapse
Affiliation(s)
- D P Berry
- Animal & Grassland Research and Innovation Centre, Teagasc, Moorepark, Fermoy, Cork, Ireland.
| | - M L Spangler
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE, United States
| |
Collapse
|
25
|
Zhu D, Zhao Y, Zhang R, Wu H, Cai G, Wu Z, Wang Y, Hu X. Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population. Genet Sel Evol 2023; 55:72. [PMID: 37853325 PMCID: PMC10583454 DOI: 10.1186/s12711-023-00843-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 09/14/2023] [Indexed: 10/20/2023] Open
Abstract
BACKGROUND Although the accumulation of whole-genome sequencing (WGS) data has accelerated the identification of mutations underlying complex traits, its impact on the accuracy of genomic predictions is limited. Reliable genotyping data and pre-selected beneficial loci can be used to improve prediction accuracy. Previously, we reported a low-coverage sequencing genotyping method that yielded 11.3 million highly accurate single-nucleotide polymorphisms (SNPs) in pigs. Here, we introduce a method termed selective linkage disequilibrium pruning (SLDP), which refines the set of SNPs that show a large gain during prediction of complex traits using whole-genome SNP data. RESULTS We used the SLDP method to identify and select markers among millions of SNPs based on genome-wide association study (GWAS) prior information. We evaluated the performance of SLDP with respect to three real traits and six simulated traits with varying genetic architectures using two representative models (genomic best linear unbiased prediction and BayesR) on samples from 3579 Duroc boars. SLDP was determined by testing 180 combinations of two core parameters (GWAS P-value thresholds and linkage disequilibrium r2). The parameters for each trait were optimized in the training population by five fold cross-validation and then tested in the validation population. Similar to previous GWAS prior-based methods, the performance of SLDP was mainly affected by the genetic architecture of the traits analyzed. Specifically, SLDP performed better for traits controlled by major quantitative trait loci (QTL) or a small number of quantitative trait nucleotides (QTN). Compared with two commercial SNP chips, genotyping-by-sequencing data, and an unselected whole-genome SNP panel, the SLDP strategy led to significant improvements in prediction accuracy, which ranged from 0.84 to 3.22% for real traits controlled by major or moderate QTL and from 1.23 to 11.47% for simulated traits controlled by a small number of QTN. CONCLUSIONS The SLDP marker selection method can be incorporated into mainstream prediction models to yield accuracy improvements for traits with a relatively simple genetic architecture, however, it has no significant advantage for traits not controlled by major QTL. The main factors that affect its performance are the genetic architecture of traits and the reliability of GWAS prior information. Our findings can facilitate the application of WGS-based genomic selection.
Collapse
Affiliation(s)
- Di Zhu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Yiqiang Zhao
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Ran Zhang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Hanyu Wu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
- National Research Facility for Phenotypic and Genotypic Analysis of Model Animals (Beijing), China Agricultural University, Beijing, China
| | - Gengyuan Cai
- National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China
| | - Zhenfang Wu
- National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China.
| | - Yuzhe Wang
- National Research Facility for Phenotypic and Genotypic Analysis of Model Animals (Beijing), China Agricultural University, Beijing, China.
| | - Xiaoxiang Hu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China.
| |
Collapse
|
26
|
Hayes BJ, Copley J, Dodd E, Ross EM, Speight S, Fordyce G. Multi-breed genomic evaluation for tropical beef cattle when no pedigree information is available. Genet Sel Evol 2023; 55:71. [PMID: 37845626 PMCID: PMC10578004 DOI: 10.1186/s12711-023-00847-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 10/04/2023] [Indexed: 10/18/2023] Open
Abstract
BACKGROUND It has been challenging to implement genomic selection in multi-breed tropical beef cattle populations. If commercial (often crossbred) animals could be used in the reference population for these genomic evaluations, this could allow for very large reference populations. In tropical beef systems, such animals often have no pedigree information. Here we investigate potential models for such data, using marker heterozygosity (to model heterosis) and breed composition derived from genetic markers, as covariates in the model. Models treated breed effects as either fixed or random, and included genomic best linear unbiased prediction (GBLUP) and BayesR. A tropically-adapted beef cattle dataset of 29,391 purebred, crossbred and composite commercial animals was used to evaluate the models. RESULTS Treating breed effects as random, in an approach analogous to genetic groups allowed partitioning of the genetic variance into within-breed and across breed-components (even with a large number of breeds), and estimation of within-breed and across-breed genomic estimated breeding values (GEBV). We demonstrate that moderately-accurate (0.30-0.43) GEBV can be calculated using these models. Treating breed effects as random gave more accurate GEBV than treating breed as fixed. A simple GBLUP model where no breed effects were fitted gave the same accuracy (and correlations of GEBV very close to 1) as a model where GEBV for within-breed and the GEBV for (random) across-breed effects were included. When GEBV were predicted for herds with no data in the reference population, BayesR resulted in the highest accuracy, with 3% accuracy improvement averaged across traits, especially when the validation population was less related to the reference population. Estimates of heterosis from our models were in line with previous estimates from beef cattle. A method for estimating the number of effective breed comparisons for each breed combination accumulated across contemporary groups is presented. CONCLUSIONS When no pedigree is available, breed composition and heterosis for inclusion in multi-breed genomic evaluation can be estimated from genotypes. When GEBV were predicted for herds with no data in the reference population, BayesR resulted in the highest accuracy.
Collapse
Affiliation(s)
- Ben J Hayes
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD, 4067, Australia.
| | - James Copley
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD, 4067, Australia
| | - Elsie Dodd
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD, 4067, Australia
| | - Elizabeth M Ross
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD, 4067, Australia
| | - Shannon Speight
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD, 4067, Australia
- BlackBox Co, Mareeba, QLD, 4880, Australia
| | - Geoffry Fordyce
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD, 4067, Australia
| |
Collapse
|
27
|
Xiang R, Fang L, Liu S, Macleod IM, Liu Z, Breen EJ, Gao Y, Liu GE, Tenesa A, Mason BA, Chamberlain AJ, Wray NR, Goddard ME. Gene expression and RNA splicing explain large proportions of the heritability for complex traits in cattle. CELL GENOMICS 2023; 3:100385. [PMID: 37868035 PMCID: PMC10589627 DOI: 10.1016/j.xgen.2023.100385] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 08/10/2022] [Accepted: 07/26/2023] [Indexed: 10/24/2023]
Abstract
Many quantitative trait loci (QTLs) are in non-coding regions. Therefore, QTLs are assumed to affect gene regulation. Gene expression and RNA splicing are primary steps of transcription, so DNA variants changing gene expression (eVariants) or RNA splicing (sVariants) are expected to significantly affect phenotypes. We quantify the contribution of eVariants and sVariants detected from 16 tissues (n = 4,725) to 37 traits of ∼120,000 cattle (average magnitude of genetic correlation between traits = 0.13). Analyzed in Bayesian mixture models, averaged across 37 traits, cis and trans eVariants and sVariants detected from 16 tissues jointly explain 69.2% (SE = 0.5%) of heritability, 44% more than expected from the same number of random variants. This 69.2% includes an average of 24% from trans e-/sVariants (14% more than expected). Averaged across 56 lipidomic traits, multi-tissue cis and trans e-/sVariants also explain 71.5% (SE = 0.3%) of heritability, demonstrating the essential role of proximal and distal regulatory variants in shaping mammalian phenotypes.
Collapse
Affiliation(s)
- Ruidong Xiang
- Faculty of Veterinary & Agricultural Science, the University of Melbourne, Parkville, VIC 3052, Australia
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
- Cambridge-Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Lingzhao Fang
- MRC Human Genetics Unit at the Institute of Genetics and Cancer, the University of Edinburgh, Edinburgh, UK
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Shuli Liu
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China
| | - Iona M. Macleod
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Zhiqian Liu
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Edmond J. Breen
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Yahui Gao
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD 20705, USA
| | - George E. Liu
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD 20705, USA
| | - Albert Tenesa
- MRC Human Genetics Unit at the Institute of Genetics and Cancer, the University of Edinburgh, Edinburgh, UK
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, the University of Edinburgh, Midlothian EH25 9RG, UK
| | - CattleGTEx Consortium
- Faculty of Veterinary & Agricultural Science, the University of Melbourne, Parkville, VIC 3052, Australia
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
- Cambridge-Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
- MRC Human Genetics Unit at the Institute of Genetics and Cancer, the University of Edinburgh, Edinburgh, UK
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD 20705, USA
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, the University of Edinburgh, Midlothian EH25 9RG, UK
- Institute for Molecular Bioscience, the University of Queensland, Brisbane, QLD 4072, Australia
- Queensland Brain Institute, the University of Queensland, Brisbane, QLD 4072, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Brett A. Mason
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| | - Amanda J. Chamberlain
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Naomi R. Wray
- Institute for Molecular Bioscience, the University of Queensland, Brisbane, QLD 4072, Australia
- Queensland Brain Institute, the University of Queensland, Brisbane, QLD 4072, Australia
| | - Michael E. Goddard
- Faculty of Veterinary & Agricultural Science, the University of Melbourne, Parkville, VIC 3052, Australia
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
| |
Collapse
|
28
|
Zhao T, Cheng H. Interpreting single-step genomic evaluation as a neural network of three layers: pedigree, genotypes, and phenotypes. Genet Sel Evol 2023; 55:68. [PMID: 37789273 PMCID: PMC10546757 DOI: 10.1186/s12711-023-00838-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/08/2023] [Indexed: 10/05/2023] Open
Abstract
The single-step approach has become the most widely-used methodology for genomic evaluations when only a subset of phenotyped individuals in the pedigree are genotyped, where the genotypes for non-genotyped individuals are imputed based on gene contents (i.e., genotypes) of genotyped individuals through their pedigree relationships. We proposed a new method named single-step neural network with mixed models (NNMM) to represent single-step genomic evaluations as a neural network of three sequential layers: pedigree, genotypes, and phenotypes. These three sequential layers of information create a unified network instead of two separate steps, allowing the unobserved gene contents of non-genotyped individuals to be sampled based on pedigree, observed genotypes of genotyped individuals, and phenotypes. In addition to imputation of genotypes using all three sources of information, including phenotypes, genotypes, and pedigree, single-step NNMM provides a more flexible framework to allow nonlinear relationships between genotypes and phenotypes, and for individuals to be genotyped with different single-nucleotide polymorphism (SNP) panels. The single-step NNMM has been implemented in the software package "JWAS'.
Collapse
Affiliation(s)
- Tianjing Zhao
- Department of Animal Science, University of California Davis, Davis, CA, 95616, USA
- Integrative Genetics and Genomics Graduate Group, University of California Davis, Davis, CA, 95616, USA
| | - Hao Cheng
- Department of Animal Science, University of California Davis, Davis, CA, 95616, USA.
| |
Collapse
|
29
|
Xie L, Qin J, Rao L, Cui D, Tang X, Chen L, Xiao S, Zhang Z, Huang L. Genetic dissection and genomic prediction for pork cuts and carcass morphology traits in pig. J Anim Sci Biotechnol 2023; 14:116. [PMID: 37660101 PMCID: PMC10475202 DOI: 10.1186/s40104-023-00914-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/02/2023] [Indexed: 09/04/2023] Open
Abstract
BACKGROUND As pre-cut and pre-packaged chilled meat becomes increasingly popular, integrating the carcass-cutting process into the pig industry chain has become a trend. Identifying quantitative trait loci (QTLs) of pork cuts would facilitate the selection of pigs with a higher overall value. However, previous studies solely focused on evaluating the phenotypic and genetic parameters of pork cuts, neglecting the investigation of QTLs influencing these traits. This study involved 17 pork cuts and 12 morphology traits from 2,012 pigs across four populations genotyped using CC1 PorcineSNP50 BeadChips. Our aim was to identify QTLs and evaluate the accuracy of genomic estimated breed values (GEBVs) for pork cuts. RESULTS We identified 14 QTLs and 112 QTLs for 17 pork cuts by GWAS using haplotype and imputation genotypes, respectively. Specifically, we found that HMGA1, VRTN and BMP2 were associated with body length and weight. Subsequent analysis revealed that HMGA1 primarily affects the size of fore leg bones, VRTN primarily affects the number of vertebrates, and BMP2 primarily affects the length of vertebrae and the size of hind leg bones. The prediction accuracy was defined as the correlation between the adjusted phenotype and GEBVs in the validation population, divided by the square root of the trait's heritability. The prediction accuracy of GEBVs for pork cuts varied from 0.342 to 0.693. Notably, ribs, boneless picnic shoulder, tenderloin, hind leg bones, and scapula bones exhibited prediction accuracies exceeding 0.600. Employing better models, increasing marker density through genotype imputation, and pre-selecting markers significantly improved the prediction accuracy of GEBVs. CONCLUSIONS We performed the first study to dissect the genetic mechanism of pork cuts and identified a large number of significant QTLs and potential candidate genes. These findings carry significant implications for the breeding of pork cuts through marker-assisted and genomic selection. Additionally, we have constructed the first reference populations for genomic selection of pork cuts in pigs.
Collapse
Affiliation(s)
- Lei Xie
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Jiangtao Qin
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Lin Rao
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Dengshuai Cui
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Xi Tang
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Liqing Chen
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Shijun Xiao
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Zhiyan Zhang
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| | - Lusheng Huang
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, 330045 China
| |
Collapse
|
30
|
Shannon ML, Muhammad A, James NT, Williams ML, Breeyear J, Edwards T, Mosley JD, Choi L, Kannankeril P, Van Driest S. Variant-based heritability assessment of dexmedetomidine and fentanyl clearance in pediatric patients. Clin Transl Sci 2023; 16:1628-1638. [PMID: 37353859 PMCID: PMC10499425 DOI: 10.1111/cts.13574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 04/12/2023] [Accepted: 06/01/2023] [Indexed: 06/25/2023] Open
Abstract
Despite complex pathways of drug disposition, clinical pharmacogenetic predictors currently rely on only a few high effect variants. Quantification of the polygenic contribution to variability in drug disposition is necessary to prioritize target drugs for pharmacogenomic approaches and guide analytic methods. Dexmedetomidine and fentanyl, often used in postoperative care of pediatric patients, have high rates of inter-individual variability in dosing requirements. Analyzing previously generated population pharmacokinetic parameters, we used Bayesian hierarchical mixed modeling to measure narrow-sense (additive) heritability (h SNP 2 ) of dexmedetomidine and fentanyl clearance in children and identify relative contributions of small, moderate, and large effect-size variants toh SNP 2 . We used genome-wide association studies (GWAS) to identify variants contributing to variation in dexmedetomidine and fentanyl clearance, followed by functional analyses to identify associated pathways. For dexmedetomidine, median clearance was 33.0 L/h (interquartile range [IQR] 23.8-47.9 L/h) andh SNP 2 was estimated to be 0.35 (90% credible interval 0.00-0.90), with 45% ofh SNP 2 attributed to large-, 32% to moderate-, and 23% to small-effect variants. The fentanyl cohort had median clearance of 8.2 L/h (IQR 4.7-16.7 L/h), with estimatedh SNP 2 of 0.30 (90% credible interval 0.00-0.84). Large-effect variants accounted for 30% ofh SNP 2 , whereas moderate- and small-effect variants accounted for 37% and 33%, respectively. As expected, given small sample sizes, no individual variants or pathways were significantly associated with dexmedetomidine or fentanyl clearance by GWAS. We conclude that clearance of both drugs is highly polygenic, motivating the future use of polygenic risk scores to guide appropriate dosing of dexmedetomidine and fentanyl.
Collapse
Affiliation(s)
| | - Ayesha Muhammad
- School of MedicineVanderbilt UniversityNashvilleTennesseeUSA
| | - Nathan T. James
- Department of BiostatisticsVanderbilt University Medical CenterNashvilleTennesseeUSA
- Present address:
Berry Consultants, LLCAustinTexasUSA
| | - Michael L. Williams
- Department of BiostatisticsVanderbilt University Medical CenterNashvilleTennesseeUSA
- Present address:
Department of Clinical Pharmacology and Quantitative PharmacologyAstraZenecaGothenburgSweden
| | - Joseph Breeyear
- Department of MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Todd Edwards
- Department of MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Jonathan D. Mosley
- Department of MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Leena Choi
- Department of BiostatisticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Prince Kannankeril
- Center for Pediatric Precision Medicine, Department of PediatricsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Sara Van Driest
- Department of MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
- Center for Pediatric Precision Medicine, Department of PediatricsVanderbilt University Medical CenterNashvilleTennesseeUSA
- Present address:
All of Us Research ProgramNational Institutes of HealthWashingtonDCUSA
| |
Collapse
|
31
|
Lee HJ, Lee JH, Gondro C, Koh YJ, Lee SH. deepGBLUP: joint deep learning networks and GBLUP framework for accurate genomic prediction of complex traits in Korean native cattle. Genet Sel Evol 2023; 55:56. [PMID: 37525091 PMCID: PMC10392020 DOI: 10.1186/s12711-023-00825-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 07/07/2023] [Indexed: 08/02/2023] Open
Abstract
BACKGROUND Genomic prediction has become widespread as a valuable tool to estimate genetic merit in animal and plant breeding. Here we develop a novel genomic prediction algorithm, called deepGBLUP, which integrates deep learning networks and a genomic best linear unbiased prediction (GBLUP) framework. The deep learning networks assign marker effects using locally-connected layers and subsequently use them to estimate an initial genomic value through fully-connected layers. The GBLUP framework estimates three genomic values (additive, dominance, and epistasis) by leveraging respective genetic relationship matrices. Finally, deepGBLUP predicts a final genomic value by summing all the estimated genomic values. RESULTS We compared the proposed deepGBLUP with the conventional GBLUP and Bayesian methods. Extensive experiments demonstrate that the proposed deepGBLUP yields state-of-the-art performance on Korean native cattle data across diverse traits, marker densities, and training sizes. In addition, they show that the proposed deepGBLUP can outperform the previous methods on simulated data across various heritabilities and quantitative trait loci (QTL) effects. CONCLUSIONS We introduced a novel genomic prediction algorithm, deepGBLUP, which successfully integrates deep learning networks and GBLUP framework. Through comprehensive evaluations on the Korean native cattle data and simulated data, deepGBLUP consistently achieved superior performance across various traits, marker densities, training sizes, heritabilities, and QTL effects. Therefore, deepGBLUP is an efficient method to estimate an accurate genomic value. The source code and manual for deepGBLUP are available at https://github.com/gywns6287/deepGBLUP .
Collapse
Affiliation(s)
- Hyo-Jun Lee
- Department of Bio-AI Convergence, Chungnam National University, 305-764, Daejeon, Korea
| | - Jun Heon Lee
- Division of Animal and Dairy Science, Chungnam National University, 305-764, Daejeon, Korea
| | - Cedric Gondro
- Department of Animal Science, Michigan State University, East Lansing, MI, USA
| | - Yeong Jun Koh
- Department of Computer Science and Engineering, Chungnam National University, 305-764, Daejeon, Korea.
| | - Seung Hwan Lee
- Division of Animal and Dairy Science, Chungnam National University, 305-764, Daejeon, Korea.
| |
Collapse
|
32
|
Jang S, Ros-Freixedes R, Hickey JM, Chen CY, Holl J, Herring WO, Misztal I, Lourenco D. Using pre-selected variants from large-scale whole-genome sequence data for single-step genomic predictions in pigs. Genet Sel Evol 2023; 55:55. [PMID: 37495982 PMCID: PMC10373252 DOI: 10.1186/s12711-023-00831-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 07/18/2023] [Indexed: 07/28/2023] Open
Abstract
BACKGROUND Whole-genome sequence (WGS) data harbor causative variants that may not be present in standard single nucleotide polymorphism (SNP) chip data. The objective of this study was to investigate the impact of using preselected variants from WGS for single-step genomic predictions in maternal and terminal pig lines with up to 1.8k sequenced and 104k sequence imputed animals per line. METHODS Two maternal and four terminal lines were investigated for eight and seven traits, respectively. The number of sequenced animals ranged from 1365 to 1491 for the maternal lines and 381 to 1865 for the terminal lines. Imputation to sequence occurred within each line for 66k to 76k animals for the maternal lines and 29k to 104k animals for the terminal lines. Two preselected SNP sets were generated based on a genome-wide association study (GWAS). Top40k included the SNPs with the lowest p-value in each of the 40k genomic windows, and ChipPlusSign included significant variants integrated into the porcine SNP chip used for routine genotyping. We compared the performance of single-step genomic predictions between using preselected SNP sets assuming equal or different variances and the standard porcine SNP chip. RESULTS In the maternal lines, ChipPlusSign and Top40k showed an average increase in accuracy of 0.6 and 4.9%, respectively, compared to the regular porcine SNP chip. The greatest increase was obtained with Top40k, particularly for fertility traits, for which the initial accuracy based on the standard SNP chip was low. However, in the terminal lines, Top40k resulted in an average loss of accuracy of 1%. ChipPlusSign provided a positive, although small, gain in accuracy (0.9%). Assigning different variances for the SNPs slightly improved accuracies when using variances obtained from BayesR. However, increases were inconsistent across the lines and traits. CONCLUSIONS The benefit of using sequence data depends on the line, the size of the genotyped population, and how the WGS variants are preselected. When WGS data are available on hundreds of thousands of animals, using sequence data presents an advantage but this remains limited in pigs.
Collapse
Affiliation(s)
- Sungbong Jang
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.
| | - Roger Ros-Freixedes
- Departament de Ciència Animal, Universitat de Lleida-Agrotecnio-CERCA Center, Lleida, Spain
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Ching-Yi Chen
- The Pig Improvement Company, Genus Plc, Hendersonville, TN, USA
| | - Justin Holl
- The Pig Improvement Company, Genus Plc, Hendersonville, TN, USA
| | | | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| |
Collapse
|
33
|
Jørgensen D, Ropstad EO, Meuwissen T, Lingaas F. Genomic analysis and prediction of genomic values for distichiasis in Staffordshire bull terriers. Canine Med Genet 2023; 10:9. [PMID: 37488637 PMCID: PMC10367371 DOI: 10.1186/s40575-023-00132-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 07/16/2023] [Indexed: 07/26/2023] Open
Abstract
BACKGROUND Distichiasis is a condition characterized by aberrant hairs along the eyelid margins. The symptoms are usually mild but can lead to ulcerations and lesions of the cornea in severe cases. It is the most frequently noted ocular disorder in Norwegian Staffordshire bull terriers (SBT), with a prevalence above 18% in the adult population. A complex inheritance is assumed, but there is sparse knowledge about the genetic background of distichiasis in dogs. We have performed a genome-wide association study of distichiasis in SBT and used genomic data in an attempt to predict genomic values for the disorder. RESULTS We identified four genetic regions on CFA1, CFA18, CFA32 and CFA34 using a mixed linear model association analysis and a Bayesian mixed model analysis. Genomic values were predicted using GBLUP and a Bayesian approach, BayesR. The genomic prediction showed that the 1/4 of dogs with predicted values most likely to acquire distichiasis had a 3.9 -4.0 times higher risk of developing distichiasis compared to the quarter (1/4) of dogs least likely to acquire the disease. There was no significant difference between the two methods used. CONCLUSION Four genomic regions associated with distichiasis were discovered in the association analysis, suggesting that distichiasis in SBT is a complex trait involving numerous loci. The four associated regions need to be confirmed in an independent sample. We also used all 95 K SNPs for genomic prediction and showed that genomic prediction can be a helpful tool in selective breeding schemes at breed level aiming at reducing the prevalence of distichiasis in SBTs in the future, even if the predictive value of single dogs may be low.
Collapse
Affiliation(s)
- Dina Jørgensen
- Medical Genetics Unit, Faculty of Veterinary Medicine, Norwegian University of Life Sciences, P.O. box 5003, 1432, Ås, Norway.
| | | | - Theodorus Meuwissen
- Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, P.O. box 5003, 1432, Ås, Norway
| | - Frode Lingaas
- Medical Genetics Unit, Faculty of Veterinary Medicine, Norwegian University of Life Sciences, P.O. box 5003, 1432, Ås, Norway
| |
Collapse
|
34
|
Sahana G, Cai Z, Sanchez MP, Bouwman AC, Boichard D. Invited review: Good practices in genome-wide association studies to identify candidate sequence variants in dairy cattle. J Dairy Sci 2023:S0022-0302(23)00357-0. [PMID: 37349208 DOI: 10.3168/jds.2022-22694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 02/01/2023] [Indexed: 06/24/2023]
Abstract
Genotype data from dairy cattle selection programs have greatly facilitated GWAS to identify variants related to economic traits. Results can enhance the accuracy of genomic prediction, analyze more complex models that go beyond additive effects, elucidate the genetic architecture of a trait, and finally, decipher the underlying biology of traits. The entire process, comprising data generation, quality control, statistical analyses, interpretation of association results, and linking results to biology should be designed and executed to minimize the generation of false-positive and false-negative associations and misleading links to biological processes. This review aims to provide general guidelines for data analysis that address data quality control, association tests, adjustment for population stratification, and significance evaluation to improve the reliability of conclusions. We also provide guidance on post-GWAS strategy and the interpretation of results. These guidelines are tailored to dairy cattle, which are characterized by long-range linkage disequilibrium, large half-sib families, and routinely collected phenotypes, requiring different approaches than those applied in human GWAS. We discuss common limitations and challenges that have been overlooked in the analysis and interpretation of GWAS to identify candidate sequence variants in dairy cattle.
Collapse
Affiliation(s)
- G Sahana
- Aarhus University, Center for Quantitative Genetic and Genomics, 8830 Tjele, Denmark.
| | - Z Cai
- Aarhus University, Center for Quantitative Genetic and Genomics, 8830 Tjele, Denmark
| | - M P Sanchez
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France
| | - A C Bouwman
- Wageningen University & Research, Animal Breeding and Genomics, 6700 AH Wageningen, the Netherlands
| | - D Boichard
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France
| |
Collapse
|
35
|
Vahedi SM, Salek Ardetani S, Brito LF, Karimi K, Pahlavan Afshari K, Banabazi MH. Expanding the application of haplotype-based genomic predictions to the wild: A case of antibody response against Teladorsagia circumcincta in Soay sheep. BMC Genomics 2023; 24:335. [PMID: 37330501 PMCID: PMC10276919 DOI: 10.1186/s12864-023-09407-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 05/24/2023] [Indexed: 06/19/2023] Open
Abstract
BACKGROUND Genomic prediction of breeding values (GP) has been adopted in evolutionary genomic studies to uncover microevolutionary processes of wild populations or improve captive breeding strategies. While recent evolutionary studies applied GP with individual single nucleotide polymorphism (SNP), haplotype-based GP could outperform individual SNP predictions through better capturing the linkage disequilibrium (LD) between the SNP and quantitative trait loci (QTL). This study aimed to evaluate the accuracy and bias of haplotype-based GP of immunoglobulin (Ig) A (IgA), IgE, and IgG against Teladorsagia circumcincta in lambs of an unmanaged sheep population (Soay breed) based on Genomic Best Linear Unbiased Prediction (GBLUP) and five Bayesian [BayesA, BayesB, BayesCπ, Bayesian Lasso (BayesL), and BayesR] methods. RESULTS The accuracy and bias of GPs using SNP, haplotypic pseudo-SNP from blocks with different LD thresholds (0.15, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1.00), or the combinations of pseudo-SNPs and non-LD clustered SNPs were obtained. Across methods and marker sets, higher ranges of genomic estimated breeding values (GEBV) accuracies were observed for IgA (0.20 to 0.49), followed by IgE (0.08 to 0.20) and IgG (0.05 to 0.14). Considering the methods evaluated, up to 8% gains in GP accuracy of IgG were achieved using pseudo-SNPs compared to SNPs. Up to 3% gain in GP accuracy for IgA was also obtained using the combinations of the pseudo-SNPs with non-clustered SNPs in comparison to fitting individual SNP. No improvement in GP accuracy of IgE was observed using haplotypic pseudo-SNPs or their combination with non-clustered SNPs compared to individual SNP. Bayesian methods outperformed GBLUP for all traits. Most scenarios yielded lower accuracies for all traits with an increased LD threshold. GP models using haplotypic pseudo-SNPs predicted less-biased GEBVs mainly for IgG. For this trait, lower bias was observed with higher LD thresholds, whereas no distinct trend was observed for other traits with changes in LD. CONCLUSIONS Haplotype information improves GP performance of anti-helminthic antibody traits of IgA and IgG compared to fitting individual SNP. The observed gains in the predictive performances indicate that haplotype-based methods could benefit GP of some traits in wild animal populations.
Collapse
Affiliation(s)
- Seyed Milad Vahedi
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, B2N5E3, Canada
| | | | - Luiz F Brito
- Department of Animal Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Karim Karimi
- Molecular Diagnostics Program, Verspeeten Clinical Genome Centre, London Health Sciences Centre, London, ON, N6A 5W9, Canada
| | - Kian Pahlavan Afshari
- Department of Animal Sciences, Islamic Azad University, Varamin, Varamin-Pishva Branch3381774895, Iran
| | - Mohammad Hossein Banabazi
- Department of Animal Breeding and Genetics (HGEN), Centre for Veterinary Medicine and Animal Science (VHC), Swedish University of Agricultural Sciences (SLU), 75007, Uppsala, Sweden.
- Department of Biotechnology, Animal Science Research Institute of IRAN (ASRI), Agricultural Research, Education & Extension Organization (AREEO), Karaj, 3146618361, Iran.
| |
Collapse
|
36
|
Clasen JB, Fikse WF, Su G, Karaman E. Multibreed genomic prediction using summary statistics and a breed-origin-of-alleles approach. Heredity (Edinb) 2023:10.1038/s41437-023-00619-4. [PMID: 37231157 DOI: 10.1038/s41437-023-00619-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 04/11/2023] [Accepted: 04/26/2023] [Indexed: 05/27/2023] Open
Abstract
Because of an increasing interest in crossbreeding between dairy breeds in dairy cattle herds, farmers are requesting breeding values for crossbred animals. However, genomically enhanced breeding values are difficult to predict in crossbred populations because the genetic make-up of crossbred individuals is unlikely to follow the same pattern as for purebreds. Furthermore, sharing genotype and phenotype information between breed populations are not always possible, which means that genetic merit (GM) for crossbred animals may be predicted without the information needed from some pure breeds, resulting in low prediction accuracy. This simulation study investigated the consequences of using summary statistics from single-breed genomic predictions for some or all pure breeds in two- and three-breed rotational crosses, rather than their raw data. A genomic prediction model taking into account the breed-origin of alleles (BOA) was considered. Because of a high genomic correlation between the breeds simulated (0.62-0.87), the prediction accuracies using the BOA approach were similar to a joint model, assuming homogeneous SNP effects for these breeds. Having a reference population with summary statistics available from all pure breeds and full phenotype and genotype information from crossbreds yielded almost as high prediction accuracies (0.720-0.768) as having a reference population with full information from all pure breeds and crossbreds (0.753-0.789). Lacking information from the pure breeds yielded much lower prediction accuracies (0.590-0.676). Furthermore, including crossbred animals in a combined reference population also benefitted prediction accuracies in the purebred animals, especially for the smallest breed population.
Collapse
Affiliation(s)
- J B Clasen
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Box 7023, 75007, Uppsala, Sweden.
- Center for Quantitative Genetics and Genomics, Aarhus University, C. F. Møllers Allé 8, DK-8000, Aarhus, Denmark.
| | - W F Fikse
- Växa Sverige, Swedish University of Agricultural Sciences, Ulls väg 26, 756 51, Uppsala, Sweden
| | - G Su
- Center for Quantitative Genetics and Genomics, Aarhus University, C. F. Møllers Allé 8, DK-8000, Aarhus, Denmark
| | - E Karaman
- Center for Quantitative Genetics and Genomics, Aarhus University, C. F. Møllers Allé 8, DK-8000, Aarhus, Denmark
| |
Collapse
|
37
|
Jang S, Ros-Freixedes R, Hickey JM, Chen CY, Herring WO, Holl J, Misztal I, Lourenco D. Multi-line ssGBLUP evaluation using preselected markers from whole-genome sequence data in pigs. Front Genet 2023; 14:1163626. [PMID: 37252662 PMCID: PMC10213539 DOI: 10.3389/fgene.2023.1163626] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 05/03/2023] [Indexed: 05/31/2023] Open
Abstract
Genomic evaluations in pigs could benefit from using multi-line data along with whole-genome sequencing (WGS) if the data are large enough to represent the variability across populations. The objective of this study was to investigate strategies to combine large-scale data from different terminal pig lines in a multi-line genomic evaluation (MLE) through single-step GBLUP (ssGBLUP) models while including variants preselected from whole-genome sequence (WGS) data. We investigated single-line and multi-line evaluations for five traits recorded in three terminal lines. The number of sequenced animals in each line ranged from 731 to 1,865, with 60k to 104k imputed to WGS. Unknown parent groups (UPG) and metafounders (MF) were explored to account for genetic differences among the lines and improve the compatibility between pedigree and genomic relationships in the MLE. Sequence variants were preselected based on multi-line genome-wide association studies (GWAS) or linkage disequilibrium (LD) pruning. These preselected variant sets were used for ssGBLUP predictions without and with weights from BayesR, and the performances were compared to that of a commercial porcine single-nucleotide polymorphisms (SNP) chip. Using UPG and MF in MLE showed small to no gain in prediction accuracy (up to 0.02), depending on the lines and traits, compared to the single-line genomic evaluation (SLE). Likewise, adding selected variants from the GWAS to the commercial SNP chip resulted in a maximum increase of 0.02 in the prediction accuracy, only for average daily feed intake in the most numerous lines. In addition, no benefits were observed when using preselected sequence variants in multi-line genomic predictions. Weights from BayesR did not help improve the performance of ssGBLUP. This study revealed limited benefits of using preselected whole-genome sequence variants for multi-line genomic predictions, even when tens of thousands of animals had imputed sequence data. Correctly accounting for line differences with UPG or MF in MLE is essential to obtain predictions similar to SLE; however, the only observed benefit of an MLE is to have comparable predictions across lines. Further investigation into the amount of data and novel methods to preselect whole-genome causative variants in combined populations would be of significant interest.
Collapse
Affiliation(s)
- Sungbong Jang
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States
| | - Roger Ros-Freixedes
- Departament de Ciència Animal, Universitat de Lleida-Agrotecnio-CERCA Center, Lleida, Spain
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, Scotland, United Kingdom
| | - Ching-Yi Chen
- The Pig Improvement Company, Genus plc, Hendersonville, TN, United States
| | - William O Herring
- The Pig Improvement Company, Genus plc, Hendersonville, TN, United States
| | - Justin Holl
- The Pig Improvement Company, Genus plc, Hendersonville, TN, United States
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States
| |
Collapse
|
38
|
Wolf MJ, Neumann GB, Kokuć P, Yin T, Brockmann GA, König S, May K. Genetic evaluations for endangered dual-purpose German Black Pied cattle using 50K SNPs, a breed-specific 200K chip, and whole-genome sequencing. J Dairy Sci 2023; 106:3345-3358. [PMID: 37028956 DOI: 10.3168/jds.2022-22665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 12/16/2022] [Indexed: 04/09/2023]
Abstract
Genetic evaluations of local cattle breeds are hampered due to small reference groups or biased due to the utilization of SNP effects estimated in other large populations. Against this background, there is a lack of studies addressing the possible advantage of whole-genome sequences (WGS) or consideration of specific variants from WGS data in genomic predictions for local breeds with small population size. Consequently, the aim of this study was to compare genetic parameters and accuracies of genomic estimated breeding values (GEBV) for 305-d production traits, fat-to protein ratio (FPR), and somatic cell score (SCS) at the first test date after calving and confirmation traits of the endangered German Black Pied cattle (DSN) breed using 4 different marker panels: (1) the commercial 50K Illumina BovineSNP50 BeadChip, (2) a customized 200K chip designed for DSN (DSN200K) which considers the most important variants for DSN from WGS, (3) randomly generated 200K chips based on WGS data, and (4) a WGS panel. The same number of animals was considered for all marker panel analyses (i.e., 1,811 genotyped or sequenced cows for conformation traits, 2,383 cows for lactation production traits, and 2,420 cows for FPR and SCS). Mixed models for the estimation of genetic parameters directly included the respective genomic relationship matrix from the different marker panels plus the trait-specific fixed effects. For the calculation of GEBV accuracies, we applied repeated random subsampling validation. In the process of separate cross-validations per trait, we created a validation set including 20% of cows with masked phenotypes, and a training set comprising 80% of the cows. The cows were selected randomly in a procedure with 10 replicates considering replacements in the different scenarios. The accuracy was defined as the correlation between the direct GEBV and the phenotypes with subtracted corresponding fixed effects for the cows in the validation set. For FPR and SCS, as well as for lactation production traits, heritabilities were largest based on WGS data, but the increase compared with the 50K or DSN200K applications was quite small in the range from 0.01 to 0.03. Also, for most of the conformation traits, heritabilities were largest based on WGS and DSN200K data, but the increase was in the range of the corresponding standard error. Accordingly, GEBV accuracies for most of the studied traits were highest based on WGS data or when utilizing the DSN200K chip, but the accuracy differences across the marker panels were quite small and nonsignificant. In conclusion, WGS data and the DSN200K chip only contributed to minor improvements in genomic predictions, still justifying the use of the commercial 50K chip. Nevertheless, WGS and the 200KDSN chip harbor breed-specific variants, which are valuable for studying causal genetic mechanisms in the endangered DSN population.
Collapse
Affiliation(s)
- Manuel J Wolf
- Institute of Animal Breeding and Genetics, Justus-Liebig-University Gießen, 35390 Gießen, Germany
| | - Guilherme B Neumann
- Animal Breeding Biology and Molecular Genetics, Albrecht Daniel Thaer-Institute for Agricultural and Horticultural Sciences, Humboldt Universität zu Berlin, 10115 Berlin, Germany
| | - Paula Kokuć
- Animal Breeding Biology and Molecular Genetics, Albrecht Daniel Thaer-Institute for Agricultural and Horticultural Sciences, Humboldt Universität zu Berlin, 10115 Berlin, Germany
| | - Tong Yin
- Institute of Animal Breeding and Genetics, Justus-Liebig-University Gießen, 35390 Gießen, Germany
| | - Gudrun A Brockmann
- Animal Breeding Biology and Molecular Genetics, Albrecht Daniel Thaer-Institute for Agricultural and Horticultural Sciences, Humboldt Universität zu Berlin, 10115 Berlin, Germany
| | - Sven König
- Institute of Animal Breeding and Genetics, Justus-Liebig-University Gießen, 35390 Gießen, Germany.
| | - Katharina May
- Institute of Animal Breeding and Genetics, Justus-Liebig-University Gießen, 35390 Gießen, Germany
| |
Collapse
|
39
|
Qu J, Runcie D, Cheng H. Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits. Genetics 2023; 223:6931802. [PMID: 36529897 PMCID: PMC9991502 DOI: 10.1093/genetics/iyac183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 05/06/2022] [Accepted: 11/17/2022] [Indexed: 12/23/2022] Open
Abstract
Large-scale phenotype data are expected to increase the accuracy of genome-wide prediction and the power of genome-wide association analyses. However, genomic analyses of high-dimensional, highly correlated traits are challenging. We developed a method for implementing high-dimensional Bayesian multivariate regression to simultaneously analyze genetic variants underlying thousands of traits. As a demonstration, we implemented the BayesC prior in the R package MegaLMM. Applied to Genomic Prediction, MegaBayesC effectively integrated hyperspectral reflectance data from 620 hyperspectral wavelengths to improve the accuracy of genetic value prediction on grain yield in a wheat dataset. Applied to Genome-Wide Association Studies, we used simulations to show that MegaBayesC can accurately estimate the effect sizes of QTL across a range of genetic architectures and causes of correlations among traits. To apply MegaBayesC to a realistic scenario involving whole-genome marker data, we developed a 2-stage procedure involving a preliminary step of candidate marker selection prior to multivariate regression. We then used MegaBayesC to identify genetic associations with flowering time in Arabidopsis thaliana, leveraging expression data from 20,843 genes. MegaBayesC selected 15 single nucleotide polymorphisms as important for flowering time, with 13 located within 100 kb of known flowering-time related genes, a higher validation rate than achieved by a single-stage analysis using only the flowering time data itself. These results demonstrate that MegaBayesC can efficiently and effectively leverage high-dimensional phenotypes in genetic analyses.
Collapse
Affiliation(s)
- Jiayi Qu
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA
| | - Daniel Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| | - Hao Cheng
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| |
Collapse
|
40
|
Forutan M, Lynn A, Aliloo H, Clark SA, McGilchrist P, Polkinghorne R, Hayes BJ. Predicting phenotypes of beef eating quality traits. Front Genet 2023; 14:1089490. [PMID: 36816029 PMCID: PMC9936823 DOI: 10.3389/fgene.2023.1089490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 01/19/2023] [Indexed: 02/04/2023] Open
Abstract
Introduction: Phenotype predictions of beef eating quality for individual animals could be used to allocate animals to longer and more expensive feeding regimes as they enter the feedlot if they are predicted to have higher eating quality, and to sort carcasses into consumer or market value categories. Phenotype predictions can include genetic effects (breed effects, heterosis and breeding value), predicted from genetic markers, as well as fixed effects such as days aged and carcass weight, hump height, ossification, and hormone growth promotant (HGP) status. Methods: Here we assessed accuracy of phenotype predictions for five eating quality traits (tenderness, juiciness, flavour, overall liking and MQ4) in striploins from 1701 animals from a wide variety of backgrounds, including Bos indicus and Bos taurus breeds, using genotypes and simple fixed effects including days aged and carcass weight. The genetic components were predicted based on 709k single nucleotide polymorphism (SNP) using BayesR model, which assumes some markers may have a moderate to large effect. Fixed effects in the prediction included principal components of the genomic relationship matrix, to account for breed effects, heterosis, days aged and carcass weight. Results and Discussion: A model which allowed breed effects to be captured in the SNP effects (e.g., not explicitly fitting these effects) tended to have slightly higher accuracies (0.43-0.50) compared to when these effects were explicitly fitted as fixed effects (0.42-0.49), perhaps because breed effects when explicitly fitted were estimated with more error than when incorporated into the (random) SNP effects. Adding estimates of effects of days aged and carcass weight did not increase the accuracy of phenotype predictions in this particular analysis. The accuracy of phenotype prediction for beef eating quality traits was sufficiently high that such predictions could be useful in predicting eating quality from DNA samples taken from an animal/carcass as it enters the processing plant, to enable optimal supply chain value extraction by sorting product into markets with different quality. The BayesR predictions identified several novel genes potentially associated with beef eating quality.
Collapse
Affiliation(s)
- Mehrnush Forutan
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia,*Correspondence: Mehrnush Forutan,
| | - Andrew Lynn
- School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| | - Hassan Aliloo
- School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| | - Samuel A. Clark
- School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| | - Peter McGilchrist
- School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| | | | - Ben J. Hayes
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
41
|
Xia X, Zhang Y, Wei Y, Wang MH. Statistical Methods for Disease Risk Prediction with Genotype Data. Methods Mol Biol 2023; 2629:331-347. [PMID: 36929084 DOI: 10.1007/978-1-0716-2986-4_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Single-nucleotide polymorphism (SNP) is the basic unit to understand the heritability of complex traits. One attractive application of the susceptible SNPs is to construct prediction models for assessing disease risk. Here, we introduce prediction methods for human traits using SNPs data, including the polygenic risk score (PRS), linear mixed models (LMMs), penalized regressions, and methods for controlling population stratification.
Collapse
Affiliation(s)
- Xiaoxuan Xia
- JC School of Public Health and Primary Care, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong
- Department of Statistics, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong
| | | | - Yingying Wei
- Department of Statistics, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong
| | - Maggie Haitian Wang
- JC School of Public Health and Primary Care, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong.
- CUHK Shenzhen Institute, Shenzhen, China.
| |
Collapse
|
42
|
Tengvall K, Sundström E, Wang C, Bergvall K, Wallerman O, Pederson E, Karlsson Å, Harvey ND, Blott SC, Olby N, Olivry T, Brander G, Meadows JRS, Roosje P, Leeb T, Hedhammar Å, Andersson G, Lindblad-Toh K. Bayesian model and selection signature analyses reveal risk factors for canine atopic dermatitis. Commun Biol 2022; 5:1348. [PMID: 36482174 PMCID: PMC9731970 DOI: 10.1038/s42003-022-04279-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 11/18/2022] [Indexed: 12/13/2022] Open
Abstract
Canine atopic dermatitis is an inflammatory skin disease with clinical similarities to human atopic dermatitis. Several dog breeds are at increased risk for developing this disease but previous genetic associations are poorly defined. To identify additional genetic risk factors for canine atopic dermatitis, we here apply a Bayesian mixture model adapted for mapping complex traits and a cross-population extended haplotype test to search for disease-associated loci and selective sweeps in four dog breeds at risk for atopic dermatitis. We define 15 associated loci and eight candidate regions under selection by comparing cases with controls. One associated locus is syntenic to the major genetic risk locus (Filaggrin locus) in human atopic dermatitis. One selection signal in common type Labrador retriever cases positions across the TBC1D1 gene (body weight) and one signal of selection in working type German shepherd controls overlaps the LRP1B gene (brain), near the KYNU gene (psoriasis). In conclusion, we identify candidate genes, including genes belonging to the same biological pathways across multiple loci, with potential relevance to the pathogenesis of canine atopic dermatitis. The results show genetic similarities between dog and human atopic dermatitis, and future across-species genetic comparisons are hereby further motivated.
Collapse
Affiliation(s)
- Katarina Tengvall
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
| | - Elisabeth Sundström
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Chao Wang
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Kerstin Bergvall
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Ola Wallerman
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Eric Pederson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Åsa Karlsson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Naomi D Harvey
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Leicestershire, UK
| | - Sarah C Blott
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Leicestershire, UK
| | - Natasha Olby
- Department of Clinical Sciences, North Carolina State University, Raleigh, NC, USA
| | - Thierry Olivry
- Department of Clinical Sciences, North Carolina State University College of Veterinary Medicine, Raleigh, NC, USA
| | - Gustaf Brander
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jennifer R S Meadows
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Petra Roosje
- Division of Clinical Dermatology, Department of Clinical Veterinary Medicine, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Tosso Leeb
- Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Åke Hedhammar
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Göran Andersson
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
43
|
Ashraf B, Hunter DC, Bérénos C, Ellis PA, Johnston SE, Pilkington JG, Pemberton JM, Slate J. Genomic prediction in the wild: A case study in Soay sheep. Mol Ecol 2022; 31:6541-6555. [PMID: 34719074 DOI: 10.1111/mec.16262] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 10/13/2021] [Accepted: 10/25/2021] [Indexed: 01/13/2023]
Abstract
Genomic prediction, the technique whereby an individual's genetic component of their phenotype is estimated from its genome, has revolutionised animal and plant breeding and medical genetics. However, despite being first introduced nearly two decades ago, it has hardly been adopted by the evolutionary genetics community studying wild organisms. Here, genomic prediction is performed on eight traits in a wild population of Soay sheep. The population has been the focus of a >30 year evolutionary ecology study and there is already considerable understanding of the genetic architecture of the focal Mendelian and quantitative traits. We show that the accuracy of genomic prediction is high for all traits, but especially those with loci of large effect segregating. Five different methods are compared, and the two methods that can accommodate zero-effect and large-effect loci in the same model tend to perform best. If the accuracy of genomic prediction is similar in other wild populations, then there is a real opportunity for pedigree-free molecular quantitative genetics research to be enabled in many more wild populations; currently the literature is dominated by studies that have required decades of field data collection to generate sufficiently deep pedigrees. Finally, some of the potential applications of genomic prediction in wild populations are discussed.
Collapse
Affiliation(s)
- Bilal Ashraf
- School of Biosciences, University of Sheffield, Sheffield, UK.,Department of Anthropology, Durham University, Durham, UK
| | - Darren C Hunter
- School of Biosciences, University of Sheffield, Sheffield, UK.,School of Biology, University of St Andrews, St Andrews, UK
| | - Camillo Bérénos
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | - Philip A Ellis
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | - Susan E Johnston
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | - Jill G Pilkington
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | | | - Jon Slate
- School of Biosciences, University of Sheffield, Sheffield, UK
| |
Collapse
|
44
|
Jones HE, Wilson PB. Progress and opportunities through use of genomics in animal production. Trends Genet 2022; 38:1228-1252. [PMID: 35945076 DOI: 10.1016/j.tig.2022.06.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 06/08/2022] [Accepted: 06/17/2022] [Indexed: 01/24/2023]
Abstract
The rearing of farmed animals is a vital component of global food production systems, but its impact on the environment, human health, animal welfare, and biodiversity is being increasingly challenged. Developments in genetic and genomic technologies have had a key role in improving the productivity of farmed animals for decades. Advances in genome sequencing, annotation, and editing offer a means not only to continue that trend, but also, when combined with advanced data collection, analytics, cloud computing, appropriate infrastructure, and regulation, to take precision livestock farming (PLF) and conservation to an advanced level. Such an approach could generate substantial additional benefits in terms of reducing use of resources, health treatments, and environmental impact, while also improving animal health and welfare.
Collapse
Affiliation(s)
- Huw E Jones
- UK Genetics for Livestock and Equines (UKGLE) Committee, Department for Environment, Food and Rural Affairs, Nobel House, 17 Smith Square, London, SW1P 3JR, UK; Nottingham Trent University, Brackenhurst Campus, Brackenhurst Lane, Southwell, NG25 0QF, UK.
| | - Philippe B Wilson
- UK Genetics for Livestock and Equines (UKGLE) Committee, Department for Environment, Food and Rural Affairs, Nobel House, 17 Smith Square, London, SW1P 3JR, UK; Nottingham Trent University, Brackenhurst Campus, Brackenhurst Lane, Southwell, NG25 0QF, UK
| |
Collapse
|
45
|
Tahir MS, Porto-Neto LR, Reverter-Gomez T, Olasege BS, Sajid MR, Wockner KB, Tan AWL, Fortes MRS. Utility of multi-omics data to inform genomic prediction of heifer fertility traits. J Anim Sci 2022; 100:skac340. [PMID: 36239447 PMCID: PMC9733504 DOI: 10.1093/jas/skac340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 10/12/2022] [Indexed: 12/15/2022] Open
Abstract
Biologically informed single nucleotide polymorphisms (SNPs) impact genomic prediction accuracy of the target traits. Our previous genomics, proteomics, and transcriptomics work identified candidate genes related to puberty and fertility in Brahman heifers. We aimed to test this biological information for capturing heritability and predicting heifer fertility traits in another breed i.e., Tropical Composite. The SNP from the identified genes including 10 kilobases (kb) region on either side were selected as biologically informed SNP set. The SNP from the rest of the Bos taurus genes including 10-kb region on either side were selected as biologically uninformed SNP set. Bovine high-density (HD) complete SNP set (628,323 SNP) was used as a control. Two populations-Tropical Composites (N = 1331) and Brahman (N = 2310)-had records for three traits: pregnancy after first mating season (PREG1, binary), first conception score (FCS, score 1 to 3), and rebreeding score (REB, score 1 to 3.5). Using the best linear unbiased prediction method, effectiveness of each SNP set to predict the traits was tested in two scenarios: a 5-fold cross-validation within Tropical Composites using biological information from Brahman studies, and application of prediction equations from one breed to the other. The accuracy of prediction was calculated as the correlation between genomic estimated breeding values and adjusted phenotypes. Results show that biologically informed SNP set estimated heritabilities not significantly better than the control HD complete SNP set in Tropical Composites; however, it captured all the observed genetic variance in PREG1 and FCS when modeled together with the biologically uninformed SNP set. In 5-fold cross-validation within Tropical Composites, the biologically informed SNP set performed marginally better (statistically insignificant) in terms of prediction accuracies (PREG1: 0.20, FCS: 0.13, and REB: 0.12) as compared to HD complete SNP set (PREG1: 0.17, FCS: 0.10, and REB: 0.11), and biologically uninformed SNP set (PREG1: 0.16, FCS: 0.10, and REB: 0.11). Across-breed use of prediction equations still remained a challenge: accuracies by all SNP sets dropped to around zero for all traits. The performance of biologically informed SNP was not significantly better than other sets in Tropical Composites. However, results indicate that biological information obtained from Brahman was successful to predict the fertility traits in Tropical Composite population.
Collapse
Affiliation(s)
- Muhammad S Tahir
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| | - Laercio R Porto-Neto
- Commonwealth Scientific and Industrial Research Organization, St. Lucia, Brisbane 4072, QLD, Australia
| | - Toni Reverter-Gomez
- Commonwealth Scientific and Industrial Research Organization, St. Lucia, Brisbane 4072, QLD, Australia
| | - Babatunde S Olasege
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| | - Mirza R Sajid
- Department of Statistics, University of Gujrat, 50700 Punjab, Pakistan
| | - Kimberley B Wockner
- Queensland Department of Agriculture and Fisheries, Brisbane 4072, QLD, Australia
| | - Andre W L Tan
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| | - Marina R S Fortes
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| |
Collapse
|
46
|
An Improved Bayesian Shrinkage Regression Algorithm for Genomic Selection. Genes (Basel) 2022; 13:genes13122193. [PMID: 36553460 PMCID: PMC9778053 DOI: 10.3390/genes13122193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/14/2022] [Accepted: 11/18/2022] [Indexed: 11/25/2022] Open
Abstract
Currently a hot topic, genomic selection (GS) has consistently provided powerful support for breeding studies and achieved more comprehensive and reliable selection in animal and plant breeding. GS estimates the effects of all single nucleotide polymorphisms (SNPs) and thereby predicts the genomic estimation of breeding value (GEBV), accelerating breeding progress and overcoming the limitations of conventional breeding. The successful application of GS primarily depends on the accuracy of the GEBV. Adopting appropriate advanced algorithms to improve the accuracy of the GEBV is time-saving and efficient for breeders, and the available algorithms can be further improved in the big data era. In this study, we develop a new algorithm under the Bayesian Shrinkage Regression (BSR, which is called BayesA) framework, an improved expectation-maximization algorithm for BayesA (emBAI). The emBAI algorithm first corrects the polygenic and environmental noise and then calculates the GEBV by emBayesA. We conduct two simulation experiments and a real dataset analysis for flowering time-related Arabidopsis phenotypes to validate the new algorithm. Compared to established methods, emBAI is more powerful in terms of prediction accuracy, mean square error (MSE), mean absolute error (MAE), the area under the receiver operating characteristic curve (AUC) and correlation of prediction in simulation studies. In addition, emBAI performs well under the increasing genetic background. The analysis of the Arabidopsis real dataset further illustrates the benefits of emBAI for genomic prediction according to prediction accuracy, MSE, MAE and correlation of prediction. Furthermore, the new method shows the advantages of significant loci detection and effect coefficient estimation, which are confirmed by The Arabidopsis Information Resource (TAIR) gene bank. In conclusion, the emBAI algorithm provides powerful support for GS in high-dimensional genomic datasets.
Collapse
|
47
|
Hardner CM, Fikere M, Gasic K, da Silva Linge C, Worthington M, Byrne D, Rawandoozi Z, Peace C. Multi-environment genomic prediction for soluble solids content in peach ( Prunus persica). FRONTIERS IN PLANT SCIENCE 2022; 13:960449. [PMID: 36275520 PMCID: PMC9583944 DOI: 10.3389/fpls.2022.960449] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 08/01/2022] [Indexed: 06/16/2023]
Abstract
Genotype-by-environment interaction (G × E) is a common phenomenon influencing genetic improvement in plants, and a good understanding of this phenomenon is important for breeding and cultivar deployment strategies. However, there is little information on G × E in horticultural tree crops, mostly due to evaluation costs, leading to a focus on the development and deployment of locally adapted germplasm. Using sweetness (measured as soluble solids content, SSC) in peach/nectarine assessed at four trials from three US peach-breeding programs as a case study, we evaluated the hypotheses that (i) complex data from multiple breeding programs can be connected using GBLUP models to improve the knowledge of G × E for breeding and deployment and (ii) accounting for a known large-effect quantitative trait locus (QTL) improves the prediction accuracy. Following a structured strategy using univariate and multivariate models containing additive and dominance genomic effects on SSC, a model that included a previously detected QTL and background genomic effects was a significantly better fit than a genome-wide model with completely anonymous markers. Estimates of an individual's narrow-sense and broad-sense heritability for SSC were high (0.57-0.73 and 0.66-0.80, respectively), with 19-32% of total genomic variance explained by the QTL. Genome-wide dominance effects and QTL effects were stable across environments. Significant G × E was detected for background genome effects, mostly due to the low correlation of these effects across seasons within a particular trial. The expected prediction accuracy, estimated from the linear model, was higher than the realised prediction accuracy estimated by cross-validation, suggesting that these two parameters measure different qualities of the prediction models. While prediction accuracy was improved in some cases by combining data across trials, particularly when phenotypic data for untested individuals were available from other trials, this improvement was not consistent. This study confirms that complex data can be combined into a single analysis using GBLUP methods to improve understanding of G × E and also incorporate known QTL effects. In addition, the study generated baseline information to account for population structure in genomic prediction models in horticultural crop improvement.
Collapse
Affiliation(s)
- Craig M. Hardner
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Mulusew Fikere
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Ksenija Gasic
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, United States
| | - Cassia da Silva Linge
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, United States
| | - Margaret Worthington
- Faculty Horticulture, University of Arkansas System Division of Agriculture, Fayetteville, AR, United States
| | - David Byrne
- College of Agriculture and Life Sciences, Texas A&M University, College Station, TX, United States
| | - Zena Rawandoozi
- College of Agriculture and Life Sciences, Texas A&M University, College Station, TX, United States
| | - Cameron Peace
- Department of Horticulture, Washington State University, Pullman, WA, United States
| |
Collapse
|
48
|
Ros-Freixedes R, Johnsson M, Whalen A, Chen CY, Valente BD, Herring WO, Gorjanc G, Hickey JM. Genomic prediction with whole-genome sequence data in intensely selected pig lines. GENETICS SELECTION EVOLUTION 2022; 54:65. [PMID: 36153511 PMCID: PMC9509613 DOI: 10.1186/s12711-022-00756-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 09/05/2022] [Indexed: 12/03/2022]
Abstract
Background Early simulations indicated that whole-genome sequence data (WGS) could improve the accuracy of genomic predictions within and across breeds. However, empirical results have been ambiguous so far. Large datasets that capture most of the genomic diversity in a population must be assembled so that allele substitution effects are estimated with high accuracy. The objectives of this study were to use a large pig dataset from seven intensely selected lines to assess the benefits of using WGS for genomic prediction compared to using commercial marker arrays and to identify scenarios in which WGS provides the largest advantage. Methods We sequenced 6931 individuals from seven commercial pig lines with different numerical sizes. Genotypes of 32.8 million variants were imputed for 396,100 individuals (17,224 to 104,661 per line). We used BayesR to perform genomic prediction for eight complex traits. Genomic predictions were performed using either data from a standard marker array or variants preselected from WGS based on association tests. Results The accuracies of genomic predictions based on preselected WGS variants were not robust across traits and lines and the improvements in prediction accuracy that we achieved so far with WGS compared to standard marker arrays were generally small. The most favourable results for WGS were obtained when the largest training sets were available and standard marker arrays were augmented with preselected variants with statistically significant associations to the trait. With this method and training sets of around 80k individuals, the accuracy of within-line genomic predictions was on average improved by 0.025. With multi-line training sets, improvements of 0.04 compared to marker arrays could be expected. Conclusions Our results showed that WGS has limited potential to improve the accuracy of genomic predictions compared to marker arrays in intensely selected pig lines. Thus, although we expect that larger improvements in accuracy from the use of WGS are possible with a combination of larger training sets and optimised pipelines for generating and analysing such datasets, the use of WGS in the current implementations of genomic prediction should be carefully evaluated against the cost of large-scale WGS data on a case-by-case basis. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-022-00756-0.
Collapse
|
49
|
Mollandin F, Gilbert H, Croiseau P, Rau A. Accounting for overlapping annotations in genomic prediction models of complex traits. BMC Bioinformatics 2022; 23:365. [PMID: 36068513 PMCID: PMC9446854 DOI: 10.1186/s12859-022-04914-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 08/25/2022] [Indexed: 11/10/2022] Open
Abstract
Background It is now widespread in livestock and plant breeding to use genotyping data to predict phenotypes with genomic prediction models. In parallel, genomic annotations related to a variety of traits are increasing in number and granularity, providing valuable insight into potentially important positions in the genome. The BayesRC model integrates this prior biological information by factorizing the genome according to disjoint annotation categories, in some cases enabling improved prediction of heritable traits. However, BayesRC is not adapted to cases where markers may have multiple annotations. Results We propose two novel Bayesian approaches to account for multi-annotated markers through a cumulative (BayesRC+) or preferential (BayesRC\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\pi$$\end{document}π) model of the contribution of multiple annotation categories. We illustrate their performance on simulated data with various genetic architectures and types of annotations. We also explore their use on data from a backcross population of growing pigs in conjunction with annotations constructed using the PigQTLdb. In both simulated and real data, we observed a modest improvement in prediction quality with our models when used with informative annotations. In addition, our results show that BayesRC+ successfully prioritizes multi-annotated markers according to their posterior variance, while BayesRC\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\pi$$\end{document}π provides a useful interpretation of informative annotations for multi-annotated markers. Finally, we explore several strategies for constructing annotations from a public database, highlighting the importance of careful consideration of this step. Conclusion When used with annotations that are relevant to the trait under study, BayesRC\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\pi$$\end{document}π and BayesRC+ allow for improved prediction and prioritization of multi-annotated markers, and can provide useful biological insight into the genetic architecture of traits. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04914-5.
Collapse
Affiliation(s)
- Fanny Mollandin
- INRAE, AgroParisTech, GABI, Université Paris-Saclay, Allée de Vilvert, 78350, Jouy-en-Josas, France.
| | - Hélène Gilbert
- GenPhySE, INRAE, ENVT, Université de Toulouse, 31320, Castanet Tolosan, France
| | - Pascal Croiseau
- INRAE, AgroParisTech, GABI, Université Paris-Saclay, Allée de Vilvert, 78350, Jouy-en-Josas, France
| | - Andrea Rau
- INRAE, AgroParisTech, GABI, Université Paris-Saclay, Allée de Vilvert, 78350, Jouy-en-Josas, France.,BioEcoAgro Joint Research Unit, INRAE, Université de Liège, Université de Lille, Université de Picardie Jules Verne, 50136, Estrée-Mons, France
| |
Collapse
|
50
|
Bolormaa S, MacLeod IM, Khansefid M, Marett LC, Wales WJ, Miglior F, Baes CF, Schenkel FS, Connor EE, Manzanilla-Pech CIV, Stothard P, Herman E, Nieuwhof GJ, Goddard ME, Pryce JE. Sharing of either phenotypes or genetic variants can increase the accuracy of genomic prediction of feed efficiency. Genet Sel Evol 2022; 54:60. [PMID: 36068488 PMCID: PMC9450441 DOI: 10.1186/s12711-022-00749-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 08/17/2022] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Sharing individual phenotype and genotype data between countries is complex and fraught with potential errors, while sharing summary statistics of genome-wide association studies (GWAS) is relatively straightforward, and thus would be especially useful for traits that are expensive or difficult-to-measure, such as feed efficiency. Here we examined: (1) the sharing of individual cow data from international partners; and (2) the use of sequence variants selected from GWAS of international cow data to evaluate the accuracy of genomic estimated breeding values (GEBV) for residual feed intake (RFI) in Australian cows. RESULTS GEBV for RFI were estimated using genomic best linear unbiased prediction (GBLUP) with 50k or high-density single nucleotide polymorphisms (SNPs), from a training population of 3797 individuals in univariate to trivariate analyses where the three traits were RFI phenotypes calculated using 584 Australian lactating cows (AUSc), 824 growing heifers (AUSh), and 2526 international lactating cows (OVE). Accuracies of GEBV in AUSc were evaluated by either cohort-by-birth-year or fourfold random cross-validations. GEBV of AUSc were also predicted using only the AUS training population with a weighted genomic relationship matrix constructed with SNPs from the 50k array and sequence variants selected from a meta-GWAS that included only international datasets. The genomic heritabilities estimated using the AUSc, OVE and AUSh datasets were moderate, ranging from 0.20 to 0.36. The genetic correlations (rg) of traits between heifers and cows ranged from 0.30 to 0.95 but were associated with large standard errors. The mean accuracies of GEBV in Australian cows were up to 0.32 and almost doubled when either overseas cows, or both overseas cows and AUS heifers were included in the training population. They also increased when selected sequence variants were combined with 50k SNPs, but with a smaller relative increase. CONCLUSIONS The accuracy of RFI GEBV increased when international data were used or when selected sequence variants were combined with 50k SNP array data. This suggests that if direct sharing of data is not feasible, a meta-analysis of summary GWAS statistics could provide selected SNPs for custom panels to use in genomic selection programs. However, since this finding is based on a small cross-validation study, confirmation through a larger study is recommended.
Collapse
Affiliation(s)
| | - Iona M. MacLeod
- Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia
| | - Majid Khansefid
- Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia
| | - Leah C. Marett
- Agriculture Victoria Research, Ellinbank Centre, Ellinbank, Gippsland, VIC 3821 Australia
- School of Agriculture and Food, University of Melbourne, Parkville, VIC 3010 Australia
| | - William J. Wales
- Agriculture Victoria Research, Ellinbank Centre, Ellinbank, Gippsland, VIC 3821 Australia
- School of Agriculture and Food, University of Melbourne, Parkville, VIC 3010 Australia
| | - Filippo Miglior
- LACTANET, Sainte-Anne-de-Bellevue, QC H9X 3R4 Canada
- CGIL, University of Guelph, Guelph, ON N1G 2W1 Canada
| | - Christine F. Baes
- CGIL, University of Guelph, Guelph, ON N1G 2W1 Canada
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3002 Bern, Switzerland
| | | | - Erin E. Connor
- Animal Genomics and Improvement Laboratory, USDA, Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, MD 20705 USA
- Department of Animal and Food Sciences, University of Delaware, Newark, DE 19716 USA
| | | | - Paul Stothard
- Faculty of Agricultural, Life & Environmental Sciences, University of Alberta, Edmonton, AB T6G 2R3 Canada
| | - Emily Herman
- Faculty of Agricultural, Life & Environmental Sciences, University of Alberta, Edmonton, AB T6G 2R3 Canada
| | - Gert J. Nieuwhof
- Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia
- DataGene Ltd, Agribio, Bundoora, VIC 3083 Australia
| | - Michael E. Goddard
- Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia
- School of Veterinary and Agricultural Sciences, University of Melbourne, Parkville, VIC 3052 Australia
| | - Jennie E. Pryce
- Agriculture Victoria Research, Agribio, Bundoora, VIC 3083 Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083 Australia
| |
Collapse
|