1
|
van den Berg I, Chamberlain AJ, MacLeod IM, Nguyen TV, Goddard ME, Xiang R, Mason B, Meier S, Phyn CVC, Burke CR, Pryce JE. Using expression data to fine map QTL associated with fertility in dairy cattle. Genet Sel Evol 2024; 56:42. [PMID: 38844868 PMCID: PMC11154999 DOI: 10.1186/s12711-024-00912-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 05/13/2024] [Indexed: 06/09/2024] Open
Abstract
BACKGROUND Female fertility is an important trait in dairy cattle. Identifying putative causal variants associated with fertility may help to improve the accuracy of genomic prediction of fertility. Combining expression data (eQTL) of genes, exons, gene splicing and allele specific expression is a promising approach to fine map QTL to get closer to the causal mutations. Another approach is to identify genomic differences between cows selected for high and low fertility and a selection experiment in New Zealand has created exactly this resource. Our objective was to combine multiple types of expression data, fertility traits and allele frequency in high- (POS) and low-fertility (NEG) cows with a genome-wide association study (GWAS) on calving interval in Australian cows to fine-map QTL associated with fertility in both Australia and New Zealand dairy cattle populations. RESULTS Variants that were significantly associated with calving interval (CI) were strongly enriched for variants associated with gene, exon, gene splicing and allele-specific expression, indicating that there is substantial overlap between QTL associated with CI and eQTL. We identified 671 genes with significant differential expression between POS and NEG cows, with the largest fold change detected for the CCDC196 gene on chromosome 10. Our results provide numerous candidate genes associated with female fertility in dairy cattle, including GYS2 and TIGAR on chromosome 5 and SYT3 and HSD17B14 on chromosome 18. Multiple QTL regions were located in regions with large numbers of copy number variants (CNV). To identify the causal mutations for these variants, long read sequencing may be useful. CONCLUSIONS Variants that were significantly associated with CI were highly enriched for eQTL. We detected 671 genes that were differentially expressed between POS and NEG cows. Several QTL detected for CI overlapped with eQTL, providing candidate genes for fertility in dairy cattle.
Collapse
Affiliation(s)
- Irene van den Berg
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia.
| | - Amanda J Chamberlain
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
| | - Tuan V Nguyen
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
| | - Mike E Goddard
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
- Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Ruidong Xiang
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
- Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Brett Mason
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
| | | | | | | | - Jennie E Pryce
- Agriculture Victoria, AgriBio, Centre of AgriBioscience, 5 Ring Road, Bundoora, VIC, 3082, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| |
Collapse
|
2
|
Alemu A, Åstrand J, Montesinos-López OA, Isidro Y Sánchez J, Fernández-Gónzalez J, Tadesse W, Vetukuri RR, Carlsson AS, Ceplitis A, Crossa J, Ortiz R, Chawade A. Genomic selection in plant breeding: Key factors shaping two decades of progress. MOLECULAR PLANT 2024; 17:552-578. [PMID: 38475993 DOI: 10.1016/j.molp.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/14/2024]
Abstract
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.
Collapse
Affiliation(s)
- Admas Alemu
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Johanna Åstrand
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden; Lantmännen Lantbruk, Svalöv, Sweden
| | | | - Julio Isidro Y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Javier Fernández-Gónzalez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Wuletaw Tadesse
- International Center for Agricultural Research in the Dry Areas (ICARDA), Rabat, Morocco
| | - Ramesh R Vetukuri
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - Anders S Carlsson
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco, México 52640, Mexico
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| |
Collapse
|
3
|
Zhang Y, Zhuang Z, Liu Y, Huang J, Luan M, Zhao X, Dong L, Ye J, Yang M, Zheng E, Cai G, Wu Z, Yang J. Genomic prediction based on preselected single-nucleotide polymorphisms from genome-wide association study and imputed whole-genome sequence data annotation for growth traits in Duroc pigs. Evol Appl 2024; 17:e13651. [PMID: 38362509 PMCID: PMC10868536 DOI: 10.1111/eva.13651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 10/31/2023] [Accepted: 01/13/2024] [Indexed: 02/17/2024] Open
Abstract
The use of whole-genome sequence (WGS) data is expected to improve genomic prediction (GP) power of complex traits because it may contain mutations that in strong linkage disequilibrium pattern with causal mutations. However, a few previous studies have shown no or small improvement in prediction accuracy using WGS data. Incorporating prior biological information into GP seems to be an attractive strategy that might improve prediction accuracy. In this study, a total of 6334 pigs were genotyped using 50K chips and subsequently imputed to the WGS level. This cohort includes two prior discovery populations that comprise 294 Landrace pigs and 186 Duroc pigs, as well as two validation populations that consist of 3770 American Duroc pigs and 2084 Canadian Duroc pigs. Then we used annotation information and genome-wide association study (GWAS) from the WGS data to make GP for six growth traits in two Duroc pig populations. Based on variant annotation, we partitioned different genomic classes, such as intron, intergenic, and untranslated regions, for imputed WGS data. Based on GWAS results of WGS data, we obtained trait-associated single-nucleotide polymorphisms (SNPs). We then applied the genomic feature best linear unbiased prediction (GFBLUP) and genomic best linear unbiased prediction (GBLUP) models to estimate the genomic estimated breeding values for growth traits with these different variant panels, including six genomic classes and trait-associated SNPs. Compared with 50K chip data, GBLUP with imputed WGS data had no increase in prediction accuracy. Using only annotations resulted in no increase in prediction accuracy compared to GBLUP with 50K, but adding annotation information into the GFBLUP model with imputed WGS data could improve the prediction accuracy with increases of 0.00%-2.82%. In conclusion, a GFBLUP model that incorporated prior biological information might increase the advantage of using imputed WGS data for GP.
Collapse
Affiliation(s)
- Yuling Zhang
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Zhanwei Zhuang
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Yiyi Liu
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Jinyan Huang
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Menghao Luan
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Xiang Zhao
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Linsong Dong
- Guangdong Zhongxin Breeding Technology Co., LtdGuangzhouChina
| | - Jian Ye
- Guangdong Zhongxin Breeding Technology Co., LtdGuangzhouChina
| | - Ming Yang
- College of Animal Science and TechnologyZhongkai University of Agriculture and EngineeringGuangzhouChina
| | - Enqin Zheng
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Gengyuan Cai
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Zhenfang Wu
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
- Guangdong Zhongxin Breeding Technology Co., LtdGuangzhouChina
| | - Jie Yang
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| |
Collapse
|
4
|
Zhu D, Zhao Y, Zhang R, Wu H, Cai G, Wu Z, Wang Y, Hu X. Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population. Genet Sel Evol 2023; 55:72. [PMID: 37853325 PMCID: PMC10583454 DOI: 10.1186/s12711-023-00843-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 09/14/2023] [Indexed: 10/20/2023] Open
Abstract
BACKGROUND Although the accumulation of whole-genome sequencing (WGS) data has accelerated the identification of mutations underlying complex traits, its impact on the accuracy of genomic predictions is limited. Reliable genotyping data and pre-selected beneficial loci can be used to improve prediction accuracy. Previously, we reported a low-coverage sequencing genotyping method that yielded 11.3 million highly accurate single-nucleotide polymorphisms (SNPs) in pigs. Here, we introduce a method termed selective linkage disequilibrium pruning (SLDP), which refines the set of SNPs that show a large gain during prediction of complex traits using whole-genome SNP data. RESULTS We used the SLDP method to identify and select markers among millions of SNPs based on genome-wide association study (GWAS) prior information. We evaluated the performance of SLDP with respect to three real traits and six simulated traits with varying genetic architectures using two representative models (genomic best linear unbiased prediction and BayesR) on samples from 3579 Duroc boars. SLDP was determined by testing 180 combinations of two core parameters (GWAS P-value thresholds and linkage disequilibrium r2). The parameters for each trait were optimized in the training population by five fold cross-validation and then tested in the validation population. Similar to previous GWAS prior-based methods, the performance of SLDP was mainly affected by the genetic architecture of the traits analyzed. Specifically, SLDP performed better for traits controlled by major quantitative trait loci (QTL) or a small number of quantitative trait nucleotides (QTN). Compared with two commercial SNP chips, genotyping-by-sequencing data, and an unselected whole-genome SNP panel, the SLDP strategy led to significant improvements in prediction accuracy, which ranged from 0.84 to 3.22% for real traits controlled by major or moderate QTL and from 1.23 to 11.47% for simulated traits controlled by a small number of QTN. CONCLUSIONS The SLDP marker selection method can be incorporated into mainstream prediction models to yield accuracy improvements for traits with a relatively simple genetic architecture, however, it has no significant advantage for traits not controlled by major QTL. The main factors that affect its performance are the genetic architecture of traits and the reliability of GWAS prior information. Our findings can facilitate the application of WGS-based genomic selection.
Collapse
Affiliation(s)
- Di Zhu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Yiqiang Zhao
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Ran Zhang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Hanyu Wu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
- National Research Facility for Phenotypic and Genotypic Analysis of Model Animals (Beijing), China Agricultural University, Beijing, China
| | - Gengyuan Cai
- National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China
| | - Zhenfang Wu
- National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China.
| | - Yuzhe Wang
- National Research Facility for Phenotypic and Genotypic Analysis of Model Animals (Beijing), China Agricultural University, Beijing, China.
| | - Xiaoxiang Hu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China.
| |
Collapse
|
5
|
Della Coletta R, Fernandes SB, Monnahan PJ, Mikel MA, Bohn MO, Lipka AE, Hirsch CN. Importance of genetic architecture in marker selection decisions for genomic prediction. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:220. [PMID: 37819415 DOI: 10.1007/s00122-023-04469-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 09/25/2023] [Indexed: 10/13/2023]
Abstract
KEY MESSAGE We demonstrate potential for improved multi-environment genomic prediction accuracy using structural variant markers. However, the degree of observed improvement is highly dependent on the genetic architecture of the trait. Breeders commonly use genetic markers to predict the performance of untested individuals as a way to improve the efficiency of breeding programs. These genomic prediction models have almost exclusively used single nucleotide polymorphisms (SNPs) as their source of genetic information, even though other types of markers exist, such as structural variants (SVs). Given that SVs are associated with environmental adaptation and not all of them are in linkage disequilibrium to SNPs, SVs have the potential to bring additional information to multi-environment prediction models that are not captured by SNPs alone. Here, we evaluated different marker types (SNPs and/or SVs) on prediction accuracy across a range of genetic architectures for simulated traits across multiple environments. Our results show that SVs can improve prediction accuracy, but it is highly dependent on the genetic architecture of the trait and the relative gain in accuracy is minimal. When SVs are the only causative variant type, 70% of the time SV predictors outperform SNP predictors. However, the improvement in accuracy in these instances is only 1.5% on average. Further simulations with predictors in varying degrees of LD with causative variants of different types (e.g., SNPs, SVs, SNPs and SVs) showed that prediction accuracy increased as linkage disequilibrium between causative variants and predictors increased regardless of the marker type. This study demonstrates that knowing the genetic architecture of a trait in deciding what markers to use in large-scale genomic prediction modeling in a breeding program is more important than what types of markers to use.
Collapse
Affiliation(s)
- Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Samuel B Fernandes
- Department of Crop, Soil and Environmental Sciences at University of Arkansas, Fayetteville, AR, 72701, USA
| | - Patrick J Monnahan
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Mark A Mikel
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Martin O Bohn
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Alexander E Lipka
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA.
| |
Collapse
|
6
|
Jang S, Ros-Freixedes R, Hickey JM, Chen CY, Holl J, Herring WO, Misztal I, Lourenco D. Using pre-selected variants from large-scale whole-genome sequence data for single-step genomic predictions in pigs. Genet Sel Evol 2023; 55:55. [PMID: 37495982 PMCID: PMC10373252 DOI: 10.1186/s12711-023-00831-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 07/18/2023] [Indexed: 07/28/2023] Open
Abstract
BACKGROUND Whole-genome sequence (WGS) data harbor causative variants that may not be present in standard single nucleotide polymorphism (SNP) chip data. The objective of this study was to investigate the impact of using preselected variants from WGS for single-step genomic predictions in maternal and terminal pig lines with up to 1.8k sequenced and 104k sequence imputed animals per line. METHODS Two maternal and four terminal lines were investigated for eight and seven traits, respectively. The number of sequenced animals ranged from 1365 to 1491 for the maternal lines and 381 to 1865 for the terminal lines. Imputation to sequence occurred within each line for 66k to 76k animals for the maternal lines and 29k to 104k animals for the terminal lines. Two preselected SNP sets were generated based on a genome-wide association study (GWAS). Top40k included the SNPs with the lowest p-value in each of the 40k genomic windows, and ChipPlusSign included significant variants integrated into the porcine SNP chip used for routine genotyping. We compared the performance of single-step genomic predictions between using preselected SNP sets assuming equal or different variances and the standard porcine SNP chip. RESULTS In the maternal lines, ChipPlusSign and Top40k showed an average increase in accuracy of 0.6 and 4.9%, respectively, compared to the regular porcine SNP chip. The greatest increase was obtained with Top40k, particularly for fertility traits, for which the initial accuracy based on the standard SNP chip was low. However, in the terminal lines, Top40k resulted in an average loss of accuracy of 1%. ChipPlusSign provided a positive, although small, gain in accuracy (0.9%). Assigning different variances for the SNPs slightly improved accuracies when using variances obtained from BayesR. However, increases were inconsistent across the lines and traits. CONCLUSIONS The benefit of using sequence data depends on the line, the size of the genotyped population, and how the WGS variants are preselected. When WGS data are available on hundreds of thousands of animals, using sequence data presents an advantage but this remains limited in pigs.
Collapse
Affiliation(s)
- Sungbong Jang
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.
| | - Roger Ros-Freixedes
- Departament de Ciència Animal, Universitat de Lleida-Agrotecnio-CERCA Center, Lleida, Spain
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Ching-Yi Chen
- The Pig Improvement Company, Genus Plc, Hendersonville, TN, USA
| | - Justin Holl
- The Pig Improvement Company, Genus Plc, Hendersonville, TN, USA
| | | | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| |
Collapse
|
7
|
Zhang R, Zhang Y, Liu T, Jiang B, Li Z, Qu Y, Chen Y, Li Z. Utilizing Variants Identified with Multiple Genome-Wide Association Study Methods Optimizes Genomic Selection for Growth Traits in Pigs. Animals (Basel) 2023; 13:ani13040722. [PMID: 36830509 PMCID: PMC9952664 DOI: 10.3390/ani13040722] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 02/09/2023] [Accepted: 02/15/2023] [Indexed: 02/22/2023] Open
Abstract
Improving the prediction accuracies of economically important traits in genomic selection (GS) is a main objective for researchers and breeders in the livestock industry. This study aims at utilizing potentially functional SNPs and QTLs identified with various genome-wide association study (GWAS) models in GS of pig growth traits. We used three well-established GWAS methods, including the mixed linear model, Bayesian model and meta-analysis, as well as 60K SNP-chip and whole genome sequence (WGS) data from 1734 Yorkshire and 1123 Landrace pigs to detect SNPs related to four growth traits: average daily gain, backfat thickness, body weight and birth weight. A total of 1485 significant loci and 24 candidate genes which are involved in skeletal muscle development, fatty deposition, lipid metabolism and insulin resistance were identified. Compared with using all SNP-chip data, GS with the pre-selected functional SNPs in the standard genomic best linear unbiased prediction (GBLUP), and a two-kernel based GBLUP model yielded average gains in accuracy by 4 to 46% (from 0.19 ± 0.07 to 0.56 ± 0.07) and 5 to 27% (from 0.16 ± 0.06 to 0.57 ± 0.05) for the four traits, respectively, suggesting that the prioritization of preselected functional markers in GS models had the potential to improve prediction accuracies for certain traits in livestock breeding.
Collapse
Affiliation(s)
- Ruifeng Zhang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Yi Zhang
- Institute of Neuroscience, Panzhihua University, Panzhihua 617000, China
| | - Tongni Liu
- Genetic Data Center, Faculty of Forestry, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Bo Jiang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Zhenyang Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Youping Qu
- Guangdong IPIG Technology Co., Ltd., Guangzhou 510006, China
| | - Yaosheng Chen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Zhengcao Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
- Correspondence:
| |
Collapse
|
8
|
Jones HE, Wilson PB. Progress and opportunities through use of genomics in animal production. Trends Genet 2022; 38:1228-1252. [PMID: 35945076 DOI: 10.1016/j.tig.2022.06.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 06/08/2022] [Accepted: 06/17/2022] [Indexed: 01/24/2023]
Abstract
The rearing of farmed animals is a vital component of global food production systems, but its impact on the environment, human health, animal welfare, and biodiversity is being increasingly challenged. Developments in genetic and genomic technologies have had a key role in improving the productivity of farmed animals for decades. Advances in genome sequencing, annotation, and editing offer a means not only to continue that trend, but also, when combined with advanced data collection, analytics, cloud computing, appropriate infrastructure, and regulation, to take precision livestock farming (PLF) and conservation to an advanced level. Such an approach could generate substantial additional benefits in terms of reducing use of resources, health treatments, and environmental impact, while also improving animal health and welfare.
Collapse
Affiliation(s)
- Huw E Jones
- UK Genetics for Livestock and Equines (UKGLE) Committee, Department for Environment, Food and Rural Affairs, Nobel House, 17 Smith Square, London, SW1P 3JR, UK; Nottingham Trent University, Brackenhurst Campus, Brackenhurst Lane, Southwell, NG25 0QF, UK.
| | - Philippe B Wilson
- UK Genetics for Livestock and Equines (UKGLE) Committee, Department for Environment, Food and Rural Affairs, Nobel House, 17 Smith Square, London, SW1P 3JR, UK; Nottingham Trent University, Brackenhurst Campus, Brackenhurst Lane, Southwell, NG25 0QF, UK
| |
Collapse
|
9
|
Tahir MS, Porto-Neto LR, Reverter-Gomez T, Olasege BS, Sajid MR, Wockner KB, Tan AWL, Fortes MRS. Utility of multi-omics data to inform genomic prediction of heifer fertility traits. J Anim Sci 2022; 100:skac340. [PMID: 36239447 PMCID: PMC9733504 DOI: 10.1093/jas/skac340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 10/12/2022] [Indexed: 12/15/2022] Open
Abstract
Biologically informed single nucleotide polymorphisms (SNPs) impact genomic prediction accuracy of the target traits. Our previous genomics, proteomics, and transcriptomics work identified candidate genes related to puberty and fertility in Brahman heifers. We aimed to test this biological information for capturing heritability and predicting heifer fertility traits in another breed i.e., Tropical Composite. The SNP from the identified genes including 10 kilobases (kb) region on either side were selected as biologically informed SNP set. The SNP from the rest of the Bos taurus genes including 10-kb region on either side were selected as biologically uninformed SNP set. Bovine high-density (HD) complete SNP set (628,323 SNP) was used as a control. Two populations-Tropical Composites (N = 1331) and Brahman (N = 2310)-had records for three traits: pregnancy after first mating season (PREG1, binary), first conception score (FCS, score 1 to 3), and rebreeding score (REB, score 1 to 3.5). Using the best linear unbiased prediction method, effectiveness of each SNP set to predict the traits was tested in two scenarios: a 5-fold cross-validation within Tropical Composites using biological information from Brahman studies, and application of prediction equations from one breed to the other. The accuracy of prediction was calculated as the correlation between genomic estimated breeding values and adjusted phenotypes. Results show that biologically informed SNP set estimated heritabilities not significantly better than the control HD complete SNP set in Tropical Composites; however, it captured all the observed genetic variance in PREG1 and FCS when modeled together with the biologically uninformed SNP set. In 5-fold cross-validation within Tropical Composites, the biologically informed SNP set performed marginally better (statistically insignificant) in terms of prediction accuracies (PREG1: 0.20, FCS: 0.13, and REB: 0.12) as compared to HD complete SNP set (PREG1: 0.17, FCS: 0.10, and REB: 0.11), and biologically uninformed SNP set (PREG1: 0.16, FCS: 0.10, and REB: 0.11). Across-breed use of prediction equations still remained a challenge: accuracies by all SNP sets dropped to around zero for all traits. The performance of biologically informed SNP was not significantly better than other sets in Tropical Composites. However, results indicate that biological information obtained from Brahman was successful to predict the fertility traits in Tropical Composite population.
Collapse
Affiliation(s)
- Muhammad S Tahir
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| | - Laercio R Porto-Neto
- Commonwealth Scientific and Industrial Research Organization, St. Lucia, Brisbane 4072, QLD, Australia
| | - Toni Reverter-Gomez
- Commonwealth Scientific and Industrial Research Organization, St. Lucia, Brisbane 4072, QLD, Australia
| | - Babatunde S Olasege
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| | - Mirza R Sajid
- Department of Statistics, University of Gujrat, 50700 Punjab, Pakistan
| | - Kimberley B Wockner
- Queensland Department of Agriculture and Fisheries, Brisbane 4072, QLD, Australia
| | - Andre W L Tan
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| | - Marina R S Fortes
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| |
Collapse
|
10
|
Ros-Freixedes R, Johnsson M, Whalen A, Chen CY, Valente BD, Herring WO, Gorjanc G, Hickey JM. Genomic prediction with whole-genome sequence data in intensely selected pig lines. GENETICS SELECTION EVOLUTION 2022; 54:65. [PMID: 36153511 PMCID: PMC9509613 DOI: 10.1186/s12711-022-00756-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 09/05/2022] [Indexed: 12/03/2022]
Abstract
Background Early simulations indicated that whole-genome sequence data (WGS) could improve the accuracy of genomic predictions within and across breeds. However, empirical results have been ambiguous so far. Large datasets that capture most of the genomic diversity in a population must be assembled so that allele substitution effects are estimated with high accuracy. The objectives of this study were to use a large pig dataset from seven intensely selected lines to assess the benefits of using WGS for genomic prediction compared to using commercial marker arrays and to identify scenarios in which WGS provides the largest advantage. Methods We sequenced 6931 individuals from seven commercial pig lines with different numerical sizes. Genotypes of 32.8 million variants were imputed for 396,100 individuals (17,224 to 104,661 per line). We used BayesR to perform genomic prediction for eight complex traits. Genomic predictions were performed using either data from a standard marker array or variants preselected from WGS based on association tests. Results The accuracies of genomic predictions based on preselected WGS variants were not robust across traits and lines and the improvements in prediction accuracy that we achieved so far with WGS compared to standard marker arrays were generally small. The most favourable results for WGS were obtained when the largest training sets were available and standard marker arrays were augmented with preselected variants with statistically significant associations to the trait. With this method and training sets of around 80k individuals, the accuracy of within-line genomic predictions was on average improved by 0.025. With multi-line training sets, improvements of 0.04 compared to marker arrays could be expected. Conclusions Our results showed that WGS has limited potential to improve the accuracy of genomic predictions compared to marker arrays in intensely selected pig lines. Thus, although we expect that larger improvements in accuracy from the use of WGS are possible with a combination of larger training sets and optimised pipelines for generating and analysing such datasets, the use of WGS in the current implementations of genomic prediction should be carefully evaluated against the cost of large-scale WGS data on a case-by-case basis. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-022-00756-0.
Collapse
|
11
|
Ramstein GP, Buckler ES. Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize. Genome Biol 2022; 23:183. [PMID: 36050782 PMCID: PMC9438327 DOI: 10.1186/s13059-022-02747-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 08/15/2022] [Indexed: 11/10/2022] Open
Abstract
Background Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations. Results Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants. Conclusions Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach—Prediction of mutation Impact by Calibrated Nucleotide Conservation (PICNC)—could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse (10.25739/hybz-2957). Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02747-2.
Collapse
Affiliation(s)
- Guillaume P Ramstein
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000, Aarhus, Denmark. .,Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA.
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA.,USDA-ARS, Ithaca, NY, 14853, USA
| |
Collapse
|
12
|
Nawaz MY, Bernardes PA, Savegnago RP, Lim D, Lee SH, Gondro C. Evaluation of Whole-Genome Sequence Imputation Strategies in Korean Hanwoo Cattle. Animals (Basel) 2022; 12:ani12172265. [PMID: 36077985 PMCID: PMC9454883 DOI: 10.3390/ani12172265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 08/25/2022] [Accepted: 08/30/2022] [Indexed: 11/29/2022] Open
Abstract
Simple Summary In this study, we evaluated various imputation strategies for the Korean Hanwoo cattle. We observed that a large reference panel consisting of many cattle breeds did not improve the imputation accuracy when compared to a proportionally small purebred Hanwoo reference. This was because the multi-breed reference did not contain animals sufficiently related to the Hanwoo to improve the accuracies and, although not detrimental, in effect, only added to the computational burden of the imputation. Despite the large multi-breed reference, when the Hanwoo were removed from the reference, the imputation accuracies were low. These results suggest additional sequencing efforts are needed for underrepresented breeds, particularly those less genetically related to the main European breeds. Abstract This study evaluated the accuracy of sequence imputation in Hanwoo beef cattle using different reference panels: a large multi-breed reference with no Hanwoo (n = 6269), a much smaller Hanwoo purebred reference (n = 88), and both datasets combined (n = 6357). The target animals were 136 cattle both sequenced and genotyped with the Illumina BovineSNP50 v2 (50K). The average imputation accuracy measured by the Pearson correlation (R) was 0.695 with the multi-breed reference, 0.876 with the purebred Hanwoo, and 0.887 with the combined data; the average concordance rates (CR) were 88.16%, 94.49%, and 94.84%, respectively. The accuracy gains from adding a large multi-breed reference of 6269 samples to only 88 Hanwoo was marginal; however, the concordance rate for the heterozygotes decreased from 85% to 82%, and the concordance rate for fixed SNPs in Hanwoo also decreased from 99.98% to 98.73%. Although the multi-breed panel was large, it was not sufficiently representative of the breed for accurate imputation without the Hanwoo animals. Additionally, we evaluated the value of high-density 700K genotypes (n = 991) as an intermediary step in the imputation process. The imputation accuracy differences were negligible between a single-step imputation strategy from 50K directly to sequence and a two-step imputation approach (50K-700K-sequence). We also observed that imputed sequence data can be used as a reference panel for imputation (mean R = 0.9650, mean CR = 98.35%). Finally, we identified 31 poorly imputed genomic regions in the Hanwoo genome and demonstrated that imputation accuracies were particularly lower at the chromosomal ends.
Collapse
Affiliation(s)
- Muhammad Yasir Nawaz
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI 48824, USA
- Correspondence: (M.Y.N.); (C.G.)
| | - Priscila Arrigucci Bernardes
- Department of Animal Science and Rural Development, Federal University of Santa Catarina, Florianopolis 88034-000, SC, Brazil
| | | | - Dajeong Lim
- Animal Genome & Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea
| | - Seung Hwan Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 305764, Korea
| | - Cedric Gondro
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
- Correspondence: (M.Y.N.); (C.G.)
| |
Collapse
|
13
|
Yoshida GM, Yáñez JM. Increased accuracy of genomic predictions for growth under chronic thermal stress in rainbow trout by prioritizing variants from GWAS using imputed sequence data. Evol Appl 2022; 15:537-552. [PMID: 35505881 PMCID: PMC9046923 DOI: 10.1111/eva.13240] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 04/01/2021] [Accepted: 04/03/2021] [Indexed: 02/07/2023] Open
Abstract
Through imputation of genotypes, genome-wide association study (GWAS) and genomic prediction (GP) using whole-genome sequencing (WGS) data are cost-efficient and feasible in aquaculture breeding schemes. The objective was to dissect the genetic architecture of growth traits under chronic heat stress in rainbow trout (Oncorhynchus mykiss) and to assess the accuracy of GP based on imputed WGS and different preselected single nucleotide polymorphism (SNP) arrays. A total of 192 and 764 fish challenged to a heat stress experiment for 62 days were genotyped using a customized 1 K and 26 K SNP panels, respectively, and then, genotype imputation was performed from a low-density chip to WGS using 102 parents (36 males and 66 females) as the reference population. Imputed WGS data were used to perform GWAS and test GP accuracy under different preselected SNP scenarios. Heritability was estimated for body weight (BW), body length (BL) and average daily gain (ADG). Estimates using imputed WGS data ranged from 0.33 ± 0.05 to 0.55 ± 0.05 for growth traits under chronic heat stress. GWAS revealed that the top five cumulatively SNPs explained a maximum of 0.94%, 0.86% and 0.51% of genetic variance for BW, BL and ADG, respectively. Some important functional candidate genes associated with growth-related traits were found among the most important SNPs, including signal transducer and activator of transcription 5B and 3 (STAT5B and STAT3, respectively) and cytokine-inducible SH2-containing protein (CISH). WGS data resulted in a slight increase in prediction accuracy compared with pedigree-based method, whereas preselected SNPs based on the top GWAS hits improved prediction accuracies, with values ranging from 1.2 to 13.3%. Our results support the evidence of the polygenic nature of growth traits when measured under heat stress. The accuracies of GP can be improved using preselected variants from GWAS, and the use of WGS marginally increases prediction accuracy.
Collapse
Affiliation(s)
| | - José M. Yáñez
- Facultad de Ciencias Veterinarias y PecuariasUniversidad de ChileSantiagoChile
- Núcleo Milenio INVASALConcepciónChile
| |
Collapse
|
14
|
van den Berg I, Ho PN, Nguyen TV, Haile-Mariam M, MacLeod IM, Beatson PR, O'Connor E, Pryce JE. GWAS and genomic prediction of milk urea nitrogen in Australian and New Zealand dairy cattle. Genet Sel Evol 2022; 54:15. [PMID: 35183113 PMCID: PMC8858489 DOI: 10.1186/s12711-022-00707-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 01/31/2022] [Indexed: 11/24/2022] Open
Abstract
Background Urinary nitrogen leakage is an environmental concern in dairy cattle. Selection for reduced urinary nitrogen leakage may be done using indicator traits such as milk urea nitrogen (MUN). The result of a previous study indicated that the genetic correlation between MUN in Australia (AUS) and MUN in New Zealand (NZL) was only low to moderate (between 0.14 and 0.58). In this context, an alternative is to select sequence variants based on genome-wide association studies (GWAS) with a view to improve genomic prediction accuracies. A GWAS can also be used to detect quantitative trait loci (QTL) associated with MUN. Therefore, our objectives were to perform within-country GWAS and a meta-GWAS for MUN using records from up to 33,873 dairy cows and imputed whole-genome sequence data, to compare QTL detected in the GWAS for MUN in AUS and NZL, and to use sequence variants selected from the meta-GWAS to improve the prediction accuracy for MUN based on a joint AUS-NZL reference set. Results Using the meta-GWAS, we detected 14 QTL for MUN, located on chromosomes 1, 6, 11, 14, 19, 22, 26 and the X chromosome. The three most significant QTL encompassed the casein genes on chromosome 6, PAEP on chromosome 11 and DGAT1 on chromosome 14. We selected 50,000 sequence variants that had the same direction of effect for MUN in AUS and MUN in NZL and that were most significant in the meta-analysis for the GWAS. The selected sequence variants yielded a genetic correlation between MUN in AUS and MUN in NZL of 0.95 and substantially increased prediction accuracy in both countries. Conclusions Our results demonstrate how the sharing of data between two countries can increase the power of a GWAS and increase the accuracy of genomic prediction using a multi-country reference population and sequence variants selected based on a meta-GWAS. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-022-00707-9.
Collapse
Affiliation(s)
- Irene van den Berg
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia.
| | - Phuong N Ho
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia
| | - Tuan V Nguyen
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia
| | - Mekonnen Haile-Mariam
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia
| | - Iona M MacLeod
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia
| | | | | | - Jennie E Pryce
- Centre for AgriBioscience, Agriculture Victoria, 5 Ring Road, Bundoora, AgriBioVIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| |
Collapse
|
15
|
Cheruiyot EK, Haile-Mariam M, Cocks BG, MacLeod IM, Mrode R, Pryce JE. Functionally prioritised whole-genome sequence variants improve the accuracy of genomic prediction for heat tolerance. Genet Sel Evol 2022; 54:17. [PMID: 35183109 PMCID: PMC8858496 DOI: 10.1186/s12711-022-00708-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 02/03/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Heat tolerance is a trait of economic importance in the context of warm climates and the effects of global warming on livestock production, reproduction, health, and well-being. This study investigated the improvement in prediction accuracy for heat tolerance when selected sets of sequence variants from a large genome-wide association study (GWAS) were combined with a standard 50k single nucleotide polymorphism (SNP) panel used by the dairy industry. METHODS Over 40,000 dairy cattle with genotype and phenotype data were analysed. The phenotypes used to measure an individual's heat tolerance were defined as the rate of decline in milk production traits with rising temperature and humidity. We used Holstein and Jersey cows to select sequence variants linked to heat tolerance. The prioritised sequence variants were the most significant SNPs passing a GWAS p-value threshold selected based on sliding 100-kb windows along each chromosome. We used a bull reference set to develop the genomic prediction equations, which were then validated in an independent set of Holstein, Jersey, and crossbred cows. Prediction analyses were performed using the BayesR, BayesRC, and GBLUP methods. RESULTS The accuracy of genomic prediction for heat tolerance improved by up to 0.07, 0.05, and 0.10 units in Holstein, Jersey, and crossbred cows, respectively, when sets of selected sequence markers from Holstein cows were added to the 50k SNP panel. However, in some scenarios, the prediction accuracy decreased unexpectedly with the largest drop of - 0.10 units for the heat tolerance fat yield trait observed in Jersey cows when 50k plus pre-selected SNPs from Holstein cows were used. Using pre-selected SNPs discovered on a combined set of Holstein and Jersey cows generally improved the accuracy, especially in the Jersey validation. In addition, combining Holstein and Jersey bulls in the reference set generally improved prediction accuracy in most scenarios compared to using only Holstein bulls as the reference set. CONCLUSIONS Informative sequence markers can be prioritised to improve the genomic prediction of heat tolerance in different breeds. In addition to providing biological insight, these variants could also have a direct application for developing customized SNP arrays or can be used via imputation in current industry SNP panels.
Collapse
Affiliation(s)
- Evans K Cheruiyot
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia.,Agriculture Victoria Research, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia
| | - Mekonnen Haile-Mariam
- Agriculture Victoria Research, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.
| | - Benjamin G Cocks
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia.,Agriculture Victoria Research, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia
| | - Iona M MacLeod
- Agriculture Victoria Research, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia
| | - Raphael Mrode
- International Livestock Research Institute, Nairobi, Kenya.,Scotland's Rural College, Edinburgh, UK
| | - Jennie E Pryce
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia.,Agriculture Victoria Research, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia
| |
Collapse
|
16
|
Richardson C, Amer P, Quinton C, Crowley J, Hely F, van den Berg I, Pryce J. Reducing greenhouse gas emissions through genetic selection in the Australian dairy industry. J Dairy Sci 2022; 105:4272-4288. [DOI: 10.3168/jds.2021-21277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 12/22/2021] [Indexed: 11/19/2022]
|
17
|
Guillenea A, Su G, Lund MS, Karaman E. Genomic prediction in Nordic Red dairy cattle considering breed origin of alleles. J Dairy Sci 2022; 105:2426-2438. [PMID: 35033341 DOI: 10.3168/jds.2021-21173] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 11/23/2021] [Indexed: 01/02/2023]
Abstract
This study investigated the reliability of genomic prediction (GP) using breed origin of alleles (BOA) approach in the Nordic Red (RDC) population, which has an admixed population structure. The RDC population consists of animals with varying degrees of genetic materials from the Danish Red (RDM), Swedish Red (SRB), Finnish Ayrshire (FAY), and Holstein (HOL) because bulls have been used across the breeds. The BOA approach was tested using 39,550 RDC animals in the reference population and 11,786 in the validation population. Deregressed proofs (DRP) of milk, fat and protein were used as response variable for GP. Direct genomic breeding values (DGV) for animals in the validation population were calculated with (BOA model) or without (joint model) considering breed origin of alleles. The joint model assumed homogeneous marker effects and a single set of marker effects were estimated, whereas BOA model assumed heterogeneous marker effects, and different sets of marker effects were estimated across the breeds. For the BOA approach, we tested scenarios assuming both correlated (BOA_cor) and uncorrelated (BOA_uncor) marker effects between the breeds. Additionally, we investigated GP using a standard Illumina 50K chip and including SNP selected from imputed whole-genome sequencing (50K+WGS). We also studied the effect of estimating (co)variances for genome regions of different sizes to exploit the information of the genome regions contributing to the (co)variance between the breeds. Region sizes were set as 1 SNP, a group of 30 or 100 adjacent SNP, or the whole genome. Reliability of DGV was measured as squared correlations between DGV and DRP divided by the reliability of DRP. Across the 3 traits, in general, RS30 and RS100 SNP yielded the highest reliabilities. Including WGS SNP improved reliabilities in almost all scenarios (0.297 on average for 50K and 0.307 on average for 50K+WGS). The BOA_uncor (0.233 on average) was inferior to the joint model (0.339 on average), but the reliabilities obtained using BOA_cor (0.334 on average) in most cases were not significantly different from those obtained using the joint model. The results indicate that both including additional whole-genome sequencing SNP and dividing the genome into fixed regions improve GP in the RDC. The BOA models have the potential to increase the reliability of GP, but the benefit is limited in populations with a high exchange of genetic material for a long time, as is the case for RDC.
Collapse
Affiliation(s)
- Ana Guillenea
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark.
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| | - Mogens Sand Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| | - Emre Karaman
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| |
Collapse
|
18
|
Bedhane M, van der Werf J, de las Heras-Saldana S, Lim D, Park B, Na Park M, Seung Hee R, Clark S. The accuracy of genomic prediction for meat quality traits in Hanwoo cattle when using genotypes from different SNP densities and preselected variants from imputed whole genome sequence. ANIMAL PRODUCTION SCIENCE 2022. [DOI: 10.1071/an20659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Context
Genomic prediction is the use of genomic data in the estimation of genomic breeding values (GEBV) in animal breeding. In beef cattle breeding programs, genomic prediction increases the rates of genetic gain by increasing the accuracy of selection at earlier ages.
Aims
The objectives of the study were to examine the effect of single-nucleotide polymorphism (SNP) density and to evaluate the effect of using SNPs preselected from imputed whole-genome sequence for genomic prediction.
Methods
Genomic and phenotypic data from 2110 Hanwoo steers were used to predict GEBV for marbling score (MS), meat texture (MT), and meat colour (MC) traits. Three types of SNP densities including 50k, high-density (HD), and whole-genome sequence data and preselected SNPs from genome-wide association study (GWAS) were used for genomic prediction analyses. Two scenarios (independent and dependent discovery populations) were used to select top significant SNPs. The accuracy of GEBV was assessed using random cross-validation. Genomic best linear unbiased prediction (GBLUP) was used to predict the breeding values for each trait.
Key results
Our result showed that very similar prediction accuracies were observed across all SNP densities used in the study. The prediction accuracy among traits ranged from 0.29±0.05 for MC to 0.46±0.04 for MS. Depending on the studied traits, up to 5% of prediction accuracy improvement was obtained when the preselected SNPs from GWAS analysis were included in the prediction analysis.
Conclusions
High SNP density such as HD and the whole-genome sequence data yielded a similar prediction accuracy in Hanwoo beef cattle. Therefore, the 50K SNP chip panel is sufficient to capture the relationships in a breed with a small effective population size such as the Hanwoo cattle population. Preselected variants improved prediction accuracy when they were included in the genomic prediction model.
Implications
The estimated genomic prediction accuracies are moderately accurate in Hanwoo cattle and for searching for SNPs that are more productive could increase the accuracy of estimated breeding values for the studied traits.
Collapse
|
19
|
Mesbah-Uddin M, Guldbrandtsen B, Capitan A, Lund MS, Boichard D, Sahana G. Genome-wide association study with imputed whole-genome sequence variants including large deletions for female fertility in 3 Nordic dairy cattle breeds. J Dairy Sci 2021; 105:1298-1313. [PMID: 34955274 DOI: 10.3168/jds.2021-20655] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Accepted: 09/22/2021] [Indexed: 11/19/2022]
Abstract
Fertility is an economically important trait in livestock. Poor fertility in dairy cattle can be due to loss-of-function variants affecting any essential gene that causes early embryonic mortality in homozygotes. To identify fertility-associated quantitative trait loci, we performed single-marker association analyses for 8 fertility traits in Holstein, Jersey, and Nordic Red Dairy cattle using imputed whole-genome sequence variants including SNPs, indels, and large deletion. We then performed stepwise selection of independent markers from GWAS loci using conditional and joint association analyses. From single-marker analyses for fertility traits, we reported genome-wide significant associations of 30,384 SNPs, 178 indels, and 3 deletions in Holstein; 23,481 SNPs, 189 indels, and 13 deletions in Nordic Red; and 17 SNPs in Jersey cattle. Conditional and joint association analyses identified 37 and 23 independent associations in Holstein and Nordic Red Dairy cattle, respectively. Fertility-associated GWAS loci were enriched for developmental and cellular processes (Gene Ontology enrichment, false discovery rate < 0.05). For these quantitative trait loci regions (top marker and 500 kb of surrounding regions), we proposed several candidate genes with functional annotations corresponding to embryonic lethality and various fertility-related phenotypes in mouse and cattle. The inclusion of these top markers in future releases of the custom SNP chip used for genomic evaluations will enable their validation in independent populations and improve the accuracy of genomic predictions.
Collapse
Affiliation(s)
- Md Mesbah-Uddin
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark; Génétique Animale et Biologie Intégrative (GABI), Institut national de recherche pour l'agriculture, l'alimentation et l'environnement (INRAE), AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Bernt Guldbrandtsen
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| | - Aurélien Capitan
- Génétique Animale et Biologie Intégrative (GABI), Institut national de recherche pour l'agriculture, l'alimentation et l'environnement (INRAE), AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France; Allice, 75595 Paris, France
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| | - Didier Boichard
- Génétique Animale et Biologie Intégrative (GABI), Institut national de recherche pour l'agriculture, l'alimentation et l'environnement (INRAE), AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark.
| |
Collapse
|
20
|
Lopez BIM, An N, Srikanth K, Lee S, Oh JD, Shin DH, Park W, Chai HH, Park JE, Lim D. Genomic Prediction Based on SNP Functional Annotation Using Imputed Whole-Genome Sequence Data in Korean Hanwoo Cattle. Front Genet 2021; 11:603822. [PMID: 33552124 PMCID: PMC7859490 DOI: 10.3389/fgene.2020.603822] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 11/09/2020] [Indexed: 12/12/2022] Open
Abstract
Whole-genome sequence (WGS) data are increasingly being applied into genomic predictions, offering a higher predictive ability by including causal mutations or single-nucleotide polymorphisms (SNPs) putatively in strong linkage disequilibrium with causal mutations affecting the trait. This study aimed to improve the predictive performance of the customized Hanwoo 50 k SNP panel for four carcass traits in commercial Hanwoo population by adding highly predictive variants from sequence data. A total of 16,892 Hanwoo cattle with phenotypes (i.e., backfat thickness, carcass weight, longissimus muscle area, and marbling score), 50 k genotypes, and WGS imputed genotypes were used. We partitioned imputed WGS data according to functional annotation [intergenic (IGR), intron (ITR), regulatory (REG), synonymous (SYN), and non-synonymous (NSY)] to characterize the genomic regions that will deliver higher predictive power for the traits investigated. Animals were assigned into two groups, the discovery set (7324 animals) used for predictive variant detection and the cross-validation set for genomic prediction. Genome-wide association studies were performed by trait to every genomic region and entire WGS data for the pre-selection of variants. Each set of pre-selected SNPs with different density (1000, 3000, 5000, or 10,000) were added to the 50 k genotypes separately and the predictive performance of each set of genotypes was assessed using the genomic best linear unbiased prediction (GBLUP). Results showed that the predictive performance of the customized Hanwoo 50 k SNP panel can be improved by the addition of pre-selected variants from the WGS data, particularly 3000 variants from each trait, which is then sufficient to improve the prediction accuracy for all traits. When 12,000 pre-selected variants (3000 variants from each trait) were added to the 50 k genotypes, the prediction accuracies increased by 9.9, 9.2, 6.4, and 4.7% for backfat thickness, carcass weight, longissimus muscle area, and marbling score compared to the regular 50 k SNP panel, respectively. In terms of prediction bias, regression coefficients for all sets of genotypes in all traits were close to 1, indicating an unbiased prediction. The strategy used to select variants based on functional annotation did not show a clear advantage compared to using whole-genome. Nonetheless, such pre-selected SNPs from the IGR region gave the highest improvement in prediction accuracy among genomic regions and the values were close to those obtained using the WGS data for all traits. We concluded that additional gain in prediction accuracy when using pre-selected variants appears to be trait-dependent, and using WGS data remained more accurate compared to using a specific genomic region.
Collapse
Affiliation(s)
- Bryan Irvine M Lopez
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| | - Narae An
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| | - Krishnamoorthy Srikanth
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| | - Seunghwan Lee
- Department of Animal Science and Biotechnology, Chungnam National University, Daejeon, South Korea
| | - Jae-Don Oh
- Department of Animal Biotechnology, Chonbuk National University, Jeonju, South Korea
| | - Dong-Hyun Shin
- Department of Agricultural Convergence Technology, Chonbuk National University, Jeonju, South Korea
| | - Woncheoul Park
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| | - Han-Ha Chai
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| | - Jong-Eun Park
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| | - Dajeong Lim
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| |
Collapse
|
21
|
Jeong S, Kim JY, Kim N. GMStool: GWAS-based marker selection tool for genomic prediction from genomic data. Sci Rep 2020; 10:19653. [PMID: 33184432 PMCID: PMC7665227 DOI: 10.1038/s41598-020-76759-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 11/02/2020] [Indexed: 12/20/2022] Open
Abstract
The increased accessibility to genomic data in recent years has laid the foundation for studies to predict various phenotypes of organisms based on the genome. Genomic prediction collectively refers to these studies, and it estimates an individual's phenotypes mainly using single nucleotide polymorphism markers. Typically, the accuracy of these genomic prediction studies is highly dependent on the markers used; however, in practice, choosing optimal markers with high accuracy for the phenotype to be used is a challenging task. Therefore, we present a new tool called GMStool for selecting optimal marker sets and predicting quantitative phenotypes. The GMStool is based on a genome-wide association study (GWAS) and heuristically searches for optimal markers using statistical and machine-learning methods. The GMStool performs the genomic prediction using statistical and machine/deep-learning models and presents the best prediction model with the optimal marker-set. For the evaluation, the GMStool was tested on real datasets with four phenotypes. The prediction results showed higher performance than using the entire markers or the GWAS-top markers, which have been used frequently in prediction studies. Although the GMStool has several limitations, it is expected to contribute to various studies for predicting quantitative phenotypes. The GMStool written in R is available at www.github.com/JaeYoonKim72/GMStool .
Collapse
Affiliation(s)
- Seongmun Jeong
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Jae-Yoon Kim
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Department of Bioinformatics, KRIBB School of Bioscience, University of Science and Technology (UST), Daejeon, 34141, Republic of Korea
| | - Namshin Kim
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
- Department of Bioinformatics, KRIBB School of Bioscience, University of Science and Technology (UST), Daejeon, 34141, Republic of Korea.
| |
Collapse
|
22
|
Accuracy of genomic evaluation using imputed high-density genotypes for carcass traits in commercial Hanwoo population. Livest Sci 2020. [DOI: 10.1016/j.livsci.2020.104256] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
23
|
Teng J, Huang S, Chen Z, Gao N, Ye S, Diao S, Ding X, Yuan X, Zhang H, Li J, Zhang Z. Optimizing genomic prediction model given causal genes in a dairy cattle population. J Dairy Sci 2020; 103:10299-10310. [PMID: 32952023 DOI: 10.3168/jds.2020-18233] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 07/07/2020] [Indexed: 01/15/2023]
Abstract
As genotypic data are moving from SNP chip toward whole-genome sequence, the accuracy of genomic prediction (GP) exhibits a marginal gain, although all genetic variation, including causal genes, are contained in whole-genome sequence data. Meanwhile, genetic analyses on complex traits, such as genome-wide association studies, have identified an increasing number of genomic regions, including potential causal genes, which would be reliable prior knowledge for GP. Many studies have tried to improve the performance of GP by modifying the prediction model to incorporate prior knowledge. Although several plausible results have been obtained from model modification or strategy optimization, most of them were validated in a specific empirical population with a limited variety of genetic architecture for complex traits. An alternative approach is to use simulated genetic architecture with known causal genes (e.g., simulated causative SNP) to evaluate different GP models with given causal genes. Our objectives were to (1) evaluate the performance of GP under a variety of genetic architectures with a subset of known causal genes and (2) compare different GP models modified by highlighting causal genes and different strategies to weight causal genes. In this study, we simulated pseudo-phenotypes under a variety of genetic architectures based on the real genotypes and phenotypes of a dairy cattle population. Besides classical genomic best linear unbiased prediction, we evaluated 3 modified GP models that highlight causal genes as follows: (1) by treating them as fixed effects, (2) by treating them as a separate random component, and (3) by combining them into the genomic relationship matrix as random effects. Our results showed that highlighting the known causal genes, which explained a considerable proportion of genetic variance in the GP models, increased the predictive accuracy. Combining all given causal genes into the genomic relationship matrix was the optimal strategy under all the scenarios validated, and treating causal genes as a separate random component is also recommended, when more than 20% of genetic variance was explained by known causal genes. Moreover, assigning differential weights to each causal gene further improved the predictive accuracy.
Collapse
Affiliation(s)
- Jinyan Teng
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Shuwen Huang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Zitao Chen
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Ning Gao
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, North Third Road, Guangzhou Higher Education Mega Center, Guangzhou 510006, China
| | - Shaopan Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Shuqi Diao
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Xiangdong Ding
- National Engineering Laboratory for Animal Breeding, Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Xiaolong Yuan
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Hao Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Jiaqi Li
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| |
Collapse
|
24
|
van den Berg I, Xiang R, Jenko J, Pausch H, Boussaha M, Schrooten C, Tribout T, Gjuvsland AB, Boichard D, Nordbø Ø, Sanchez MP, Goddard ME. Meta-analysis for milk fat and protein percentage using imputed sequence variant genotypes in 94,321 cattle from eight cattle breeds. Genet Sel Evol 2020; 52:37. [PMID: 32635893 PMCID: PMC7339598 DOI: 10.1186/s12711-020-00556-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 06/26/2020] [Indexed: 12/14/2022] Open
Abstract
Background Sequence-based genome-wide association studies (GWAS) provide high statistical power to identify candidate causal mutations when a large number of individuals with both sequence variant genotypes and phenotypes is available. A meta-analysis combines summary statistics from multiple GWAS and increases the power to detect trait-associated variants without requiring access to data at the individual level of the GWAS mapping cohorts. Because linkage disequilibrium between adjacent markers is conserved only over short distances across breeds, a multi-breed meta-analysis can improve mapping precision. Results To maximise the power to identify quantitative trait loci (QTL), we combined the results of nine within-population GWAS that used imputed sequence variant genotypes of 94,321 cattle from eight breeds, to perform a large-scale meta-analysis for fat and protein percentage in cattle. The meta-analysis detected (p ≤ 10−8) 138 QTL for fat percentage and 176 QTL for protein percentage. This was more than the number of QTL detected in all within-population GWAS together (124 QTL for fat percentage and 104 QTL for protein percentage). Among all the lead variants, 100 QTL for fat percentage and 114 QTL for protein percentage had the same direction of effect in all within-population GWAS. This indicates either persistence of the linkage phase between the causal variant and the lead variant across breeds or that some of the lead variants might indeed be causal or tightly linked with causal variants. The percentage of intergenic variants was substantially lower for significant variants than for non-significant variants, and significant variants had mostly moderate to high minor allele frequencies. Significant variants were also clustered in genes that are known to be relevant for fat and protein percentages in milk. Conclusions Our study identified a large number of QTL associated with fat and protein percentage in dairy cattle. We demonstrated that large-scale multi-breed meta-analysis reveals more QTL at the nucleotide resolution than within-population GWAS. Significant variants were more often located in genic regions than non-significant variants and a large part of them was located in potentially regulatory regions.
Collapse
Affiliation(s)
- Irene van den Berg
- Agriculture Victoria Research, AgriBio, 5 Ring Road, Bundoora, VIC, 3083, Australia.
| | - Ruidong Xiang
- Agriculture Victoria Research, AgriBio, 5 Ring Road, Bundoora, VIC, 3083, Australia.,Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia
| | - Janez Jenko
- GENO SA, Storhamargata 44, 2317, Hamar, Norway
| | | | - Mekki Boussaha
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | | | - Thierry Tribout
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | | | - Didier Boichard
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | | | - Marie-Pierre Sanchez
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | - Mike E Goddard
- Agriculture Victoria Research, AgriBio, 5 Ring Road, Bundoora, VIC, 3083, Australia.,Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia
| |
Collapse
|
25
|
van den Berg I, MacLeod I, Reich C, Breen E, Pryce J. Optimizing genomic prediction for Australian Red dairy cattle. J Dairy Sci 2020; 103:6276-6298. [DOI: 10.3168/jds.2019-17914] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 02/13/2020] [Indexed: 12/18/2022]
|
26
|
Liu A, Lund MS, Boichard D, Mao X, Karaman E, Fritz S, Aamand GP, Wang Y, Su G. Imputation for sequencing variants preselected to a customized low-density chip. Sci Rep 2020; 10:9524. [PMID: 32533087 PMCID: PMC7293337 DOI: 10.1038/s41598-020-66523-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 05/19/2020] [Indexed: 12/27/2022] Open
Abstract
The sequencing variants preselected from association analyses and bioinformatics analyses could improve genomic prediction. In this study, the imputation of sequencing SNPs preselected from major dairy breeds in Denmark-Finland-Sweden (DFS) and France (FRA) was investigated for both contemporary animals and old bulls in Danish Jersey. For contemporary animals, a two-step imputation which first imputed to 54 K and then to 54 K + DFS + FRA SNPs achieved highest accuracy. Correlations between observed and imputed genotypes were 91.6% for DFS SNPs and 87.6% for FRA SNPs, while concordance rates were 96.6% for DFS SNPs and 93.5% for FRA SNPs. The SNPs with lower minor allele frequency (MAF) tended to have lower correlations but higher concordance rates. For old bulls, imputation for DFS and FRA SNPs were relatively accurate even for bulls without progenies (correlations higher than 97.2% and concordance rates higher than 98.4%). For contemporary animals, given limited imputation accuracy of preselected sequencing SNPs especially for SNPs with low MAF, it would be a good strategy to directly genotype preselected sequencing SNPs with a customized SNP chip. For old bulls, given high imputation accuracy for preselected sequencing SNPs with all MAF ranges, it would be unnecessary to re-genotype preselected sequencing SNPs.
Collapse
Affiliation(s)
- Aoxing Liu
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark.,Key Laboratory of Animal Genetics, Breeding and Reproduction, MARA; National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, 100193, Beijing, P.R. China
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Didier Boichard
- GABI, INRA, AgroParisTech, Université Paris Saclay, 78350, Jouy-en-Josas, France
| | - Xiaowei Mao
- Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, 100044, Beijing, P.R. China.,CAS Center for Excellence in Life and Paleoenvironment, 100044, Beijing, P.R. China
| | - Emre Karaman
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | - Sebastien Fritz
- GABI, INRA, AgroParisTech, Université Paris Saclay, 78350, Jouy-en-Josas, France.,ALLICE, 75012, Paris, France
| | | | - Yachun Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction, MARA; National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, 100193, Beijing, P.R. China.
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark.
| |
Collapse
|
27
|
Tiplady KM, Lopdell TJ, Littlejohn MD, Garrick DJ. The evolving role of Fourier-transform mid-infrared spectroscopy in genetic improvement of dairy cattle. J Anim Sci Biotechnol 2020; 11:39. [PMID: 32322393 PMCID: PMC7164258 DOI: 10.1186/s40104-020-00445-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 03/09/2020] [Indexed: 11/22/2022] Open
Abstract
Over the last 100 years, significant advances have been made in the characterisation of milk composition for dairy cattle improvement programs. Technological progress has enabled a shift from labour intensive, on-farm collection and processing of samples that assess yield and fat levels in milk, to large-scale processing of samples through centralised laboratories, with the scope extended to include quantification of other traits. Fourier-transform mid-infrared (FT-MIR) spectroscopy has had a significant role in the transformation of milk composition phenotyping, with spectral-based predictions of major milk components already being widely used in milk payment and animal evaluation systems globally. Increasingly, there is interest in analysing the individual FT-MIR wavenumbers, and in utilising the FT-MIR data to predict other novel traits of importance to breeding programs. This includes traits related to the nutritional value of milk, the processability of milk into products such as cheese, and traits relevant to animal health and the environment. The ability to successfully incorporate these traits into breeding programs is dependent on the heritability of the FT-MIR predicted traits, and the genetic correlations between the FT-MIR predicted and actual trait values. Linking FT-MIR predicted traits to the underlying mutations responsible for their variation can be difficult because the phenotypic expression of these traits are a function of a diverse range of molecular and biological mechanisms that can obscure their genetic basis. The individual FT-MIR wavenumbers give insights into the chemical composition of milk and provide an additional layer of granularity that may assist with establishing causal links between the genome and observed phenotypes. Additionally, there are other molecular phenotypes such as those related to the metabolome, chromatin accessibility, and RNA editing that could improve our understanding of the underlying biological systems controlling traits of interest. Here we review topics of importance to phenotyping and genetic applications of FT-MIR spectra datasets, and discuss opportunities for consolidating FT-MIR datasets with other genomic and molecular data sources to improve future dairy cattle breeding programs.
Collapse
Affiliation(s)
- K M Tiplady
- 1Research and Development, Livestock Improvement Corporation, Private Bag 3016, Hamilton, 3240 New Zealand.,2School of Agriculture, Massey University, Ruakura, Hamilton, 3240 New Zealand
| | - T J Lopdell
- 1Research and Development, Livestock Improvement Corporation, Private Bag 3016, Hamilton, 3240 New Zealand
| | - M D Littlejohn
- 1Research and Development, Livestock Improvement Corporation, Private Bag 3016, Hamilton, 3240 New Zealand.,2School of Agriculture, Massey University, Ruakura, Hamilton, 3240 New Zealand
| | - D J Garrick
- 2School of Agriculture, Massey University, Ruakura, Hamilton, 3240 New Zealand
| |
Collapse
|
28
|
Liu T, Luo C, Ma J, Wang Y, Shu D, Su G, Qu H. High-Throughput Sequencing With the Preselection of Markers Is a Good Alternative to SNP Chips for Genomic Prediction in Broilers. Front Genet 2020; 11:108. [PMID: 32174971 PMCID: PMC7056902 DOI: 10.3389/fgene.2020.00108] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 01/30/2020] [Indexed: 11/13/2022] Open
Abstract
The choice of a genetic marker genotyping platform is important for genomic prediction in livestock and poultry. High-throughput sequencing can produce more genetic markers, but the genotype quality is lower than that obtained with single nucleotide polymorphism (SNP) chips. The aim of this study was to compare the accuracy of genomic prediction between high-throughput sequencing and SNP chips in broilers. In this study, we developed a new SNP marker screening method, the pre-marker-selection (PMS) method, to determine whether an SNP marker can be used for genomic prediction. We also compared a method which preselection marker based results from genome-wide association studies (GWAS). With the two methods, we analysed body weight at the12th week (BW) and feed conversion ratio (FCR) in a local broiler population. A total of 395 birds were selected from the F2 generation of the population, and 10X specific-locus amplified fragment sequencing (SLAF-seq) and the Illumina Chicken 60K SNP Beadchip were used for genotyping. The genomic best linear unbiased prediction method (GBLUP) was used to predict the genomic breeding values. The accuracy of genomic prediction was validated by the leave-one-out cross-validation method. Without SNP marker screening, the accuracies of the genomic estimated breeding value (GEBV) of BW and FCR were 0.509 and 0.249, respectively, when using SLAF-seq, and the accuracies were 0.516 and 0.232, respectively, when using the SNP chip. With SNP marker screening by the PMS method, the accuracies of GEBV of the two traits were 0.671 and 0.499, respectively, when using SLAF-seq, and 0.605 and 0.422, respectively, when using the SNP chip. Our SNP marker screening method led to an increase of prediction accuracy by 0.089-0.250. With SNP marker screening by the GWAS method, the accuracies of genomic prediction for the two traits were also improved, but the gains of accuracy were less than the gains with PMS method for all traits. The results from this study indicate that our PMS method can improve the accuracy of GEBV, and that more accurate genomic prediction can be obtained from an increased number of genomic markers when using high-throughput sequencing in local broiler populations. Due to its lower genotyping cost, high-throughput sequencing could be a good alternative to SNP chips for genomic prediction in breeding programmes of local broiler populations.
Collapse
Affiliation(s)
- Tianfei Liu
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Chenglong Luo
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Jie Ma
- Guangdong Provincial Key Laboratory of Animal Breeding and Nutrition, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Yan Wang
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Dingming Shu
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, Denmark
| | - Hao Qu
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| |
Collapse
|
29
|
Steyn Y, Lourenco DAL, Misztal I. Genomic predictions in purebreds with a multibreed genomic relationship matrix1. J Anim Sci 2020; 97:4418-4427. [PMID: 31539424 DOI: 10.1093/jas/skz296] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 09/10/2019] [Indexed: 11/14/2022] Open
Abstract
Combining breeds in a multibreed evaluation can have a negative impact on prediction accuracy, especially if single nucleotide polymorphism (SNP) effects differ among breeds. The aim of this study was to evaluate the use of a multibreed genomic relationship matrix (G), where SNP effects are considered to be unique to each breed, that is, nonshared. This multibreed G was created by treating SNP of different breeds as if they were on nonoverlapping positions on the chromosome, although, in reality, they were not. This simple setup may avoid spurious Identity by state (IBS) relationships between breeds and automatically considers breed-specific allele frequencies. This scenario was contrasted to a regular multibreed evaluation where all SNPs were shared, that is, the same position, and to single-breed evaluations. Different SNP densities (9k and 45k) and different effective population sizes (Ne) were tested. Five breeds mimicking recent beef cattle populations that diverged from the same historical population were simulated using different selection criteria. It was assumed that quantitative trait locus (QTL) effects were the same over all breeds. For the recent population, generations 1-9 had approximately half of the animals genotyped, whereas all animals in generation 10 were genotyped. Generation 10 animals were set for validation; therefore, each breed had a validation group. Analyses were performed using single-step genomic best linear unbiased prediction. Prediction accuracy was calculated as the correlation between true (T) and genomic estimated breeding values (GEBV). Accuracies of GEBV were lower for the larger Ne and low SNP density. All three evaluation scenarios using 45k resulted in similar accuracies, suggesting that the marker density is high enough to account for relationships and linkage disequilibrium with QTL. A shared multibreed evaluation using 9k resulted in a decrease of accuracy of 0.08 for a smaller Ne and 0.12 for a larger Ne. This loss was mostly avoided when markers were treated as nonshared within the same G matrix. A G matrix with nonshared SNP enables multibreed evaluations without considerably changing accuracy, especially with limited information per breed.
Collapse
Affiliation(s)
- Yvette Steyn
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| | | | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA
| |
Collapse
|
30
|
Ye S, Gao N, Zheng R, Chen Z, Teng J, Yuan X, Zhang H, Chen Z, Zhang X, Li J, Zhang Z. Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction. Front Genet 2019; 10:673. [PMID: 31379929 PMCID: PMC6650575 DOI: 10.3389/fgene.2019.00673] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Accepted: 06/27/2019] [Indexed: 11/13/2022] Open
Abstract
Genomic prediction with imputed whole-genome sequencing (WGS) data is an attractive approach to improve predictive ability with low cost. However, high accuracy has not been realized using this method in livestock. In this study, we imputed 435 individuals from 600K single nucleotide polymorphism (SNP) chip data to WGS data using different reference panels. We also investigated the prediction accuracy of genomic best linear unbiased prediction (GBLUP) using imputed WGS data from different reference panels, linkage disequilibrium (LD)-based marker pruning, and pre-selected variants based on Genome-wide association society (GWAS) results. Results showed that the imputation accuracies from 600K to WGS data were 0.873 ± 0.038, 0.906 ± 0.036, and 0.979 ± 0.010 for the internal, external, and combined reference panels, respectively. In most traits of chickens, the prediction accuracy of imputed WGS data obtained from the internal reference panel was greater than or equal to that of the combined reference panel; the external reference panel had the lowest prediction accuracy. Compared with 600K chip data, GBLUP with imputed WGS data had only a small increase (1-3%) in prediction accuracy. Using only variants selected from imputed WGS data based on GWAS results resulted in almost no increase for most traits and even increased the bias of the regression coefficient. The impact of the degree of LD of selected and remaining variants on prediction accuracy was different. For average daily gain (ADG), residual feed intake (RFI), intestine length (IL), and body weight in 91 days (BW91), the accuracy of GBLUP increased as the degree of LD of selected variants decreased, but the opposite relationship occurred for the remaining variants. But for breast muscle weight (BMW) and average daily feed intake (ADFI), the accuracy of GBLUP increased as the degree of LD of selected variants increased, and the degree of LD of remaining variants had a small effect on prediction accuracy. Overall, the optimal imputation strategy to obtain WGS data for genomic prediction should consider the relationship between selected individuals and target population individuals to avoid heterogeneity of imputation. LD-based marker pruning can be used to improve the accuracy of genomic prediction using imputed WGS data.
Collapse
Affiliation(s)
- Shaopan Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Ning Gao
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Rongrong Zheng
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zitao Chen
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Jinyan Teng
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Xiaolong Yuan
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Hao Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zanmou Chen
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Xiquan Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Jiaqi Li
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| |
Collapse
|
31
|
Improvement of genomic prediction by integrating additional single nucleotide polymorphisms selected from imputed whole genome sequencing data. Heredity (Edinb) 2019; 124:37-49. [PMID: 31278370 PMCID: PMC6906477 DOI: 10.1038/s41437-019-0246-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 05/11/2019] [Accepted: 06/17/2019] [Indexed: 11/10/2022] Open
Abstract
The availability of whole genome sequencing (WGS) data enables the discovery of causative single nucleotide polymorphisms (SNPs) or SNPs in high linkage disequilibrium with causative SNPs. This study investigated effects of integrating SNPs selected from imputed WGS data into the data of 54K chip on genomic prediction in Danish Jersey. The WGS SNPs, mainly including peaks of quantitative trait loci, structure variants, regulatory regions of genes, and SNPs within genes with strong effects predicted with variant effect predictor, were selected in previous analyses for dairy breeds in Denmark–Finland–Sweden (DFS) and France (FRA). Animals genotyped with 54K chip, standard LD chip, and customized LD chip which covered selected WGS SNPs and SNPs in the standard LD chip, were imputed to 54K together with DFS and FRA SNPs. Genomic best linear unbiased prediction (GBLUP) and Bayesian four-distribution mixture models considering 54K and selected WGS SNPs as one (a one-component model) or two separate genetic components (a two-component model) were used to predict breeding values. For milk production traits and mastitis, both DFS (0.025) and FRA (0.029) sets of additional WGS SNPs improved reliabilities, and inclusions of all selected WGS SNPs generally achieved highest improvements of reliabilities (0.034). A Bayesian four-distribution model yielded higher reliabilities than a GBLUP model for milk and protein, but extra gains in reliabilities from using selected WGS SNPs were smaller for a Bayesian four-distribution model than a GBLUP model. Generally, no significant difference was observed between one-component and two-component models, except for using GBLUP models for milk.
Collapse
|
32
|
Al Kalaldeh M, Gibson J, Duijvesteijn N, Daetwyler HD, MacLeod I, Moghaddar N, Lee SH, van der Werf JHJ. Using imputed whole-genome sequence data to improve the accuracy of genomic prediction for parasite resistance in Australian sheep. Genet Sel Evol 2019; 51:32. [PMID: 31242855 PMCID: PMC6595562 DOI: 10.1186/s12711-019-0476-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 06/18/2019] [Indexed: 01/16/2023] Open
Abstract
Background This study aimed at (1) comparing the accuracies of genomic prediction for parasite resistance in sheep based on whole-genome sequence (WGS) data to those based on 50k and high-density (HD) single nucleotide polymorphism (SNP) panels; (2) investigating whether the use of variants within quantitative trait loci (QTL) regions that were selected from regional heritability mapping (RHM) in an independent dataset improved the accuracy more than variants selected from genome-wide association studies (GWAS); and (3) comparing the prediction accuracies between variants selected from WGS data to variants selected from the HD SNP panel. Results The accuracy of genomic prediction improved marginally from 0.16 ± 0.02 and 0.18 ± 0.01 when using all the variants from 50k and HD genotypes, respectively, to 0.19 ± 0.01 when using all the variants from WGS data. Fitting a GRM from the selected variants alongside a GRM from the 50k SNP genotypes improved the prediction accuracy substantially compared to fitting the 50k SNP genotypes alone. The gain in prediction accuracy was slightly more pronounced when variants were selected from WGS data compared to when variants were selected from the HD panel. When sequence variants that passed the GWAS \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$- log_{10} (p\,value)$$\end{document}-log10(pvalue) threshold of 3 across the entire genome were selected, the prediction accuracy improved by 5% (up to 0.21 ± 0.01), whereas when selection was limited to sequence variants that passed the same GWAS \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$- log_{10} (p\,value)$$\end{document}-log10(pvalue) threshold of 3 in regions identified by RHM, the accuracy improved by 9% (up to 0.25 ± 0.01). Conclusions Our results show that through careful selection of sequence variants from the QTL regions, the accuracy of genomic prediction for parasite resistance in sheep can be improved. These findings have important implications for genomic prediction in sheep.
Collapse
Affiliation(s)
- Mohammad Al Kalaldeh
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia. .,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia.
| | - John Gibson
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Naomi Duijvesteijn
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Hans D Daetwyler
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,Centre for AgriBioscience, Agriculture Victoria, Bundoora, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Iona MacLeod
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,Centre for AgriBioscience, Agriculture Victoria, Bundoora, VIC, 3083, Australia
| | - Nasir Moghaddar
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Sang Hong Lee
- Australian Centre for Precision Health, University of South Australia Cancer Research Institute, University of South Australia, Adelaide, SA, 5000, Australia
| | - Julius H J van der Werf
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.,School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| |
Collapse
|
33
|
Zhang Q, Sahana G, Su G, Guldbrandtsen B, Lund MS, Calus MPL. Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle. Genet Sel Evol 2018; 50:62. [PMID: 30458700 PMCID: PMC6247626 DOI: 10.1186/s12711-018-0432-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 11/14/2018] [Indexed: 11/05/2022] Open
Abstract
Background Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of including RLFV that are within genes and selected from whole-genome sequence variants, on the reliability of genomic prediction for fertility, health and longevity in dairy cattle. Results All genic RLFV with a minor allele frequency lower than 0.05 were extracted from imputed sequence data and subsets were created using different strategies. These subsets were subsequently combined with Illumina 50 k single nucleotide polymorphism (SNP) data and used for genomic prediction. Reliability of prediction obtained by using 50 k SNP data alone was used as reference value and absolute changes in reliabilities are referred to as changes in percentage points. Adding a component that included either all the genic or a subset of selected RLFV into the model in addition to the 50 k component changed the reliability of predictions by − 2.2 to 1.1%, i.e. hardly no change in reliability of prediction was found, regardless of how the RLFV were selected. In addition to these empirical analyses, a simulation study was performed to evaluate the potential impact of adding RLFV in the model on the reliability of prediction. Three sets of causal RLFV (containing 21,468, 1348 and 235 RLFV) that were randomly selected from different numbers of genes were generated and accounted for 10% additional genetic variance of the estimated variance explained by the 50 k SNPs. When genic RLFV based on mapping results were included in the prediction model, reliabilities improved by up to 4.0% and when the causal RLFV were included they improved by up to 6.8%. Conclusions Using selected RLFV from whole-genome sequence data had only a small impact on the empirical reliability of genomic prediction in dairy cattle. Our simulations revealed that for sequence data to bring a benefit, the key is to identify causal RLFV. Electronic supplementary material The online version of this article (10.1186/s12711-018-0432-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qianqian Zhang
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark. .,Wageningen University and Research, Animal Breeding and Genomics, Wageningen, The Netherlands. .,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Goutam Sahana
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Guosheng Su
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Bernt Guldbrandtsen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Mogens Sandø Lund
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Mario P L Calus
- Wageningen University and Research, Animal Breeding and Genomics, Wageningen, The Netherlands
| |
Collapse
|
34
|
Raymond B, Bouwman AC, Wientjes YCJ, Schrooten C, Houwing-Duistermaat J, Veerkamp RF. Genomic prediction for numerically small breeds, using models with pre-selected and differentially weighted markers. Genet Sel Evol 2018; 50:49. [PMID: 30314431 PMCID: PMC6186145 DOI: 10.1186/s12711-018-0419-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 10/01/2018] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Genomic prediction (GP) accuracy in numerically small breeds is limited by the small size of the reference population. Our objective was to test a multi-breed multiple genomic relationship matrices (GRM) GP model (MBMG) that weighs pre-selected markers separately, uses the remaining markers to explain the remaining genetic variance that can be explained by markers, and weighs information of breeds in the reference population by their genetic correlation with the validation breed. METHODS Genotype and phenotype data were used on 595 Jersey bulls from New Zealand and 5503 Holstein bulls from the Netherlands, all with deregressed proofs for stature. Different sets of markers were used, containing either pre-selected markers from a meta-genome-wide association analysis on stature, remaining markers or both. We implemented a multi-breed bivariate GREML model in which we fitted either a single multi-breed GRM (MBSG), or two distinct multi-breed GRM (MBMG), one made with pre-selected markers and the other with remaining markers. Accuracies of predicting stature for Jersey individuals using the multi-breed models (Holstein and Jersey combined reference population) was compared to those obtained using either the Jersey (within-breed) or Holstein (across-breed) reference population. All the models were subsequently fitted in the analysis of simulated phenotypes, with a simulated genetic correlation between breeds of 1, 0.5, and 0.25. RESULTS The MBMG model always gave better prediction accuracies for stature compared to MBSG, within-, and across-breed GP models. For example, with MBSG, accuracies obtained by fitting 48,912 unselected markers (0.43), 357 pre-selected markers (0.38) or a combination of both (0.43), were lower than accuracies obtained by fitting pre-selected and unselected markers in separate GRM in MBMG (0.49). This improvement was further confirmed by results from a simulation study, with MBMG performing on average 23% better than MBSG with all markers fitted. CONCLUSIONS With the MBMG model, it is possible to use information from numerically large breeds to improve prediction accuracy of numerically small breeds. The superiority of MBMG is mainly due to its ability to use information on pre-selected markers, explain the remaining genetic variance and weigh information from a different breed by the genetic correlation between breeds.
Collapse
Affiliation(s)
- Biaty Raymond
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
- Biometris, Wageningen University and Research, 6700 AA Wageningen, The Netherlands
| | - Aniek C. Bouwman
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| | - Yvonne C. J. Wientjes
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| | | | - Jeanine Houwing-Duistermaat
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, 2333 ZC Leiden, The Netherlands
- School of Mathematics, Faculty of Mathematics and Physical Sciences, University of Leeds, Leeds, LS2 9JT UK
| | - Roel F. Veerkamp
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| |
Collapse
|
35
|
Raymond B, Bouwman AC, Schrooten C, Houwing-Duistermaat J, Veerkamp RF. Utility of whole-genome sequence data for across-breed genomic prediction. Genet Sel Evol 2018; 50:27. [PMID: 29776327 PMCID: PMC5960108 DOI: 10.1186/s12711-018-0396-8] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 05/04/2018] [Indexed: 11/24/2022] Open
Abstract
Background Genomic prediction (GP) across breeds has so far resulted in low accuracies of the predicted genomic breeding values. Our objective was to evaluate whether using whole-genome sequence (WGS) instead of low-density markers can improve GP across breeds, especially when markers are pre-selected from a genome-wide association study (GWAS), and to test our hypothesis that many non-causal markers in WGS data have a diluting effect on accuracy of across-breed prediction. Methods Estimated breeding values for stature and bovine high-density (HD) genotypes were available for 595 Jersey bulls from New Zealand, 957 Holstein bulls from New Zealand and 5553 Holstein bulls from the Netherlands. BovineHD genotypes for all bulls were imputed to WGS using Beagle4 and Minimac2. Genomic prediction across the three populations was performed with ASReml4, with each population used as single reference and as single validation sets. In addition to the 50k, HD and WGS, markers that were significantly associated with stature in a large meta-GWAS analysis were selected and used for prediction, resulting in 10 prediction scenarios. Furthermore, we estimated the proportion of genetic variance captured by markers in each scenario. Results Across breeds, 50k, HD and WGS markers resulted in very low accuracies of prediction ranging from − 0.04 to 0.13. Accuracies were higher in scenarios with pre-selected markers from a meta-GWAS. For example, using only the 133 most significant markers in 133 QTL regions from the meta-GWAS yielded accuracies ranging from 0.08 to 0.23, while 23,125 markers with a − log10(p) higher than 7 resulted in accuracies of up 0.35. Using WGS data did not significantly improve the proportion of genetic variance captured across breeds compared to scenarios with few but pre-selected markers. Conclusions Our results demonstrated that the accuracy of across-breed GP can be improved by using markers that are pre-selected from WGS based on their potential causal effect. We also showed that simply increasing the number of markers up to the WGS level does not increase the accuracy of across-breed prediction, even when markers that are expected to have a causal effect are included.
Collapse
Affiliation(s)
- Biaty Raymond
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands. .,Biometris, Wageningen University and Research, 6700 AA, Wageningen, The Netherlands.
| | - Aniek C Bouwman
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | | | - Jeanine Houwing-Duistermaat
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, 2333 ZC, Leiden, The Netherlands.,School of Mathematics, University of Leeds, Leeds, LS2 9JT, UK
| | - Roel F Veerkamp
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
36
|
Calus MPL, Goddard ME, Wientjes YCJ, Bowman PJ, Hayes BJ. Multibreed genomic prediction using multitrait genomic residual maximum likelihood and multitask Bayesian variable selection. J Dairy Sci 2018; 101:4279-4294. [PMID: 29550121 DOI: 10.3168/jds.2017-13366] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 01/04/2018] [Indexed: 11/19/2022]
Abstract
Genomic prediction is applicable to individuals of different breeds. Empirical results to date, however, show limited benefits in using information on multiple breeds in the context of genomic prediction. We investigated a multitask Bayesian model, presented previously by others, implemented in a Bayesian stochastic search variable selection (BSSVS) model. This model allowed for evidence of quantitative trait loci (QTL) to be accumulated across breeds or for both QTL that segregate across breeds and breed-specific QTL. In both cases, single nucleotide polymorphism effects were estimated with information from a single breed. Other models considered were a single-trait and multitrait genomic residual maximum likelihood (GREML) model, with breeds considered as different traits, and a single-trait BSSVS model. All single-trait models were applied to each of the 2 breeds separately and to the pooled data of both breeds. The data used included a training data set of 6,278 Holstein and 722 Jersey bulls, as well as 374 Jersey validation bulls. All animals had genotypes for 474,773 single nucleotide polymorphisms after editing and phenotypes for milk, fat, and protein yields. Using the same training data, BSSVS consistently outperformed GREML. The multitask BSSVS, however, did not outperform single-trait BSSVS, which used pooled Holstein and Jersey data for training. Thus, the rigorous assumption that the traits are the same in both breeds yielded a slightly better prediction than a model that had to estimate the correlation between the breeds from the data. Adding the Holstein data significantly increased the accuracy of the single-trait GREML and BSSVS in predicting the Jerseys for milk and protein, in line with estimated correlations between the breeds of 0.66 and 0.47 for milk and protein yields, whereas only the BSSVS model significantly improved the accuracy for fat yield with an estimated correlation between breeds of only 0.05. The relatively high genetic correlations for milk and protein yields, and the superiority of the pooling strategy, is likely the result of the observed admixture between both breeds in our data. The Bayesian model was able to detect several QTL in Holsteins, which likely enabled it to outperform GREML. The inability of the multitask Bayesian models to outperform a simple pooling strategy may be explained by the fact that the pooling strategy assumes equal effects in both breeds; furthermore, this assumption may be valid for moderate- to large-sized QTL, which are important for multibreed genomic prediction.
Collapse
Affiliation(s)
- M P L Calus
- Wageningen University & Research, Animal Breeding and Genomics, PO Box 338, 6700 AH Wageningen, the Netherlands.
| | - M E Goddard
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Melbourne, Victoria 3010, Australia; Agriculture Research, Department of Economic Development, Jobs, Transport and Resources, Melbourne, Victoria 3083, Australia
| | - Y C J Wientjes
- Wageningen University & Research, Animal Breeding and Genomics, PO Box 338, 6700 AH Wageningen, the Netherlands
| | - P J Bowman
- Agriculture Research, Department of Economic Development, Jobs, Transport and Resources, Melbourne, Victoria 3083, Australia; School of Applied Systems Biology, La Trobe University, Bundoora, Victoria 3083, Australia
| | - B J Hayes
- School of Applied Systems Biology, La Trobe University, Bundoora, Victoria 3083, Australia; Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, The University of Queensland, St. Lucia, Queensland 4072, Australia
| |
Collapse
|
37
|
Jardim JG, Guldbrandtsen B, Lund MS, Sahana G. Association analysis for udder index and milking speed with imputed whole-genome sequence variants in Nordic Holstein cattle. J Dairy Sci 2017; 101:2199-2212. [PMID: 29274975 DOI: 10.3168/jds.2017-12982] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 10/30/2017] [Indexed: 12/26/2022]
Abstract
Genome-wide association testing facilitates the identification of genetic variants associated with complex traits. Mapping genes that promote genetic resistance to mastitis could reduce the cost of antibiotic use and enhance animal welfare and milk production by improving outcomes of breeding for udder health. Using imputed whole-genome sequence variants, we carried out association studies for 2 traits related to udder health, udder index, and milking speed in Nordic Holstein cattle. A total of 4,921 bulls genotyped with the BovineSNP50 BeadChip array were imputed to high-density genotypes (Illumina BovineHD BeadChip, Illumina, San Diego, CA) and, subsequently, to whole-genome sequence variants. An association analysis was carried out using a linear mixed model. Phenotypes used in the association analyses were deregressed breeding values. Multitrait meta-analysis was carried out for these 2 traits. We identified 10 and 8 chromosomes harboring markers that were significantly associated with udder index and milking speed, respectively. Strongest association signals were observed on chromosome 20 for udder index and chromosome 19 for milking speed. Multitrait meta-analysis identified 13 chromosomes harboring associated markers for the combination of udder index and milking speed. The associated region on chromosome 20 overlapped with earlier reported quantitative trait loci for similar traits in other cattle populations. Moreover, this region was located close to the FYB gene, which is involved in platelet activation and controls IL-2 expression; FYB is a strong candidate gene for udder health and worthy of further investigation.
Collapse
Affiliation(s)
- Júlia Gazzoni Jardim
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark; Laboratory of Reproduction and Animal Breeding, State University of North Fluminense Darcy Ribeiro, Av. Alberto Lamego, 2000 Parque California, Campos dos Goytacazes, RJ, 28013-602, Brazil
| | - Bernt Guldbrandtsen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| | - Mogens Sandø Lund
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| | - Goutam Sahana
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark.
| |
Collapse
|
38
|
van den Berg I, Bowman PJ, MacLeod IM, Hayes BJ, Wang T, Bolormaa S, Goddard ME. Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect. Genet Sel Evol 2017; 49:70. [PMID: 28934948 PMCID: PMC5609075 DOI: 10.1186/s12711-017-0347-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 09/13/2017] [Indexed: 11/26/2022] Open
Abstract
Background The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows. Results With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs. Conclusions We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0347-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Irene van den Berg
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, VIC, Australia.
| | - Phil J Bowman
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.,School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Ben J Hayes
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.,Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, University of Queensland, St Lucia, QLD, Australia
| | - Tingting Wang
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Sunduimijid Bolormaa
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Mike E Goddard
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, VIC, Australia.,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| |
Collapse
|
39
|
Selecting sequence variants to improve genomic predictions for dairy cattle. Genet Sel Evol 2017; 49:32. [PMID: 28270096 PMCID: PMC5339980 DOI: 10.1186/s12711-017-0307-4] [Citation(s) in RCA: 85] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 02/27/2017] [Indexed: 01/26/2023] Open
Abstract
Background Millions of genetic variants have been identified by population-scale sequencing projects, but subsets of these variants are needed for routine genomic predictions or genotyping arrays. Methods for selecting sequence variants were compared using simulated sequence genotypes and real July 2015 data from the 1000 Bull Genomes Project. Methods Candidate sequence variants for 444 Holstein animals were combined with high-density (HD) imputed genotypes for 26,970 progeny-tested Holstein bulls. Test 1 included single nucleotide polymorphisms (SNPs) for 481,904 candidate sequence variants. Test 2 also included 249,966 insertions-deletions (InDels). After merging sequence variants with 312,614 HD SNPs and editing steps, Tests 1 and 2 included 762,588 and 1,003,453 variants, respectively. Imputation quality from findhap software was assessed with 404 of the sequenced animals in the reference population and 40 randomly chosen animals for validation. Their sequence genotypes were reduced to the subset of genotypes that were in common with HD genotypes and then imputed back to sequence. Predictions were tested for 33 traits using 2015 data of 3983 US validation bulls with daughters that were first phenotyped after August 2011. Results The average percentage of correctly imputed variants across all chromosomes was 97.2 for Test 1 and 97.0 for Test 2. Total time required to prepare, edit, impute, and estimate the effects of sequence variants for 27,235 bulls was about 1 week using less than 33 threads. Many sequence variants had larger estimated effects than nearby HD SNPs, but prediction reliability improved only by 0.6 percentage points in Test 1 when sequence SNPs were added to HD SNPs and by 0.4 percentage points in Test 2 when sequence SNPs and InDels were included. However, selecting the 16,648 candidate SNPs with the largest estimated effects and adding them to the 60,671 SNPs used in routine evaluations improved reliabilities by 2.7 percentage points. Conclusions Reliabilities for genomic predictions improved when selected sequence variants were added; gains were similar for simulated and real data for the same population, and larger than previous gains obtained by adding HD SNPs. With many genotyped animals, many data sources, and millions of variants, computing strategies must efficiently balance costs of imputation, selection, and prediction to obtain subsets of markers that provide the highest accuracy. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0307-4) contains supplementary material, which is available to authorized users.
Collapse
|
40
|
Evaluating Sequence-Based Genomic Prediction with an Efficient New Simulator. Genetics 2016; 205:939-953. [PMID: 27913617 DOI: 10.1534/genetics.116.194878] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Accepted: 11/23/2016] [Indexed: 01/28/2023] Open
Abstract
The vast amount of sequence data generated to analyze complex traits is posing new challenges in terms of the analysis and interpretation of the results. Although simulation is a fundamental tool to investigate the reliability of genomic analyses and to optimize experimental design, existing software cannot realistically simulate complete genomes. To remedy this, we have developed a new strategy (Sequence-Based Virtual Breeding, SBVB) that uses real sequence data and simulates new offspring genomes and phenotypes in a very efficient and flexible manner. Using this tool, we studied the efficiency of full sequence in genomic prediction compared to SNP arrays. We used real porcine sequences from three breeds as founder genomes of a 2500-animal pedigree and two genetic architectures: "neutral" and "selective." In the neutral architecture, frequencies and allele effects were sampled independently whereas, in the selective case, SNPs were sites putatively under selection after domestication and a negative correlation between effect and frequency was induced. We compared the effectiveness of different genotyping strategies for genomic selection, including the use of full sequence commercial arrays or randomly chosen SNP sets in both outbred and crossbred experimental designs. We found that accuracy increases using sequence instead of commercial chips but modestly, perhaps by ≤ 4%. This result was robust to extreme genetic architectures. We conclude that full sequence is unlikely to offset commercial arrays for predicting genetic value when the number of loci is relatively large and the prior given to each SNP is uniform. Using sequence to improve selection thus requires optimized prior information and, likely, increased population sizes. The code and manual for SBVB are available at https://github.com/mperezenciso/sbvb0.
Collapse
|
41
|
Veerkamp RF, Bouwman AC, Schrooten C, Calus MPL. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle. Genet Sel Evol 2016; 48:95. [PMID: 27905878 PMCID: PMC5134274 DOI: 10.1186/s12711-016-0274-1] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 11/24/2016] [Indexed: 11/10/2022] Open
Abstract
Background Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. Methods Phenotypes were available for 5503 Holstein–Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. Results The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. Conclusions Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0274-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Roel F Veerkamp
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands. .,Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, P.O. Box 5003, 1432, Ås, Norway.
| | - Aniek C Bouwman
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | | | - Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|