1
|
Alemu A, Åstrand J, Montesinos-López OA, Isidro Y Sánchez J, Fernández-Gónzalez J, Tadesse W, Vetukuri RR, Carlsson AS, Ceplitis A, Crossa J, Ortiz R, Chawade A. Genomic selection in plant breeding: Key factors shaping two decades of progress. MOLECULAR PLANT 2024; 17:552-578. [PMID: 38475993 DOI: 10.1016/j.molp.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/14/2024]
Abstract
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.
Collapse
Affiliation(s)
- Admas Alemu
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Johanna Åstrand
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden; Lantmännen Lantbruk, Svalöv, Sweden
| | | | - Julio Isidro Y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Javier Fernández-Gónzalez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Wuletaw Tadesse
- International Center for Agricultural Research in the Dry Areas (ICARDA), Rabat, Morocco
| | - Ramesh R Vetukuri
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - Anders S Carlsson
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco, México 52640, Mexico
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| |
Collapse
|
2
|
Salvian M, Moreira GCM, Silveira RMF, Reis ÂP, Dias D'auria B, Pilonetto F, Gervásio IC, Ledur MC, Coutinho LL, Spangler ML, Mourão GB. Estimation of breeding values using different densities of SNP to inform kinship in broiler chickens. Livest Sci 2022. [DOI: 10.1016/j.livsci.2022.105124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
3
|
Ros-Freixedes R, Johnsson M, Whalen A, Chen CY, Valente BD, Herring WO, Gorjanc G, Hickey JM. Genomic prediction with whole-genome sequence data in intensely selected pig lines. GENETICS SELECTION EVOLUTION 2022; 54:65. [PMID: 36153511 PMCID: PMC9509613 DOI: 10.1186/s12711-022-00756-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 09/05/2022] [Indexed: 12/03/2022]
Abstract
Background Early simulations indicated that whole-genome sequence data (WGS) could improve the accuracy of genomic predictions within and across breeds. However, empirical results have been ambiguous so far. Large datasets that capture most of the genomic diversity in a population must be assembled so that allele substitution effects are estimated with high accuracy. The objectives of this study were to use a large pig dataset from seven intensely selected lines to assess the benefits of using WGS for genomic prediction compared to using commercial marker arrays and to identify scenarios in which WGS provides the largest advantage. Methods We sequenced 6931 individuals from seven commercial pig lines with different numerical sizes. Genotypes of 32.8 million variants were imputed for 396,100 individuals (17,224 to 104,661 per line). We used BayesR to perform genomic prediction for eight complex traits. Genomic predictions were performed using either data from a standard marker array or variants preselected from WGS based on association tests. Results The accuracies of genomic predictions based on preselected WGS variants were not robust across traits and lines and the improvements in prediction accuracy that we achieved so far with WGS compared to standard marker arrays were generally small. The most favourable results for WGS were obtained when the largest training sets were available and standard marker arrays were augmented with preselected variants with statistically significant associations to the trait. With this method and training sets of around 80k individuals, the accuracy of within-line genomic predictions was on average improved by 0.025. With multi-line training sets, improvements of 0.04 compared to marker arrays could be expected. Conclusions Our results showed that WGS has limited potential to improve the accuracy of genomic predictions compared to marker arrays in intensely selected pig lines. Thus, although we expect that larger improvements in accuracy from the use of WGS are possible with a combination of larger training sets and optimised pipelines for generating and analysing such datasets, the use of WGS in the current implementations of genomic prediction should be carefully evaluated against the cost of large-scale WGS data on a case-by-case basis. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-022-00756-0.
Collapse
|
4
|
Marina H, Pelayo R, Gutiérrez-Gil B, Suárez-Vega A, Esteban-Blanco C, Reverter A, Arranz JJ. Low-density SNP panel for efficient imputation and genomic selection of milk production and technological traits in dairy sheep. J Dairy Sci 2022; 105:8199-8217. [PMID: 36028350 DOI: 10.3168/jds.2021-21601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 05/30/2022] [Indexed: 11/19/2022]
Abstract
The present study aimed to ascertain how different strategies for leveraging genomic information enhance the accuracy of estimated breeding values for milk and cheese-making traits and to evaluate the implementation of a low-density (LowD) SNP chip designed explicitly for that aim. Thus, milk samples from a total of 2,020 dairy ewes from 2 breeds (1,039 Spanish Assaf and 981 Churra) were collected and analyzed to determine 3 milk production and composition traits and 2 traits related to milk coagulation properties and cheese yield. The 2 studied populations were genotyped with a customized 50K Affymetrix SNP chip (Affymetrix Inc.) containing 55,627 SNP markers. The prediction accuracies were obtained using different multitrait methodologies, such as the BLUP model based on pedigree information, the genomic BLUP (GBLUP), and the BLUP at the SNP level (SNP-BLUP), which are based on genotypic data, and the single-step GBLUP (ssGBLUP), which combines both sources of information. All of these methods were analyzed by cross-validation, comparing predictions of the whole population with the test population sets. Additionally, we describe the design of a LowD SNP chip (3K) and its prediction accuracies through the different methods mentioned previously. Furthermore, the results obtained using the LowD SNP chip were compared with those based on the 50K SNP chip data sets. Finally, we conclude that implementing genomic selection through the ssGBLUP model in the current breeding programs would increase the accuracy of the estimated breeding values compared with the BLUP methodology in the Assaf (from 0.19 to 0.39) and Churra (from 0.27 to 0.44) dairy sheep populations. The LowD SNP chip is cost-effective and has proven to be an accurate tool for estimating genomic breeding values for milk and cheese-making traits, microsatellite imputation, and parentage verification. The results presented here suggest that the routine use of this LowD SNP chip could potentially increase the genetic gains of the breeding selection programs of the 2 Spanish dairy sheep breeds considered here.
Collapse
Affiliation(s)
- H Marina
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León 24071, Spain
| | - R Pelayo
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León 24071, Spain
| | - B Gutiérrez-Gil
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León 24071, Spain
| | - A Suárez-Vega
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León 24071, Spain
| | - C Esteban-Blanco
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León 24071, Spain
| | - A Reverter
- CSIRO Agriculture & Food, 306 Carmody Rd., St. Lucia, Brisbane, QLD 4067, Australia
| | - J J Arranz
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León 24071, Spain.
| |
Collapse
|
5
|
Wang Z, Zhang Z, Chen Z, Sun J, Cao C, Wu F, Xu Z, Zhao W, Sun H, Guo L, Zhang Z, Wang Q, Pan Y. PHARP: a pig haplotype reference panel for genotype imputation. Sci Rep 2022; 12:12645. [PMID: 35879321 PMCID: PMC9314402 DOI: 10.1038/s41598-022-15851-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 06/30/2022] [Indexed: 11/18/2022] Open
Abstract
Pigs not only function as a major meat source worldwide but also are commonly used as an animal model for studying human complex traits. A large haplotype reference panel has been used to facilitate efficient phasing and imputation of relatively sparse genome-wide microarray chips and low-coverage sequencing data. Using the imputed genotypes in the downstream analysis, such as GWASs, TWASs, eQTL mapping and genomic prediction (GS), is beneficial for obtaining novel findings. However, currently, there is still a lack of publicly available and high-quality pig reference panels with large sample sizes and high diversity, which greatly limits the application of genotype imputation in pigs. In response, we built the pig Haplotype Reference Panel (PHARP) database. PHARP provides a reference panel of 2012 pig haplotypes at 34 million SNPs constructed using whole-genome sequence data from more than 49 studies of 71 pig breeds. It also provides Web-based analytical tools that allow researchers to carry out phasing and imputation consistently and efficiently. PHARP is freely accessible at http://alphaindex.zju.edu.cn/PHARP/index.php . We demonstrate its applicability for pig commercial 50 K SNP arrays, by accurately imputing 2.6 billion genotypes at a concordance rate value of 0.971 in 81 Large White pigs (~ 17 × sequencing coverage). We also applied our reference panel to impute the low-density SNP chip into the high-density data for three GWASs and found novel significantly associated SNPs that might be casual variants.
Collapse
Affiliation(s)
- Zhen Wang
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Zhenyang Zhang
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Zitao Chen
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Jiabao Sun
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Caiyun Cao
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Fen Wu
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Zhong Xu
- Hubei Key Laboratory of Animal Embryo and Molecular Breeding, Institute of Animal Husbandry and Veterinary, Hubei Provincial Academy of Agricultural Sciences, Wuhan, 430064, China
| | - Wei Zhao
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Hao Sun
- Department of Animal Science, School of Animal Science, Jilin University, Changchun, 130062, China
| | - Longyu Guo
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhe Zhang
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| | - Qishan Wang
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| | - Yuchun Pan
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
6
|
Marcos S, Parejo M, Estonba A, Alberdi A. Recovering High-Quality Host Genomes from Gut Metagenomic Data through Genotype Imputation. ADVANCED GENETICS (HOBOKEN, N.J.) 2022; 3:2100065. [PMID: 36620197 PMCID: PMC9744478 DOI: 10.1002/ggn2.202100065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 03/05/2022] [Indexed: 01/11/2023]
Abstract
Metagenomic datasets of host-associated microbial communities often contain host DNA that is usually discarded because the amount of data is too low for accurate host genetic analyses. However, genotype imputation can be employed to reconstruct host genotypes if a reference panel is available. Here, the performance of a two-step strategy is tested to impute genotypes from four types of reference panels built using different strategies to low-depth host genome data (≈2× coverage) recovered from intestinal samples of two chicken genetic lines. First, imputation accuracy is evaluated in 12 samples for which both low- and high-depth sequencing data are available, obtaining high imputation accuracies for all tested panels (>0.90). Second, the impact of reference panel choice in population genetics statistics on 100 chickens is assessed, all four panels yielding comparable results. In light of the observations, the feasibility and application of the applied imputation strategy are discussed for different species with regard to the host DNA proportion, genomic diversity, and availability of a reference panel. This method enables leveraging insofar discarded host DNA to get insights into the genetic structure of host populations, and in doing so, facilitates the implementation of hologenomic approaches that jointly analyze host and microbial genomic data.
Collapse
Affiliation(s)
- Sofia Marcos
- Applied Genomics and BioinformaticsUniversity of the Basque Country (UPV/EHU)LeioaBilbao48940Spain
| | - Melanie Parejo
- Applied Genomics and BioinformaticsUniversity of the Basque Country (UPV/EHU)LeioaBilbao48940Spain
| | - Andone Estonba
- Applied Genomics and BioinformaticsUniversity of the Basque Country (UPV/EHU)LeioaBilbao48940Spain
| | - Antton Alberdi
- Center for Evolutionary HologenomicsGLOBE InstituteUniversity of CopenhagenCopenhagen1353Denmark
| |
Collapse
|
7
|
Shen F, Bianco L, Wu B, Tian Z, Wang Y, Wu T, Xu X, Han Z, Velasco R, Fontana P, Zhang X. A bulked segregant analysis tool for out-crossing species (BSATOS) and QTL-based genomics-assisted prediction of complex traits in apple. J Adv Res 2022; 42:149-162. [PMID: 36513410 PMCID: PMC9788957 DOI: 10.1016/j.jare.2022.03.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 03/06/2022] [Accepted: 03/22/2022] [Indexed: 12/27/2022] Open
Abstract
INTRODUCTION Genomic heterozygosity, self-incompatibility, and rich-in somatic mutations hinder the molecular breeding efficiency of outcrossing plants. OBJECTIVES We attempted to develop an efficient integrated strategy to identify quantitative trait loci (QTLs) and trait-associated genes, to develop gene markers, and to construct genomics-assisted prediction (GAP) modes. METHODS A novel protocol, bulked segregant analysis tool for out-crossing species (BSATOS), is presented here, which is characterized by taking full advantage of all segregation patterns (including AB × AB markers) and haplotype information. To verify the effectiveness of the protocol in dealing with the complex traits of outbreeding species, three apple cross populations with 9,654 individuals were adopted. RESULTS By using BSATOS, 90, 60, and 77 significant QTLs were identified successfully and candidate genes were predicted for apple fruit weight (FW), fruit ripening date (FRD), and fruit soluble solid content (SSC), respectively. The gene-based markers were developed and genotyped for 1,396 individuals in a training population, including 145 Malus accessions and 1,251 F1 plants of the three full-sib families. GAP models were trained using marker genotype effect estimates of the training population. The prediction accuracy was 0.7658, 0.6455, and 0.3758 for FW, FRD, and SSC, respectively. CONCLUSION The BSATOS and GAP models provided a convenient and efficient methodology for candidate gene mining and molecular breeding in out-crossing plant species. The BSATOS pipeline can be freely downloaded from: https://github.com/maypoleflyn/BSATOS.
Collapse
Affiliation(s)
- Fei Shen
- College of Horticulture, China Agricultural University, Beijing 100193, China,Research and Innovation Center, Edmund Mach Foundation, 38010 S. Michele all’Adige, Italy,Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
| | - Luca Bianco
- Research and Innovation Center, Edmund Mach Foundation, 38010 S. Michele all’Adige, Italy
| | - Bei Wu
- College of Horticulture, China Agricultural University, Beijing 100193, China
| | - Zhendong Tian
- College of Horticulture, China Agricultural University, Beijing 100193, China
| | - Yi Wang
- College of Horticulture, China Agricultural University, Beijing 100193, China
| | - Ting Wu
- College of Horticulture, China Agricultural University, Beijing 100193, China
| | - Xuefeng Xu
- College of Horticulture, China Agricultural University, Beijing 100193, China
| | - Zhenhai Han
- College of Horticulture, China Agricultural University, Beijing 100193, China,Corresponding authors.
| | - Riccardo Velasco
- Research Centre for Viticulture and Enology, CREA, Conegliano, Italy
| | - Paolo Fontana
- Research and Innovation Center, Edmund Mach Foundation, 38010 S. Michele all’Adige, Italy,Corresponding authors.
| | - Xinzhong Zhang
- College of Horticulture, China Agricultural University, Beijing 100193, China,Corresponding authors.
| |
Collapse
|
8
|
Cheruiyot EK, Haile-Mariam M, Cocks BG, MacLeod IM, Mrode R, Pryce JE. Functionally prioritised whole-genome sequence variants improve the accuracy of genomic prediction for heat tolerance. Genet Sel Evol 2022; 54:17. [PMID: 35183109 PMCID: PMC8858496 DOI: 10.1186/s12711-022-00708-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 02/03/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Heat tolerance is a trait of economic importance in the context of warm climates and the effects of global warming on livestock production, reproduction, health, and well-being. This study investigated the improvement in prediction accuracy for heat tolerance when selected sets of sequence variants from a large genome-wide association study (GWAS) were combined with a standard 50k single nucleotide polymorphism (SNP) panel used by the dairy industry. METHODS Over 40,000 dairy cattle with genotype and phenotype data were analysed. The phenotypes used to measure an individual's heat tolerance were defined as the rate of decline in milk production traits with rising temperature and humidity. We used Holstein and Jersey cows to select sequence variants linked to heat tolerance. The prioritised sequence variants were the most significant SNPs passing a GWAS p-value threshold selected based on sliding 100-kb windows along each chromosome. We used a bull reference set to develop the genomic prediction equations, which were then validated in an independent set of Holstein, Jersey, and crossbred cows. Prediction analyses were performed using the BayesR, BayesRC, and GBLUP methods. RESULTS The accuracy of genomic prediction for heat tolerance improved by up to 0.07, 0.05, and 0.10 units in Holstein, Jersey, and crossbred cows, respectively, when sets of selected sequence markers from Holstein cows were added to the 50k SNP panel. However, in some scenarios, the prediction accuracy decreased unexpectedly with the largest drop of - 0.10 units for the heat tolerance fat yield trait observed in Jersey cows when 50k plus pre-selected SNPs from Holstein cows were used. Using pre-selected SNPs discovered on a combined set of Holstein and Jersey cows generally improved the accuracy, especially in the Jersey validation. In addition, combining Holstein and Jersey bulls in the reference set generally improved prediction accuracy in most scenarios compared to using only Holstein bulls as the reference set. CONCLUSIONS Informative sequence markers can be prioritised to improve the genomic prediction of heat tolerance in different breeds. In addition to providing biological insight, these variants could also have a direct application for developing customized SNP arrays or can be used via imputation in current industry SNP panels.
Collapse
Affiliation(s)
- Evans K Cheruiyot
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia.,Agriculture Victoria Research, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia
| | - Mekonnen Haile-Mariam
- Agriculture Victoria Research, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.
| | - Benjamin G Cocks
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia.,Agriculture Victoria Research, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia
| | - Iona M MacLeod
- Agriculture Victoria Research, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia
| | - Raphael Mrode
- International Livestock Research Institute, Nairobi, Kenya.,Scotland's Rural College, Edinburgh, UK
| | - Jennie E Pryce
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia.,Agriculture Victoria Research, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia
| |
Collapse
|
9
|
Deng T, Zhang P, Garrick D, Gao H, Wang L, Zhao F. Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data. Front Genet 2022; 12:704118. [PMID: 35046990 PMCID: PMC8762119 DOI: 10.3389/fgene.2021.704118] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 12/03/2021] [Indexed: 11/17/2022] Open
Abstract
Genotype imputation is the term used to describe the process of inferring unobserved genotypes in a sample of individuals. It is a key step prior to a genome-wide association study (GWAS) or genomic prediction. The imputation accuracy will directly influence the results from subsequent analyses. In this simulation-based study, we investigate the accuracy of genotype imputation in relation to some factors characterizing SNP chip or low-coverage whole-genome sequencing (LCWGS) data. The factors included the imputation reference population size, the proportion of target markers /SNP density, the genetic relationship (distance) between the target population and the reference population, and the imputation method. Simulations of genotypes were based on coalescence theory accounting for the demographic history of pigs. A population of simulated founders diverged to produce four separate but related populations of descendants. The genomic data of 20,000 individuals were simulated for a 10-Mb chromosome fragment. Our results showed that the proportion of target markers or SNP density was the most critical factor affecting imputation accuracy under all imputation situations. Compared with Minimac4, Beagle5.1 reproduced higher-accuracy imputed data in most cases, more notably when imputing from the LCWGS data. Compared with SNP chip data, LCWGS provided more accurate genotype imputation. Our findings provided a relatively comprehensive insight into the accuracy of genotype imputation in a realistic population of domestic animals.
Collapse
Affiliation(s)
- Tianyu Deng
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Pengfei Zhang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Dorian Garrick
- A. L. Rae Centre of Genetics and Breeding, Massey University, Hamilton, New Zealand
| | - Huijiang Gao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lixian Wang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Fuping Zhao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
10
|
Bedhane M, van der Werf J, de las Heras-Saldana S, Lim D, Park B, Na Park M, Seung Hee R, Clark S. The accuracy of genomic prediction for meat quality traits in Hanwoo cattle when using genotypes from different SNP densities and preselected variants from imputed whole genome sequence. ANIMAL PRODUCTION SCIENCE 2022. [DOI: 10.1071/an20659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Context
Genomic prediction is the use of genomic data in the estimation of genomic breeding values (GEBV) in animal breeding. In beef cattle breeding programs, genomic prediction increases the rates of genetic gain by increasing the accuracy of selection at earlier ages.
Aims
The objectives of the study were to examine the effect of single-nucleotide polymorphism (SNP) density and to evaluate the effect of using SNPs preselected from imputed whole-genome sequence for genomic prediction.
Methods
Genomic and phenotypic data from 2110 Hanwoo steers were used to predict GEBV for marbling score (MS), meat texture (MT), and meat colour (MC) traits. Three types of SNP densities including 50k, high-density (HD), and whole-genome sequence data and preselected SNPs from genome-wide association study (GWAS) were used for genomic prediction analyses. Two scenarios (independent and dependent discovery populations) were used to select top significant SNPs. The accuracy of GEBV was assessed using random cross-validation. Genomic best linear unbiased prediction (GBLUP) was used to predict the breeding values for each trait.
Key results
Our result showed that very similar prediction accuracies were observed across all SNP densities used in the study. The prediction accuracy among traits ranged from 0.29±0.05 for MC to 0.46±0.04 for MS. Depending on the studied traits, up to 5% of prediction accuracy improvement was obtained when the preselected SNPs from GWAS analysis were included in the prediction analysis.
Conclusions
High SNP density such as HD and the whole-genome sequence data yielded a similar prediction accuracy in Hanwoo beef cattle. Therefore, the 50K SNP chip panel is sufficient to capture the relationships in a breed with a small effective population size such as the Hanwoo cattle population. Preselected variants improved prediction accuracy when they were included in the genomic prediction model.
Implications
The estimated genomic prediction accuracies are moderately accurate in Hanwoo cattle and for searching for SNPs that are more productive could increase the accuracy of estimated breeding values for the studied traits.
Collapse
|
11
|
Zhang Z, Ma P, Zhang Z, Wang Z, Wang Q, Pan Y. The construction of a haplotype reference panel using extremely low coverage whole genome sequences and its application in genome-wide association studies and genomic prediction in Duroc pigs. Genomics 2021; 114:340-350. [PMID: 34929285 DOI: 10.1016/j.ygeno.2021.12.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 10/11/2021] [Accepted: 12/15/2021] [Indexed: 12/30/2022]
Abstract
Extremely low coverage whole genome sequencing (lcWGS) is an economical technique to obtain high-density single nucleotide polymorphisms (SNPs). Here, we explored the feasibility of constructing a haplotype reference panel (lcHRP) using lcWGS and evaluated the effects of lcHRP through a genome-wide association study (GWAS) and genomic prediction in pigs. A total of 297 and 974 Duroc pigs were genotyped using lcWGS and a 50 K SNP array, respectively. We obtained 19,306,498 SNPs using lcWGS with an accuracy of 0.984. With the help of lcHRP, the accuracy of imputation from the SNP array to lcWGS was 0.922. Compared to the SNP array findings, those from the imputation-based GWAS identified more signals across four traits. With the integration of the top 1% imputation-based GWAS findings as genomic features, the accuracies of genomic prediction was improved by 6.0% to 13.2%. This study showed the great potential of lcWGS in pigs' molecular breeding.
Collapse
Affiliation(s)
- Zhe Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, PR China
| | - Peipei Ma
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, PR China
| | - Zhenyang Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, PR China
| | - Zhen Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, PR China
| | - Qishan Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, PR China.
| | - Yuchun Pan
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, PR China; Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya 572000, China.
| |
Collapse
|
12
|
Vu NT, Phuc TH, Oanh KTP, Sang NV, Trang TT, Nguyen NH. Accuracies of genomic predictions for disease resistance of striped catfish to Edwardsiella ictaluri using artificial intelligence algorithms. G3-GENES GENOMES GENETICS 2021; 12:6408442. [PMID: 34788431 PMCID: PMC8727988 DOI: 10.1093/g3journal/jkab361] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 10/10/2021] [Indexed: 02/04/2023]
Abstract
Assessments of genomic prediction accuracies using artificial intelligent (AI) algorithms (i.e., machine and deep learning methods) are currently not available or very limited in aquaculture species. The principal aim of this study was to examine the predictive performance of these new methods for disease resistance to Edwardsiella ictaluri in a population of striped catfish Pangasianodon hypophthalmus and to make comparisons with four common methods, i.e., pedigree-based best linear unbiased prediction (PBLUP), genomic-based best linear unbiased prediction (GBLUP), single-step GBLUP (ssGBLUP) and a nonlinear Bayesian approach (notably BayesR). Our analyses using machine learning (i.e., ML-KAML) and deep learning (i.e., DL-MLP and DL-CNN) together with the four common methods (PBLUP, GBLUP, ssGBLUP, and BayesR) were conducted for two main disease resistance traits (i.e., survival status coded as 0 and 1 and survival time, i.e., days that the animals were still alive after the challenge test) in a pedigree consisting of 560 individual animals (490 offspring and 70 parents) genotyped for 14,154 single nucleotide polymorphism (SNPs). The results using 6,470 SNPs after quality control showed that machine learning methods outperformed PBLUP, GBLUP, and ssGBLUP, with the increases in the prediction accuracies for both traits by 9.1–15.4%. However, the prediction accuracies obtained from machine learning methods were comparable to those estimated using BayesR. Imputation of missing genotypes using AlphaFamImpute increased the prediction accuracies by 5.3–19.2% in all the methods and data used. On the other hand, there were insignificant decreases (0.3–5.6%) in the prediction accuracies for both survival status and survival time when multivariate models were used in comparison to univariate analyses. Interestingly, the genomic prediction accuracies based on only highly significant SNPs (P < 0.00001, 318–400 SNPs for survival status and 1,362–1,589 SNPs for survival time) were somewhat lower (0.3–15.6%) than those obtained from the whole set of 6,470 SNPs. In most of our analyses, the accuracies of genomic prediction were somewhat higher for survival time than survival status (0/1 data). It is concluded that although there are prospects for the application of genomic selection to increase disease resistance to E. ictaluri in striped catfish breeding programs, further evaluation of these methods should be made in independent families/populations when more data are accumulated in future generations to avoid possible biases in the genetic parameters estimates and prediction accuracies for the disease-resistant traits studied in this population of striped catfish P. hypophthalmus.
Collapse
Affiliation(s)
- Nguyen Thanh Vu
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Genecology Research Center, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Research Institute for Aquaculture No.2, Ho Chi Minh 710000, Vietnam
| | - Tran Huu Phuc
- Research Institute for Aquaculture No.2, Ho Chi Minh 710000, Vietnam
| | - Kim Thi Phuong Oanh
- Institute of Genome Research, Vietnam Academy of Science and Technology, Hanoi, Vietnam
| | - Nguyen Van Sang
- Research Institute for Aquaculture No.2, Ho Chi Minh 710000, Vietnam
| | - Trinh Thi Trang
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Genecology Research Center, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Vietnam National University of Agriculture, Gia Lam 131000, Vietnam
| | - Nguyen Hong Nguyen
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Genecology Research Center, University of the Sunshine Coast, Sippy Downs, QLD, Australia
| |
Collapse
|
13
|
Marina H, Chitneedi P, Pelayo R, Suárez-Vega A, Esteban-Blanco C, Gutiérrez-Gil B, Arranz JJ. Study on the concordance between different SNP-genotyping platforms in sheep. Anim Genet 2021; 52:868-880. [PMID: 34515357 DOI: 10.1111/age.13139] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/28/2021] [Indexed: 12/12/2022]
Abstract
Different SNP genotyping technologies are commonly used in multiple studies to perform QTL detection, genotype imputation, and genomic predictions. Therefore, genotyping errors cannot be ignored, as they can reduce the accuracy of different procedures applied in genomic selection, such as genomic imputation, genomic predictions, and false-positive results in genome-wide association studies. Currently, whole-genome resequencing (WGR) also offers the potential for variant calling analysis and high-throughput genotyping. WGR might overshadow array-based genotyping technologies due to the larger amount and precision of the genomic information provided; however, its comparatively higher price per individual still limits its use in larger populations. Thus, the objective of this work was to evaluate the accuracy of the two most popular SNP-chip technologies, namely, Affymetrix and Illumina, for high-throughput genotyping in sheep considering high-coverage WGR datasets as references. Analyses were performed using two reference sheep genome assemblies, the popular Oar_v3.1 reference genome and the latest available version Oar_rambouillet_v1.0. Our results demonstrate that the genotypes from both platforms are suggested to have high concordance rates with the genotypes determined from reference WGR datasets (96.59% and 99.51% for Affymetrix and Illumina technologies, respectively). The concordance results provided in the current study can pinpoint low reproducible markers across multiple platforms used for sheep genotyping data. Comparing results using two reference genome assemblies also informs how genome assembly quality can influence genotype concordance rates among different genotyping platforms. Moreover, we describe an efficient pipeline to test the reliability of markers included in sheep SNP-chip panels against WGR datasets available on public databases. This pipeline may be helpful for discarding low-reliability markers before exploiting genomic information for gene mapping analyses or genomic prediction.
Collapse
Affiliation(s)
- H Marina
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León, 24071, Spain
| | - P Chitneedi
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León, 24071, Spain
| | - R Pelayo
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León, 24071, Spain
| | - A Suárez-Vega
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León, 24071, Spain
| | - C Esteban-Blanco
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León, 24071, Spain
| | - B Gutiérrez-Gil
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León, 24071, Spain
| | - J J Arranz
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León, 24071, Spain
| |
Collapse
|
14
|
Lopez BIM, An N, Srikanth K, Lee S, Oh JD, Shin DH, Park W, Chai HH, Park JE, Lim D. Genomic Prediction Based on SNP Functional Annotation Using Imputed Whole-Genome Sequence Data in Korean Hanwoo Cattle. Front Genet 2021; 11:603822. [PMID: 33552124 PMCID: PMC7859490 DOI: 10.3389/fgene.2020.603822] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 11/09/2020] [Indexed: 12/12/2022] Open
Abstract
Whole-genome sequence (WGS) data are increasingly being applied into genomic predictions, offering a higher predictive ability by including causal mutations or single-nucleotide polymorphisms (SNPs) putatively in strong linkage disequilibrium with causal mutations affecting the trait. This study aimed to improve the predictive performance of the customized Hanwoo 50 k SNP panel for four carcass traits in commercial Hanwoo population by adding highly predictive variants from sequence data. A total of 16,892 Hanwoo cattle with phenotypes (i.e., backfat thickness, carcass weight, longissimus muscle area, and marbling score), 50 k genotypes, and WGS imputed genotypes were used. We partitioned imputed WGS data according to functional annotation [intergenic (IGR), intron (ITR), regulatory (REG), synonymous (SYN), and non-synonymous (NSY)] to characterize the genomic regions that will deliver higher predictive power for the traits investigated. Animals were assigned into two groups, the discovery set (7324 animals) used for predictive variant detection and the cross-validation set for genomic prediction. Genome-wide association studies were performed by trait to every genomic region and entire WGS data for the pre-selection of variants. Each set of pre-selected SNPs with different density (1000, 3000, 5000, or 10,000) were added to the 50 k genotypes separately and the predictive performance of each set of genotypes was assessed using the genomic best linear unbiased prediction (GBLUP). Results showed that the predictive performance of the customized Hanwoo 50 k SNP panel can be improved by the addition of pre-selected variants from the WGS data, particularly 3000 variants from each trait, which is then sufficient to improve the prediction accuracy for all traits. When 12,000 pre-selected variants (3000 variants from each trait) were added to the 50 k genotypes, the prediction accuracies increased by 9.9, 9.2, 6.4, and 4.7% for backfat thickness, carcass weight, longissimus muscle area, and marbling score compared to the regular 50 k SNP panel, respectively. In terms of prediction bias, regression coefficients for all sets of genotypes in all traits were close to 1, indicating an unbiased prediction. The strategy used to select variants based on functional annotation did not show a clear advantage compared to using whole-genome. Nonetheless, such pre-selected SNPs from the IGR region gave the highest improvement in prediction accuracy among genomic regions and the values were close to those obtained using the WGS data for all traits. We concluded that additional gain in prediction accuracy when using pre-selected variants appears to be trait-dependent, and using WGS data remained more accurate compared to using a specific genomic region.
Collapse
Affiliation(s)
- Bryan Irvine M Lopez
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| | - Narae An
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| | - Krishnamoorthy Srikanth
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| | - Seunghwan Lee
- Department of Animal Science and Biotechnology, Chungnam National University, Daejeon, South Korea
| | - Jae-Don Oh
- Department of Animal Biotechnology, Chonbuk National University, Jeonju, South Korea
| | - Dong-Hyun Shin
- Department of Agricultural Convergence Technology, Chonbuk National University, Jeonju, South Korea
| | - Woncheoul Park
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| | - Han-Ha Chai
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| | - Jong-Eun Park
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| | - Dajeong Lim
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea
| |
Collapse
|
15
|
Yoshida GM, Yáñez JM. Multi-trait GWAS using imputed high-density genotypes from whole-genome sequencing identifies genes associated with body traits in Nile tilapia. BMC Genomics 2021; 22:57. [PMID: 33451291 PMCID: PMC7811220 DOI: 10.1186/s12864-020-07341-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 12/22/2020] [Indexed: 12/16/2022] Open
Abstract
Background Body traits are generally controlled by several genes in vertebrates (i.e. polygenes), which in turn make them difficult to identify through association mapping. Increasing the power of association studies by combining approaches such as genotype imputation and multi-trait analysis improves the ability to detect quantitative trait loci associated with polygenic traits, such as body traits. Results A multi-trait genome-wide association study (mtGWAS) was performed to identify quantitative trait loci (QTL) and genes associated with body traits in Nile tilapia (Oreochromis niloticus) using genotypes imputed to whole-genome sequences (WGS). To increase the statistical power of mtGWAS for the detection of genetic associations, summary statistics from single-trait genome-wide association studies (stGWAS) for eight different body traits recorded in 1309 animals were used. The mtGWAS increased the statistical power from the original sample size from 13 to 44%, depending on the trait analyzed. The better resolution of the WGS data, combined with the increased power of the mtGWAS approach, allowed the detection of significant markers which were not previously found in the stGWAS. Some of the lead single nucleotide polymorphisms (SNPs) were found within important functional candidate genes previously associated with growth-related traits in other terrestrial species. For instance, we identified SNP within the α1,6-fucosyltransferase (FUT8), solute carrier family 4 member 2 (SLC4A2), A disintegrin and metalloproteinase with thrombospondin motifs 9 (ADAMTS9) and heart development protein with EGF like domains 1 (HEG1) genes, which have been associated with average daily gain in sheep, osteopetrosis in cattle, chest size in goats, and growth and meat quality in sheep, respectively. Conclusions The high-resolution mtGWAS presented here allowed the identification of significant SNPs, linked to strong functional candidate genes, associated with body traits in Nile tilapia. These results provide further insights about the genetic variants and genes underlying body trait variation in cichlid fish with high accuracy and strong statistical support. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-020-07341-z.
Collapse
Affiliation(s)
- Grazyella M Yoshida
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile
| | - José M Yáñez
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile. .,Núcleo Milenio INVASAL, Concepción, Chile.
| |
Collapse
|
16
|
Jeong S, Kim JY, Kim N. GMStool: GWAS-based marker selection tool for genomic prediction from genomic data. Sci Rep 2020; 10:19653. [PMID: 33184432 PMCID: PMC7665227 DOI: 10.1038/s41598-020-76759-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 11/02/2020] [Indexed: 12/20/2022] Open
Abstract
The increased accessibility to genomic data in recent years has laid the foundation for studies to predict various phenotypes of organisms based on the genome. Genomic prediction collectively refers to these studies, and it estimates an individual's phenotypes mainly using single nucleotide polymorphism markers. Typically, the accuracy of these genomic prediction studies is highly dependent on the markers used; however, in practice, choosing optimal markers with high accuracy for the phenotype to be used is a challenging task. Therefore, we present a new tool called GMStool for selecting optimal marker sets and predicting quantitative phenotypes. The GMStool is based on a genome-wide association study (GWAS) and heuristically searches for optimal markers using statistical and machine-learning methods. The GMStool performs the genomic prediction using statistical and machine/deep-learning models and presents the best prediction model with the optimal marker-set. For the evaluation, the GMStool was tested on real datasets with four phenotypes. The prediction results showed higher performance than using the entire markers or the GWAS-top markers, which have been used frequently in prediction studies. Although the GMStool has several limitations, it is expected to contribute to various studies for predicting quantitative phenotypes. The GMStool written in R is available at www.github.com/JaeYoonKim72/GMStool .
Collapse
Affiliation(s)
- Seongmun Jeong
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Jae-Yoon Kim
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Department of Bioinformatics, KRIBB School of Bioscience, University of Science and Technology (UST), Daejeon, 34141, Republic of Korea
| | - Namshin Kim
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
- Department of Bioinformatics, KRIBB School of Bioscience, University of Science and Technology (UST), Daejeon, 34141, Republic of Korea.
| |
Collapse
|
17
|
Teng J, Huang S, Chen Z, Gao N, Ye S, Diao S, Ding X, Yuan X, Zhang H, Li J, Zhang Z. Optimizing genomic prediction model given causal genes in a dairy cattle population. J Dairy Sci 2020; 103:10299-10310. [PMID: 32952023 DOI: 10.3168/jds.2020-18233] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 07/07/2020] [Indexed: 01/15/2023]
Abstract
As genotypic data are moving from SNP chip toward whole-genome sequence, the accuracy of genomic prediction (GP) exhibits a marginal gain, although all genetic variation, including causal genes, are contained in whole-genome sequence data. Meanwhile, genetic analyses on complex traits, such as genome-wide association studies, have identified an increasing number of genomic regions, including potential causal genes, which would be reliable prior knowledge for GP. Many studies have tried to improve the performance of GP by modifying the prediction model to incorporate prior knowledge. Although several plausible results have been obtained from model modification or strategy optimization, most of them were validated in a specific empirical population with a limited variety of genetic architecture for complex traits. An alternative approach is to use simulated genetic architecture with known causal genes (e.g., simulated causative SNP) to evaluate different GP models with given causal genes. Our objectives were to (1) evaluate the performance of GP under a variety of genetic architectures with a subset of known causal genes and (2) compare different GP models modified by highlighting causal genes and different strategies to weight causal genes. In this study, we simulated pseudo-phenotypes under a variety of genetic architectures based on the real genotypes and phenotypes of a dairy cattle population. Besides classical genomic best linear unbiased prediction, we evaluated 3 modified GP models that highlight causal genes as follows: (1) by treating them as fixed effects, (2) by treating them as a separate random component, and (3) by combining them into the genomic relationship matrix as random effects. Our results showed that highlighting the known causal genes, which explained a considerable proportion of genetic variance in the GP models, increased the predictive accuracy. Combining all given causal genes into the genomic relationship matrix was the optimal strategy under all the scenarios validated, and treating causal genes as a separate random component is also recommended, when more than 20% of genetic variance was explained by known causal genes. Moreover, assigning differential weights to each causal gene further improved the predictive accuracy.
Collapse
Affiliation(s)
- Jinyan Teng
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Shuwen Huang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Zitao Chen
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Ning Gao
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, North Third Road, Guangzhou Higher Education Mega Center, Guangzhou 510006, China
| | - Shaopan Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Shuqi Diao
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Xiangdong Ding
- National Engineering Laboratory for Animal Breeding, Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Xiaolong Yuan
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Hao Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Jiaqi Li
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| |
Collapse
|
18
|
Srikanth K, Lee SH, Chung KY, Park JE, Jang GW, Park MR, Kim NY, Kim TH, Chai HH, Park WC, Lim D. A Gene-Set Enrichment and Protein-Protein Interaction Network-Based GWAS with Regulatory SNPs Identifies Candidate Genes and Pathways Associated with Carcass Traits in Hanwoo Cattle. Genes (Basel) 2020; 11:E316. [PMID: 32188084 PMCID: PMC7140899 DOI: 10.3390/genes11030316] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 03/06/2020] [Accepted: 03/12/2020] [Indexed: 02/06/2023] Open
Abstract
Non-synonymous SNPs and protein coding SNPs within the promoter region of genes (regulatory SNPs) might have a significant effect on carcass traits. Imputed sequence level data of 10,215 Hanwoo bulls, annotated and filtered to include only regulatory SNPs (450,062 SNPs), were used in a genome-wide association study (GWAS) to identify loci associated with backfat thickness (BFT), carcass weight (CWT), eye muscle area (EMA), and marbling score (MS). A total of 15, 176, and 1 SNPs were found to be significantly associated (p < 1.11 × 10-7) with BFT, CWT, and EMA, respectively. The significant loci were BTA4 (CWT), BTA6 (CWT), BTA14 (CWT and EMA), and BTA19 (BFT). BayesR estimated that 1.1%~1.9% of the SNPs contributed to more than 0.01% of the phenotypic variance. So, the GWAS was complemented by a gene-set enrichment (GSEA) and protein-protein interaction network (PPIN) analysis in identifying the pathways affecting carcass traits. At p < 0.005 (~2,261 SNPs), 25 GO and 18 KEGG categories, including calcium signaling, cell proliferation, and folate biosynthesis, were found to be enriched through GSEA. The PPIN analysis showed enrichment for 81 candidate genes involved in various pathways, including the PI3K-AKT, calcium, and FoxO signaling pathways. Our finding provides insight into the effects of regulatory SNPs on carcass traits.
Collapse
Affiliation(s)
- Krishnamoorthy Srikanth
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Seung-Hwan Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea;
| | - Ki-Yong Chung
- Department of Beef Science, Korea National College of Agriculture and Fisheries, Jeonju 54874, Korea;
| | - Jong-Eun Park
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Gul-Won Jang
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Mi-Rim Park
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Na Yeon Kim
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Tae-Hun Kim
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Han-Ha Chai
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Won Cheoul Park
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| | - Dajeong Lim
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, Korea (J.-E.P.); (G.-W.J.); (M.-R.P.); (N.Y.K.); (T.-H.K.); (H.-H.C.); (W.C.P.)
| |
Collapse
|
19
|
Chitneedi PK, Arranz JJ, Suárez-Vega A, Martínez-Valladares M, Gutiérrez-Gil B. Identification of potential functional variants underlying ovine resistance to gastrointestinal nematode infection by using RNA-Seq. Anim Genet 2020; 51:266-277. [PMID: 31900978 DOI: 10.1111/age.12894] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/28/2019] [Indexed: 02/06/2023]
Abstract
In dairy sheep flocks from Mediterranean countries, replacement and adult ewes are the animals most affected by gastrointestinal nematode (GIN) infections. In this study, we have exploited the information derived from an RNA-Seq experiment with the aim of identifying potential causal mutations related to GIN resistance in sheep. Considering the RNA-Seq samples from 12 ewes previously classified as six resistant and six susceptible animals to experimental infection by Teladorsagia circumcincta, we performed a variant calling analysis pipeline using two different types of software, gatk version 3.7 and Samtools version 1.4. The variants commonly identified by the two packages (high-quality variants) within two types of target regions - (i) QTL regions previously reported in sheep for parasite resistance based on SNP-chip or sequencing technology studies and (ii) functional candidate genes selected from gene expression studies related to GIN resistance in sheep - were further characterised to identify mutations with a potential functional impact. Among the genes harbouring these potential functional variants (930 and 553 respectively for the two types of regions), we identified 111 immune-related genes in the QTL regions and 132 immune-related genes from the initially selected candidate genes. For these immune-related genes harbouring potential functional variants, the enrichment analyses performed highlighted significant GO terms related to apoptosis, adhesion and inflammatory response, in relation to the QTL related variants, and significant disease-related terms such as inflammation, adhesion and necrosis, in relation to the initial candidate gene list. Overall, the study provides a valuable list of potential causal mutations that could be considered as candidate causal mutations in relation to GIN resistance in sheep. Future studies should assess the role of these suggested mutations with the aim of identifying genetic markers that could be directly implemented in sheep breeding programmes considering not only production traits, but also functional traits such as resistance to GIN infections.
Collapse
Affiliation(s)
- P K Chitneedi
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24071, León, Spain
| | - J J Arranz
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24071, León, Spain
| | - A Suárez-Vega
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24071, León, Spain
| | - M Martínez-Valladares
- Departamento de Sanidad Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24071, León, Spain.,Instituto de Ganadería de Montaña, CSIC-ULE, 24346, Grulleros, León, Spain
| | - B Gutiérrez-Gil
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24071, León, Spain
| |
Collapse
|
20
|
Genomic prediction based on selected variants from imputed whole-genome sequence data in Australian sheep populations. Genet Sel Evol 2019; 51:72. [PMID: 31805849 PMCID: PMC6896509 DOI: 10.1186/s12711-019-0514-2] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Accepted: 11/25/2019] [Indexed: 12/13/2022] Open
Abstract
Background Whole-genome sequence (WGS) data could contain information on genetic variants at or in high linkage disequilibrium with causative mutations that underlie the genetic variation of polygenic traits. Thus far, genomic prediction accuracy has shown limited increase when using such information in dairy cattle studies, in which one or few breeds with limited diversity predominate. The objective of our study was to evaluate the accuracy of genomic prediction in a multi-breed Australian sheep population of relatively less related target individuals, when using information on imputed WGS genotypes. Methods Between 9626 and 26,657 animals with phenotypes were available for nine economically important sheep production traits and all had WGS imputed genotypes. About 30% of the data were used to discover predictive single nucleotide polymorphism (SNPs) based on a genome-wide association study (GWAS) and the remaining data were used for training and validation of genomic prediction. Prediction accuracy using selected variants from imputed sequence data was compared to that using a standard array of 50k SNP genotypes, thereby comparing genomic best linear prediction (GBLUP) and Bayesian methods (BayesR/BayesRC). Accuracy of genomic prediction was evaluated in two independent populations that were each lowly related to the training set, one being purebred Merino and the other crossbred Border Leicester x Merino sheep. Results A substantial improvement in prediction accuracy was observed when selected sequence variants were fitted alongside 50k genotypes as a separate variance component in GBLUP (2GBLUP) or in Bayesian analysis as a separate category of SNPs (BayesRC). From an average accuracy of 0.27 in both validation sets for the 50k array, the average absolute increase in accuracy across traits with 2GBLUP was 0.083 and 0.073 for purebred and crossbred animals, respectively, whereas with BayesRC it was 0.102 and 0.087. The average gain in accuracy was smaller when selected sequence variants were treated in the same category as 50k SNPs. Very little improvement over 50k prediction was observed when using all WGS variants. Conclusions Accuracy of genomic prediction in diverse sheep populations increased substantially by using variants selected from whole-genome sequence data based on an independent multi-breed GWAS, when compared to genomic prediction using standard 50K genotypes.
Collapse
|