1
|
Cai X, Zhang W, Gao N, Wei C, Wu X, Si J, Gao Y, Li J, Yin T, Zhang Z. Integrating large-scale meta-analysis of genome-wide association studies improve the genomic prediction accuracy for combined pig populations. J Anim Breed Genet 2024. [PMID: 39215551 DOI: 10.1111/jbg.12896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 07/18/2024] [Accepted: 08/18/2024] [Indexed: 09/04/2024]
Abstract
The strategy of combining reference populations has been widely recognized as an effective way to enhance the accuracy of genomic prediction (GP). This study investigated the efficiency of genomic prediction using prior information and combined reference population. In total, prior information considering trait-associated single nucleotide polymorphisms (SNPs) obtained from meta-analysis of genome-wide association studies (GWAS meta-analysis) was incorporated into three models to assess the performance of GP using combined reference populations. Two different Yorkshire populations with imputed whole genome sequence (WGS) data (9,741,620 SNPs), named as P1 (1259 individuals) and P2 (1018 individuals), were used to predict genomic estimated breeding values for three live carcass traits, including backfat thickness, loin muscle area, and loin muscle depth. A 10 × 5 fold cross-validation was used to evaluate the prediction accuracy of 203 randomly selected candidate pigs from the P2 population and the reference population consisted of the remaining pigs from P2 and the stepwise added pigs from P1. By integrating SNPs with different p-value thresholds from GWAS meta-analysis downloaded from PigGTEx Project, the prediction accuracy of GBLUP, genomic feature BLUP (GFBLUP) and GBLUP given genetic architecture (BLUP|GA) were compared. Moreover, we explored effects of reference population size and heritability enrichment of genomic features on the prediction accuracy improvement of GFBLUP and BLUP|GA relative to GBLUP. The prediction accuracy of GBLUP using all WGS markers showed average improvement of 4.380% using the P1 + P2 reference population compared with the P2 reference population. Using the combined reference population, GFBLUP and BLUP|GA yielded 6.179% and 5.525% higher accuracies than GBLUP using all SNPs based on the single reference population, respectively. Positive regression coefficients were estimated in relation to the improvement in prediction accuracy (between GFBLUP/BLUP|GA and GBLUP) and the size of the reference as well as the heritability enrichment of genomic features. Compared to the classic GBLUP model, GFBLUP and BLUP|GA models integrating GWAS meta-analysis information increase the prediction accuracy and using combined populations with enlarged reference population size further enhances prediction accuracy of the two approaches. The heritability enrichment of genomic features can be used as an indicator to reflect weather prior information is accurately presented.
Collapse
Affiliation(s)
- Xiaodian Cai
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Wenjing Zhang
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Ning Gao
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Chen Wei
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Xibo Wu
- Guangxi State Farmd Yongxin Animal Husbandry Group Co., Ltd, Nanning, China
| | - Jinglei Si
- Guangxi State Farmd Yongxin Animal Husbandry Group Co., Ltd, Nanning, China
| | - Yahui Gao
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Jiaqi Li
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Tong Yin
- Institute of Animal Breeding and Genetics, Justus Liebig University, Giessen, Germany
| | - Zhe Zhang
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| |
Collapse
|
2
|
Wang H, Li C, Li J, Zhang R, An X, Yuan C, Guo T, Yue Y. Genomic Selection for Weaning Weight in Alpine Merino Sheep Based on GWAS Prior Marker Information. Animals (Basel) 2024; 14:1904. [PMID: 38998016 PMCID: PMC11240623 DOI: 10.3390/ani14131904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 06/19/2024] [Accepted: 06/24/2024] [Indexed: 07/14/2024] Open
Abstract
This study aims to compare the accuracy of genomic estimated breeding values (GEBV) estimated using a genomic best linear unbiased prediction (GBLUP) method and GEBV estimates incorporating prior marker information from a genome-wide association study (GWAS) for the weaning weight trait in highland Merino sheep. The objective is to provide theoretical and technical support for improving the accuracy of genomic selection. The study used a population of 1007 highland Merino ewes, with the weaning weight at 3 months as the target trait. The population was randomly divided into two groups. The first group was used for GWAS analysis to identify significant markers, and the top 5%, top 10%, top 15%, and top 20% markers were selected as prior marker information. The second group was used to estimate genetic parameters and compare the accuracy of GEBV predictions using different prior marker information. The accuracy was obtained using a five-fold cross-validation. Finally, both groups were subjected to cross-validation. The study's findings revealed that the heritability of the weaning weight trait, as calculated using the GBLUP model, ranged from 0.122 to 0.394, with corresponding prediction accuracies falling between 0.075 and 0.228. By incorporating prior marker information from GWAS, the heritability was enhanced to a range of 0.125 to 0.407. The inclusion of the top 5% to top 20% significant SNPs from GWAS results as prior information into GS showed potential for improving the accuracy of predicting genomic breeding value.
Collapse
Affiliation(s)
- Haifeng Wang
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Chenglan Li
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Jianye Li
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Rui Zhang
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Xuejiao An
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Chao Yuan
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Tingting Guo
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Yaojing Yue
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| |
Collapse
|
3
|
Du A, Guo Z, Chen A, Xu L, Sun D, Han B. PC Gene Affects Milk Production Traits in Dairy Cattle. Genes (Basel) 2024; 15:708. [PMID: 38927644 PMCID: PMC11202589 DOI: 10.3390/genes15060708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/20/2024] [Accepted: 05/27/2024] [Indexed: 06/28/2024] Open
Abstract
In previous work, we found that PC was differentially expressed in cows at different lactation stages. Thus, we deemed that PC may be a candidate gene affecting milk production traits in dairy cattle. In this study, we found the polymorphisms of PC by resequencing and verified their genetic associations with milk production traits by using an animal model in a cattle population. In total, we detected six single-nucleotide polymorphisms (SNPs) in PC. The single marker association analysis showed that all SNPs were significantly associated with the five milk production traits (p < 0.05). Additionally, we predicted that allele G of 29:g.44965658 in the 5' regulatory region created binding sites for TF GATA1 and verified that this allele inhibited the transcriptional activity of PC by the dual-luciferase reporter assay. In conclusion, we proved that PC had a prominent genetic effect on milk production traits, and six SNPs with prominent genetic effects could be used as markers for genomic selection (GS) in dairy cattle, which is beneficial for accelerating the improvement in milk yield and quality in Chinese Holstein cows.
Collapse
Affiliation(s)
| | | | | | | | | | - Bo Han
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, Beijing 100193, China; (A.D.); (Z.G.); (A.C.); (L.X.); (D.S.)
| |
Collapse
|
4
|
Jiang J. MPH: fast REML for large-scale genome partitioning of quantitative genetic variation. Bioinformatics 2024; 40:btae298. [PMID: 38688661 PMCID: PMC11093526 DOI: 10.1093/bioinformatics/btae298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 04/24/2024] [Accepted: 04/29/2024] [Indexed: 05/02/2024] Open
Abstract
MOTIVATION Genome partitioning of quantitative genetic variation is useful for dissecting the genetic architecture of complex traits. However, existing methods, such as Haseman-Elston regression and linkage disequilibrium score regression, often face limitations when handling extensive farm animal datasets, as demonstrated in this study. RESULTS To overcome this challenge, we present MPH, a novel software tool designed for efficient genome partitioning analyses using restricted maximum likelihood. The computational efficiency of MPH primarily stems from two key factors: the utilization of stochastic trace estimators and the comprehensive implementation of parallel computation. Evaluations with simulated and real datasets demonstrate that MPH achieves comparable accuracy and significantly enhances convergence, speed, and memory efficiency compared to widely used tools like GCTA and LDAK. These advancements facilitate large-scale, comprehensive analyses of complex genetic architectures in farm animals. AVAILABILITY AND IMPLEMENTATION The MPH software is available at https://jiang18.github.io/mph/.
Collapse
Affiliation(s)
- Jicai Jiang
- Department of Animal Science, North Carolina State University, Raleigh, NC 27695, United States
| |
Collapse
|
5
|
Gu LL, Yang RQ, Wang ZY, Jiang D, Fang M. Ensemble learning for integrative prediction of genetic values with genomic variants. BMC Bioinformatics 2024; 25:120. [PMID: 38515026 PMCID: PMC10956256 DOI: 10.1186/s12859-024-05720-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 02/26/2024] [Indexed: 03/23/2024] Open
Abstract
BACKGROUND Whole genome variants offer sufficient information for genetic prediction of human disease risk, and prediction of animal and plant breeding values. Many sophisticated statistical methods have been developed for enhancing the predictive ability. However, each method has its own advantages and disadvantages, so far, no one method can beat others. RESULTS We herein propose an Ensemble Learning method for Prediction of Genetic Values (ELPGV), which assembles predictions from several basic methods such as GBLUP, BayesA, BayesB and BayesCπ, to produce more accurate predictions. We validated ELPGV with a variety of well-known datasets and a serious of simulated datasets. All revealed that ELPGV was able to significantly enhance the predictive ability than any basic methods, for instance, the comparison p-value of ELPGV over basic methods were varied from 4.853E-118 to 9.640E-20 for WTCCC dataset. CONCLUSIONS ELPGV is able to integrate the merit of each method together to produce significantly higher predictive ability than any basic methods and it is simple to implement, fast to run, without using genotype data. is promising for wide application in genetic predictions.
Collapse
Affiliation(s)
- Lin-Lin Gu
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs and Fisheries College, Jimei University, Xiamen, People's Republic of China
| | - Run-Qing Yang
- Research Center for Aquatic Biotechnology, Chinese Academy of Fishery Sciences, Beijing, People's Republic of China
| | - Zhi-Yong Wang
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs and Fisheries College, Jimei University, Xiamen, People's Republic of China.
| | - Dan Jiang
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs and Fisheries College, Jimei University, Xiamen, People's Republic of China.
| | - Ming Fang
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs and Fisheries College, Jimei University, Xiamen, People's Republic of China.
- Life Science College, Heilongjiang Bayi Agricultural University, Daqing, People's Republic of China.
| |
Collapse
|
6
|
Cheng J, Maltecca C, VanRaden PM, O'Connell JR, Ma L, Jiang J. SLEMM: million-scale genomic predictions with window-based SNP weighting. BIOINFORMATICS (OXFORD, ENGLAND) 2023; 39:7075542. [PMID: 36897019 PMCID: PMC10039786 DOI: 10.1093/bioinformatics/btad127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 02/27/2023] [Accepted: 03/07/2023] [Indexed: 03/11/2023]
Abstract
MOTIVATION The amount of genomic data is increasing exponentially. Using many genotyped and phenotyped individuals for genomic prediction is appealing yet challenging. RESULTS We present SLEMM (short for Stochastic-Lanczos-Expedited Mixed Models), a new software tool, to address the computational challenge. SLEMM builds on an efficient implementation of the stochastic Lanczos algorithm for REML in a framework of mixed models. We further implement SNP weighting in SLEMM to improve its predictions. Extensive analyses on seven public datasets, covering 19 polygenic traits in three plant and three livestock species, showed that SLEMM with SNP weighting had overall the best predictive ability among a variety of genomic prediction methods including GCTA's empirical BLUP, BayesR, KAML, and LDAK's BOLT and BayesR models. We also compared the methods using nine dairy traits of ∼300k genotyped cows. All had overall similar prediction accuracies, except that KAML failed to process the data. Additional simulation analyses on up to 3 million individuals and 1 million SNPs showed that SLEMM was advantageous over counterparts as for computational performance. Overall, SLEMM can do million-scale genomic predictions with an accuracy comparable to BayesR. AVAILABILITY AND IMPLEMENTATION The software is available at https://github.com/jiang18/slemm.
Collapse
Affiliation(s)
- Jian Cheng
- Department of Animal Science, North Carolina State University, Raleigh, NC 27695, United States
| | - Christian Maltecca
- Department of Animal Science, North Carolina State University, Raleigh, NC 27695, United States
| | - Paul M VanRaden
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD 20705, United States
| | - Jeffrey R O'Connell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, United States
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, United States
| | - Jicai Jiang
- Department of Animal Science, North Carolina State University, Raleigh, NC 27695, United States
| |
Collapse
|
7
|
Jia R, Xu L, Sun D, Han B. Genetic marker identification of SEC13 gene for milk production traits in Chinese holstein. Front Genet 2023; 13:1065096. [PMID: 36685890 PMCID: PMC9846039 DOI: 10.3389/fgene.2022.1065096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Accepted: 12/15/2022] [Indexed: 01/05/2023] Open
Abstract
SEC13 homolog, nuclear pore and COPII coat complex component (SEC13) is the core component of the cytoplasmic COPII complex, which mediates material transport from the endoplasmic reticulum to the Golgi complex. Our preliminary work found that SEC13 gene was differentially expressed in dairy cows during different stages of lactation, and involved in metabolic pathways of milk synthesis such as citric acid cycle, fatty acid, starch and sucrose metabolisms, so we considered that the SEC13 might be a candidate gene affecting milk production traits. In this study, we detected the polymorphisms of SEC13 gene and verified their genetic effects on milk yield and composition traits in a Chinese Holstein cow population. By sequencing the whole coding and partial flanking regions of SEC13, we found four single nucleotide polymorphisms (SNPs). Subsequent association analysis showed that these four SNPs were significantly associated with milk yield, fat yield, protein yield or protein percentage in the first and second lactations (p ≤.0351). We also found that two SNPs in SEC13 formed one haplotype block by Haploview4.2, and the block was significantly associated with milk yield, fat yield, fat percentage, protein yield or protein percentage (p ≤ .0373). In addition, we predicted the effect of SNP on 5'region on transcription factor binding sites (TFBSs), and found that the allele A of 22:g.54362761A>G could bind transcription factors (TFs) GATA5, GATA3, HOXD9, HOXA10, CDX1 and Hoxd13; and further dual-luciferase reporter assay verified that the allele A of this SNP inhibited the fluorescence activity. We speculate that the A allele of 22:g.54362761A>G might inhibit the transcriptional activity of SEC13 gene by binding the TFs, which may be a cause mutation affecting the formation of milk production traits in dairy cows. In summary, we proved that SEC13 has a significant genetic effect on milk production traits and the identified significant SNPs could be used as candidate genetic markers for GS SNP chips development; on the other hand, we verified the transcriptional regulation of 22:g.54362761A>G on SEC13 gene, providing research direction for further function validation tests.
Collapse
Affiliation(s)
- Ruike Jia
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, China
| | - Lingna Xu
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, China
| | - Dongxiao Sun
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, China
- National Dairy Innovation Center, Hohhot, China
| | - Bo Han
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, China
| |
Collapse
|
8
|
de Andrade LRB, Sousa MBE, Wolfe M, Jannink JL, de Resende MDV, Azevedo CF, de Oliveira EJ. Increasing cassava root yield: Additive-dominant genetic models for selection of parents and clones. FRONTIERS IN PLANT SCIENCE 2022; 13:1071156. [PMID: 36589120 PMCID: PMC9800927 DOI: 10.3389/fpls.2022.1071156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 12/02/2022] [Indexed: 06/17/2023]
Abstract
Genomic selection has been promising in situations where phenotypic assessments are expensive, laborious, and/or inefficient. This work evaluated the efficiency of genomic prediction methods combined with genetic models in clone and parent selection with the goal of increasing fresh root yield, dry root yield, as well as dry matter content in cassava roots. The bias and predictive ability of the combinations of prediction methods Genomic Best Linear Unbiased Prediction (G-BLUP), Bayes B, Bayes Cπ, and Reproducing Kernel Hilbert Spaces with additive and additive-dominant genetic models were estimated. Fresh and dry root yield exhibited predominantly dominant heritability, while dry matter content exhibited predominantly additive heritability. The combination of prediction methods and genetic models did not show significant differences in the predictive ability for dry matter content. On the other hand, the prediction methods with additive-dominant genetic models had significantly higher predictive ability than the additive genetic models for fresh and dry root yield, allowing higher genetic gains in clone selection. However, higher predictive ability for genotypic values did not result in differences in breeding value predictions between additive and additive-dominant genetic models. G-BLUP with the classical additive-dominant genetic model had the best predictive ability and bias estimates for fresh and dry root yield. For dry matter content, the highest predictive ability was obtained by G-BLUP with the additive genetic model. Dry matter content exhibited the highest heritability, predictive ability, and bias estimates compared with other traits. The prediction methods showed similar selection gains with approximately 67% of the phenotypic selection gain. By shortening the breeding cycle time by 40%, genomic selection may overcome phenotypic selection by 10%, 13%, and 18% for fresh root yield, dry root yield, and dry matter content, respectively, with a selection proportion of 15%. The most suitable genetic model for each trait allows for genomic selection optimization in cassava with high selection gains, thereby accelerating the release of new varieties.
Collapse
Affiliation(s)
| | | | - Marnin Wolfe
- Department of Crop, Soil and Environment Sciences, Auburn University, Auburn, AL, United States
| | - Jean-Luc Jannink
- Section on Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, United States
- United States Department of Agriculture – Agriculture Research Service, Plant, Soil and Nutrition Research, Ithaca, NY, United States
| | - Marcos Deon Vilela de Resende
- Department of Forestry Engineering, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
- Embrapa Florestas, Colombo, Paraná, Brazil
- Department of Statistics, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | | | | |
Collapse
|
9
|
DoVale JC, Carvalho HF, Sabadin F, Fritsche-Neto R. Genotyping marker density and prediction models effects in long-term breeding schemes of cross-pollinated crops. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:4523-4539. [PMID: 36261658 DOI: 10.1007/s00122-022-04236-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 10/09/2022] [Indexed: 06/16/2023]
Abstract
In genomic recurrent selection, the more markers, the better because they buffer the linkage disequilibrium losses caused by recombination over cycles, and consequently, provide higher responses to selection. Reductions of genotyping marker density have been extensively evaluated as potential strategies to reduce the genotyping costs of genomic selection (GS). Low-density marker panels are appealing in GS because they entail lower multicollinearity and computing time and allow more individuals to be genotyped for the same cost. However, statistical models used in GS are usually evaluated with empirical data, using "static" training sets and populations. This may be adequate for making predictions during a breeding program's initial cycles but not for the long-term. Moreover, studies that focus on long selective breeding cycles generally do not consider GS models with the effect of dominance, which is particularly important for breeding outcomes in cross-pollinated crops. Hence, dominance effects are important and unexplored in GS for long-term programs involving allogamous species. To address it, we employed two approaches: analysis of empirical maize datasets and simulations of long-term breeding applying phenotypic and genomic recurrent selection (intrapopulation and reciprocal schemes). In both schemes, we simulated twenty breeding cycles and assessed the effect of marker density reduction on the population mean, the best crosses, additive variance, selective accuracy, and response to selection with models [additive, additive-dominant, general (GCA), and this plus specific combining ability (GCA + SCA)]. Our results indicate that marker reduction based on linkage disequilibrium levels provides useful predictions only within a cycle, as accuracy significantly decreases over cycles. In the long-term, without training set updating, high-marker density provides the best responses to selection. The model to be used depends on the breeding scheme: additive for intrapopulation and additive-dominant or GCA + SCA for reciprocal.
Collapse
Affiliation(s)
- Júlio César DoVale
- Department of Crop Science, Federal University of Ceará, Fortaleza, CE, Brazil.
| | | | - Felipe Sabadin
- Virginia Tech: Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | | |
Collapse
|
10
|
Liang M, An B, Li K, Du L, Deng T, Cao S, Du Y, Xu L, Gao X, Zhang L, Li J, Gao H. Improving Genomic Prediction with Machine Learning Incorporating TPE for Hyperparameters Optimization. BIOLOGY 2022; 11:1647. [PMID: 36421361 PMCID: PMC9688023 DOI: 10.3390/biology11111647] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 10/31/2022] [Accepted: 11/07/2022] [Indexed: 08/08/2023]
Abstract
Depending on excellent prediction ability, machine learning has been considered the most powerful implement to analyze high-throughput sequencing genome data. However, the sophisticated process of tuning hyperparameters tremendously impedes the wider application of machine learning in animal and plant breeding programs. Therefore, we integrated an automatic tuning hyperparameters algorithm, tree-structured Parzen estimator (TPE), with machine learning to simplify the process of using machine learning for genomic prediction. In this study, we applied TPE to optimize the hyperparameters of Kernel ridge regression (KRR) and support vector regression (SVR). To evaluate the performance of TPE, we compared the prediction accuracy of KRR-TPE and SVR-TPE with the genomic best linear unbiased prediction (GBLUP) and KRR-RS, KRR-Grid, SVR-RS, and SVR-Grid, which tuned the hyperparameters of KRR and SVR by using random search (RS) and grid search (Gird) in a simulation dataset and the real datasets. The results indicated that KRR-TPE achieved the most powerful prediction ability considering all populations and was the most convenient. Especially for the Chinese Simmental beef cattle and Loblolly pine populations, the prediction accuracy of KRR-TPE had an 8.73% and 6.08% average improvement compared with GBLUP, respectively. Our study will greatly promote the application of machine learning in GP and further accelerate breeding progress.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | - Huijiang Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| |
Collapse
|
11
|
Sun H, Wei M, Xu Z, Bai C, Sun B. PC-DOT: Improving genomic prediction ability of principal component regression by DOT product. Anim Genet 2022; 53:888-891. [PMID: 36168679 DOI: 10.1111/age.13255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/11/2022] [Accepted: 08/12/2022] [Indexed: 11/29/2022]
Abstract
Principal component regression (PC regression) is a useful method developed for prediction based on a dimension-reducing strategy. Generally, the principal components (PCs) are added to the regression model one by one based on the eigenvalue (PC-Eigen). Considering that some PCs with large eigenvalues may be poorly associated with the response variable, the PC-Eigen may not be the best framework. Researchers previously tried to add PCs to the model based on their contribution to the sum of squares of the regression (PC-SS) and they found that the performance of PC-SS is generally lower than that of the PC-Eigen. A standard approach for selecting the optimal set of PCs remains a challenge. Here, from the cosine similarity theory, we postulated that we could rank the PCs by dot product, and this framework (we called PC-DOT) could help to preferentially extract PCs that are highly correlated with the response variable and meanwhile have a large eigenvalue. Based on one simulated and three real genomic datasets (a total of 15 traits), we tested the prediction ability of different frameworks. In general, the PC-DOT method showed a better performance than both PC-Eigen and PC-SS. To facilitate the application of PC, we attached a series of R codes for different frameworks (https://github.com/SUNHAO-JLU/Genome_Prediction-PC_DOT). In addition, the HAT matrix was used to reduce the compute complex in reference data during the cross-validation process. Our work may help researchers to better understand and carry out the PC regression model.
Collapse
Affiliation(s)
- Hao Sun
- College of Animal Science, Jilin University, Changchun, China
| | - Meng Wei
- College of Animal Science, Jilin University, Changchun, China
| | - Zhong Xu
- Hubei Key Laboratory of Animal Embryo and Molecular Breeding, Institute of Animal Husbandry and Veterinary, Hubei Provincial Academy of Agricultural Sciences, Wuhan, China
| | - Chunyan Bai
- College of Animal Science, Jilin University, Changchun, China
| | - Boxing Sun
- College of Animal Science, Jilin University, Changchun, China
| |
Collapse
|
12
|
Du A, Zhao F, Liu Y, Xu L, Chen K, Sun D, Han B. Genetic polymorphisms of PKLR gene and their associations with milk production traits in Chinese Holstein cows. Front Genet 2022; 13:1002706. [PMID: 36118870 PMCID: PMC9479125 DOI: 10.3389/fgene.2022.1002706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 08/12/2022] [Indexed: 11/13/2022] Open
Abstract
Our previous work had confirmed that pyruvate kinase L/R (PKLR) gene was expressed differently in different lactation periods of dairy cattle, and participated in lipid metabolism through insulin, PI3K-Akt, MAPK, AMPK, mTOR, and PPAR signaling pathways, suggesting that PKLR is a candidate gene to affect milk production traits in dairy cattle. Here, we verified whether this gene has significant genetic association with milk yield and composition traits in a Chinese Holstein cow population. In total, we identified 21 single nucleotide polymorphisms (SNPs) by resequencing the entire coding region and partial flanking region of PKLR gene, in which, two SNPs were located in 5′ promoter region, two in 5′ untranslated region (UTR), three in introns, five in exons, six in 3′ UTR and three in 3′ flanking region. The single marker association analysis displayed that all SNPs were significantly associated with milk yield, fat and protein yields or protein percentage (p ≤ 0.0497). The haplotype block containing all the SNPs, predicted by Haploview, had a significant association with fat yield and protein percentage (p ≤ 0.0145). Further, four SNPs in 5′ regulatory region and eight SNPs in UTR and exon regions were predicted to change the transcription factor binding sites (TFBSs) and mRNA secondary structure, respectively, thus affecting the expression of PKLR, leading to changes in milk production phenotypes, suggesting that these SNPs might be the potential functional mutations for milk production traits in dairy cattle. In conclusion, we demonstrated that PKLR had significant genetic effects on milk production traits, and the SNPs with significant genetic effects could be used as candidate genetic markers for genomic selection (GS) in dairy cattle.
Collapse
Affiliation(s)
- Aixia Du
- National Engineering Laboratory of Animal Breeding, Key Laboratory of Animal Genetics, Department of Animal Genetics and Breeding, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | | | - Yanan Liu
- National Engineering Laboratory of Animal Breeding, Key Laboratory of Animal Genetics, Department of Animal Genetics and Breeding, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Lingna Xu
- National Engineering Laboratory of Animal Breeding, Key Laboratory of Animal Genetics, Department of Animal Genetics and Breeding, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Kewei Chen
- Yantai Institute, China Agricultural University, Yantai, China
| | - Dongxiao Sun
- National Engineering Laboratory of Animal Breeding, Key Laboratory of Animal Genetics, Department of Animal Genetics and Breeding, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Bo Han
- National Engineering Laboratory of Animal Breeding, Key Laboratory of Animal Genetics, Department of Animal Genetics and Breeding, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
- *Correspondence: Bo Han, /
| |
Collapse
|
13
|
Ye W, Xu L, Li Y, Liu L, Ma Z, Sun D, Han B. Single Nucleotide Polymorphisms of ALDH18A1 and MAT2A Genes and Their Genetic Associations with Milk Production Traits of Chinese Holstein Cows. Genes (Basel) 2022; 13:genes13081437. [PMID: 36011348 PMCID: PMC9407996 DOI: 10.3390/genes13081437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 07/16/2022] [Accepted: 08/10/2022] [Indexed: 11/16/2022] Open
Abstract
Our preliminary work had suggested two genes, aldehyde dehydrogenase 18 family member A1 (ALDH18A1) and methionine adenosyltransferase 2A (MAT2A), related to amino acid synthesis and metabolism as candidates affecting milk traits by analyzing the liver transcriptome and proteome of dairy cows at different lactation stages. In this study, the single nucleotide polymorphisms (SNPs) of ALDH18A1 and MAT2A genes were identified and their genetic effects and underlying causative mechanisms on milk production traits in dairy cattle were analyzed, with the aim of providing effective genetic information for the molecular breeding of dairy cows. By resequencing the entire coding and partial flanking regions of ALDH18A1 and MAT2A, we found eight SNPs located in ALDH18A1 and two in MAT2A. Single-SNP association analysis showed that most of the 10 SNPs of these two genes were significantly associated with the milk yield traits, 305-day milk yield, fat yield, and protein yield in the first and second lactations (corrected p ≤ 0.0488). Using Haploview 4.2, we found that the seven SNPs of ALDH18A1 formed two haplotype blocks; subsequently, the haplotype-based association analysis showed that both haplotypes were significantly associated with 305-day milk yield, fat yield, and protein yield (corrected p ≤ 0.014). Furthermore, by Jaspar and Genomatix software, we found that 26:g.17130318 C>A and 11:g.49472723G>C, respectively, in the 5′ flanking region of ALDH18A1 and MAT2A genes changed the transcription factor binding sites (TFBSs), which might regulate the expression of corresponding genes to affect the phenotypes of milk production traits. Therefore, these two SNPs were considered as potential functional mutations, but they also require further verification. In summary, ALDH18A1 and MAT2A were proved to probably have genetic effects on milk production traits, and their valuable SNPs might be used as candidate genetic markers for dairy cattle’s genomic selection (GS).
Collapse
Affiliation(s)
- Wen Ye
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, National Engineering Laboratory for Animal Breeding, China Agricultural University, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, Beijing 100193, China
| | - Lingna Xu
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, National Engineering Laboratory for Animal Breeding, China Agricultural University, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, Beijing 100193, China
| | - Yanhua Li
- Beijing Dairy Cattle Center, Beijing 100192, China
| | - Lin Liu
- Beijing Dairy Cattle Center, Beijing 100192, China
| | - Zhu Ma
- Beijing Dairy Cattle Center, Beijing 100192, China
| | - Dongxiao Sun
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, National Engineering Laboratory for Animal Breeding, China Agricultural University, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, Beijing 100193, China
| | - Bo Han
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, National Engineering Laboratory for Animal Breeding, China Agricultural University, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, Beijing 100193, China
- Correspondence:
| |
Collapse
|
14
|
Hao X, Liang A, Plastow G, Zhang C, Wang Z, Liu J, Salzano A, Gasparrini B, Campanile G, Zhang S, Yang L. An Integrative Genomic Prediction Approach for Predicting Buffalo Milk Traits by Incorporating Related Cattle QTLs. Genes (Basel) 2022; 13:genes13081430. [PMID: 36011341 PMCID: PMC9408041 DOI: 10.3390/genes13081430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/16/2022] Open
Abstract
Background: The 90K Axiom Buffalo SNP Array is expected to improve and speed up various genomic analyses for the buffalo (Bubalus bubalis). Genomic prediction is an effective approach in animal breeding to improve selection and reduce costs. As buffalo genome research is lagging behind that of the cow and production records are also limited, genomic prediction performance will be relatively poor. To improve the genomic prediction in buffalo, we introduced a new approach (pGBLUP) for genomic prediction of six buffalo milk traits by incorporating QTL information from the cattle milk traits in order to help improve the prediction performance for buffalo. Results: In simulations, the pGBLUP could outperform BayesR and the GBLUP if the prior biological information (i.e., the known causal loci) was appropriate; otherwise, it performed slightly worse than BayesR and equal to or better than the GBLUP. In real data, the heritability of the buffalo genomic region corresponding to the cattle milk trait QTLs was enriched (fold of enrichment > 1) in four buffalo milk traits (FY270, MY270, PY270, and PM) when the EBV was used as the response variable. The DEBV as the response variable yielded more reliable genomic predictions than the traditional EBV, as has been shown by previous research. The performance of the three approaches (GBLUP, BayesR, and pGBLUP) did not vary greatly in this study, probably due to the limited sample size, incomplete prior biological information, and less artificial selection in buffalo. Conclusions: To our knowledge, this study is the first to apply genomic prediction to buffalo by incorporating prior biological information. The genomic prediction of buffalo traits can be further improved with a larger sample size, higher-density SNP chips, and more precise prior biological information.
Collapse
Affiliation(s)
- Xingjie Hao
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
- Correspondence: (X.H.); (L.Y.)
| | - Aixin Liang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Graham Plastow
- Livestock Gentec Center, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2C8, Canada
| | - Chunyan Zhang
- Livestock Gentec Center, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2C8, Canada
| | - Zhiquan Wang
- Livestock Gentec Center, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2C8, Canada
| | - Jiajia Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Angela Salzano
- Department of Veterinary Medicine and Animal Productions, University of Naples “Federico II”, 80137 Naples, Italy
| | - Bianca Gasparrini
- Department of Veterinary Medicine and Animal Productions, University of Naples “Federico II”, 80137 Naples, Italy
| | - Giuseppe Campanile
- Department of Veterinary Medicine and Animal Productions, University of Naples “Federico II”, 80137 Naples, Italy
| | - Shujun Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Liguo Yang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
- Correspondence: (X.H.); (L.Y.)
| |
Collapse
|
15
|
Ren D, Cai X, Lin Q, Ye H, Teng J, Li J, Ding X, Zhang Z. Impact of linkage disequilibrium heterogeneity along the genome on genomic prediction and heritability estimation. Genet Sel Evol 2022; 54:47. [PMID: 35761182 PMCID: PMC9235212 DOI: 10.1186/s12711-022-00737-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 06/15/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Compared to medium-density single nucleotide polymorphism (SNP) data, high-density SNP data contain abundant genetic variants and provide more information for the genetic evaluation of livestock, but it has been shown that they do not confer any advantage for genomic prediction and heritability estimation. One possible reason is the uneven distribution of the linkage disequilibrium (LD) along the genome, i.e., LD heterogeneity among regions. The aim of this study was to effectively use genome-wide SNP data for genomic prediction and heritability estimation by using models that control LD heterogeneity among regions. METHODS The LD-adjusted kinship (LDAK) and LD-stratified multicomponent (LDS) models were used to control LD heterogeneity among regions and were compared with the classical model that has no such control. Simulated and real traits of 2000 dairy cattle individuals with imputed high-density (770K) SNP data were used. Five types of phenotypes were simulated, which were controlled by very strongly, strongly, moderately, weakly and very weakly tagged causal variants, respectively. The performances of the models with high- and medium-density (50K) panels were compared to verify that the models that controlled LD heterogeneity among regions were more effective with high-density data. RESULTS Compared to the medium-density panel, the use of the high-density panel did not improve and even decreased prediction accuracies and heritability estimates from the classical model for both simulated and real traits. Compared to the classical model, LDS effectively improved the accuracy of genomic predictions and unbiasedness of heritability estimates, regardless of the genetic architecture of the trait. LDAK applies only to traits that are mainly controlled by weakly tagged causal variants, but is still less effective than LDS for this type of trait. Compared with the classical model, LDS improved prediction accuracy by about 13% for simulated phenotypes and by 0.3 to ~ 10.7% for real traits with the high-density panel, and by ~ 1% for simulated phenotypes and by - 0.1 to ~ 6.9% for real traits with the medium-density panel. CONCLUSIONS Grouping SNPs based on regional LD to construct the LD-stratified multicomponent model can effectively eliminate the adverse effects of LD heterogeneity among regions, and greatly improve the efficiency of high-density SNP data for genomic prediction and heritability estimation.
Collapse
Affiliation(s)
- Duanyang Ren
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Xiaodian Cai
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Qing Lin
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Haoqiang Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Jinyan Teng
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Jiaqi Li
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Xiangdong Ding
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China.
| |
Collapse
|
16
|
Fu Y, Jia R, Xu L, Su D, Li Y, Liu L, Ma Z, Sun D, Han B. Fatty acid desaturase 2 affects the milk-production traits in Chinese Holsteins. Anim Genet 2022; 53:422-426. [PMID: 35292995 DOI: 10.1111/age.13192] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 12/30/2021] [Accepted: 03/02/2022] [Indexed: 01/14/2023]
Abstract
As a member of the fatty acid desaturase family, fatty acid desaturase 2 (FADS2) gene is a rate-limiting enzyme in the synthesis of unsaturated fatty acids and within/near to the reported QTL regions for milk-production traits. We previously found that FADS2 is differentially expressed during different lactations of Chinese Holstein cows, and participates in lipid metabolic processes by influencing the insulin, PI3K-Akt, MAPK, AMPK, mTOR and PPAR signaling pathways. Therefore, we considered this gene as a candidate gene for milk-production traits. In this study, we identified 12 SNPs in FADS2 by re-sequencing, including two SNPs in the 5' flanking region, one in the seventh exon, five in introns, two in the 3' untranslated region and two in the 3' flanking region. The 29:g.40378819C>T is a missense mutation that causes alanine (GCG) to be replaced with valine (GTG). Through single marker association analysis, we found that all of the 12 SNPs were significantly associated with 305 day milk yield, fat yield, fat percentage, protein yield or protein percentage (p < 0.0493). The results of the subsequent haplotype association analysis also confirmed the associations between the gene and milk-production traits. In summary, this study suggests that there is a significant genetic association between FADS2 and milk-production traits, and that the SNPs with significant genetic effects can provide important molecular information for the development of a genomic selection chip in dairy cattle.
Collapse
Affiliation(s)
- Yihan Fu
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, China
| | - Ruike Jia
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, China
| | - Lingna Xu
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, China
| | - Dingran Su
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, China
| | - Yanhua Li
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, China.,Beijing Dairy Cattle Center, Beijing, China
| | - Lin Liu
- Beijing Dairy Cattle Center, Beijing, China
| | - Zhu Ma
- Beijing Dairy Cattle Center, Beijing, China
| | - Dongxiao Sun
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, China
| | - Bo Han
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, China
| |
Collapse
|
17
|
He Z, Li S, Li W, Ding J, Zheng M, Li Q, Fahey AG, Wen J, Liu R, Zhao G. Comparison of genomic prediction methods for residual feed intake in broilers. Anim Genet 2022; 53:466-469. [PMID: 35292985 DOI: 10.1111/age.13186] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 02/24/2022] [Accepted: 02/28/2022] [Indexed: 11/30/2022]
Abstract
Residual feed intake (RFI) is a measure of the feed efficiency of animals. Previous studies have identified SNPs associated with RFI. The objective of this study was to compare the GBLUP model with the GA-BLUP model including previously identified associated SNPs. The nine associated SNPs were obtained from the genome-wide association study on a discovery population as preselection information. These models were analysed using ASREML software using a 5-fold cross-validation method on a validation population. With the genetic architecture (GA) matrix used, which was conducted with the nine RFI-associated SNPs, the prediction accuracy of RFI was improved compared with the original GBLUP model. The calculated optimal ω was 0.981 for RFI, which is in line with the optimal range from 0.9 to 1.0 in the gradient test. The prediction accuracy increased by 2% in the GA-BLUP model with ω being 0.981 compared with the GBLUP model. In conclusion, the GA-BLUP with the nine RFI-associated SNPs and an optimal ω can improve the prediction accuracy for a specific trait compared with GBLUP.
Collapse
Affiliation(s)
- Zhengxiao He
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.,School of Agriculture and Food Science, University College Dublin, Dublin, Ireland
| | - Sen Li
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Wei Li
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Jiqiang Ding
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Maiqing Zheng
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Qinghe Li
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Alan G Fahey
- School of Agriculture and Food Science, University College Dublin, Dublin, Ireland
| | - Jie Wen
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Ranran Liu
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Guiping Zhao
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| |
Collapse
|
18
|
Martins Oliveira IC, Bernardeli A, Soler Guilhen JH, Pastina MM. Genomic Prediction of Complex Traits in an Allogamous Annual Crop: The Case of Maize Single-Cross Hybrids. Methods Mol Biol 2022; 2467:543-567. [PMID: 35451790 DOI: 10.1007/978-1-0716-2205-6_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
For many plant and animal species, commercial products are hybrids between individuals from different genetic groups. For allogamous plant species such as maize, the breeding objective is to produce single-cross hybrid varieties from two inbred lines each selected in complementary groups. Efficient hybrid breeding requires methods that (1) quickly generate homozygous and homogeneous parental lines with high combining abilities, (2) efficiently choose among the large number of available parental lines the most promising ones, and (3) predict the performances of sets of non-phenotyped single-cross hybrids, or hybrids phenotyped in a limited number of environments, based on their relationship with another set of hybrids with known performances. The maize breeding community has been developing model-based prediction of hybrid performances well before the genomic era. This chapter (1) provides a reminder of the maize breeding scheme before the genomic era; (2) describes how genomic data were incorporated in the prediction models involved in different steps of genomic-based single-cross maize hybrid breeding; and (3) reviews factors affecting the accuracy of genomic prediction, approaches for optimizing GP-based single-cross maize hybrid breeding schemes, and ensuring the long-term sustainability of genomic selection.
Collapse
Affiliation(s)
| | - Arthur Bernardeli
- Department of Agronomy, Universidade Federal de Viçosa, Viçosa-MG, Brazil
| | | | | |
Collapse
|
19
|
Zhang F, Zhu F, Yang FX, Hao JP, Hou ZC. Genomic selection for meat quality traits in Pekin duck. Anim Genet 2021; 53:94-100. [PMID: 34841553 DOI: 10.1111/age.13157] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/15/2021] [Indexed: 01/22/2023]
Abstract
Genomic selection uses genome-wide molecular marker data to predict an animal's genetic value in the breeding program. This study's objective was to present heritability estimates and accuracy of genomic prediction using different methods for meat quality traits in Pekin duck. There were two kinds of ducks in the genomic selection training population: 639 fat-type ducks and 540 lean-type ducks. A single-trait animal model was used to estimate heritability and adjust the phenotype. GBLUP and BayesR methods were performed to estimate the SNP effects. The accuracy of genomic prediction was calculated using 5-fold cross-validation. The accuracy varied from 0.235 to 0.501 with the lowest accuracy estimated for traits associated with abdominal fat weight in the combined population and the most remarkable accuracy observed for abdominal fat percentage traits in the lean-type duck population. Overall, BayesR can achieve the highest prediction accuracy, while the combined population strategy could be used to increase the accuracy of prediction only when the two populations have the same breeding aim for a certain trait.
Collapse
Affiliation(s)
- F Zhang
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, MARA, Beijing, 100193, China.,College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - F Zhu
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, MARA, Beijing, 100193, China.,College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - F-X Yang
- Beijing Golden Star Inc., Beijing, 100076, China
| | - J-P Hao
- Beijing Golden Star Inc., Beijing, 100076, China
| | - Z-C Hou
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, MARA, Beijing, 100193, China.,College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| |
Collapse
|
20
|
Jia R, Fu Y, Xu L, Li H, Li Y, Liu L, Ma Z, Sun D, Han B. Associations between polymorphisms of SLC22A7, NGFR, ARNTL and PPP2R2B genes and Milk production traits in Chinese Holstein. BMC Genom Data 2021; 22:47. [PMID: 34732138 PMCID: PMC8567656 DOI: 10.1186/s12863-021-01002-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 10/22/2021] [Indexed: 12/27/2022] Open
Abstract
Background Our preliminary work confirmed that, SLC22A7 (solute carrier family 22 member 7), NGFR (nerve growth factor receptor), ARNTL (aryl hydrocarbon receptor nuclear translocator like) and PPP2R2B (protein phosphatase 2 regulatory subunit Bβ) genes were differentially expressed in dairy cows during different stages of lactation, and involved in the lipid metabolism through insulin, PI3K-Akt, MAPK, AMPK, mTOR, and PPAR signaling pathways, so we considered these four genes as the candidates affecting milk production traits. In this study, we detected polymorphisms of the four genes and verified their genetic effects on milk yield and composition traits in a Chinese Holstein cow population. Results By resequencing the whole coding region and part of the flanking region of SLC22A7, NGFR, ARNTL and PPP2R2B, we totally found 20 SNPs, of which five were located in SLC22A7, eight in NGFR, three in ARNTL, and four in PPP2R2B. Using Haploview4.2, we found three haplotype blocks including five SNPs in SLC22A7, eight in NGFR and three in ARNTL. Single-SNP association analysis showed that 19 out of 20 SNPs were significantly associated with at least one of milk yield, fat yield, fat percentage, protein yield or protein percentage in the first and second lactations (P < 0.05). Haplotype-based association analysis showed that the three haplotypes were significantly associated with at least one of milk yield, fat yield, fat percentage, protein yield or protein percentage (P < 0.05). Further, we used SOPMA software to predict a SNP, 19:g.37095131C > T in NGFR, changed the structure of NGFR protein. In addition, we used Jaspar software to found that four SNPs, 19:g.37113872C > G,19:g.37113157C > T, and 19:g.37112276C > T in NGFR and 15:g.39320936A > G in ARNTL, could change the transcription factor binding sites and might affect the expression of the corresponding genes. These five SNPs might be the potential functional mutations for milk production traits in dairy cattle. Conclusions In summary, we proved that SLC22A7, NGFR, ARNTL and PPP2R2B have significant genetic effects on milk production traits. The valuable SNPs can be used as candidate genetic markers for genomic selection of dairy cattle, and the effects of these SNPs on other traits need to be further verified. Supplementary Information The online version contains supplementary material available at 10.1186/s12863-021-01002-0.
Collapse
Affiliation(s)
- Ruike Jia
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193, China
| | - Yihan Fu
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193, China
| | - Lingna Xu
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193, China
| | - Houcheng Li
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193, China
| | - Yanhua Li
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193, China.,Beijing Dairy Cattle Center, Beijing, 100192, China
| | - Lin Liu
- Beijing Dairy Cattle Center, Beijing, 100192, China
| | - Zhu Ma
- Beijing Dairy Cattle Center, Beijing, 100192, China
| | - Dongxiao Sun
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193, China
| | - Bo Han
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, No. 2 Yuanmingyuan West Road, Haidian District, Beijing, 100193, China.
| |
Collapse
|
21
|
Montesinos-Lopez OA, Montesinos-Lopez JC, Salazar E, Barron JA, Montesinos-Lopez A, Buenrostro-Mariscal R, Crossa J. Application of a Poisson deep neural network model for the prediction of count data in genome-based prediction. THE PLANT GENOME 2021; 14:e20118. [PMID: 34323393 DOI: 10.1002/tpg2.20118] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 05/15/2021] [Indexed: 06/13/2023]
Abstract
Genomic selection (GS) is revolutionizing conventional ways of developing new plants and animals. However, because it is a predictive methodology, GS strongly depends on statistical and machine learning to perform these predictions. For continuous outcomes, more models are available for GS. Unfortunately, for count data outcomes, there are few efficient statistical machine learning models for large datasets or for datasets with fewer observations than independent variables. For this reason, in this paper, we applied the univariate version of the Poisson deep neural network (PDNN) proposed earlier for genomic predictions of count data. The model was implemented with (a) the negative log-likelihood of Poisson distribution as the loss function, (b) the rectified linear activation unit as the activation function in hidden layers, and (c) the exponential activation function in the output layer. The advantage of the PDNN model is that it captures complex patterns in the data by implementing many nonlinear transformations in the hidden layers. Moreover, since it was implemented in Tensorflow as the back-end, and in Keras as the front-end, the model can be applied to moderate and large datasets, which is a significant advantage over previous GS models for count data. The PDNN model was compared with deep learning models with continuous outcomes, conventional generalized Poisson regression models, and conventional Bayesian regression methods. We found that the PDNN model outperformed the Bayesian regression and generalized Poisson regression methods in terms of prediction accuracy, although it was not better than the conventional deep neural network with continuous outcomes.
Collapse
Affiliation(s)
| | - Jose C Montesinos-Lopez
- Dep. de Estadística, Centro de Investigación en Matemáticas, Guanajuato, Guanajuato, 36023, México
| | - Eduardo Salazar
- Facultad de Telemática, Univ. de Colima, Colima, Colima, 28040, México
| | - Jose Alberto Barron
- Dep. of Animal Production (DPA), Universidad Nacional Agraria La Molina, Av. La Molina, s/n La Molina 15024, Lima, Perú
| | - Abelardo Montesinos-Lopez
- Dep. de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías, Univ. de Guadalajara, Guadalajara, Jalisco, 44430, México
| | | | - Jose Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Carretera km 45, Mexico-Veracruz, Texcoco, Edo. de México, CP 52640, México
- Colegio de Post-Graduados, CP 56230, Montecillos, Edo. de México, Texcoco, México
| |
Collapse
|
22
|
Mancisidor B, Cruz A, Gutiérrez G, Burgos A, Morón JA, Wurzinger M, Gutiérrez JP. ssGBLUP Method Improves the Accuracy of Breeding Value Prediction in Huacaya Alpaca. Animals (Basel) 2021; 11:ani11113052. [PMID: 34827784 PMCID: PMC8614529 DOI: 10.3390/ani11113052] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 09/03/2021] [Accepted: 09/14/2021] [Indexed: 11/10/2022] Open
Abstract
Simple Summary Alpaca breeding takes place in the most entrenched areas of the Andes, where the conditions to implement genetic improvement programs are very difficult. Likewise, taking phenotypic records is limited in its ability to predict genetic merit accurately. For this reason, genomic information is shown as an alternative that helps to predict the genetic values of fiber traits more precisely. This study showed how genomic information increased precision by 2.623% for the fiber diameter, 6.442% for the standard deviation of the fiber diameter, and 1.471% for the percentage of medullation compared to traditional methods for predicting genetic merit, suggesting that adding genomic data in prediction models could be beneficial for alpaca breeding programs in the future. Abstract Improving textile characteristics is the main objective of alpaca breeding. A recently developed SNP chip for alpacas could potentially be used to implement genomic selection and accelerate genetic progress. Therefore, this study aimed to compare the increase in prediction accuracy of three important fiber traits: fiber diameter (FD), standard deviation of fiber diameter (SD), and percentage of medullation (PM) in Huacaya alpacas. The data contains a total pedigree of 12,431 animals, 24,169 records for FD and SD, and 8386 records for PM and 60,624 SNP markers for each of the 431 genotyped animals of the Pacomarca Genetic Center. Prediction accuracy of breeding values was compared between a classical BLUP and a single-step Genomic BLUP (ssGBLUP). Deregressed phenotypes were predicted. The accuracies of the genetic and genomic values were calculated using the correlation between the predicted breeding values and the deregressed values of 100 randomly selected animals from the genotyped ones. Fifty replicates were carried out. Accuracies with ssGBLUP improved by 2.623%, 6.442%, and 1.471% on average for FD, SD, and PM, respectively, compared to the BLUP method. The increase in accuracy was relevant, suggesting that adding genomic data could benefit alpaca breeding programs.
Collapse
Affiliation(s)
- Betsy Mancisidor
- Departamento de Producción Animal, Universidad Nacional Agraria La Molina, Lima 12056, Peru; (B.M.); (G.G.); (J.A.M.); (M.W.)
| | - Alan Cruz
- Departamento de Producción Animal, Universidad Nacional Agraria La Molina, Lima 12056, Peru; (B.M.); (G.G.); (J.A.M.); (M.W.)
- Correspondence: ; Tel.: +51-940-202-666
| | - Gustavo Gutiérrez
- Departamento de Producción Animal, Universidad Nacional Agraria La Molina, Lima 12056, Peru; (B.M.); (G.G.); (J.A.M.); (M.W.)
| | - Alonso Burgos
- Centro Genético de Pacomarca–Inca Tops S.A., Miguel Forga 348, Arequipa 04001, Peru;
| | - Jonathan Alejandro Morón
- Departamento de Producción Animal, Universidad Nacional Agraria La Molina, Lima 12056, Peru; (B.M.); (G.G.); (J.A.M.); (M.W.)
| | - Maria Wurzinger
- Departamento de Producción Animal, Universidad Nacional Agraria La Molina, Lima 12056, Peru; (B.M.); (G.G.); (J.A.M.); (M.W.)
| | - Juan Pablo Gutiérrez
- Departamento de Producción Animal, Universidad Complutense de Madrid, E-28040 Madrid, Spain;
| |
Collapse
|
23
|
Zhu S, Guo T, Yuan C, Liu J, Li J, Han M, Zhao H, Wu Y, Sun W, Wang X, Wang T, Liu J, Tiambo CK, Yue Y, Yang B. Evaluation of Bayesian alphabet and GBLUP based on different marker density for genomic prediction in Alpine Merino sheep. G3 (BETHESDA, MD.) 2021; 11:6310012. [PMID: 34849779 PMCID: PMC8527494 DOI: 10.1093/g3journal/jkab206] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 06/01/2021] [Indexed: 01/20/2023]
Abstract
The marker density, the heritability level of trait and the statistical models adopted are critical to the accuracy of genomic prediction (GP) or selection (GS). If the potential of GP is to be fully utilized to optimize the effect of breeding and selection, in addition to incorporating the above factors into simulated data for analysis, it is essential to incorporate these factors into real data for understanding their impact on GP accuracy, more clearly and intuitively. Herein, we studied the GP of six wool traits of sheep by two different models, including Bayesian Alphabet (BayesA, BayesB, BayesCπ, and Bayesian LASSO) and genomic best linear unbiased prediction (GBLUP). We adopted fivefold cross-validation to perform the accuracy evaluation based on the genotyping data of Alpine Merino sheep (n = 821). The main aim was to study the influence and interaction of different models and marker densities on GP accuracy. The GP accuracy of the six traits was found to be between 0.28 and 0.60, as demonstrated by the cross-validation results. We showed that the accuracy of GP could be improved by increasing the marker density, which is closely related to the model adopted and the heritability level of the trait. Moreover, based on two different marker densities, it was derived that the prediction effect of GBLUP model for traits with low heritability was better; while with the increase of heritability level, the advantage of Bayesian Alphabet would be more obvious, therefore, different models of GP are appropriate in different traits. These findings indicated the significance of applying appropriate models for GP which would assist in further exploring the optimization of GP.
Collapse
Affiliation(s)
- Shaohua Zhu
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Tingting Guo
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Chao Yuan
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Jianbin Liu
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Jianye Li
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Mei Han
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Hongchang Zhao
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Yi Wu
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Weibo Sun
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.,Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Xijun Wang
- Gansu Provincial Sheep Breeding Technology Extension Station, Sunan 734400, China
| | - Tianxiang Wang
- Gansu Provincial Sheep Breeding Technology Extension Station, Sunan 734400, China
| | - Jigang Liu
- Gansu Provincial Sheep Breeding Technology Extension Station, Sunan 734400, China
| | - Christian Keambou Tiambo
- Centre for Tropical Livestock Genetics and Health (CTLGH), International Livestock Research Institute, Nairobi 00100, Kenya
| | - Yaojing Yue
- Sheep Breeding Engineering Technology Center, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Bohui Yang
- Animal Science Department, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| |
Collapse
|
24
|
Xiao J, Zhou Y, He S, Ren WL. An Efficient Score Test Integrated with Empirical Bayes for Genome-Wide Association Studies. Front Genet 2021; 12:742752. [PMID: 34659362 PMCID: PMC8517403 DOI: 10.3389/fgene.2021.742752] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 09/13/2021] [Indexed: 11/30/2022] Open
Abstract
Many methods used in multi-locus genome-wide association studies (GWAS) have been developed to improve statistical power. However, most existing multi-locus methods are not quicker than single-locus methods. To address this concern, we proposed a fast score test integrated with Empirical Bayes (ScoreEB) for multi-locus GWAS. Firstly, a score test was conducted for each single nucleotide polymorphism (SNP) under a linear mixed model (LMM) framework, taking into account the genetic relatedness and population structure. Then, all of the potentially associated SNPs were selected with a less stringent criterion. Finally, Empirical Bayes in a multi-locus model was performed for all of the selected SNPs to identify the true quantitative trait nucleotide (QTN). Our new method ScoreEB adopts the similar strategy of multi-locus random-SNP-effect mixed linear model (mrMLM) and fast multi-locus random-SNP-effect EMMA (FASTmrEMMA), and the only difference is that we use the score test to select all the potentially associated markers. Monte Carlo simulation studies demonstrate that ScoreEB significantly improved the computational efficiency compared with the popular methods mrMLM, FASTmrEMMA, iterative modified-sure independence screening EM-Bayesian lasso (ISIS EM-BLASSO), hybrid of restricted and penalized maximum likelihood (HRePML) and genome-wide efficient mixed model association (GEMMA). In addition, ScoreEB remained accurate in QTN effect estimation and effectively controlled false positive rate. Subsequently, ScoreEB was applied to re-analyze quantitative traits in plants and animals. The results show that ScoreEB not only can detect previously reported genes, but also can mine new genes.
Collapse
Affiliation(s)
- Jing Xiao
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong, China
| | - Yang Zhou
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong, China
| | - Shu He
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong, China
| | - Wen-Long Ren
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong, China
| |
Collapse
|
25
|
Impact of Marker Pruning Strategies Based on Different Measurements of Marker Distance on Genomic Prediction in Dairy Cattle. Animals (Basel) 2021; 11:ani11071992. [PMID: 34359120 PMCID: PMC8300388 DOI: 10.3390/ani11071992] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 06/27/2021] [Accepted: 06/28/2021] [Indexed: 11/16/2022] Open
Abstract
Simple Summary The usefulness of genomic prediction (GP) has been widely proofed by breeding analysis in livestock, plants and aquatic populations. It is well known that ‘marker density’ is a critical factor that affects the accuracy of GP, however, how to properly measure ‘marker density’ in GP is yet to be determined. With population-level whole-genome sequence data or high-density single nucleotide polymorphism (SNP) data available, this question seems to be answered more convincingly. In this study, we investigated and discussed the impact of four ‘marker density’ measures that reflect genetic or physical distances between SNPs on the accuracy of GP in a Germany Holstein dairy cattle population. Our results showed that the degree of variation of physical distance between adjacent SNPs had significant effects on the accuracy of GP, while the genetic distance between SNPs had no relationship with the accuracy of GP. Therefore, for studies based on high-density SNP data, the default strategy of pruning SNPs based on genetic distance is detrimental to heritability estimation and genomic prediction. The results extended the communities knowledge of ‘marker density’ and provided useful suggestions for the application and research on genome prediction. Abstract With the availability of high-density single-nucleotide polymorphism (SNP) data and the development of genotype imputation methods, high-density panel-based genomic prediction (GP) has become possible in livestock breeding. It is generally considered that the genomic estimated breeding value (GEBV) accuracy increases with the marker density, while studies have shown that the GEBV accuracy does not increase or even decrease when high-density panels were used. Therefore, in addition to the SNP number, other measurements of ‘marker density’ seem to have impacts on the GEBV accuracy, and exploring the relationship between the GEBV accuracy and the measurements of ‘marker density’ based on high-density SNP or whole-genome sequence data is important for the field of GP. In this study, we constructed different SNP panels with certain SNP numbers (e.g., 1 k) by using the physical distance (PhyD), genetic distance (GenD) and random distance (RanD) between SNPs respectively based on the high-density SNP data of a Germany Holstein dairy cattle population. Therefore, there are three different panels at a certain SNP number level. These panels were used to construct GP models to predict fat percentage, milk yield and somatic cell score. Meanwhile, the mean (d¯) and variance (σd2) of the physical distance between SNPs and the mean (r2¯) and variance (σr22) of the genetic distance between SNPs in each panel were used as marker density-related measurements and their influence on the GEBV accuracy was investigated. At the same SNP number level, the d¯ of all panels is basically the same, but the σd2, r2¯ and σr22 are different. Therefore, we only investigated the effects of σd2, r2¯ and σr22 on the GEBV accuracy. The results showed that at a certain SNP number level, the GEBV accuracy was negatively correlated with σd2, but not with r2¯ and σr22. Compared with GenD and RanD, the σd2 of panels constructed by PhyD is smaller. The low and moderate-density panels (< 50 k) constructed by RanD or GenD have large σd2, which is not conducive to genomic prediction. The GEBV accuracy of the low and moderate-density panels constructed by PhyD is 3.8~34.8% higher than that of the low and moderate-density panels constructed by RanD and GenD. Panels with 20–30 k SNPs constructed by PhyD can achieve the same or slightly higher GEBV accuracy than that of high-density SNP panels for all three traits. In summary, the smaller the variation degree of physical distance between adjacent SNPs, the higher the GEBV accuracy. The low and moderate-density panels construct by physical distance are beneficial to genomic prediction, while pruning high-density SNP data based on genetic distance is detrimental to genomic prediction. The results provide suggestions for the development of SNP panels and the research of genome prediction based on whole-genome sequence data.
Collapse
|
26
|
An B, Liang M, Chang T, Duan X, Du L, Xu L, Zhang L, Gao X, Li J, Gao H. KCRR: a nonlinear machine learning with a modified genomic similarity matrix improved the genomic prediction efficiency. Brief Bioinform 2021; 22:6271997. [PMID: 33963831 DOI: 10.1093/bib/bbab132] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Revised: 03/03/2021] [Indexed: 11/13/2022] Open
Abstract
Nowadays, advances in high-throughput sequencing benefit the increasing application of genomic prediction (GP) in breeding programs. In this research, we designed a Cosine kernel-based KRR named KCRR to perform GP. This paper assessed the prediction accuracies of 12 traits with various heritability and genetic architectures from four populations using the genomic best linear unbiased prediction (GBLUP), BayesB, support vector regression (SVR), and KCRR. On the whole, KCRR performed stably for all traits of multiple species, indicating that the hypothesis of KCRR had the potential to be adapted to a wide range of genetic architectures. Moreover, we defined a modified genomic similarity matrix named Cosine similarity matrix (CS matrix). The results indicated that the accuracies between GBLUP_kinship and GBLUP_CS almost unanimously for all traits, but the computing efficiency has increased by an average of 20 times. Our research will be a significant promising strategy in future GP.
Collapse
Affiliation(s)
- Bingxing An
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, P. R. China
| | - Mang Liang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, P. R. China
| | - Tianpeng Chang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, P. R. China
| | - Xinghai Duan
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, P. R. China
| | - Lili Du
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, P. R. China
| | - Lingyang Xu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, P. R. China
| | - Lupei Zhang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, P. R. China
| | - Xue Gao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, P. R. China
| | - Junya Li
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, P. R. China
| | - Huijiang Gao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, P. R. China
| |
Collapse
|
27
|
Yang X, Sun J, Zhao G, Li W, Tan X, Zheng M, Feng F, Liu D, Wen J, Liu R. Identification of Major Loci and Candidate Genes for Meat Production-Related Traits in Broilers. Front Genet 2021; 12:645107. [PMID: 33859671 PMCID: PMC8042277 DOI: 10.3389/fgene.2021.645107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 03/02/2021] [Indexed: 12/30/2022] Open
Abstract
Background Carcass traits are crucial characteristics of broilers. However, the underlying genetic mechanisms are not well understood. In the current study, significant loci and major-effect candidate genes affecting nine carcass traits related to meat production were analyzed in 873 purebred broilers using an imputation-based genome-wide association study. Results The heritability estimates of nine carcass traits, including carcass weight, thigh muscle weight, and thigh muscle percentage, were moderate to high and ranged from 0.21 to 0.39. Twelve genome-wide significant SNPs and 118 suggestively significant SNPs of 546,656 autosomal variants were associated with carcass traits. All SNPs for six weight traits (body weight at 42 days of age, carcass weight, eviscerated weight, whole thigh weight, thigh weight, and thigh muscle weight) were clustered around the 24.08 Kb region (GGA24: 5.73–5.75 Mb) and contained only one candidate gene (DRD2). The most significant SNP, rs15226023, accounted for 4.85–7.71% of the estimated genetic variance of the six weight traits. The remaining SNPs for carcass composition traits (whole thigh percentage and thigh percentage) were clustered around the 42.52 Kb region (GGA3: 53.03–53.08 Mb) and contained only one candidate gene (ADGRG6). The most significant SNP in this region, rs13571431, accounted for 11.89–13.56% of the estimated genetic variance of two carcass composition traits. Some degree of genetic differentiation in ADGRG6 between large and small breeds was observed. Conclusion We identified one 24.08 Kb region for weight traits and one 42.52 Kb region for thigh-related carcass traits. DRD2 was the major-effect candidate gene for weight traits, and ADGRG6 was the major-effect candidate gene for carcass composition traits. Our results supply essential information for causative mutation identification of carcass traits in broilers.
Collapse
Affiliation(s)
- Xinting Yang
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jiahong Sun
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Guiping Zhao
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Wei Li
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xiaodong Tan
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Maiqing Zheng
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Furong Feng
- Foshan Gaoming Xinguang Agricultural and Animal Industrials Corporation, Foshan, China
| | - Dawei Liu
- Foshan Gaoming Xinguang Agricultural and Animal Industrials Corporation, Foshan, China
| | - Jie Wen
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ranran Liu
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
28
|
Liang M, Chang T, An B, Duan X, Du L, Wang X, Miao J, Xu L, Gao X, Zhang L, Li J, Gao H. A Stacking Ensemble Learning Framework for Genomic Prediction. Front Genet 2021; 12:600040. [PMID: 33747037 PMCID: PMC7969712 DOI: 10.3389/fgene.2021.600040] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 01/12/2021] [Indexed: 11/22/2022] Open
Abstract
Machine learning (ML) is perhaps the most useful tool for the interpretation of large genomic datasets. However, the performance of a single machine learning method in genomic selection (GS) is currently unsatisfactory. To improve the genomic predictions, we constructed a stacking ensemble learning framework (SELF), integrating three machine learning methods, to predict genomic estimated breeding values (GEBVs). The present study evaluated the prediction ability of SELF by analyzing three real datasets, with different genetic architecture; comparing the prediction accuracy of SELF, base learners, genomic best linear unbiased prediction (GBLUP) and BayesB. For each trait, SELF performed better than base learners, which included support vector regression (SVR), kernel ridge regression (KRR) and elastic net (ENET). The prediction accuracy of SELF was, on average, 7.70% higher than GBLUP in three datasets. Except for the milk fat percentage (MFP) traits, of the German Holstein dairy cattle dataset, SELF was more robust than BayesB in all remaining traits. Therefore, we believed that SEFL has the potential to be promoted to estimate GEBVs in other animals and plants.
Collapse
Affiliation(s)
- Mang Liang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Tianpeng Chang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Bingxing An
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xinghai Duan
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lili Du
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xiaoqiao Wang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jian Miao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lingyang Xu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xue Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lupei Zhang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Junya Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Huijiang Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
29
|
Polymorphisms of AMY1A gene and their association with growth, carcass traits and feed intake efficiency in chickens. Genomics 2021; 113:583-594. [PMID: 33485951 DOI: 10.1016/j.ygeno.2020.10.041] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 10/09/2020] [Accepted: 10/27/2020] [Indexed: 11/20/2022]
Abstract
Investigations on the association between chicken traits and genetic variations can provide basic information to improve production performance in chickens. In our previous work, we genotyped 450 male chickens with a 600 K SNP array [1] and found that several SNPs in the genomic regions of the amylase alpha 1A (AMY1A) gene were significantly associated with feed intake efficiency and carcass traits. Given the lower accuracy of the SNP array, we performed direct sequencing with male and female chickens to further test chicken AMY1A polymorphisms and investigate their association with 17 traits in chickens. The results showed that 7 SNPs in the 5' flanking region, exon, intron and 3' UTR (3' untranslated region) of AMY1A, were significantly associated with daily gain (DG), average daily feed intake (ADFI), leg muscle weight (LMW) and abdominal fat (AF) (p < 0.05). Additionally, the haplotypes based on three SNPs, rs15910189, rs314354067 and rs316026696, showed significant associations with DG (p < 0.01), ADFI and AF (p < 0.05). To better understand the transcriptional regulation of AMY1A, we cloned its 5' flanking region and found that the SNPs rs316436216 and rs314213090 which might change the transcriptional regulator binding sites, were in the suppressor and enhancer regions, respectively. In addition, luciferase assays revealed that the SNP rs314613110 in the 3' UTR influenced the binding of the miRNA gga-miR-1764-3p. To validate whether there is any copy number variation in AMY1A in our population, we performed a genome-wide assessment of CNVs through whole-genome resequencing data. However, no CNV was found in AMY1A in our population, which is different from the increased copy number of AMY1A found in humans who consume a high-starch diet. Therefore, the present study provides substantial evidence for the association of AMY1A polymorphisms with growth traits and feed intake efficiency, which might contribute to chicken breeding programs.
Collapse
|
30
|
An Y, Chen L, Li YX, Li C, Shi Y, Zhang D, Li Y, Wang T. Genome-wide association studies and whole-genome prediction reveal the genetic architecture of KRN in maize. BMC PLANT BIOLOGY 2020; 20:490. [PMID: 33109077 PMCID: PMC7590725 DOI: 10.1186/s12870-020-02676-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Accepted: 09/24/2020] [Indexed: 05/21/2023]
Abstract
BACKGROUND Kernel row number (KRN) is an important trait for the domestication and improvement of maize. Exploring the genetic basis of KRN has great research significance and can provide valuable information for molecular assisted selection. RESULTS In this study, one single-locus method (MLM) and six multilocus methods (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB and ISIS EM-BLASSO) of genome-wide association studies (GWASs) were used to identify significant quantitative trait nucleotides (QTNs) for KRN in an association panel including 639 maize inbred lines that were genotyped by the MaizeSNP50 BeadChip. In three phenotyping environments and with best linear unbiased prediction (BLUP) values, the seven GWAS methods revealed different numbers of KRN-associated QTNs, ranging from 11 to 177. Based on these results, seven important regions for KRN located on chromosomes 1, 2, 3, 5, 9, and 10 were identified by at least three methods and in at least two environments. Moreover, 49 genes from the seven regions were expressed in different maize tissues. Among the 49 genes, ARF29 (Zm00001d026540, encoding auxin response factor 29) and CKO4 (Zm00001d043293, encoding cytokinin oxidase protein) were significantly related to KRN, based on expression analysis and candidate gene association mapping. Whole-genome prediction (WGP) of KRN was also performed, and we found that the KRN-associated tagSNPs achieved a high prediction accuracy. The best strategy was to integrate all of the KRN-associated tagSNPs identified by all GWAS models. CONCLUSIONS These results aid in our understanding of the genetic architecture of KRN and provide useful information for genomic selection for KRN in maize breeding.
Collapse
Affiliation(s)
- Yixin An
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
| | - Lin Chen
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
| | - Yong-Xiang Li
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
| | - Chunhui Li
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
| | - Yunsu Shi
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
| | - Dengfeng Zhang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
| | - Yu Li
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
| | - Tianyu Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081 China
| |
Collapse
|
31
|
Ren D, An L, Li B, Qiao L, Liu W. Efficient weighting methods for genomic best linear-unbiased prediction (BLUP) adapted to the genetic architectures of quantitative traits. Heredity (Edinb) 2020; 126:320-334. [PMID: 32980863 DOI: 10.1038/s41437-020-00372-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 09/12/2020] [Accepted: 09/13/2020] [Indexed: 11/09/2022] Open
Abstract
Genomic best linear-unbiased prediction (GBLUP) assumes equal variance for all marker effects, which is suitable for traits that conform to the infinitesimal model. For traits controlled by major genes, Bayesian methods with shrinkage priors or genome-wide association study (GWAS) methods can be used to identify causal variants effectively. The information from Bayesian/GWAS methods can be used to construct the weighted genomic relationship matrix (G). However, it remains unclear which methods perform best for traits varying in genetic architecture. Therefore, we developed several methods to optimize the performance of weighted GBLUP and compare them with other available methods using simulated and real data sets. First, two types of methods (marker effects with local shrinkage or normal prior) were used to obtain test statistics and estimates for each marker effect. Second, three weighted G matrices were constructed based on the marker information from the first step: (1) the genomic-feature-weighted G, (2) the estimated marker-variance-weighted G, and (3) the absolute value of the estimated marker-effect-weighted G. Following the above process, six different weighted GBLUP methods (local shrinkage/normal-prior GF/EV/AEWGBLUP) were proposed for genomic prediction. Analyses with both simulated and real data demonstrated that these options offer flexibility for optimizing the weighted GBLUP for traits with a broad spectrum of genetic architectures. The advantage of weighting methods over GBLUP in terms of accuracy was trait dependant, ranging from 14.8% to marginal for simulated traits and from 44% to marginal for real traits. Local-shrinkage prior EVWGBLUP is superior for traits mainly controlled by loci of a large effect. Normal-prior AEWGBLUP performs well for traits mainly controlled by loci of moderate effect. For traits controlled by some loci with large effects (explain 25-50% genetic variance) and a range of loci with small effects, GFWGBLUP has advantages. In conclusion, the optimal weighted GBLUP method for genomic selection should take both the genetic architecture and number of QTLs of traits into consideration carefully.
Collapse
Affiliation(s)
- Duanyang Ren
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Taigu, China
| | - Lixia An
- College of Information, Shanxi Agricultural University, Taigu, China
| | - Baojun Li
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Taigu, China
| | - Liying Qiao
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Taigu, China
| | - Wenzhong Liu
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Taigu, China.
| |
Collapse
|
32
|
Foroutaifar S. Accuracy and sensitivity of different Bayesian methods for genomic prediction using simulation and real data. Stat Appl Genet Mol Biol 2020; 19:/j/sagmb.ahead-of-print/sagmb-2019-0007/sagmb-2019-0007.xml. [PMID: 32776906 DOI: 10.1515/sagmb-2019-0007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 07/24/2020] [Indexed: 11/15/2022]
Abstract
The main objectives of this study were to compare the prediction accuracy of different Bayesian methods for traits with a wide range of genetic architecture using simulation and real data and to assess the sensitivity of these methods to the violation of their assumptions. For the simulation study, different scenarios were implemented based on two traits with low or high heritability and different numbers of QTL and the distribution of their effects. For real data analysis, a German Holstein dataset for milk fat percentage, milk yield, and somatic cell score was used. The simulation results showed that, with the exception of the Bayes R, the other methods were sensitive to changes in the number of QTLs and distribution of QTL effects. Having a distribution of QTL effects, similar to what different Bayesian methods assume for estimating marker effects, did not improve their prediction accuracy. The Bayes B method gave higher or equal accuracy rather than the rest. The real data analysis showed that similar to scenarios with a large number of QTLs in the simulation, there was no difference between the accuracies of the different methods for any of the traits.
Collapse
Affiliation(s)
- Saheb Foroutaifar
- Department of Animal Science, College of Agriculture and Natural Resources, Razi University, Kermanshah, PO Box: 6715685418, Iran
| |
Collapse
|
33
|
Yin L, Zhang H, Zhou X, Yuan X, Zhao S, Li X, Liu X. KAML: improving genomic prediction accuracy of complex traits using machine learning determined parameters. Genome Biol 2020; 21:146. [PMID: 32552725 PMCID: PMC7386246 DOI: 10.1186/s13059-020-02052-w] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 05/21/2020] [Indexed: 02/06/2023] Open
Abstract
Advances in high-throughput sequencing technologies have reduced the cost of genotyping dramatically and led to genomic prediction being widely used in animal and plant breeding, and increasingly in human genetics. Inspired by the efficient computing of linear mixed model and the accurate prediction of Bayesian methods, we propose a machine learning-based method incorporating cross-validation, multiple regression, grid search, and bisection algorithms named KAML that aims to combine the advantages of prediction accuracy with computing efficiency. KAML exhibits higher prediction accuracy than existing methods, and it is available at https://github.com/YinLiLin/KAML.
Collapse
Affiliation(s)
- Lilin Yin
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China.,Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China
| | - Haohao Zhang
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, 430070, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiaohui Yuan
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, 430070, China
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China.,Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China
| | - Xinyun Li
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China. .,Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China.
| | - Xiaolei Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China. .,Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China.
| |
Collapse
|
34
|
Liu T, Luo C, Ma J, Wang Y, Shu D, Su G, Qu H. High-Throughput Sequencing With the Preselection of Markers Is a Good Alternative to SNP Chips for Genomic Prediction in Broilers. Front Genet 2020; 11:108. [PMID: 32174971 PMCID: PMC7056902 DOI: 10.3389/fgene.2020.00108] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 01/30/2020] [Indexed: 11/13/2022] Open
Abstract
The choice of a genetic marker genotyping platform is important for genomic prediction in livestock and poultry. High-throughput sequencing can produce more genetic markers, but the genotype quality is lower than that obtained with single nucleotide polymorphism (SNP) chips. The aim of this study was to compare the accuracy of genomic prediction between high-throughput sequencing and SNP chips in broilers. In this study, we developed a new SNP marker screening method, the pre-marker-selection (PMS) method, to determine whether an SNP marker can be used for genomic prediction. We also compared a method which preselection marker based results from genome-wide association studies (GWAS). With the two methods, we analysed body weight at the12th week (BW) and feed conversion ratio (FCR) in a local broiler population. A total of 395 birds were selected from the F2 generation of the population, and 10X specific-locus amplified fragment sequencing (SLAF-seq) and the Illumina Chicken 60K SNP Beadchip were used for genotyping. The genomic best linear unbiased prediction method (GBLUP) was used to predict the genomic breeding values. The accuracy of genomic prediction was validated by the leave-one-out cross-validation method. Without SNP marker screening, the accuracies of the genomic estimated breeding value (GEBV) of BW and FCR were 0.509 and 0.249, respectively, when using SLAF-seq, and the accuracies were 0.516 and 0.232, respectively, when using the SNP chip. With SNP marker screening by the PMS method, the accuracies of GEBV of the two traits were 0.671 and 0.499, respectively, when using SLAF-seq, and 0.605 and 0.422, respectively, when using the SNP chip. Our SNP marker screening method led to an increase of prediction accuracy by 0.089-0.250. With SNP marker screening by the GWAS method, the accuracies of genomic prediction for the two traits were also improved, but the gains of accuracy were less than the gains with PMS method for all traits. The results from this study indicate that our PMS method can improve the accuracy of GEBV, and that more accurate genomic prediction can be obtained from an increased number of genomic markers when using high-throughput sequencing in local broiler populations. Due to its lower genotyping cost, high-throughput sequencing could be a good alternative to SNP chips for genomic prediction in breeding programmes of local broiler populations.
Collapse
Affiliation(s)
- Tianfei Liu
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Chenglong Luo
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Jie Ma
- Guangdong Provincial Key Laboratory of Animal Breeding and Nutrition, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Yan Wang
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Dingming Shu
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, Denmark
| | - Hao Qu
- State Key Laboratory of Livestock and Poultry Breeding, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| |
Collapse
|
35
|
Pyhäjärvi T, Kujala ST, Savolainen O. 275 years of forestry meets genomics in Pinus sylvestris. Evol Appl 2020; 13:11-30. [PMID: 31988655 PMCID: PMC6966708 DOI: 10.1111/eva.12809] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 04/05/2019] [Accepted: 04/24/2019] [Indexed: 12/12/2022] Open
Abstract
Pinus sylvestris has a long history of basic and applied research that is relevant for both forestry and evolutionary studies. Its patterns of adaptive variation and role in forest economic and ecological systems have been studied extensively for nearly 275 years, detailed demography for a 100 years and mating system more than 50 years. However, its reference genome sequence is not yet available and genomic studies have been lagging compared to, for example, Pinus taeda and Picea abies, two other economically important conifers. Despite the lack of reference genome, many modern genomic methods are applicable for a more detailed look at its biological characteristics. For example, RNA-seq has revealed a complex transcriptional landscape and targeted DNA sequencing displays an excess of rare variants and geographically homogenously distributed molecular genetic diversity. Current DNA and RNA resources can be used as a reference for gene expression studies, SNP discovery, and further targeted sequencing. In the future, specific consequences of the large genome size, such as functional effects of regulatory open chromatin regions and transposable elements, should be investigated more carefully. For forest breeding and long-term management purposes, genomic data can help in assessing the genetic basis of inbreeding depression and the application of genomic tools for genomic prediction and relatedness estimates. Given the challenges of breeding (long generation time, no easy vegetative propagation) and the economic importance, application of genomic tools has a potential to have a considerable impact. Here, we explore how genomic characteristics of P. sylvestris, such as rare alleles and the low extent of linkage disequilibrium, impact the applicability and power of the tools.
Collapse
Affiliation(s)
- Tanja Pyhäjärvi
- Department of Ecology and GeneticsUniversity of OuluOuluFinland
- Biocenter OuluUniversity of OuluOuluFinland
| | | | - Outi Savolainen
- Department of Ecology and GeneticsUniversity of OuluOuluFinland
- Biocenter OuluUniversity of OuluOuluFinland
| |
Collapse
|
36
|
Gao N, Chen Y, Liu X, Zhao Y, Zhu L, Liu A, Jiang W, Peng X, Zhang C, Tang Z, Li X, Chen Y. Weighted single-step GWAS identified candidate genes associated with semen traits in a Duroc boar population. BMC Genomics 2019; 20:797. [PMID: 31666004 PMCID: PMC6822442 DOI: 10.1186/s12864-019-6164-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 10/09/2019] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND In the pig production industry, artificial insemination (AI) plays an important role in enlarging the beneficial impact of elite boars. Understanding the genetic architecture and detecting genetic markers associated with semen traits can help in improving genetic selection for such traits and accelerate genetic progress. In this study, we utilized a weighted single-step genome-wide association study (wssGWAS) procedure to detect genetic regions and further candidate genes associated with semen traits in a Duroc boar population. Overall, the full pedigree consists of 5284 pigs (12 generations), of which 2693 boars have semen data (143,113 ejaculations) and 1733 pigs were genotyped with 50 K single nucleotide polymorphism (SNP) array. RESULTS Results show that the most significant genetic regions (0.4 Mb windows) explained approximately 2%~ 6% of the total genetic variances for the studied traits. Totally, the identified significant windows (windows explaining more than 1% of total genetic variances) explained 28.29, 35.31, 41.98, and 20.60% of genetic variances (not phenotypic variance) for number of sperm cells, sperm motility, sperm progressive motility, and total morphological abnormalities, respectively. Several genes that have been previously reported to be associated with mammal spermiogenesis, testes functioning, and male fertility were detected and treated as candidate genes for the traits of interest: Number of sperm cells, TDRD5, QSOX1, BLK, TIMP3, THRA, CSF3, and ZPBP1; Sperm motility, PPP2R2B, NEK2, NDRG, ADAM7, SKP2, and RNASET2; Sperm progressive motility, SH2B1, BLK, LAMB1, VPS4A, SPAG9, LCN2, and DNM1; Total morphological abnormalities, GHR, SELENOP, SLC16A5, SLC9A3R1, and DNAI2. CONCLUSIONS In conclusion, candidate genes associated with Duroc boars' semen traits, including the number of sperm cells, sperm motility, sperm progressive motility, and total morphological abnormalities, were identified using wssGWAS. KEGG and GO enrichment analysis indicate that the identified candidate genes were enriched in biological processes and functional terms may be involved into spermiogenesis, testes functioning, and male fertility.
Collapse
Affiliation(s)
- Ning Gao
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, North Third Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, China
| | - Yilong Chen
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xiaohong Liu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, North Third Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, China
| | - Yunxiang Zhao
- Guangxi Xiubo genetics technology Co., LTD, Guigang, 537100, China
| | - Lin Zhu
- Guangxi Xiubo genetics technology Co., LTD, Guigang, 537100, China
| | - Ali Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wei Jiang
- Guangxi Xiubo genetics technology Co., LTD, Guigang, 537100, China
| | - Xing Peng
- Guangxi Xiubo genetics technology Co., LTD, Guigang, 537100, China
| | - Conglin Zhang
- Guangxi Yangxiang Agriculture and Husbandry Co., LTD, Guigang, 537100, China
| | - Zhenshuang Tang
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xinyun Li
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of the Ministry of Education, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yaosheng Chen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, North Third Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, China.
| |
Collapse
|
37
|
Abstract
The increasing amount of available biological information on the markers can be used to inform the models applied for genomic selection to improve predictions. The objective of this study was to propose a general model for genomic selection using a link function approach within the hierarchical generalized linear model framework (hglm) that can include external information on the markers. These models can be fitted using the well-established hglm package in R. We also present an R package (CodataGS) to fit these models, which is significantly faster than the hglm package. Simulated data were used to validate the proposed model. We tested categorical, continuous and combination models where the external information on the markers was related to 1) the location of the QTL on the genome with varying degree of uncertainty, 2) the relationship of the markers with the QTL calculated as the LD between them, and 3) a combination of both. The proposed models showed improved accuracies from 3.8% up to 23.2% compared to the SNP-BLUP method in a simulated population derived from a base population with 100 individuals. Moreover, the proposed categorical model was tested on a dairy cattle dataset for two traits (Milk Yield and Fat Percentage). These results also showed improved accuracy compared to SNP-BLUP, especially for the Fat% trait. The performance of the proposed models depended on the genetic architecture of the trait, as traits that deviate from the infinitesimal model benefited more from the external information. Also, the gain in accuracy depended on the degree of uncertainty of the external information provided to the model. The usefulness of these type of models is expected to increase with time as more accurate information on the markers becomes available.
Collapse
|
38
|
Drag MH, Kogelman LJA, Maribo H, Meinert L, Thomsen PD, Kadarmideen HN. Characterization of eQTLs associated with androstenone by RNA sequencing in porcine testis. Physiol Genomics 2019; 51:488-499. [PMID: 31373884 DOI: 10.1152/physiolgenomics.00125.2018] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Characterization of genetic variants affecting genome-wide gene expression levels (expression quantitative trait loci or eQTLs) in pig testes may improve our understanding of genetic architecture of boar taint (an animal welfare trait) and helps in genome-assisted or genomic selection programs. The aims of this study were to identify eQTLs associated with androstenone, to find candidate eQTLs for low androstenone, and to validate the top eQTL by reverse transcriptase quantitative PCR (RT-qPCR). Gene expression profiles were obtained by RNA sequencing in testis from Danish cross-bred pigs and genotype data by 80K single nucleotide polymorphism panel. A total of 262 eQTLs [false discovery rate (FDR) < 0.05] were identified by using two software packages: Matrix eQTL and Krux eQTL. Of these, 149 cis-acting eQTLs were significantly associated with androstenone concentrations and gene expression (FDR < 0.05). The eQTLs were associated with several genes of boar taint relevance including CYP1A2, CYB5D1, and SPHK2. One eQTL gene, AMPH, was differentially expressed (FDR < 0.05) and affected by chicory. Five candidate eQTLs associated with low androstenone concentrations were discovered, including the top eQTL associated with CYP1A2. RT-qPCR confirmed target gene expression to be significantly (P < 0.05) different based on eQTL genotypes. Furthermore, eQTLs were enriched as QTLs for 15 boar taint related traits from the PigQTLdb. This is the first study to report eQTLs in testes of commercial crossbred pigs used in pork production and to reveal genetic architecture of boar taint. Potential applications include development of a DNA test and in advanced genomic selection models for boar taint.
Collapse
Affiliation(s)
- Markus H Drag
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Lisette J A Kogelman
- Department of Neurology, Danish Headache Center, Rigshospitalet Glostrup, Faculty of Health and Medical Sciences, University of Copenhagen, Glostrup, Denmark
| | - Hanne Maribo
- SEGES, Danish Pig Research Center, Copenhagen, Denmark
| | - Lene Meinert
- Danish Meat Research Institute (DMRI), Danish Technological Institute, Taastrup, Denmark
| | - Preben D Thomsen
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Haja N Kadarmideen
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
| |
Collapse
|
39
|
Teng J, Gao N, Zhang H, Li X, Li J, Zhang H, Zhang X, Zhang Z. Performance of whole genome prediction for growth traits in a crossbred chicken population. Poult Sci 2019; 98:1968-1975. [DOI: 10.3382/ps/pey604] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Accepted: 12/21/2018] [Indexed: 11/20/2022] Open
|
40
|
Nani JP, Rezende FM, Peñagaricano F. Predicting male fertility in dairy cattle using markers with large effect and functional annotation data. BMC Genomics 2019; 20:258. [PMID: 30940077 PMCID: PMC6444482 DOI: 10.1186/s12864-019-5644-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Accepted: 03/25/2019] [Indexed: 11/22/2022] Open
Abstract
Background Fertility is among the most important economic traits in dairy cattle. Genomic prediction for cow fertility has received much attention in the last decade, while bull fertility has been largely overlooked. The goal of this study was to assess genomic prediction of dairy bull fertility using markers with large effect and functional annotation data. Sire conception rate (SCR) was used as a measure of service sire fertility. Dataset consisted of 11.5 k U.S. Holstein bulls with SCR records and about 300 k single nucleotide polymorphism (SNP) markers. The analyses included the use of both single-kernel and multi-kernel predictive models fitting either all SNPs, markers with large effect, or markers with presumed functional roles, such as non-synonymous, synonymous, or non-coding regulatory variants. Results The entire set of SNPs yielded predictive correlations of 0.340. Five markers located on chromosomes BTA8, BTA9, BTA13, BTA17, and BTA27 showed marked dominance effects. Interestingly, the inclusion of these five major markers as fixed effects in the predictive models increased predictive correlations to 0.403, representing an increase in accuracy of about 19% compared with the standard model. Single-kernel models fitting functional SNP classes outperformed their counterparts using random sets of SNPs, suggesting that the predictive power of these functional variants is driven in part by their biological roles. Multi-kernel models fitting all the functional SNP classes together with the five major markers exhibited predictive correlations around 0.405. Conclusions The inclusion of markers with large effect markedly improved the prediction of dairy sire fertility. Functional variants exhibited higher predictive ability than random variants, but did not outperform the standard whole-genome approach. This research is the foundation for the development of novel strategies that could help the dairy industry make accurate genome-guided selection decisions on service sire fertility.
Collapse
Affiliation(s)
- Juan Pablo Nani
- Department of Animal Sciences, University of Florida, 2250 Shealy Drive, Gainesville, FL, 32611, USA.,Estación Experimental Agropecuaria Rafaela, Instituto Nacional de Tecnología Agropecuaria, 22-2300, Rafaela, SF, Argentina
| | - Fernanda M Rezende
- Department of Animal Sciences, University of Florida, 2250 Shealy Drive, Gainesville, FL, 32611, USA.,Faculdade de Medicina Veterinária, Universidade Federal de Uberlândia, Uberlândia, MG, 38410-337, Brazil
| | - Francisco Peñagaricano
- Department of Animal Sciences, University of Florida, 2250 Shealy Drive, Gainesville, FL, 32611, USA. .,University of Florida Genetics Institute, University of Florida, Gainesville, FL, 32610, USA.
| |
Collapse
|
41
|
Zhang H, Yin L, Wang M, Yuan X, Liu X. Factors Affecting the Accuracy of Genomic Selection for Agricultural Economic Traits in Maize, Cattle, and Pig Populations. Front Genet 2019; 10:189. [PMID: 30923535 PMCID: PMC6426750 DOI: 10.3389/fgene.2019.00189] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 02/21/2019] [Indexed: 11/20/2022] Open
Abstract
Genomic Selection (GS) has been proved to be a powerful tool for estimating genetic values in plant and livestock breeding. Newly developed sequencing technologies have dramatically reduced the cost of genotyping and significantly increased the scale of genotype data that used for GS. Meanwhile, state-of-the-art statistical methods were developed to make the best use of high marker density genotype data. In this study, 14 traits from four data sets of three species (maize, cattle, and pig) and five influential factors that affect the prediction accuracy were evaluated, including marker density (from 1 to ~600 k), statistical method (GBLUP-A, GBLUP-AD, and BayesR), minor allele frequency (MAF), heritability, and genetic architecture. Results indicate that in the GBLUP method, higher marker density leads to a higher prediction accuracy. In contrast, BayesR method needs more Monte Carlo Markov Chain (MCMC) iterations to reach the convergence and get reliable prediction values. BayesR outperforms GBLUP in predicting high or medium heritability trait that affected by one or several genes with large effects, while GBLUP performs similarly or slightly better than BayesR in predicting low heritability trait that controlled by a large amount of genes with minor effects. Prediction accuracy of trait with complex genetic architecture can be improved by increasing the marker density. Interestingly, for simple traits that controlled by one or several genes with large effects, higher marker density can cause a lower prediction accuracy if the QTN is included, but leads to a higher prediction accuracy if the QTN is excluded. The quantity of genetic markers with low MAF would not significantly affect the prediction accuracy of GBLUP, but results in a bad prediction accuracy performance of BayesR method. Compared with GBLUP-A, GBLUP-AD didn't show any advantages in capturing the non-additive variance for the traits with high heritability. The factors that affected prediction accuracy are discussed in this study and indicate that a combination of either GBLUP or BayesR method with moderate marker density and favorable polymorphism single nucleotide polymorphisms (SNPs) (~25 k SNPs) would always produce a good and stable prediction accuracy with acceptable breeding and computational costs.
Collapse
Affiliation(s)
- Haohao Zhang
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Lilin Yin
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Meiyue Wang
- Department of Botany and Plant Sciences, University of California, Riverside, Riverside, CA, United States
| | - Xiaohui Yuan
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Xiaolei Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
42
|
Boison SA, Gjerde B, Hillestad B, Makvandi-Nejad S, Moghadam HK. Genomic and Transcriptomic Analysis of Amoebic Gill Disease Resistance in Atlantic Salmon ( Salmo salar L.). Front Genet 2019; 10:68. [PMID: 30873203 PMCID: PMC6400892 DOI: 10.3389/fgene.2019.00068] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Accepted: 01/28/2019] [Indexed: 01/01/2023] Open
Abstract
Amoebic gill disease (AGD) is one of the most important parasitic diseases of farmed Atlantic salmon. It is a source of major economic loss to the industry and poses significant threats to animal welfare. Previous studies have shown that resistance against this disease has a moderate, heritable genetic component, although the genes and the genetic pathways that contribute to this process have yet to be elucidated. In this study, to identify the genetic mechanisms of AGD resistance, we first investigated the molecular signatures of AGD infection in Atlantic salmon through a challenge model, where we compared the transcriptome profiles of the naïve and infected animals. We then conducted a genome-wide association analysis with 1,333 challenged tested fish to map the AGD resistance genomic regions, supported by the results from the transcriptomic data. Further, we investigated the potential of incorporating gene expression analysis results in genomic prediction to improve prediction accuracy. Our data suggest thousands of genes have modified their expression following infection, with a significant increase in the transcription of genes with functional properties in cell adhesion and a sharp decline in the abundance of various components of the immune system genes. From the genome-wide association analysis, QTL regions on chromosomes ssa04, ssa09, and ssa13 were detected to be linked with AGD resistance. In particular, we found that QTL region on ssa04 harbors members of the cadherin gene family. These genes play a critical role in target recognition and cell adhesion. The QTL region on ssa09 also is associated with another member of the cadherin gene family, protocadherin Fat 4. The associated genetic markers on ssa13 span a large genomic region that includes interleukin-18-binding protein, a gene with function essential in inhibiting the proinflammatory effect of cytokine IL18. Incorporating gene expression information through a weighted genomic relationship matrix approach decreased genomic prediction accuracy and increased bias of prediction. Together, these findings help to improve our breeding programs and animal welfare against AGD and advance our knowledge of the genetic basis of host-pathogen interactions.
Collapse
Affiliation(s)
| | - Bjarne Gjerde
- Department of Breeding and Genetics, Nofima, Ås, Norway
| | | | | | | |
Collapse
|
43
|
Souza LM, Francisco FR, Gonçalves PS, Scaloppi Junior EJ, Le Guen V, Fritsche-Neto R, Souza AP. Genomic Selection in Rubber Tree Breeding: A Comparison of Models and Methods for Managing G×E Interactions. FRONTIERS IN PLANT SCIENCE 2019; 10:1353. [PMID: 31708955 PMCID: PMC6824234 DOI: 10.3389/fpls.2019.01353] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 10/01/2019] [Indexed: 05/18/2023]
Abstract
Several genomic prediction models combining genotype × environment (G×E) interactions have recently been developed and used for genomic selection (GS) in plant breeding programs. G×E interactions reduce selection accuracy and limit genetic gains in plant breeding. Two data sets were used to compare the prediction abilities of multienvironment G×E genomic models and two kernel methods. Specifically, a linear kernel, or GB (genomic best linear unbiased predictor [GBLUP]), and a nonlinear kernel, or Gaussian kernel (GK), were used to compare the prediction accuracies (PAs) of four genomic prediction models: 1) a single-environment, main genotypic effect model (SM); 2) a multienvironment, main genotypic effect model (MM); 3) a multienvironment, single-variance G×E deviation model (MDs); and 4) a multienvironment, environment-specific variance G×E deviation model (MDe). We evaluated the utility of genomic selection (GS) for 435 individual rubber trees at two sites and genotyped the individuals via genotyping-by-sequencing (GBS) of single-nucleotide polymorphisms (SNPs). Prediction models were used to estimate stem circumference (SC) during the first 4 years of tree development in conjunction with a broad-sense heritability (H 2) of 0.60. Applying the model (SM, MM, MDs, and MDe) and kernel method (GB and GK) combinations to the rubber tree data revealed that the multienvironment models were superior to the single-environment genomic models, regardless of the kernel (GB or GK) used, suggesting that introducing interactions between markers and environmental conditions increases the proportion of variance explained by the model and, more importantly, the PA. Compared with the classic breeding method (CBM), methods in which GS is incorporated resulted in a 5-fold increase in response to selection for SC with multienvironment GS (MM, MDe, or MDs). Furthermore, GS resulted in a more balanced selection response for SC and contributed to a reduction in selection time when used in conjunction with traditional genetic breeding programs. Given the rapid advances in genotyping methods and their declining costs and given the overall costs of large-scale progeny testing and shortened breeding cycles, we expect GS to be implemented in rubber tree breeding programs.
Collapse
Affiliation(s)
- Livia M. Souza
- Molecular Biology and Genetic Engineering Center (CBMEG), University of Campinas (UNICAMP), Campinas, Brazil
| | - Felipe R. Francisco
- Molecular Biology and Genetic Engineering Center (CBMEG), University of Campinas (UNICAMP), Campinas, Brazil
| | - Paulo S. Gonçalves
- Center of Rubber Tree and Agroforestry Systems, Agronomic Institute (IAC), Votuporanga, Brazil
| | | | - Vincent Le Guen
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), UMR AGAP, Montpellier, France
| | - Roberto Fritsche-Neto
- Departamento de Genética, Escola Superior de Agricultura “Luiz de Queiroz” Universidade de São Paulo (ESALQ/USP), Piracicaba, Brazil
| | - Anete P. Souza
- Molecular Biology and Genetic Engineering Center (CBMEG), University of Campinas (UNICAMP), Campinas, Brazil
- Department of Plant Biology, Biology Institute, University of Campinas (UNICAMP), Campinas, Brazil
- *Correspondence: Anete P. Souza,
| |
Collapse
|
44
|
Liu X, Wang H, Hu X, Li K, Liu Z, Wu Y, Huang C. Improving Genomic Selection With Quantitative Trait Loci and Nonadditive Effects Revealed by Empirical Evidence in Maize. FRONTIERS IN PLANT SCIENCE 2019; 10:1129. [PMID: 31620155 PMCID: PMC6759780 DOI: 10.3389/fpls.2019.01129] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 08/15/2019] [Indexed: 05/20/2023]
Abstract
Genomic selection (GS), a tool developed for molecular breeding, is used by plant breeders to improve breeding efficacy by shortening the breeding cycle and to facilitate the selection of candidate lines for creating hybrids without phenotyping in various environments. Association and linkage mapping have been widely used to explore and detect candidate genes in order to understand the genetic mechanisms of quantitative traits. In the current study, phenotypic and genotypic data from three experimental populations, including data on six agronomic traits (e.g., plant height, ear height, ear length, ear diameter, grain yield per plant, and hundred-kernel weight), were used to evaluate the effect of trait-relevant markers (TRMs) on prediction accuracy estimation. Integrating information from mapping into a statistical model can efficiently improve prediction performance compared with using stochastically selected markers to perform GS. The prediction accuracy can reach plateau when a total of 500-1,000 TRMs are utilized in GS. The prediction accuracy can be significantly enhanced by including nonadditive effects and TRMs in the GS model when genotypic data with high proportions of heterozygous alleles and complex agronomic traits with high proportion of nonadditive variancein phenotypic variance are used to perform GS. In addition, taking information on population structure into account can slightly improve prediction performance when the genetic relationship between the training and testing sets is influenced by population stratification due to different allele frequencies. In conclusion, GS is a useful approach for prescreening candidate lines, and the empirical evidence provided by the current study for TRMs and nonadditive effects can inform plant breeding and in turn contribute to the improvement of selection efficiency in practical GS-assisted breeding programs.
Collapse
|
45
|
Optimising Genomic Selection in Wheat: Effect of Marker Density, Population Size and Population Structure on Prediction Accuracy. G3-GENES GENOMES GENETICS 2018; 8:2889-2899. [PMID: 29970398 PMCID: PMC6118301 DOI: 10.1534/g3.118.200311] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Genomic selection applied to plant breeding enables earlier estimates of a line’s performance and significant reductions in generation interval. Several factors affecting prediction accuracy should be well understood if breeders are to harness genomic selection to its full potential. We used a panel of 10,375 bread wheat (Triticum aestivum) lines genotyped with 18,101 SNP markers to investigate the effect and interaction of training set size, population structure and marker density on genomic prediction accuracy. Through assessing the effect of training set size we showed the rate at which prediction accuracy increases is slower beyond approximately 2,000 lines. The structure of the panel was assessed via principal component analysis and K-means clustering, and its effect on prediction accuracy was examined through a novel cross-validation analysis according to the K-means clusters and breeding cohorts. Here we showed that accuracy can be improved by increasing the diversity within the training set, particularly when relatedness between training and validation sets is low. The breeding cohort analysis revealed that traits with higher selection pressure (lower allelic diversity) can be more accurately predicted by including several previous cohorts in the training set. The effect of marker density and its interaction with population structure was assessed for marker subsets containing between 100 and 17,181 markers. This analysis showed that response to increased marker density is largest when using a diverse training set to predict between poorly related material. These findings represent a significant resource for plant breeders and contribute to the collective knowledge on the optimal structure of calibration panels for genomic prediction.
Collapse
|
46
|
Accuracy of Genomic Prediction for Foliar Terpene Traits in Eucalyptus polybractea. G3-GENES GENOMES GENETICS 2018; 8:2573-2583. [PMID: 29891736 PMCID: PMC6071609 DOI: 10.1534/g3.118.200443] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Unlike agricultural crops, most forest species have not had millennia of improvement through phenotypic selection, but can contribute energy and material resources and possibly help alleviate climate change. Yield gains similar to those achieved in agricultural crops over millennia could be made in forestry species with the use of genomic methods in a much shorter time frame. Here we compare various methods of genomic prediction for eight traits related to foliar terpene yield in Eucalyptus polybractea, a tree grown predominantly for the production of Eucalyptus oil. The genomic markers used in this study are derived from shallow whole genome sequencing of a population of 480 trees. We compare the traditional pedigree-based additive best linear unbiased predictors (ABLUP), genomic BLUP (GBLUP), BayesB genomic prediction model, and a form of GBLUP based on weighting markers according to their influence on traits (BLUP|GA). Predictive ability is assessed under varying marker densities of 10,000, 100,000 and 500,000 SNPs. Our results show that BayesB and BLUP|GA perform best across the eight traits. Predictive ability was higher for individual terpene traits, such as foliar α-pinene and 1,8-cineole concentration (0.59 and 0.73, respectively), than aggregate traits such as total foliar oil concentration (0.38). This is likely a function of the trait architecture and markers used. BLUP|GA was the best model for the two biomass related traits, height and 1 year change in height (0.25 and 0.19, respectively). Predictive ability increased with marker density for most traits, but with diminishing returns. The results of this study are a solid foundation for yield improvement of essential oil producing eucalypts. New markets such as biopolymers and terpene-derived biofuels could benefit from rapid yield increases in undomesticated oil-producing species.
Collapse
|
47
|
Morgante F, Huang W, Maltecca C, Mackay TFC. Effect of genetic architecture on the prediction accuracy of quantitative traits in samples of unrelated individuals. Heredity (Edinb) 2018; 120:500-514. [PMID: 29426878 PMCID: PMC5943287 DOI: 10.1038/s41437-017-0043-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Revised: 11/16/2017] [Accepted: 11/22/2017] [Indexed: 11/13/2022] Open
Abstract
Predicting complex phenotypes from genomic data is a fundamental aim of animal and plant breeding, where we wish to predict genetic merits of selection candidates; and of human genetics, where we wish to predict disease risk. While genomic prediction models work well with populations of related individuals and high linkage disequilibrium (LD) (e.g., livestock), comparable models perform poorly for populations of unrelated individuals and low LD (e.g., humans). We hypothesized that low prediction accuracies in the latter situation may occur when the genetics architecture of the trait departs from the infinitesimal and additive architecture assumed by most prediction models. We used simulated data for 10,000 lines based on sequence data from a population of unrelated, inbred Drosophila melanogaster lines to evaluate this hypothesis. We show that, even in very simplified scenarios meant as a stress test of the commonly used Genomic Best Linear Unbiased Predictor (G-BLUP) method, using all common variants yields low prediction accuracy regardless of the trait genetic architecture. However, prediction accuracy increases when predictions are informed by the genetic architecture inferred from mapping the top variants affecting main effects and interactions in the training data, provided there is sufficient power for mapping. When the true genetic architecture is largely or partially due to epistatic interactions, the additive model may not perform well, while models that account explicitly for interactions generally increase prediction accuracy. Our results indicate that accounting for genetic architecture can improve prediction accuracy for quantitative traits.
Collapse
Affiliation(s)
- Fabio Morgante
- Program in Genetics, North Carolina State University, Raleigh, NC, 27695-7614, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, 27695-7614, USA
- W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC, 27695-7614, USA
| | - Wen Huang
- Program in Genetics, North Carolina State University, Raleigh, NC, 27695-7614, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, 27695-7614, USA
- W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC, 27695-7614, USA
- Initiative in Biological Complexity, North Carolina State University, Raleigh, NC, 27695-7614, USA
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA
| | - Christian Maltecca
- Program in Genetics, North Carolina State University, Raleigh, NC, 27695-7614, USA
- Department of Animal Science, North Carolina State University, Raleigh, NC, 27695-7621, USA
| | - Trudy F C Mackay
- Program in Genetics, North Carolina State University, Raleigh, NC, 27695-7614, USA.
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, 27695-7614, USA.
- W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC, 27695-7614, USA.
- Initiative in Biological Complexity, North Carolina State University, Raleigh, NC, 27695-7614, USA.
| |
Collapse
|
48
|
Ye S, Yuan X, Lin X, Gao N, Luo Y, Chen Z, Li J, Zhang X, Zhang Z. Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population. J Anim Sci Biotechnol 2018; 9:30. [PMID: 29581880 PMCID: PMC5861640 DOI: 10.1186/s40104-018-0241-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2017] [Accepted: 01/26/2018] [Indexed: 11/24/2022] Open
Abstract
Background Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation. Results We genotyped 450 chickens with a 600 K SNP array, and sequenced 24 key individuals by whole genome re-sequencing. Accuracy of imputation from putative 60 K and 600 K array data to WGS data was 0.620 and 0.812 for Beagle, and 0.810 and 0.914 for FImpute, respectively. By increasing the sequencing cost from 24X to 144X, the imputation accuracy increased from 0.525 to 0.698 for Beagle and from 0.654 to 0.823 for FImpute. With fixed sequence depth (12X), increasing the number of sequenced animals from 1 to 24, improved accuracy from 0.421 to 0.897 for FImpute and from 0.396 to 0.777 for Beagle. Using optimally selected key individuals resulted in a higher imputation accuracy compared with using randomly selected individuals as a reference population for re-sequencing. With fixed reference population size (24), imputation accuracy increased from 0.654 to 0.875 for FImpute and from 0.512 to 0.762 for Beagle as the sequencing depth increased from 1X to 12X. With a given total cost of genotyping, accuracy increased with the size of the reference population for FImpute, but the pattern was not valid for Beagle, which showed the highest accuracy at six fold coverage for the scenarios used in this study. Conclusions In conclusion, we comprehensively investigated the impacts of several key factors on genotype imputation. Generally, increasing sequencing cost gave a higher imputation accuracy. But with a fixed sequencing cost, the optimal imputation enhance the performance of WGP and GWAS. An optimal imputation strategy should take size of reference population, imputation algorithms, marker density, and population structure of the target population and methods to select key individuals into consideration comprehensively. This work sheds additional light on how to design and execute genotype imputation for livestock populations. Electronic supplementary material The online version of this article (10.1186/s40104-018-0241-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shaopan Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Xiaolong Yuan
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Xiran Lin
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Ning Gao
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Yuanyu Luo
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Zanmou Chen
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Jiaqi Li
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Xiquan Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| |
Collapse
|
49
|
Abdollahi-Arpanahi R, Morota G, Peñagaricano F. Predicting bull fertility using genomic data and biological information. J Dairy Sci 2017; 100:9656-9666. [PMID: 28987577 DOI: 10.3168/jds.2017-13288] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2017] [Accepted: 09/13/2017] [Indexed: 01/04/2023]
Abstract
The genomic prediction of unobserved genetic values or future phenotypes for complex traits has revolutionized agriculture and human medicine. Fertility traits are undoubtedly complex traits of great economic importance to the dairy industry. Although genomic prediction for improved cow fertility has received much attention, bull fertility largely has been ignored. The first aim of this study was to investigate the feasibility of genomic prediction of sire conception rate (SCR) in US Holstein dairy cattle. Standard genomic prediction often ignores any available information about functional features of the genome, although it is believed that such information can yield more accurate and more persistent predictions. Hence, the second objective was to incorporate prior biological information into predictive models and evaluate their performance. The analyses included the use of kernel-based models fitting either all single nucleotide polymorphisms (SNP; 55K) or only markers with presumed functional roles, such as SNP linked to Gene Ontology or Medical Subject Heading terms related to male fertility, or SNP significantly associated with SCR. Both single- and multikernel models were evaluated using linear and Gaussian kernels. Predictive ability was evaluated in 5-fold cross-validation. The entire set of SNP exhibited predictive correlations around 0.35. Neither Gene Ontology nor Medical Subject Heading gene sets achieved predictive abilities higher than their counterparts using random sets of SNP. Notably, kernel models fitting significant SNP achieved the best performance with increases in accuracy up to 5% compared with the standard whole-genome approach. Models fitting Gaussian kernels outperformed their counterparts fitting linear kernels irrespective of the set of SNP. Overall, our findings suggest that genomic prediction of bull fertility is feasible in dairy cattle. This provides potential for accurate genome-guided decisions, such as early culling of bull calves with low SCR predictions. In addition, exploiting nonlinear effects through the use of Gaussian kernels together with the incorporation of relevant markers seems to be a promising alternative to the standard approach. The inclusion of gene set results into prediction models deserves further research.
Collapse
Affiliation(s)
- Rostam Abdollahi-Arpanahi
- Department of Animal Sciences, University of Florida, Gainesville 32611; Department of Animal and Poultry Science, University of Tehran, Pakdasht, Iran 3391653755
| | - Gota Morota
- Department of Animal Science, University of Nebraska, Lincoln 68583
| | - Francisco Peñagaricano
- Department of Animal Sciences, University of Florida, Gainesville 32611; University of Florida Genetics Institute, Gainesville 32611.
| |
Collapse
|
50
|
Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models. Nat Commun 2017; 8:456. [PMID: 28878256 PMCID: PMC5587666 DOI: 10.1038/s41467-017-00470-2] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2016] [Accepted: 06/30/2017] [Indexed: 01/03/2023] Open
Abstract
Using genotype data to perform accurate genetic prediction of complex traits can facilitate genomic selection in animal and plant breeding programs, and can aid in the development of personalized medicine in humans. Because most complex traits have a polygenic architecture, accurate genetic prediction often requires modeling all genetic variants together via polygenic methods. Here, we develop such a polygenic method, which we refer to as the latent Dirichlet process regression model. Dirichlet process regression is non-parametric in nature, relies on the Dirichlet process to flexibly and adaptively model the effect size distribution, and thus enjoys robust prediction performance across a broad spectrum of genetic architectures. We compare Dirichlet process regression with several commonly used prediction methods with simulations. We further apply Dirichlet process regression to predict gene expressions, to conduct PrediXcan based gene set test, to perform genomic selection of four traits in two species, and to predict eight complex traits in a human cohort.Genetic prediction of complex traits with polygenic architecture has wide application from animal breeding to disease prevention. Here, Zeng and Zhou develop a non-parametric genetic prediction method based on latent Dirichlet Process regression models.
Collapse
|