1
|
Zhang Z, Zhao W, Wang Z, Pan Y, Wang Q, Zhang Z. Integration of ssGWAS and ROH analyses for uncovering genetic variants associated with reproduction traits in Large White pigs. Anim Genet 2024. [PMID: 39129705 DOI: 10.1111/age.13465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 05/26/2024] [Accepted: 07/05/2024] [Indexed: 08/13/2024]
Abstract
The low heritability of reproduction traits such as total number born (TNB), number born alive (NBA) and adjusted litter weight until 21 days at weaning (ALW) poses a challenge for genetic improvement. In this study, we aimed to identify genetic variants that influence these traits and evaluate the accuracy of genomic selection (GS) using these variants as genomic features. We performed single-step genome-wide association studies (ssGWAS) on 17 823 Large White (LW) pigs, of which 2770 were genotyped by 50K single nucleotide polymorphism (SNP) chips. Additionally, we analyzed runs of homozygosity (ROH) in the population and tested their effects on the traits. The genomic feature best linear unbiased prediction (GFBLUP) was then carried out in an independent population of 350 LW pigs using identified trait-related SNP subsets as genomic features. As a result, our findings identified five, one and four SNP windows that explaining more than 1% of genetic variance for ALW, TNB, and NBA, respectively and discovered 358 hotspots and nine ROH islands. The ROH SSC1:21814570-27186456 and SSC11:7220366-14276394 were found to be significantly associated with ALW and NBA, respectively. We assessed the genomic estimated breeding value accuracy through 20 replicates of five-fold cross-validation. Our findings demonstrate that GFBLUP, incorporating SNPs located in effective ROH (p-value < 0.05) as genomic features, might enhance GS accuracy for ALW compared with GBLUP. Additionally, using SNPs explaining more than 0.1% of the genetic variance in ssGWAS for NBA as genomic features might improve the GS accuracy, too. However, it is important to note that the incorporation of inappropriate genomic features can significantly reduce GS accuracy. In conclusion, our findings provide valuable insights into the genetic mechanisms of reproductive traits in pigs and suggest that the ssGWAS and ROH have the potential to enhance the accuracy of GS for reproductive traits in LW pigs.
Collapse
Affiliation(s)
- Zhenyang Zhang
- Department of Animal Science, College of Animal Science, Zhejiang University, Hangzhou, China
| | - Wei Zhao
- SciGene Biotechnology Co. Ltd, Hefei, China
| | - Zhen Wang
- Department of Animal Science, College of Animal Science, Zhejiang University, Hangzhou, China
| | - Yuchun Pan
- Department of Animal Science, College of Animal Science, Zhejiang University, Hangzhou, China
- Hainan Institute, Zhejiang University, Sanya, China
| | - Qishan Wang
- Department of Animal Science, College of Animal Science, Zhejiang University, Hangzhou, China
- Hainan Institute, Zhejiang University, Sanya, China
| | - Zhe Zhang
- Department of Animal Science, College of Animal Science, Zhejiang University, Hangzhou, China
| |
Collapse
|
2
|
Wu H, Gao B, Zhang R, Huang Z, Yin Z, Hu X, Yang CX, Du ZQ. Residual network improves the prediction accuracy of genomic selection. Anim Genet 2024; 55:599-611. [PMID: 38746973 DOI: 10.1111/age.13445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 04/21/2024] [Accepted: 04/29/2024] [Indexed: 07/04/2024]
Abstract
Genetic improvement of complex traits in animal and plant breeding depends on the efficient and accurate estimation of breeding values. Deep learning methods have been shown to be not superior over traditional genomic selection (GS) methods, partially due to the degradation problem (i.e. with the increase of the model depth, the performance of the deeper model deteriorates). Since the deep learning method residual network (ResNet) is designed to solve gradient degradation, we examined its performance and factors related to its prediction accuracy in GS. Here we compared the prediction accuracy of conventional genomic best linear unbiased prediction, Bayesian methods (BayesA, BayesB, BayesC, and Bayesian Lasso), and two deep learning methods, convolutional neural network and ResNet, on three datasets (wheat, simulated and real pig data). ResNet outperformed other methods in both Pearson's correlation coefficient (PCC) and mean squared error (MSE) on the wheat and simulated data. For the pig backfat depth trait, ResNet still had the lowest MSE, whereas Bayesian Lasso had the highest PCC. We further clustered the pig data into four groups and, on one separated group, ResNet had the highest prediction accuracy (both PCC and MSE). Transfer learning was adopted and capable of enhancing the performance of both convolutional neural network and ResNet. Taken together, our findings indicate that ResNet could improve GS prediction accuracy, affected potentially by factors such as the genetic architecture of complex traits, data volume, and heterogeneity.
Collapse
Affiliation(s)
- Huaxuan Wu
- College of Animal Science and Technology, Yangtze University, Jingzhou, Hubei, China
| | - Bingxi Gao
- College of Animal Science and Technology, Yangtze University, Jingzhou, Hubei, China
| | - Rong Zhang
- College of Animal Science and Technology, Yangtze University, Jingzhou, Hubei, China
| | - Zehang Huang
- College of Animal Science and Technology, Yangtze University, Jingzhou, Hubei, China
| | - Zongjun Yin
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, Anhui, China
| | - Xiaoxiang Hu
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, China
| | - Cai-Xia Yang
- College of Animal Science and Technology, Yangtze University, Jingzhou, Hubei, China
| | - Zhi-Qiang Du
- College of Animal Science and Technology, Yangtze University, Jingzhou, Hubei, China
| |
Collapse
|
3
|
Pedrosa VB, Chen SY, Gloria LS, Doucette JS, Boerman JP, Rosa GJM, Brito LF. Machine learning methods for genomic prediction of cow behavioral traits measured by automatic milking systems in North American Holstein cattle. J Dairy Sci 2024; 107:4758-4771. [PMID: 38395400 DOI: 10.3168/jds.2023-24082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 01/18/2024] [Indexed: 02/25/2024]
Abstract
Identifying genome-enabled methods that provide more accurate genomic prediction is crucial when evaluating complex traits such as dairy cow behavior. In this study, we aimed to compare the predictive performance of traditional genomic prediction methods and deep learning algorithms for genomic prediction of milking refusals (MREF) and milking failures (MFAIL) in North American Holstein cows measured by automatic milking systems (milking robots). A total of 1,993,509 daily records from 4,511 genotyped Holstein cows were collected by 36 milking robot stations. After quality control, 57,600 SNPs were available for the analyses. Four genomic prediction methods were considered: Bayesian least absolute shrinkage and selection operator (LASSO), multiple layer perceptron (MLP), convolutional neural network (CNN), and GBLUP. We implemented the first 3 methods using the Keras and TensorFlow libraries in Python (v.3.9) but the GBLUP method was implemented using the BLUPF90+ family programs. The accuracy of genomic prediction (mean square error) for MREF and MFAIL was 0.34 (0.08) and 0.27 (0.08) based on LASSO, 0.36 (0.09) and 0.32 (0.09) for MLP, 0.37 (0.08) and 0.30 (0.09) for CNN, and 0.35 (0.09) and 0.31(0.09) based on GBLUP, respectively. Additionally, we observed a lower reranking of top selected individuals based on the MLP versus CNN methods compared with the other approaches for both MREF and MFAIL. Although the deep learning methods showed slightly higher accuracies than GBLUP, the results may not be sufficient to justify their use over traditional methods due to their higher computational demand and the difficulty of performing genomic prediction for nongenotyped individuals using deep learning procedures. Overall, this study provides insights into the potential feasibility of using deep learning methods to enhance genomic prediction accuracy for behavioral traits in livestock. Further research is needed to determine their practical applicability to large dairy cattle breeding programs.
Collapse
Affiliation(s)
- Victor B Pedrosa
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907
| | - Shi-Yi Chen
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
| | - Leonardo S Gloria
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907
| | - Jarrod S Doucette
- Agriculture Information Technology (AgIT), Purdue University, West Lafayette, IN 47907
| | | | - Guilherme J M Rosa
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI, 53706
| | - Luiz F Brito
- Department of Animal Sciences, Purdue University, West Lafayette, IN 47907.
| |
Collapse
|
4
|
Wang H, Li C, Li J, Zhang R, An X, Yuan C, Guo T, Yue Y. Genomic Selection for Weaning Weight in Alpine Merino Sheep Based on GWAS Prior Marker Information. Animals (Basel) 2024; 14:1904. [PMID: 38998016 PMCID: PMC11240623 DOI: 10.3390/ani14131904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 06/19/2024] [Accepted: 06/24/2024] [Indexed: 07/14/2024] Open
Abstract
This study aims to compare the accuracy of genomic estimated breeding values (GEBV) estimated using a genomic best linear unbiased prediction (GBLUP) method and GEBV estimates incorporating prior marker information from a genome-wide association study (GWAS) for the weaning weight trait in highland Merino sheep. The objective is to provide theoretical and technical support for improving the accuracy of genomic selection. The study used a population of 1007 highland Merino ewes, with the weaning weight at 3 months as the target trait. The population was randomly divided into two groups. The first group was used for GWAS analysis to identify significant markers, and the top 5%, top 10%, top 15%, and top 20% markers were selected as prior marker information. The second group was used to estimate genetic parameters and compare the accuracy of GEBV predictions using different prior marker information. The accuracy was obtained using a five-fold cross-validation. Finally, both groups were subjected to cross-validation. The study's findings revealed that the heritability of the weaning weight trait, as calculated using the GBLUP model, ranged from 0.122 to 0.394, with corresponding prediction accuracies falling between 0.075 and 0.228. By incorporating prior marker information from GWAS, the heritability was enhanced to a range of 0.125 to 0.407. The inclusion of the top 5% to top 20% significant SNPs from GWAS results as prior information into GS showed potential for improving the accuracy of predicting genomic breeding value.
Collapse
Affiliation(s)
- Haifeng Wang
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Chenglan Li
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Jianye Li
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Rui Zhang
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Xuejiao An
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Chao Yuan
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Tingting Guo
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| | - Yaojing Yue
- Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
- Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou 730050, China
| |
Collapse
|
5
|
Li X, Chen X, Wang Q, Yang N, Sun C. Integrating Bioinformatics and Machine Learning for Genomic Prediction in Chickens. Genes (Basel) 2024; 15:690. [PMID: 38927626 PMCID: PMC11202573 DOI: 10.3390/genes15060690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 05/12/2024] [Accepted: 05/23/2024] [Indexed: 06/28/2024] Open
Abstract
Genomic prediction plays an increasingly important role in modern animal breeding, with predictive accuracy being a crucial aspect. The classical linear mixed model is gradually unable to accommodate the growing number of target traits and the increasingly intricate genetic regulatory patterns. Hence, novel approaches are necessary for future genomic prediction. In this study, we used an illumina 50K SNP chip to genotype 4190 egg-type female Rhode Island Red chickens. Machine learning (ML) and classical bioinformatics methods were integrated to fit genotypes with 10 economic traits in chickens. We evaluated the effectiveness of ML methods using Pearson correlation coefficients and the RMSE between predicted and actual phenotypic values and compared them with rrBLUP and BayesA. Our results indicated that ML algorithms exhibit significantly superior performance to rrBLUP and BayesA in predicting body weight and eggshell strength traits. Conversely, rrBLUP and BayesA demonstrated 2-58% higher predictive accuracy in predicting egg numbers. Additionally, the incorporation of suggestively significant SNPs obtained through the GWAS into the ML models resulted in an increase in the predictive accuracy of 0.1-27% across nearly all traits. These findings suggest the potential of combining classical bioinformatics methods with ML techniques to improve genomic prediction in the future.
Collapse
Affiliation(s)
| | | | | | | | - Congjiao Sun
- State Key Laboratory of Animal Biotech Breeding and Frontiers Science Center for Molecular Design Breeding (MOE), China Agricultural University, Beijing 100193, China; (X.L.); (X.C.); (Q.W.); (N.Y.)
| |
Collapse
|
6
|
Gu LL, Yang RQ, Wang ZY, Jiang D, Fang M. Ensemble learning for integrative prediction of genetic values with genomic variants. BMC Bioinformatics 2024; 25:120. [PMID: 38515026 PMCID: PMC10956256 DOI: 10.1186/s12859-024-05720-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 02/26/2024] [Indexed: 03/23/2024] Open
Abstract
BACKGROUND Whole genome variants offer sufficient information for genetic prediction of human disease risk, and prediction of animal and plant breeding values. Many sophisticated statistical methods have been developed for enhancing the predictive ability. However, each method has its own advantages and disadvantages, so far, no one method can beat others. RESULTS We herein propose an Ensemble Learning method for Prediction of Genetic Values (ELPGV), which assembles predictions from several basic methods such as GBLUP, BayesA, BayesB and BayesCπ, to produce more accurate predictions. We validated ELPGV with a variety of well-known datasets and a serious of simulated datasets. All revealed that ELPGV was able to significantly enhance the predictive ability than any basic methods, for instance, the comparison p-value of ELPGV over basic methods were varied from 4.853E-118 to 9.640E-20 for WTCCC dataset. CONCLUSIONS ELPGV is able to integrate the merit of each method together to produce significantly higher predictive ability than any basic methods and it is simple to implement, fast to run, without using genotype data. is promising for wide application in genetic predictions.
Collapse
Affiliation(s)
- Lin-Lin Gu
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs and Fisheries College, Jimei University, Xiamen, People's Republic of China
| | - Run-Qing Yang
- Research Center for Aquatic Biotechnology, Chinese Academy of Fishery Sciences, Beijing, People's Republic of China
| | - Zhi-Yong Wang
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs and Fisheries College, Jimei University, Xiamen, People's Republic of China.
| | - Dan Jiang
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs and Fisheries College, Jimei University, Xiamen, People's Republic of China.
| | - Ming Fang
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs and Fisheries College, Jimei University, Xiamen, People's Republic of China.
- Life Science College, Heilongjiang Bayi Agricultural University, Daqing, People's Republic of China.
| |
Collapse
|
7
|
Zhou C, Jiang W, Guo J, Zhu L, Liu L, Liu S, Chen R, Du B, Huang J. Genome-wide association study and genomic prediction for resistance to brown planthopper in rice. FRONTIERS IN PLANT SCIENCE 2024; 15:1373081. [PMID: 38576786 PMCID: PMC10991774 DOI: 10.3389/fpls.2024.1373081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 03/08/2024] [Indexed: 04/06/2024]
Abstract
The brown planthopper (BPH) is the most destructive insect pest that threatens rice production globally. Developing rice varieties incorporating BPH-resistant genes has proven to be an effective control measure against BPH. In this study, we assessed the resistance of a core collection consisting of 502 rice germplasms by evaluating resistance scores, weight gain rates and honeydew excretions. A total of 117 rice varieties (23.31%) exhibited resistance to BPH. Genome-wide association studies (GWAS) were performed on both the entire panel of 502 rice varieties and its subspecies, and 6 loci were significantly associated with resistance scores (P value < 1.0e-8). Within these loci, we identified eight candidate genes encoding receptor-like protein kinase (RLK), nucleotide-binding and leucine-rich repeat (NB-LRR), or LRR proteins. Two loci had not been detected in previous study and were entirely novel. Furthermore, we evaluated the predictive ability of genomic selection for resistance to BPH. The results revealed that the highest prediction accuracy for BPH resistance reached 0.633. As expected, the prediction accuracy increased progressively with an increasing number of SNPs, and a total of 6.7K SNPs displayed comparable accuracy to 268K SNPs. Among various statistical models tested, the random forest model exhibited superior predictive accuracy. Moreover, increasing the size of training population improved prediction accuracy; however, there was no significant difference in prediction accuracy between a training population size of 737 and 1179. Additionally, when there existed close genetic relatedness between the training and validation populations, higher prediction accuracies were observed compared to scenarios when they were genetically distant. These findings provide valuable resistance candidate genes and germplasm resources and are crucial for the application of genomic selection for breeding durable BPH-resistant rice varieties.
Collapse
Affiliation(s)
- Cong Zhou
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences/The Key Laboratory of Biology and Genetic Improvement of Oil Crops, The Ministry of Agriculture and Rural Affairs, Wuhan, China
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, China
| | - Weihua Jiang
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, China
| | - Jianping Guo
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, China
| | - Lili Zhu
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, China
| | - Lijiang Liu
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences/The Key Laboratory of Biology and Genetic Improvement of Oil Crops, The Ministry of Agriculture and Rural Affairs, Wuhan, China
| | - Shengyi Liu
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences/The Key Laboratory of Biology and Genetic Improvement of Oil Crops, The Ministry of Agriculture and Rural Affairs, Wuhan, China
| | - Rongzhi Chen
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, China
| | - Bo Du
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, China
| | - Jin Huang
- Cash Crops Research Institute, Hubei Academy of Agricultural Sciences, Wuhan, China
| |
Collapse
|
8
|
Zhu D, Zhao Y, Zhang R, Wu H, Cai G, Wu Z, Wang Y, Hu X. Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population. Genet Sel Evol 2023; 55:72. [PMID: 37853325 PMCID: PMC10583454 DOI: 10.1186/s12711-023-00843-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 09/14/2023] [Indexed: 10/20/2023] Open
Abstract
BACKGROUND Although the accumulation of whole-genome sequencing (WGS) data has accelerated the identification of mutations underlying complex traits, its impact on the accuracy of genomic predictions is limited. Reliable genotyping data and pre-selected beneficial loci can be used to improve prediction accuracy. Previously, we reported a low-coverage sequencing genotyping method that yielded 11.3 million highly accurate single-nucleotide polymorphisms (SNPs) in pigs. Here, we introduce a method termed selective linkage disequilibrium pruning (SLDP), which refines the set of SNPs that show a large gain during prediction of complex traits using whole-genome SNP data. RESULTS We used the SLDP method to identify and select markers among millions of SNPs based on genome-wide association study (GWAS) prior information. We evaluated the performance of SLDP with respect to three real traits and six simulated traits with varying genetic architectures using two representative models (genomic best linear unbiased prediction and BayesR) on samples from 3579 Duroc boars. SLDP was determined by testing 180 combinations of two core parameters (GWAS P-value thresholds and linkage disequilibrium r2). The parameters for each trait were optimized in the training population by five fold cross-validation and then tested in the validation population. Similar to previous GWAS prior-based methods, the performance of SLDP was mainly affected by the genetic architecture of the traits analyzed. Specifically, SLDP performed better for traits controlled by major quantitative trait loci (QTL) or a small number of quantitative trait nucleotides (QTN). Compared with two commercial SNP chips, genotyping-by-sequencing data, and an unselected whole-genome SNP panel, the SLDP strategy led to significant improvements in prediction accuracy, which ranged from 0.84 to 3.22% for real traits controlled by major or moderate QTL and from 1.23 to 11.47% for simulated traits controlled by a small number of QTN. CONCLUSIONS The SLDP marker selection method can be incorporated into mainstream prediction models to yield accuracy improvements for traits with a relatively simple genetic architecture, however, it has no significant advantage for traits not controlled by major QTL. The main factors that affect its performance are the genetic architecture of traits and the reliability of GWAS prior information. Our findings can facilitate the application of WGS-based genomic selection.
Collapse
Affiliation(s)
- Di Zhu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Yiqiang Zhao
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Ran Zhang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Hanyu Wu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
- National Research Facility for Phenotypic and Genotypic Analysis of Model Animals (Beijing), China Agricultural University, Beijing, China
| | - Gengyuan Cai
- National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China
| | - Zhenfang Wu
- National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China.
| | - Yuzhe Wang
- National Research Facility for Phenotypic and Genotypic Analysis of Model Animals (Beijing), China Agricultural University, Beijing, China.
| | - Xiaoxiang Hu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China.
| |
Collapse
|
9
|
Melchinger AE, Fernando R, Stricker C, Schön CC, Auinger HJ. Genomic prediction in hybrid breeding: I. Optimizing the training set design. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:176. [PMID: 37532821 PMCID: PMC10397156 DOI: 10.1007/s00122-023-04413-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 06/23/2023] [Indexed: 08/04/2023]
Abstract
KEY MESSAGE Training sets produced by maximizing the number of parent lines, each involved in one cross, had the highest prediction accuracy for H0 hybrids, but lowest for H1 and H2 hybrids. Genomic prediction holds great promise for hybrid breeding but optimum composition of the training set (TS) as determined by the number of parents (nTS) and crosses per parent (c) has received little attention. Our objective was to examine prediction accuracy ([Formula: see text]) of GCA for lines used as parents of the TS (I1 lines) or not (I0 lines), and H0, H1 and H2 hybrids, comprising crosses of type I0 × I0, I1 × I0 and I1 × I1, respectively, as function of nTS and c. In the theory, we developed estimates for [Formula: see text] of GBLUPs for hybrids: (i)[Formula: see text] based on the expected prediction accuracy, and (ii) [Formula: see text] based on [Formula: see text] of GBLUPs of GCA and SCA effects. In the simulation part, hybrid populations were generated using molecular data from two experimental maize data sets. Additive and dominance effects of QTL borrowed from literature were used to simulate six scenarios of traits differing in the proportion (τSCA = 1%, 6%, 22%) of SCA variance in σG2 and heritability (h2 = 0.4, 0.8). Values of [Formula: see text] and [Formula: see text] closely agreed with [Formula: see text] for hybrids. For given size NTS = nTS × c of TS, [Formula: see text] of H0 hybrids and GCA of I0 lines was highest for c = 1. Conversely, for GCA of I1 lines and H1 and H2 hybrids, c = 1 yielded lowest [Formula: see text] with concordant results across all scenarios for both data sets. In view of these opposite trends, the optimum choice of c for maximizing selection response across all types of hybrids depends on the size and resources of the breeding program.
Collapse
Affiliation(s)
- Albrecht E Melchinger
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany.
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70599, Stuttgart, Germany.
| | - Rohan Fernando
- Department of Animal Science, Iowa State University, Ames, IA, 50011, USA
| | - Christian Stricker
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Chris-Carolin Schön
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Hans-Jürgen Auinger
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| |
Collapse
|
10
|
Liang M, Cao S, Deng T, Du L, Li K, An B, Du Y, Xu L, Zhang L, Gao X, Li J, Guo P, Gao H. MAK: a machine learning framework improved genomic prediction via multi-target ensemble regressor chains and automatic selection of assistant traits. Brief Bioinform 2023; 24:7031157. [PMID: 36752363 DOI: 10.1093/bib/bbad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 01/13/2023] [Accepted: 01/20/2023] [Indexed: 02/09/2023] Open
Abstract
Incorporating the genotypic and phenotypic of the correlated traits into the multi-trait model can significantly improve the prediction accuracy of the target trait in animal and plant breeding, as well as human genetics. However, in most cases, the phenotypic information of the correlated and target trait of the individual to be evaluated was null simultaneously, particularly for the newborn. Therefore, we propose a machine learning framework, MAK, to improve the prediction accuracy of the target trait by constructing the multi-target ensemble regression chains and selecting the assistant trait automatically, which predicted the genomic estimated breeding values of the target trait using genotypic information only. The prediction ability of MAK was significantly more robust than the genomic best linear unbiased prediction, BayesB, BayesRR and the multi trait Bayesian method in the four real animal and plant datasets, and the computational efficiency of MAK was roughly 100 times faster than BayesB and BayesRR.
Collapse
Affiliation(s)
- Mang Liang
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Sheng Cao
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Tianyu Deng
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Lili Du
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Keanning Li
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Bingxing An
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Yueying Du
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Lingyang Xu
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Lupei Zhang
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Xue Gao
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Junya Li
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | | | - Huijiang Gao
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| |
Collapse
|
11
|
Tang Z, Yin L, Yin D, Zhang H, Fu Y, Zhou G, Zhao Y, Wang Z, Liu X, Li X, Zhao S. Development and application of an efficient genomic mating method to maximize the production performances of three-way crossbred pigs. Brief Bioinform 2023; 24:6961793. [PMID: 36575830 DOI: 10.1093/bib/bbac587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 11/24/2022] [Accepted: 11/30/2022] [Indexed: 12/29/2022] Open
Abstract
Creating synthetic lines is the standard mating mode for commercial pig production. Traditional mating performance was evaluated through a strictly designed cross-combination test at the 'breed level' to maximize the benefits of production. The Duroc-Landrace-Yorkshire (DLY) three-way crossbred production system became the most widely used breeding scheme for pigs. Here, we proposed an 'individual level' genomic mating procedure that can be applied to commercial pig production with efficient algorithms for estimating marker effects and for allocating the appropriate boar-sow pairs, which can be freely accessed to public in our developed HIBLUP software at https://www.hiblup.com/tutorials#genomic-mating. A total of 875 Duroc boars, 350 Landrace-Yorkshire sows and 3573 DLY pigs were used to carry out the genomic mating to assess the production benefits theoretically. The results showed that genomic mating significantly improved the performances of progeny across different traits compared with random mating, such as the feed conversion rate, days from 30 to 120 kg and eye muscle area could be improved by -0.12, -4.64 d and 2.65 cm2, respectively, which were consistent with the real experimental validations. Overall, our findings indicated that genomic mating is an effective strategy to improve the performances of progeny by maximizing their total genetic merit with consideration of both additive and dominant effects. Also, a herd of boars from a richer genetic source will increase the effectiveness of genomic mating further.
Collapse
Affiliation(s)
- Zhenshuang Tang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Lilin Yin
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China.,Frontiers Science Center for Animal Breeding and Sustainable Production, Wuhan 430070, PR China
| | - Dong Yin
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Haohao Zhang
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, PR China
| | - Yuhua Fu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China.,Frontiers Science Center for Animal Breeding and Sustainable Production, Wuhan 430070, PR China
| | - Guangliang Zhou
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Yunxiang Zhao
- School of Life Sciences and Engineering, Foshan University, Foshan 528225, PR China
| | - Zhiquan Wang
- Wuhan Yingzi Gene Technology Co. LTD, Wuhan 430070, PR China
| | - Xiaolei Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China.,Frontiers Science Center for Animal Breeding and Sustainable Production, Wuhan 430070, PR China.,Hubei Hongshan Laboratory, Wuhan 430070, PR China
| | - Xinyun Li
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China.,Frontiers Science Center for Animal Breeding and Sustainable Production, Wuhan 430070, PR China
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China.,Frontiers Science Center for Animal Breeding and Sustainable Production, Wuhan 430070, PR China.,Hubei Hongshan Laboratory, Wuhan 430070, PR China
| |
Collapse
|
12
|
Vu NT, Phuc TH, Nguyen NH, Van Sang N. Effects of common full-sib families on accuracy of genomic prediction for tagging weight in striped catfish Pangasianodon hypophthalmus. Front Genet 2023; 13:1081246. [PMID: 36685869 PMCID: PMC9845282 DOI: 10.3389/fgene.2022.1081246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 12/06/2022] [Indexed: 01/06/2023] Open
Abstract
Common full-sib families (c 2 ) make up a substantial proportion of total phenotypic variation in traits of commercial importance in aquaculture species and omission or inclusion of the c 2 resulted in possible changes in genetic parameter estimates and re-ranking of estimated breeding values. However, the impacts of common full-sib families on accuracy of genomic prediction for commercial traits of economic importance are not well known in many species, including aquatic animals. This research explored the impacts of common full-sib families on accuracy of genomic prediction for tagging weight in a population of striped catfish comprising 11,918 fish traced back to the base population (four generations), in which 560 individuals had genotype records of 14,154 SNPs. Our single step genomic best linear unbiased prediction (ssGLBUP) showed that the accuracy of genomic prediction for tagging weight was reduced by 96.5%-130.3% when the common full-sib families were included in statistical models. The reduction in the prediction accuracy was to a smaller extent in multivariate analysis than in univariate models. Imputation of missing genotypes somewhat reduced the upward biases in the prediction accuracy for tagging weight. It is therefore suggested that genomic evaluation models for traits recorded during the early phase of growth development should account for the common full-sib families to minimise possible biases in the accuracy of genomic prediction and hence, selection response.
Collapse
Affiliation(s)
- Nguyen Thanh Vu
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia,Center for Bio-Innovation, University of the Sunshine Coast, Maroochydore, QLD, Australia,Research Institute for Aquaculture No. 2, Ho Chi Minh City, Vietnam
| | - Tran Huu Phuc
- Research Institute for Aquaculture No. 2, Ho Chi Minh City, Vietnam
| | - Nguyen Hong Nguyen
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia,Center for Bio-Innovation, University of the Sunshine Coast, Maroochydore, QLD, Australia,*Correspondence: Nguyen Hong Nguyen, ; Nguyen Van Sang,
| | - Nguyen Van Sang
- Research Institute for Aquaculture No. 2, Ho Chi Minh City, Vietnam,*Correspondence: Nguyen Hong Nguyen, ; Nguyen Van Sang,
| |
Collapse
|
13
|
Nishio M, Inoue K, Arakawa A, Ichinoseki K, Kobayashi E, Okamura T, Fukuzawa Y, Ogawa S, Taniguchi M, Oe M, Takeda M, Kamata T, Konno M, Takagi M, Sekiya M, Matsuzawa T, Inoue Y, Watanabe A, Kobayashi H, Shibata E, Ohtani A, Yazaki R, Nakashima R, Ishii K. Application of linear and machine learning models to genomic prediction of fatty acid composition in Japanese Black cattle. Anim Sci J 2023; 94:e13883. [PMID: 37909231 DOI: 10.1111/asj.13883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 08/29/2023] [Accepted: 09/15/2023] [Indexed: 11/02/2023]
Abstract
We collected 3180 records of oleic acid (C18:1) and monounsaturated fatty acid (MUFA) measured using gas chromatography (GC) and 6960 records of C18:1 and MUFA measured using near-infrared spectroscopy (NIRS) in intermuscular fat samples of Japanese Black cattle. We compared genomic prediction performance for four linear models (genomic best linear unbiased prediction [GBLUP], kinship-adjusted multiple loci [KAML], BayesC, and BayesLASSO) and five machine learning models (Gaussian kernel [GK], deep kernel [DK], random forest [RF], extreme gradient boost [XGB], and convolutional neural network [CNN]). For GC-based C18:1 and MUFA, KAML showed the highest accuracies, followed by BayesC, XGB, DK, GK, and BayesLASSO, with more than 6% gain of accuracy by KAML over GBLUP. Meanwhile, DK had the highest prediction accuracy for NIRS-based C18:1 and MUFA, but the difference in accuracies between DK and KAML was slight. For all traits, accuracies of RF and CNN were lower than those of GBLUP. The KAML extends GBLUP methods, of which marker effects are weighted, and involves only additive genetic effects; whereas machine learning methods capture non-additive genetic effects. Thus, KAML is the most suitable method for breeding of fatty acid composition in Japanese Black cattle.
Collapse
Affiliation(s)
- Motohide Nishio
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | - Keiichi Inoue
- National Livestock Breeding Center, Fukushima, Japan
- University of Miyazaki, Miyazaki, Japan
| | - Aisaku Arakawa
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | | | - Eiji Kobayashi
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | | | - Yo Fukuzawa
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | - Shinichiro Ogawa
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | | | - Mika Oe
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | | | - Takehiro Kamata
- Aomori Prefectural Industrial Technology Research Center, Tsugaru, Japan
| | - Masaru Konno
- Iwate Agricultural Research Center Animal Industry Research Institute, Takizawa, Japan
| | - Michihiro Takagi
- Miyagi Prefecture Animal Industry Experiment Station, Osaki, Japan
| | - Mario Sekiya
- Akita Prefectural Livestock Experiment Station, Daisen, Japan
| | - Tamotsu Matsuzawa
- Livestock Research Centre, Fukushima Agricultural Technology Centre, Fukushima, Japan
| | - Yoshinobu Inoue
- Tottori Prefectural Livestock Research Center, Tottori, Japan
| | | | - Hiroshi Kobayashi
- Institute of Animal Production Okayama Prefectural Technology Center for Agriculture, Forestry and Fisheries, Misaki, Japan
| | - Eri Shibata
- Hiroshima Prefectural Technology Research Institute, Livestock Technology Research Center, Shobara, Japan
| | - Akihumi Ohtani
- Yamaguchi Prefectural Agriculture and Forestry General Technology Center, Mine, Japan
| | - Ryu Yazaki
- Oita Prefectural Agriculture, Forestry, and Fisheries Research Center, Takeda, Japan
| | - Ryotaro Nakashima
- Cattle Breeding Development Institute of Kagoshima Prefecture, Soo, Japan
| | - Kazuo Ishii
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| |
Collapse
|
14
|
Li H, Wang Z, Xu L, Li Q, Gao H, Ma H, Cai W, Chen Y, Gao X, Zhang L, Gao H, Zhu B, Xu L, Li J. Genomic prediction of carcass traits using different haplotype block partitioning methods in beef cattle. Evol Appl 2022; 15:2028-2042. [PMID: 36540636 PMCID: PMC9753827 DOI: 10.1111/eva.13491] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 09/18/2022] [Indexed: 09/22/2023] Open
Abstract
Genomic prediction (GP) based on haplotype alleles can capture quantitative trait loci (QTL) effects and increase predictive ability because the haplotypes are expected to be in linkage disequilibrium (LD) with QTL. In this study, we constructed haploblocks using LD-based and the fixed number of single nucleotide polymorphisms (fixed-SNP) methods with Illumina BovineHD chip in beef cattle. To evaluate the performance of different haplotype block partitioning methods, we constructed haploblocks based on LD thresholds (from r 2 > 0.2 to r 2 > 0.8) and the number of fixed-SNPs (5, 10, 20). The performance of predictive methods for three carcass traits including liveweight (LW), dressing percentage (DP), and longissimus dorsi muscle weight (LDMW) was evaluated using three approaches (GBLUP and BayesB model based on the SNP, GHBLUP, and BayesBH models based on the haploblock, and GHBLUP+GBLUP and BayesBH+BayesB models based on the combined haploblock and the nonblocked SNPs, which were located between blocks). In this study, we found the accuracies of LD-based and fixed-SNP haplotype Bayesian methods outperformed the Bayesian models (up to 8.54 ± 7.44% and 5.74 ± 2.95%, respectively). GHBLUP showed a high improvement (up to 11.29 ± 9.87%) compared with GBLUP. The Bayesian models have higher accuracies than BLUP models in most scenarios. The average computing time of the BayesBH+BayesB model can reduce by 29.3% compared with the BayesB model. The prediction accuracies using the LD-based haplotype method showed higher improvements than the fixed-SNP haplotype method. In addition, to avoid the influence of rare haplotypes generated from haplotype construction, we compared the performance of GP by filtering four types of minor haplotype allele frequency (MHAF) (0.01, 0.025, 0.05, and 0.1) under different conditions (LD levels were set at r 2 > 0.3, and the fixed number of SNPs was 5). We found the optimal MHAF threshold for LW was 0.01, and the optimal MHAF threshold for DP and LDMW was 0.025.
Collapse
Affiliation(s)
- Hongwei Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Zezhao Wang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Lei Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Qian Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Han Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Haoran Ma
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Wentao Cai
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Yan Chen
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Xue Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Lupei Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Huijiang Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Bo Zhu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Lingyang Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Junya Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| |
Collapse
|
15
|
Tahir MS, Porto-Neto LR, Reverter-Gomez T, Olasege BS, Sajid MR, Wockner KB, Tan AWL, Fortes MRS. Utility of multi-omics data to inform genomic prediction of heifer fertility traits. J Anim Sci 2022; 100:skac340. [PMID: 36239447 PMCID: PMC9733504 DOI: 10.1093/jas/skac340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 10/12/2022] [Indexed: 12/15/2022] Open
Abstract
Biologically informed single nucleotide polymorphisms (SNPs) impact genomic prediction accuracy of the target traits. Our previous genomics, proteomics, and transcriptomics work identified candidate genes related to puberty and fertility in Brahman heifers. We aimed to test this biological information for capturing heritability and predicting heifer fertility traits in another breed i.e., Tropical Composite. The SNP from the identified genes including 10 kilobases (kb) region on either side were selected as biologically informed SNP set. The SNP from the rest of the Bos taurus genes including 10-kb region on either side were selected as biologically uninformed SNP set. Bovine high-density (HD) complete SNP set (628,323 SNP) was used as a control. Two populations-Tropical Composites (N = 1331) and Brahman (N = 2310)-had records for three traits: pregnancy after first mating season (PREG1, binary), first conception score (FCS, score 1 to 3), and rebreeding score (REB, score 1 to 3.5). Using the best linear unbiased prediction method, effectiveness of each SNP set to predict the traits was tested in two scenarios: a 5-fold cross-validation within Tropical Composites using biological information from Brahman studies, and application of prediction equations from one breed to the other. The accuracy of prediction was calculated as the correlation between genomic estimated breeding values and adjusted phenotypes. Results show that biologically informed SNP set estimated heritabilities not significantly better than the control HD complete SNP set in Tropical Composites; however, it captured all the observed genetic variance in PREG1 and FCS when modeled together with the biologically uninformed SNP set. In 5-fold cross-validation within Tropical Composites, the biologically informed SNP set performed marginally better (statistically insignificant) in terms of prediction accuracies (PREG1: 0.20, FCS: 0.13, and REB: 0.12) as compared to HD complete SNP set (PREG1: 0.17, FCS: 0.10, and REB: 0.11), and biologically uninformed SNP set (PREG1: 0.16, FCS: 0.10, and REB: 0.11). Across-breed use of prediction equations still remained a challenge: accuracies by all SNP sets dropped to around zero for all traits. The performance of biologically informed SNP was not significantly better than other sets in Tropical Composites. However, results indicate that biological information obtained from Brahman was successful to predict the fertility traits in Tropical Composite population.
Collapse
Affiliation(s)
- Muhammad S Tahir
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| | - Laercio R Porto-Neto
- Commonwealth Scientific and Industrial Research Organization, St. Lucia, Brisbane 4072, QLD, Australia
| | - Toni Reverter-Gomez
- Commonwealth Scientific and Industrial Research Organization, St. Lucia, Brisbane 4072, QLD, Australia
| | - Babatunde S Olasege
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| | - Mirza R Sajid
- Department of Statistics, University of Gujrat, 50700 Punjab, Pakistan
| | - Kimberley B Wockner
- Queensland Department of Agriculture and Fisheries, Brisbane 4072, QLD, Australia
| | - Andre W L Tan
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| | - Marina R S Fortes
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia Campus, Brisbane 4072, QLD, Australia
| |
Collapse
|
16
|
A Method for Improving the Prediction of Outpatient Visits for Hospital Management: Bayesian Autoregressive Analysis. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:4718157. [PMID: 36277006 PMCID: PMC9581652 DOI: 10.1155/2022/4718157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 07/03/2022] [Accepted: 08/31/2022] [Indexed: 11/17/2022]
Abstract
The number of outpatient visits is generally influenced by various factors that are difficult to quantify and obtain, resulting in some irregular fluctuations. The traditional statistical methodology seldom considers these uncertainties. Accordingly, this paper presents a Bayesian autoregressive (AR) analysis to propose a forecasting framework to cope with the strict requirements. The AR model was conducted to identify the linear and autocorrelation relationships of historical series, and Bayesian inference was used to correct and optimize the AR model parameters. Posterior distribution of parameters was stably and reliably obtained by Gibbs sampling on the condition of the convergent Markov chain. Meanwhile, the lag orders of the AR model were adjusted based on the series characteristics. To increase the variability and generality of the dataset, the developed Bayesian AR model was evaluated at seven hospitals in China. The results demonstrated that the Bayesian AR model had varying degrees of decline in the MAPE value in the seven sets of experimental data. The reductions ranged from 0.1431% to 0.0342%, indicating effective optimization of the Bayesian inference in the AR model parameters and reflecting the useful correction of the lag order adjustment strategy. The proposed Bayesian AR framework showed high accuracy index and stable prediction accuracy, thereby outperforming the traditional AR model.
Collapse
|
17
|
Liu D, Xu Z, Zhao W, Wang S, Li T, Zhu K, Liu G, Zhao X, Wang Q, Pan Y, Ma P. Genetic parameters and genome-wide association for milk production traits and somatic cell score in different lactation stages of Shanghai Holstein population. Front Genet 2022; 13:940650. [PMID: 36134029 PMCID: PMC9483179 DOI: 10.3389/fgene.2022.940650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 08/04/2022] [Indexed: 11/13/2022] Open
Abstract
The aim of this study was to investigate the genetic parameters and genetic architectures of six milk production traits in the Shanghai Holstein population. The data used to estimate the genetic parameters consisted of 1,968,589 test-day records for 305,031 primiparous cows. Among the cows with phenotypes, 3,016 cows were genotyped with Illumina Bovine SNP50K BeadChip, GeneSeek Bovine 50K BeadChip, GeneSeek Bovine LD BeadChip v4, GeneSeek Bovine 150K BeadChip, or low-depth whole-genome sequencing. A genome-wide association study was performed to identify quantitative trait loci and genes associated with milk production traits in the Shanghai Holstein population using genotypes imputed to whole-genome sequences and both fixed and random model circulating probability unification and a mixed linear model with rMVP software. Estimated heritabilities (h2) varied from 0.04 to 0.14 for somatic cell score (SCS), 0.07 to 0.22 for fat percentage (FP), 0.09 to 0.27 for milk yield (MY), 0.06 to 0.23 for fat yield (FY), 0.09 to 0.26 for protein yield (PY), and 0.07 to 0.35 for protein percentage (PP), respectively. Within lactation, genetic correlations for SCS, FP, MY, FY, PY, and PP at different stages of lactation estimated in random regression model were ranged from -0.02 to 0.99, 0.18 to 0.99, 0.04 to 0.99, 0.04 to 0.99, 0.01 to 0.99, and 0.33 to 0.99, respectively. The genetic correlations were highest between adjacent DIM but decreased as DIM got further apart. Candidate genes included those related to production traits (DGAT1, MGST1, PTK2, and SCRIB), disease-related (LY6K, COL22A1, TECPR2, and PLCB1), heat stress-related (ITGA9, NDST4, TECPR2, and HSF1), and reproduction-related (7SK and DOCK2) genes. This study has shown that there are differences in the genetic mechanisms of milk production traits at different stages of lactation. Therefore, it is necessary to conduct research on milk production traits at different stages of lactation as different traits. Our results can also provide a theoretical basis for subsequent molecular breeding, especially for the novel genetic loci.
Collapse
Affiliation(s)
- Dengying Liu
- Shanghai Key Laboratory of Veterinary Biotechnology, Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Zhong Xu
- Hubei Key Laboratory of Animal Embryo and Molecular Breeding, Institute of Animal Husbandry and Veterinary, Hubei Provincial Academy of Agricultural Sciences, Wuhan, China
| | - Wei Zhao
- Shanghai Key Laboratory of Veterinary Biotechnology, Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Shiyi Wang
- Shanghai Key Laboratory of Veterinary Biotechnology, Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Tuowu Li
- Shanghai Key Laboratory of Veterinary Biotechnology, Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Kai Zhu
- Shanghai Dairy Cattle Breeding Centre Co, Ltd, Shanghai, China
| | - Guanglei Liu
- Shanghai Dairy Cattle Breeding Centre Co, Ltd, Shanghai, China
| | - Xiaoduo Zhao
- Shanghai Dairy Cattle Breeding Centre Co, Ltd, Shanghai, China
| | - Qishan Wang
- Department of Animal Breeding and Reproduction, College of Animal Science, Zhejiang University, Hangzhou, China
| | - Yuchun Pan
- Department of Animal Breeding and Reproduction, College of Animal Science, Zhejiang University, Hangzhou, China
| | - Peipei Ma
- Shanghai Key Laboratory of Veterinary Biotechnology, Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
18
|
Hao X, Liang A, Plastow G, Zhang C, Wang Z, Liu J, Salzano A, Gasparrini B, Campanile G, Zhang S, Yang L. An Integrative Genomic Prediction Approach for Predicting Buffalo Milk Traits by Incorporating Related Cattle QTLs. Genes (Basel) 2022; 13:genes13081430. [PMID: 36011341 PMCID: PMC9408041 DOI: 10.3390/genes13081430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/16/2022] Open
Abstract
Background: The 90K Axiom Buffalo SNP Array is expected to improve and speed up various genomic analyses for the buffalo (Bubalus bubalis). Genomic prediction is an effective approach in animal breeding to improve selection and reduce costs. As buffalo genome research is lagging behind that of the cow and production records are also limited, genomic prediction performance will be relatively poor. To improve the genomic prediction in buffalo, we introduced a new approach (pGBLUP) for genomic prediction of six buffalo milk traits by incorporating QTL information from the cattle milk traits in order to help improve the prediction performance for buffalo. Results: In simulations, the pGBLUP could outperform BayesR and the GBLUP if the prior biological information (i.e., the known causal loci) was appropriate; otherwise, it performed slightly worse than BayesR and equal to or better than the GBLUP. In real data, the heritability of the buffalo genomic region corresponding to the cattle milk trait QTLs was enriched (fold of enrichment > 1) in four buffalo milk traits (FY270, MY270, PY270, and PM) when the EBV was used as the response variable. The DEBV as the response variable yielded more reliable genomic predictions than the traditional EBV, as has been shown by previous research. The performance of the three approaches (GBLUP, BayesR, and pGBLUP) did not vary greatly in this study, probably due to the limited sample size, incomplete prior biological information, and less artificial selection in buffalo. Conclusions: To our knowledge, this study is the first to apply genomic prediction to buffalo by incorporating prior biological information. The genomic prediction of buffalo traits can be further improved with a larger sample size, higher-density SNP chips, and more precise prior biological information.
Collapse
Affiliation(s)
- Xingjie Hao
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
- Correspondence: (X.H.); (L.Y.)
| | - Aixin Liang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Graham Plastow
- Livestock Gentec Center, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2C8, Canada
| | - Chunyan Zhang
- Livestock Gentec Center, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2C8, Canada
| | - Zhiquan Wang
- Livestock Gentec Center, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2C8, Canada
| | - Jiajia Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Angela Salzano
- Department of Veterinary Medicine and Animal Productions, University of Naples “Federico II”, 80137 Naples, Italy
| | - Bianca Gasparrini
- Department of Veterinary Medicine and Animal Productions, University of Naples “Federico II”, 80137 Naples, Italy
| | - Giuseppe Campanile
- Department of Veterinary Medicine and Animal Productions, University of Naples “Federico II”, 80137 Naples, Italy
| | - Shujun Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Liguo Yang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
- Correspondence: (X.H.); (L.Y.)
| |
Collapse
|
19
|
Ren D, Cai X, Lin Q, Ye H, Teng J, Li J, Ding X, Zhang Z. Impact of linkage disequilibrium heterogeneity along the genome on genomic prediction and heritability estimation. Genet Sel Evol 2022; 54:47. [PMID: 35761182 PMCID: PMC9235212 DOI: 10.1186/s12711-022-00737-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 06/15/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Compared to medium-density single nucleotide polymorphism (SNP) data, high-density SNP data contain abundant genetic variants and provide more information for the genetic evaluation of livestock, but it has been shown that they do not confer any advantage for genomic prediction and heritability estimation. One possible reason is the uneven distribution of the linkage disequilibrium (LD) along the genome, i.e., LD heterogeneity among regions. The aim of this study was to effectively use genome-wide SNP data for genomic prediction and heritability estimation by using models that control LD heterogeneity among regions. METHODS The LD-adjusted kinship (LDAK) and LD-stratified multicomponent (LDS) models were used to control LD heterogeneity among regions and were compared with the classical model that has no such control. Simulated and real traits of 2000 dairy cattle individuals with imputed high-density (770K) SNP data were used. Five types of phenotypes were simulated, which were controlled by very strongly, strongly, moderately, weakly and very weakly tagged causal variants, respectively. The performances of the models with high- and medium-density (50K) panels were compared to verify that the models that controlled LD heterogeneity among regions were more effective with high-density data. RESULTS Compared to the medium-density panel, the use of the high-density panel did not improve and even decreased prediction accuracies and heritability estimates from the classical model for both simulated and real traits. Compared to the classical model, LDS effectively improved the accuracy of genomic predictions and unbiasedness of heritability estimates, regardless of the genetic architecture of the trait. LDAK applies only to traits that are mainly controlled by weakly tagged causal variants, but is still less effective than LDS for this type of trait. Compared with the classical model, LDS improved prediction accuracy by about 13% for simulated phenotypes and by 0.3 to ~ 10.7% for real traits with the high-density panel, and by ~ 1% for simulated phenotypes and by - 0.1 to ~ 6.9% for real traits with the medium-density panel. CONCLUSIONS Grouping SNPs based on regional LD to construct the LD-stratified multicomponent model can effectively eliminate the adverse effects of LD heterogeneity among regions, and greatly improve the efficiency of high-density SNP data for genomic prediction and heritability estimation.
Collapse
Affiliation(s)
- Duanyang Ren
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Xiaodian Cai
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Qing Lin
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Haoqiang Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Jinyan Teng
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Jiaqi Li
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Xiangdong Ding
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China.
| |
Collapse
|
20
|
Genome-Enabled Prediction Methods Based on Machine Learning. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:189-218. [PMID: 35451777 DOI: 10.1007/978-1-0716-2205-6_7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Growth of artificial intelligence and machine learning (ML) methodology has been explosive in recent years. In this class of procedures, computers get knowledge from sets of experiences and provide forecasts or classification. In genome-wide based prediction (GWP), many ML studies have been carried out. This chapter provides a description of main semiparametric and nonparametric algorithms used in GWP in animals and plants. Thirty-four ML comparative studies conducted in the last decade were used to develop a meta-analysis through a Thurstonian model, to evaluate algorithms with the best predictive qualities. It was found that some kernel, Bayesian, and ensemble methods displayed greater robustness and predictive ability. However, the type of study and data distribution must be considered in order to choose the most appropriate model for a given problem.
Collapse
|
21
|
Yang W, Guo T, Luo J, Zhang R, Zhao J, Warburton ML, Xiao Y, Yan J. Target-oriented prioritization: targeted selection strategy by integrating organismal and molecular traits through predictive analytics in breeding. Genome Biol 2022; 23:80. [PMID: 35292095 PMCID: PMC8922918 DOI: 10.1186/s13059-022-02650-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 03/08/2022] [Indexed: 11/10/2022] Open
Abstract
Genomic prediction in crop breeding is hindered by modeling on limited phenotypic traits. We propose an integrative multi-trait breeding strategy via machine learning algorithm, target-oriented prioritization (TOP). Using a large hybrid maize population, we demonstrate that the accuracy for identifying a candidate that is phenotypically closest to an ideotype, or target variety, achieves up to 91%. The strength of TOP is enhanced when omics level traits are included. We show that TOP enables selection of inbreds or hybrids that outperform existing commercial varieties. It improves multiple traits and accurately identifies improved candidates for new varieties, which will greatly influence breeding.
Collapse
Affiliation(s)
- Wenyu Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- College of Science, Huazhong Agricultural University, Wuhan, 430070, China
| | | | - Jingyun Luo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ruyang Zhang
- Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing Academy of Agricultural & Forestry Sciences, Beijing, 100097, China
| | - Jiuran Zhao
- Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing Academy of Agricultural & Forestry Sciences, Beijing, 100097, China
| | - Marilyn L Warburton
- United States Department of Agriculture-Agricultural Research Service, Corn Host Plant Resistance Research Unit, Box 9555, Mississippi State, MS, 39762, USA
| | - Yingjie Xiao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Hongshan Laboratory, Wuhan, 430070, China.
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Hongshan Laboratory, Wuhan, 430070, China.
| |
Collapse
|
22
|
He Z, Li S, Li W, Ding J, Zheng M, Li Q, Fahey AG, Wen J, Liu R, Zhao G. Comparison of genomic prediction methods for residual feed intake in broilers. Anim Genet 2022; 53:466-469. [PMID: 35292985 DOI: 10.1111/age.13186] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 02/24/2022] [Accepted: 02/28/2022] [Indexed: 11/30/2022]
Abstract
Residual feed intake (RFI) is a measure of the feed efficiency of animals. Previous studies have identified SNPs associated with RFI. The objective of this study was to compare the GBLUP model with the GA-BLUP model including previously identified associated SNPs. The nine associated SNPs were obtained from the genome-wide association study on a discovery population as preselection information. These models were analysed using ASREML software using a 5-fold cross-validation method on a validation population. With the genetic architecture (GA) matrix used, which was conducted with the nine RFI-associated SNPs, the prediction accuracy of RFI was improved compared with the original GBLUP model. The calculated optimal ω was 0.981 for RFI, which is in line with the optimal range from 0.9 to 1.0 in the gradient test. The prediction accuracy increased by 2% in the GA-BLUP model with ω being 0.981 compared with the GBLUP model. In conclusion, the GA-BLUP with the nine RFI-associated SNPs and an optimal ω can improve the prediction accuracy for a specific trait compared with GBLUP.
Collapse
Affiliation(s)
- Zhengxiao He
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.,School of Agriculture and Food Science, University College Dublin, Dublin, Ireland
| | - Sen Li
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Wei Li
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Jiqiang Ding
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Maiqing Zheng
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Qinghe Li
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Alan G Fahey
- School of Agriculture and Food Science, University College Dublin, Dublin, Ireland
| | - Jie Wen
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Ranran Liu
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Guiping Zhao
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| |
Collapse
|
23
|
Shi S, Zhang Z, Li B, Zhang S, Fang L. Incorporation of Trait-Specific Genetic Information into Genomic Prediction Models. Methods Mol Biol 2022; 2467:329-340. [PMID: 35451781 DOI: 10.1007/978-1-0716-2205-6_11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Due to the rapid development of high-throughput sequencing technology, we can easily obtain not only the genetic variants at the whole-genome sequence level (e.g., from 1000 Genomes project and 1000 Bull Genomes project), but also a wide range of functional annotations (e.g., enhancers and promoters from ENCODE, FAANG, and FarmGTEx projects) across a wide range of tissues, cell types, developmental stages, and environmental conditions. This huge amount of information leads to a revolution in studying genetics and genomics of complex traits in humans, livestock, and plant species. In this chapter, we focused on and reviewed the genomic prediction methods that incorporate external biological information into genomic prediction, such as sequence ontology, linkage disequilibrium (LD) of SNPs, quantitative trait loci (QTL), and multi-layer omics data (e.g., transcriptome, epigenome, and microbiome).
Collapse
Affiliation(s)
- Shaolei Shi
- College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Zhe Zhang
- Department of Animal Breeding and genetics, College of Animal Science, South China Agricultural University (SCAU), Guangzhou, China
| | - Bingjie Li
- The Roslin Institute Building, Scotland's Rural College, Edinburgh, UK
| | - Shengli Zhang
- College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Lingzhao Fang
- MRC Human Genetics Unit at the Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
24
|
Bayer PE, Petereit J, Danilevicz MF, Anderson R, Batley J, Edwards D. The application of pangenomics and machine learning in genomic selection in plants. THE PLANT GENOME 2021; 14:e20112. [PMID: 34288550 DOI: 10.1002/tpg2.20112] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 05/01/2021] [Indexed: 05/10/2023]
Abstract
Genomic selection approaches have increased the speed of plant breeding, leading to growing crop yields over the last decade. However, climate change is impacting current and future yields, resulting in the need to further accelerate breeding efforts to cope with these changing conditions. Here we present approaches to accelerate plant breeding and incorporate nonadditive effects in genomic selection by applying state-of-the-art machine learning approaches. These approaches are made more powerful by the inclusion of pangenomes, which represent the entire genome content of a species. Understanding the strengths and limitations of machine learning methods, compared with more traditional genomic selection efforts, is paramount to the successful application of these methods in crop breeding. We describe examples of genomic selection and pangenome-based approaches in crop breeding, discuss machine learning-specific challenges, and highlight the potential for the application of machine learning in genomic selection. We believe that careful implementation of machine learning approaches will support crop improvement to help counter the adverse outcomes of climate change on crop production.
Collapse
Affiliation(s)
- Philipp E Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jakob Petereit
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Monica Furaste Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Robyn Anderson
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| |
Collapse
|
25
|
Vu NT, Phuc TH, Oanh KTP, Sang NV, Trang TT, Nguyen NH. Accuracies of genomic predictions for disease resistance of striped catfish to Edwardsiella ictaluri using artificial intelligence algorithms. G3-GENES GENOMES GENETICS 2021; 12:6408442. [PMID: 34788431 PMCID: PMC8727988 DOI: 10.1093/g3journal/jkab361] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 10/10/2021] [Indexed: 02/04/2023]
Abstract
Assessments of genomic prediction accuracies using artificial intelligent (AI) algorithms (i.e., machine and deep learning methods) are currently not available or very limited in aquaculture species. The principal aim of this study was to examine the predictive performance of these new methods for disease resistance to Edwardsiella ictaluri in a population of striped catfish Pangasianodon hypophthalmus and to make comparisons with four common methods, i.e., pedigree-based best linear unbiased prediction (PBLUP), genomic-based best linear unbiased prediction (GBLUP), single-step GBLUP (ssGBLUP) and a nonlinear Bayesian approach (notably BayesR). Our analyses using machine learning (i.e., ML-KAML) and deep learning (i.e., DL-MLP and DL-CNN) together with the four common methods (PBLUP, GBLUP, ssGBLUP, and BayesR) were conducted for two main disease resistance traits (i.e., survival status coded as 0 and 1 and survival time, i.e., days that the animals were still alive after the challenge test) in a pedigree consisting of 560 individual animals (490 offspring and 70 parents) genotyped for 14,154 single nucleotide polymorphism (SNPs). The results using 6,470 SNPs after quality control showed that machine learning methods outperformed PBLUP, GBLUP, and ssGBLUP, with the increases in the prediction accuracies for both traits by 9.1–15.4%. However, the prediction accuracies obtained from machine learning methods were comparable to those estimated using BayesR. Imputation of missing genotypes using AlphaFamImpute increased the prediction accuracies by 5.3–19.2% in all the methods and data used. On the other hand, there were insignificant decreases (0.3–5.6%) in the prediction accuracies for both survival status and survival time when multivariate models were used in comparison to univariate analyses. Interestingly, the genomic prediction accuracies based on only highly significant SNPs (P < 0.00001, 318–400 SNPs for survival status and 1,362–1,589 SNPs for survival time) were somewhat lower (0.3–15.6%) than those obtained from the whole set of 6,470 SNPs. In most of our analyses, the accuracies of genomic prediction were somewhat higher for survival time than survival status (0/1 data). It is concluded that although there are prospects for the application of genomic selection to increase disease resistance to E. ictaluri in striped catfish breeding programs, further evaluation of these methods should be made in independent families/populations when more data are accumulated in future generations to avoid possible biases in the genetic parameters estimates and prediction accuracies for the disease-resistant traits studied in this population of striped catfish P. hypophthalmus.
Collapse
Affiliation(s)
- Nguyen Thanh Vu
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Genecology Research Center, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Research Institute for Aquaculture No.2, Ho Chi Minh 710000, Vietnam
| | - Tran Huu Phuc
- Research Institute for Aquaculture No.2, Ho Chi Minh 710000, Vietnam
| | - Kim Thi Phuong Oanh
- Institute of Genome Research, Vietnam Academy of Science and Technology, Hanoi, Vietnam
| | - Nguyen Van Sang
- Research Institute for Aquaculture No.2, Ho Chi Minh 710000, Vietnam
| | - Trinh Thi Trang
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Genecology Research Center, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Vietnam National University of Agriculture, Gia Lam 131000, Vietnam
| | - Nguyen Hong Nguyen
- School of Science, Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD, Australia.,Genecology Research Center, University of the Sunshine Coast, Sippy Downs, QLD, Australia
| |
Collapse
|
26
|
Jiang L, Li Z, Hayward JJ, Hayashi K, Krotscheck U, Todhunter RJ, Tang Y, Huang M. Genomic Prediction of Two Complex Orthopedic Traits Across Multiple Pure and Mixed Breed Dogs. Front Genet 2021; 12:666740. [PMID: 34630503 PMCID: PMC8492927 DOI: 10.3389/fgene.2021.666740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 09/06/2021] [Indexed: 11/20/2022] Open
Abstract
Canine hip dysplasia (CHD) and rupture of the cranial cruciate ligament (RCCL) are two complex inherited orthopedic traits of dogs. These two traits may occur concurrently in the same dog. Genomic prediction of these two diseases would benefit veterinary medicine, the dog’s owner, and dog breeders because of their high prevalence, and because both traits result in painful debilitating osteoarthritis in affected joints. In this study, 842 unique dogs from 6 breeds with hip and stifle phenotypes were genotyped on a customized Illumina high density 183 k single nucleotide polymorphism (SNP) array and also analyzed using an imputed dataset of 20,487,155 SNPs. To implement genomic prediction, two different statistical methods were employed: Genomic Best Linear Unbiased Prediction (GBLUP) and a Bayesian method called BayesC. The cross-validation results showed that the two methods gave similar prediction accuracy (r = 0.3–0.4) for CHD (measured as Norberg angle) and RCCL in the multi-breed population. For CHD, the average correlation of the AUC was 0.71 (BayesC) and 0.70 (GBLUP), which is a medium level of prediction accuracy and consistent with Pearson correlation results. For RCCL, the correlation of the AUC was slightly higher. The prediction accuracy of GBLUP from the imputed genotype data was similar to the accuracy from DNA array data. We demonstrated that the genomic prediction of CHD and RCCL with DNA array genotype data is feasible in a multiple breed population if there is a genetic connection, such as breed, between the reference population and the validation population. Albeit these traits have heritability of about one-third, higher accuracy is needed to implement in a natural population and predicting a complex phenotype will require much larger number of dogs within a breed and across breeds. It is possible that with higher accuracy, genomic prediction of these orthopedic traits could be implemented in a clinical setting for early diagnosis and treatment, and the selection of dogs for breeding. These results need continuous improvement in model prediction through ongoing genotyping and data sharing. When genomic prediction indicates that a dog is susceptible to one of these orthopedic traits, it should be accompanied by clinical and radiographic screening at an acceptable age with appropriate follow-up.
Collapse
Affiliation(s)
- Liping Jiang
- College of Mathematics, Jilin University, Changchun, China.,Electrical and Information Engineering College, Jilin Agricultural Science and Technology University, Jilin, China
| | - Zhuo Li
- Electrical and Information Engineering College, Jilin Agricultural Science and Technology University, Jilin, China
| | - Jessica J Hayward
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY, United States
| | - Kei Hayashi
- Department of Clinical Sciences and Cornell Veterinary Biobank, College of Veterinary Medicine, Cornell University, Ithaca, NY, United States
| | - Ursula Krotscheck
- Department of Clinical Sciences and Cornell Veterinary Biobank, College of Veterinary Medicine, Cornell University, Ithaca, NY, United States
| | - Rory J Todhunter
- Department of Clinical Sciences and Cornell Veterinary Biobank, College of Veterinary Medicine, Cornell University, Ithaca, NY, United States
| | - You Tang
- Electrical and Information Engineering College, Jilin Agricultural Science and Technology University, Jilin, China
| | - Meng Huang
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, United States
| |
Collapse
|
27
|
O’Donnell TP, Sullivan TJ. Low-coverage whole-genome sequencing reveals molecular markers for spawning season and sex identification in Gulf of Maine Atlantic cod ( Gadus morhua, Linnaeus 1758). Ecol Evol 2021; 11:10659-10671. [PMID: 34367604 PMCID: PMC8328444 DOI: 10.1002/ece3.7878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 06/17/2021] [Accepted: 06/18/2021] [Indexed: 11/28/2022] Open
Abstract
Atlantic cod (Gadus morhua, Linnaeus 1758) in the western Gulf of Maine are managed as a single stock despite several lines of evidence supporting two spawning groups (spring and winter) that overlap spatially, while exhibiting seasonal spawning isolation. Low-coverage whole-genome sequencing was used to evaluate the genomic population structure of Atlantic cod spawning groups in the western Gulf of Maine and Georges Bank using 222 individuals collected over multiple years. Results indicated low total genomic differentiation, while also showing strong differentiation between spring and winter-spawning groups at specific regions of the genome. Guided regularized random forest and ranked F ST methods were used to select panels of single nucleotide polymorphisms (SNPs) that could reliably distinguish spring and winter-spawning Atlantic cod (88.5% assignment rate), as well as males and females (95.0% assignment rate) collected in the western Gulf of Maine. These SNP panels represent a valuable tool for fisheries research and management of Atlantic cod in the western Gulf of Maine that will aid investigations of stock production and support accuracy of future assessments.
Collapse
Affiliation(s)
| | - Timothy J. Sullivan
- Gloucester Marine Genomics InstituteGloucesterMAUSA
- USDA – National Institute of Food and AgricultureKansas CityMOUSA
| |
Collapse
|
28
|
Zhang T, Jiang L, Ruan L, Qian Y, Liang S, Lin F, Lu H, Dai H, Zhao H. Heterotic quantitative trait loci analysis and genomic prediction of seedling biomass-related traits in maize triple testcross populations. PLANT METHODS 2021; 17:85. [PMID: 34330310 PMCID: PMC8325263 DOI: 10.1186/s13007-021-00785-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 07/23/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Heterosis has been widely used in maize breeding. However, we know little about the heterotic quantitative trait loci and their roles in genomic prediction. In this study, we sought to identify heterotic quantitative trait loci for seedling biomass-related traits using triple testcross design and compare their prediction accuracies by fitting molecular markers and heterotic quantitative trait loci. RESULTS A triple testcross population comprised of 366 genotypes was constructed by crossing each of 122 intermated B73 × Mo17 genotypes with B73, Mo17, and B73 × Mo17. The mid-parent heterosis of seedling biomass-related traits involved in leaf length, leaf width, leaf area, and seedling dry weight displayed a large range, from less than 50 to ~ 150%. Relationships between heterosis of seedling biomass-related traits showed congruency with that between performances. Based on a linkage map comprised of 1631 markers, 14 augmented additive, two augmented dominance, and three dominance × additive epistatic quantitative trait loci for heterosis of seedling biomass-related traits were identified, with each individually explaining 4.1-20.5% of the phenotypic variation. All modes of gene action, i.e., additive, partially dominant, dominant, and overdominant modes were observed. In addition, ten additive × additive and six dominance × dominance epistatic interactions were identified. By implementing the general and special combining ability model, we found that prediction accuracy ranged from 0.29 for leaf length to 0.56 for leaf width. Different number of marker analysis showed that ~ 800 markers almost capture the largest prediction accuracies. When incorporating the heterotic quantitative trait loci into the model, we did not find the significant change of prediction accuracy, with only leaf length showing the marginal improvement by 1.7%. CONCLUSIONS Our results demonstrated that the triple testcross design is suitable for detecting heterotic quantitative trait loci and evaluating the prediction accuracy. Seedling leaf width can be used as the representative trait for seedling prediction. The heterotic quantitative trait loci are not necessary for genomic prediction of seedling biomass-related traits.
Collapse
Affiliation(s)
- Tifu Zhang
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Lu Jiang
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Industrial Crops, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Long Ruan
- Institute of Tobacco, Anhui Academy of Agricultural Sciences, Hefei, 230001, China
| | - Yiliang Qian
- Institute of Tobacco, Anhui Academy of Agricultural Sciences, Hefei, 230001, China
| | - Shuaiqiang Liang
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Feng Lin
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Haiyan Lu
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Huixue Dai
- Nanjing Institute of Vegetable Sciences, Nanjing, 210042, China
| | - Han Zhao
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China.
| |
Collapse
|
29
|
Li H, Zhu B, Xu L, Wang Z, Xu L, Zhou P, Gao H, Guo P, Chen Y, Gao X, Zhang L, Gao H, Cai W, Xu L, Li J. Genomic Prediction Using LD-Based Haplotypes Inferred From High-Density Chip and Imputed Sequence Variants in Chinese Simmental Beef Cattle. Front Genet 2021; 12:665382. [PMID: 34394182 PMCID: PMC8358323 DOI: 10.3389/fgene.2021.665382] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 06/30/2021] [Indexed: 01/05/2023] Open
Abstract
A haplotype is defined as a combination of alleles at adjacent loci belonging to the same chromosome that can be transmitted as a unit. In this study, we used both the Illumina BovineHD chip (HD chip) and imputed whole-genome sequence (WGS) data to explore haploblocks and assess haplotype effects, and the haploblocks were defined based on the different LD thresholds. The accuracies of genomic prediction (GP) for dressing percentage (DP), meat percentage (MP), and rib eye roll weight (RERW) based on haplotype were investigated and compared for both data sets in Chinese Simmental beef cattle. The accuracies of GP using the entire imputed WGS data were lower than those using the HD chip data in all cases. For DP and MP, the accuracy of GP using haploblock approaches outperformed the individual single nucleotide polymorphism (SNP) approach (GBLUP_In_Block) at specific LD levels. Hotelling’s test confirmed that GP using LD-based haplotypes from WGS data can significantly increase the accuracies of GP for RERW, compared with the individual SNP approach (∼1.4 and 1.9% for GHBLUP and GHBLUP+GBLUP, respectively). We found that the accuracies using haploblock approach varied with different LD thresholds. The LD thresholds (r2 ≥ 0.5) were optimal for most scenarios. Our results suggested that LD-based haploblock approach can improve accuracy of genomic prediction for carcass traits using both HD chip and imputed WGS data under the optimal LD thresholds in Chinese Simmental beef cattle.
Collapse
Affiliation(s)
- Hongwei Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Bo Zhu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.,National Centre of Beef Cattle Genetic Evaluation, Beijing, China
| | - Ling Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zezhao Wang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lei Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Peinuo Zhou
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Han Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Peng Guo
- College of Computer and Information Engineering, Tianjin Agricultural University, Tianjin, China
| | - Yan Chen
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xue Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lupei Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Huijiang Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.,National Centre of Beef Cattle Genetic Evaluation, Beijing, China
| | - Wentao Cai
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lingyang Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Junya Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.,National Centre of Beef Cattle Genetic Evaluation, Beijing, China
| |
Collapse
|
30
|
Marsh JI, Hu H, Gill M, Batley J, Edwards D. Crop breeding for a changing climate: integrating phenomics and genomics with bioinformatics. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:1677-1690. [PMID: 33852055 DOI: 10.1007/s00122-021-03820-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 03/18/2021] [Indexed: 05/05/2023]
Abstract
Safeguarding crop yields in a changing climate requires bioinformatics advances in harnessing data from vast phenomics and genomics datasets to translate research findings into climate smart crops in the field. Climate change and an additional 3 billion mouths to feed by 2050 raise serious concerns over global food security. Crop breeding and land management strategies will need to evolve to maximize the utilization of finite resources in coming years. High-throughput phenotyping and genomics technologies are providing researchers with the information required to guide and inform the breeding of climate smart crops adapted to the environment. Bioinformatics has a fundamental role to play in integrating and exploiting this fast accumulating wealth of data, through association studies to detect genomic targets underlying key adaptive climate-resilient traits. These data provide tools for breeders to tailor crops to their environment and can be introduced using advanced selection or genome editing methods. To effectively translate research into the field, genomic and phenomic information will need to be integrated into comprehensive clade-specific databases and platforms alongside accessible tools that can be used by breeders to inform the selection of climate adaptive traits. Here we discuss the role of bioinformatics in extracting, analysing, integrating and managing genomic and phenomic data to improve climate resilience in crops, including current, emerging and potential approaches, applications and bottlenecks in the research and breeding pipeline.
Collapse
Affiliation(s)
- Jacob I Marsh
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Perth, 6009, Australia
| | - Haifei Hu
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Perth, 6009, Australia
| | - Mitchell Gill
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Perth, 6009, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Perth, 6009, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Perth, 6009, Australia.
| |
Collapse
|
31
|
Liang M, Chang T, An B, Duan X, Du L, Wang X, Miao J, Xu L, Gao X, Zhang L, Li J, Gao H. A Stacking Ensemble Learning Framework for Genomic Prediction. Front Genet 2021; 12:600040. [PMID: 33747037 PMCID: PMC7969712 DOI: 10.3389/fgene.2021.600040] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 01/12/2021] [Indexed: 11/22/2022] Open
Abstract
Machine learning (ML) is perhaps the most useful tool for the interpretation of large genomic datasets. However, the performance of a single machine learning method in genomic selection (GS) is currently unsatisfactory. To improve the genomic predictions, we constructed a stacking ensemble learning framework (SELF), integrating three machine learning methods, to predict genomic estimated breeding values (GEBVs). The present study evaluated the prediction ability of SELF by analyzing three real datasets, with different genetic architecture; comparing the prediction accuracy of SELF, base learners, genomic best linear unbiased prediction (GBLUP) and BayesB. For each trait, SELF performed better than base learners, which included support vector regression (SVR), kernel ridge regression (KRR) and elastic net (ENET). The prediction accuracy of SELF was, on average, 7.70% higher than GBLUP in three datasets. Except for the milk fat percentage (MFP) traits, of the German Holstein dairy cattle dataset, SELF was more robust than BayesB in all remaining traits. Therefore, we believed that SEFL has the potential to be promoted to estimate GEBVs in other animals and plants.
Collapse
Affiliation(s)
- Mang Liang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Tianpeng Chang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Bingxing An
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xinghai Duan
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lili Du
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xiaoqiao Wang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jian Miao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lingyang Xu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xue Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lupei Zhang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Junya Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Huijiang Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
32
|
Lee HJ, Chung YJ, Jang S, Seo DW, Lee HK, Yoon D, Lim D, Lee SH. Genome-wide identification of major genes and genomic prediction using high-density and text-mined gene-based SNP panels in Hanwoo (Korean cattle). PLoS One 2020; 15:e0241848. [PMID: 33264312 PMCID: PMC7710051 DOI: 10.1371/journal.pone.0241848] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Accepted: 10/21/2020] [Indexed: 11/24/2022] Open
Abstract
It was hypothesized that single-nucleotide polymorphisms (SNPs) extracted from text-mined genes could be more tightly related to causal variant for each trait and that differentially weighting of this SNP panel in the GBLUP model could improve the performance of genomic prediction in cattle. Fitting two GRMs constructed by text-mined SNPs and SNPs except text-mined SNPs from 777k SNPs set (exp_777K) as different random effects showed better accuracy than fitting one GRM (Im_777K) for six traits (e.g. backfat thickness: + 0.002, eye muscle area: + 0.014, Warner–Bratzler Shear Force of semimembranosus and longissimus dorsi: + 0.024 and + 0.068, intramuscular fat content of semimembranosus and longissimus dorsi: + 0.008 and + 0.018). These results can suggest that attempts to incorporate text mining into genomic predictions seem valuable, and further study using text mining can be expected to present the significant results.
Collapse
Affiliation(s)
- Hyo Jun Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon, Korea
| | - Yoon Ji Chung
- Division of Animal and Dairy Science, Chungnam National University, Daejeon, Korea
| | - Sungbong Jang
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States of America
| | - Dong Won Seo
- Division of Animal and Dairy Science, Chungnam National University, Daejeon, Korea
| | - Hak Kyo Lee
- Department of Animal Biotechnology, Chonbuk National University, Jeonju, Korea
| | - Duhak Yoon
- Department of Animal Science, Kyungpook National University, Sangju, Korea
| | - Dajeong Lim
- Animal Genome & Bioinformatics, National Institute of Animal Science, Wanju, Korea
- * E-mail: (DL); (SHL)
| | - Seung Hwan Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon, Korea
- * E-mail: (DL); (SHL)
| |
Collapse
|
33
|
Jeong S, Kim JY, Kim N. GMStool: GWAS-based marker selection tool for genomic prediction from genomic data. Sci Rep 2020; 10:19653. [PMID: 33184432 PMCID: PMC7665227 DOI: 10.1038/s41598-020-76759-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 11/02/2020] [Indexed: 12/20/2022] Open
Abstract
The increased accessibility to genomic data in recent years has laid the foundation for studies to predict various phenotypes of organisms based on the genome. Genomic prediction collectively refers to these studies, and it estimates an individual's phenotypes mainly using single nucleotide polymorphism markers. Typically, the accuracy of these genomic prediction studies is highly dependent on the markers used; however, in practice, choosing optimal markers with high accuracy for the phenotype to be used is a challenging task. Therefore, we present a new tool called GMStool for selecting optimal marker sets and predicting quantitative phenotypes. The GMStool is based on a genome-wide association study (GWAS) and heuristically searches for optimal markers using statistical and machine-learning methods. The GMStool performs the genomic prediction using statistical and machine/deep-learning models and presents the best prediction model with the optimal marker-set. For the evaluation, the GMStool was tested on real datasets with four phenotypes. The prediction results showed higher performance than using the entire markers or the GWAS-top markers, which have been used frequently in prediction studies. Although the GMStool has several limitations, it is expected to contribute to various studies for predicting quantitative phenotypes. The GMStool written in R is available at www.github.com/JaeYoonKim72/GMStool .
Collapse
Affiliation(s)
- Seongmun Jeong
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Jae-Yoon Kim
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Department of Bioinformatics, KRIBB School of Bioscience, University of Science and Technology (UST), Daejeon, 34141, Republic of Korea
| | - Namshin Kim
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
- Department of Bioinformatics, KRIBB School of Bioscience, University of Science and Technology (UST), Daejeon, 34141, Republic of Korea.
| |
Collapse
|
34
|
Yin L, Zhang H, Zhou X, Yuan X, Zhao S, Li X, Liu X. KAML: improving genomic prediction accuracy of complex traits using machine learning determined parameters. Genome Biol 2020; 21:146. [PMID: 32552725 PMCID: PMC7386246 DOI: 10.1186/s13059-020-02052-w] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 05/21/2020] [Indexed: 02/06/2023] Open
Abstract
Advances in high-throughput sequencing technologies have reduced the cost of genotyping dramatically and led to genomic prediction being widely used in animal and plant breeding, and increasingly in human genetics. Inspired by the efficient computing of linear mixed model and the accurate prediction of Bayesian methods, we propose a machine learning-based method incorporating cross-validation, multiple regression, grid search, and bisection algorithms named KAML that aims to combine the advantages of prediction accuracy with computing efficiency. KAML exhibits higher prediction accuracy than existing methods, and it is available at https://github.com/YinLiLin/KAML.
Collapse
Affiliation(s)
- Lilin Yin
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China.,Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China
| | - Haohao Zhang
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, 430070, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiaohui Yuan
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, 430070, China
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China.,Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China
| | - Xinyun Li
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China. .,Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China.
| | - Xiaolei Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China. .,Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, Huazhong Agricultural University, Wuhan, 430070, Hubei, People's Republic of China.
| |
Collapse
|