1
|
Wang X, Zhang Z, Du H, Pfeiffer C, Mészáros G, Ding X. Predictive ability of multi-population genomic prediction methods of phenotypes for reproduction traits in Chinese and Austrian pigs. Genet Sel Evol 2024; 56:49. [PMID: 38926647 PMCID: PMC11201905 DOI: 10.1186/s12711-024-00915-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 05/30/2024] [Indexed: 06/28/2024] Open
Abstract
BACKGROUND Multi-population genomic prediction can rapidly expand the size of the reference population and improve genomic prediction ability. Machine learning (ML) algorithms have shown advantages in single-population genomic prediction of phenotypes. However, few studies have explored the effectiveness of ML methods for multi-population genomic prediction. RESULTS In this study, 3720 Yorkshire pigs from Austria and four breeding farms in China were used, and single-trait genomic best linear unbiased prediction (ST-GBLUP), multitrait GBLUP (MT-GBLUP), Bayesian Horseshoe (BayesHE), and three ML methods (support vector regression (SVR), kernel ridge regression (KRR) and AdaBoost.R2) were compared to explore the optimal method for joint genomic prediction of phenotypes of Chinese and Austrian pigs through 10 replicates of fivefold cross-validation. In this study, we tested the performance of different methods in two scenarios: (i) including only one Austrian population and one Chinese pig population that were genetically linked based on principal component analysis (PCA) (designated as the "two-population scenario") and (ii) adding reference populations that are unrelated based on PCA to the above two populations (designated as the "multi-population scenario"). Our results show that, the use of MT-GBLUP in the two-population scenario resulted in an improvement of 7.1% in predictive ability compared to ST-GBLUP, while the use of SVR and KKR yielded improvements in predictive ability of 4.5 and 5.3%, respectively, compared to MT-GBLUP. SVR and KRR also yielded lower mean square errors (MSE) in most population and trait combinations. In the multi-population scenario, improvements in predictive ability of 29.7, 24.4 and 11.1% were obtained compared to ST-GBLUP when using, respectively, SVR, KRR, and AdaBoost.R2. However, compared to MT-GBLUP, the potential of ML methods to improve predictive ability was not demonstrated. CONCLUSIONS Our study demonstrates that ML algorithms can achieve better prediction performance than multitrait GBLUP models in multi-population genomic prediction of phenotypes when the populations have similar genetic backgrounds; however, when reference populations that are unrelated based on PCA are added, the ML methods did not show a benefit. When the number of populations increased, only MT-GBLUP improved predictive ability in both validation populations, while the other methods showed improvement in only one population.
Collapse
Affiliation(s)
- Xue Wang
- State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Zipeng Zhang
- State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Hehe Du
- State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | | | - Gábor Mészáros
- University of Natural Resources and Life Sciences, Vienna, Austria
| | - Xiangdong Ding
- State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China.
| |
Collapse
|
2
|
Li X, Chen X, Wang Q, Yang N, Sun C. Integrating Bioinformatics and Machine Learning for Genomic Prediction in Chickens. Genes (Basel) 2024; 15:690. [PMID: 38927626 PMCID: PMC11202573 DOI: 10.3390/genes15060690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 05/12/2024] [Accepted: 05/23/2024] [Indexed: 06/28/2024] Open
Abstract
Genomic prediction plays an increasingly important role in modern animal breeding, with predictive accuracy being a crucial aspect. The classical linear mixed model is gradually unable to accommodate the growing number of target traits and the increasingly intricate genetic regulatory patterns. Hence, novel approaches are necessary for future genomic prediction. In this study, we used an illumina 50K SNP chip to genotype 4190 egg-type female Rhode Island Red chickens. Machine learning (ML) and classical bioinformatics methods were integrated to fit genotypes with 10 economic traits in chickens. We evaluated the effectiveness of ML methods using Pearson correlation coefficients and the RMSE between predicted and actual phenotypic values and compared them with rrBLUP and BayesA. Our results indicated that ML algorithms exhibit significantly superior performance to rrBLUP and BayesA in predicting body weight and eggshell strength traits. Conversely, rrBLUP and BayesA demonstrated 2-58% higher predictive accuracy in predicting egg numbers. Additionally, the incorporation of suggestively significant SNPs obtained through the GWAS into the ML models resulted in an increase in the predictive accuracy of 0.1-27% across nearly all traits. These findings suggest the potential of combining classical bioinformatics methods with ML techniques to improve genomic prediction in the future.
Collapse
Affiliation(s)
| | | | | | | | - Congjiao Sun
- State Key Laboratory of Animal Biotech Breeding and Frontiers Science Center for Molecular Design Breeding (MOE), China Agricultural University, Beijing 100193, China; (X.L.); (X.C.); (Q.W.); (N.Y.)
| |
Collapse
|
3
|
Li C, Yang Q, Liu B, Shi X, Liu Z, Yang C, Wang T, Xiao F, Zhang M, Shi A, Yan L. Ability of Genomic Prediction to Bi-Parent-Derived Breeding Population Using Public Data for Soybean Oil and Protein Content. PLANTS (BASEL, SWITZERLAND) 2024; 13:1260. [PMID: 38732474 PMCID: PMC11085238 DOI: 10.3390/plants13091260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 04/21/2024] [Accepted: 04/29/2024] [Indexed: 05/13/2024]
Abstract
Genomic selection (GS) is a marker-based selection method used to improve the genetic gain of quantitative traits in plant breeding. A large number of breeding datasets are available in the soybean database, and the application of these public datasets in GS will improve breeding efficiency and reduce time and cost. However, the most important problem to be solved is how to improve the ability of across-population prediction. The objectives of this study were to perform genomic prediction (GP) and estimate the prediction ability (PA) for seed oil and protein contents in soybean using available public datasets to predict breeding populations in current, ongoing breeding programs. In this study, six public datasets of USDA GRIN soybean germplasm accessions with available phenotypic data of seed oil and protein contents from different experimental populations and their genotypic data of single-nucleotide polymorphisms (SNPs) were used to perform GP and to predict a bi-parent-derived breeding population in our experiment. The average PA was 0.55 and 0.50 for seed oil and protein contents within the bi-parents population according to the within-population prediction; and 0.45 for oil and 0.39 for protein content when the six USDA populations were combined and employed as training sets to predict the bi-parent-derived population. The results showed that four USDA-cultivated populations can be used as a training set individually or combined to predict oil and protein contents in GS when using 800 or more USDA germplasm accessions as a training set. The smaller the genetic distance between training population and testing population, the higher the PA. The PA increased as the population size increased. In across-population prediction, no significant difference was observed in PA for oil and protein content among different models. The PA increased as the SNP number increased until a marker set consisted of 10,000 SNPs. This study provides reasonable suggestions and methods for breeders to utilize public datasets for GS. It will aid breeders in developing GS-assisted breeding strategies to develop elite soybean cultivars with high oil and protein contents.
Collapse
Affiliation(s)
- Chenhui Li
- College of Life Sciences, Hebei Agricultural University, Baoding 071001, China;
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Qing Yang
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Bingqiang Liu
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Xiaolei Shi
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Zhi Liu
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Chunyan Yang
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Tao Wang
- Handan Academy of Agricultural Science, Handan 056001, China; (T.W.); (F.X.)
| | - Fuming Xiao
- Handan Academy of Agricultural Science, Handan 056001, China; (T.W.); (F.X.)
| | - Mengchen Zhang
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| | - Ainong Shi
- Department of Horticulture, University of Arkansas, Fayetteville, AR 72701, USA
| | - Long Yan
- Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
| |
Collapse
|
4
|
Mota LFM, Giannuzzi D, Pegolo S, Sturaro E, Gianola D, Negrini R, Trevisi E, Ajmone Marsan P, Cecchinato A. Genomic prediction of blood biomarkers of metabolic disorders in Holstein cattle using parametric and nonparametric models. Genet Sel Evol 2024; 56:31. [PMID: 38684971 PMCID: PMC11057143 DOI: 10.1186/s12711-024-00903-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 04/12/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Metabolic disturbances adversely impact productive and reproductive performance of dairy cattle due to changes in endocrine status and immune function, which increase the risk of disease. This may occur in the post-partum phase, but also throughout lactation, with sub-clinical symptoms. Recently, increased attention has been directed towards improved health and resilience in dairy cattle, and genomic selection (GS) could be a helpful tool for selecting animals that are more resilient to metabolic disturbances throughout lactation. Hence, we evaluated the genomic prediction of serum biomarkers levels for metabolic distress in 1353 Holsteins genotyped with the 100K single nucleotide polymorphism (SNP) chip assay. The GS was evaluated using parametric models best linear unbiased prediction (GBLUP), Bayesian B (BayesB), elastic net (ENET), and nonparametric models, gradient boosting machine (GBM) and stacking ensemble (Stack), which combines ENET and GBM approaches. RESULTS The results show that the Stack approach outperformed other methods with a relative difference (RD), calculated as an increment in prediction accuracy, of approximately 18.0% compared to GBLUP, 12.6% compared to BayesB, 8.7% compared to ENET, and 4.4% compared to GBM. The highest RD in prediction accuracy between other models with respect to GBLUP was observed for haptoglobin (hapto) from 17.7% for BayesB to 41.2% for Stack; for Zn from 9.8% (BayesB) to 29.3% (Stack); for ceruloplasmin (CuCp) from 9.3% (BayesB) to 27.9% (Stack); for ferric reducing antioxidant power (FRAP) from 8.0% (BayesB) to 40.0% (Stack); and for total protein (PROTt) from 5.7% (BayesB) to 22.9% (Stack). Using a subset of top SNPs (1.5k) selected from the GBM approach improved the accuracy for GBLUP from 1.8 to 76.5%. However, for the other models reductions in prediction accuracy of 4.8% for ENET (average of 10 traits), 5.9% for GBM (average of 21 traits), and 6.6% for Stack (average of 16 traits) were observed. CONCLUSIONS Our results indicate that the Stack approach was more accurate in predicting metabolic disturbances than GBLUP, BayesB, ENET, and GBM and seemed to be competitive for predicting complex phenotypes with various degrees of mode of inheritance, i.e. additive and non-additive effects. Selecting markers based on GBM improved accuracy of GBLUP.
Collapse
Affiliation(s)
- Lucio F M Mota
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy.
| | - Diana Giannuzzi
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| | - Sara Pegolo
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy.
| | - Enrico Sturaro
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| | - Daniel Gianola
- Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - Riccardo Negrini
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Erminio Trevisi
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
- Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Paolo Ajmone Marsan
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
- Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Alessio Cecchinato
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| |
Collapse
|
5
|
Kjetså MV, Gjuvsland AB, Grindflek E, Meuwissen T. Effects of reference population size and structure on genomic prediction of maternal traits in two pig lines using whole-genome sequence-, high-density- and combined annotation-dependent depletion genotypes. J Anim Breed Genet 2024. [PMID: 38564181 DOI: 10.1111/jbg.12865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 03/14/2024] [Accepted: 03/16/2024] [Indexed: 04/04/2024]
Abstract
The aim of this study was to investigate the reference population size required to obtain substantial prediction accuracy within- and across-lines and the effect of using a multi-line reference population for genomic predictions of maternal traits in pigs. The data consisted of two nucleus pig populations, one pure-bred Landrace (L) and one Synthetic (S) Yorkshire/Large White line. All animals were genotyped with up to 30 K animals in each line, and all had records on maternal traits. Prediction accuracy was tested with three different marker data sets: High-density SNP (HD), whole genome sequence (WGS), and markers derived from WGS based on pig combined annotation dependent depletion-score (pCADD). Also, two different genomic prediction methods (GBLUP and Bayes GC) were compared for four maternal traits; total number piglets born (TNB), total number of stillborn piglets (STB), Shoulder Lesion Score and Body Condition Score. The main results from this study showed that a reference population of 3 K-6 K animals for within-line prediction generally was sufficient to achieve high prediction accuracy. However, when the number of animals in the reference population was increased to 30 K, the prediction accuracy significantly increased for the traits TNB and STB. For multi-line prediction accuracy, the accuracy was most dependent on the number of within-line animals in the reference data. The S-line provided a generally higher prediction accuracy compared to the L-line. Using pCADD scores to reduce the number of markers from WGS data in combination with the GBLUP method generally reduced prediction accuracies relative to GBLUP using HD genotypes. The BayesGC method benefited from a large reference population and was less dependent on the different genotype marker datasets to achieve a high prediction accuracy.
Collapse
Affiliation(s)
- Maria V Kjetså
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | | | | | - Theo Meuwissen
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| |
Collapse
|
6
|
Hong JK, Kim YM, Cho ES, Lee JB, Kim YS, Park HB. Application of deep learning with bivariate models for genomic prediction of sow lifetime productivity-related traits. Anim Biosci 2024; 37:622-630. [PMID: 38228129 PMCID: PMC10915216 DOI: 10.5713/ab.23.0264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 08/31/2023] [Accepted: 11/03/2023] [Indexed: 01/18/2024] Open
Abstract
OBJECTIVE Pig breeders cannot obtain phenotypic information at the time of selection for sow lifetime productivity (SLP). They would benefit from obtaining genetic information of candidate sows. Genomic data interpreted using deep learning (DL) techniques could contribute to the genetic improvement of SLP to maximize farm profitability because DL models capture nonlinear genetic effects such as dominance and epistasis more efficiently than conventional genomic prediction methods based on linear models. This study aimed to investigate the usefulness of DL for the genomic prediction of two SLP-related traits; lifetime number of litters (LNL) and lifetime pig production (LPP). METHODS Two bivariate DL models, convolutional neural network (CNN) and local convolutional neural network (LCNN), were compared with conventional bivariate linear models (i.e., genomic best linear unbiased prediction, Bayesian ridge regression, Bayes A, and Bayes B). Phenotype and pedigree data were collected from 40,011 sows that had husbandry records. Among these, 3,652 pigs were genotyped using the PorcineSNP60K BeadChip. RESULTS The best predictive correlation for LNL was obtained with CNN (0.28), followed by LCNN (0.26) and conventional linear models (approximately 0.21). For LPP, the best predictive correlation was also obtained with CNN (0.29), followed by LCNN (0.27) and conventional linear models (approximately 0.25). A similar trend was observed with the mean squared error of prediction for the SLP traits. CONCLUSION This study provides an example of a CNN that can outperform against the linear model-based genomic prediction approaches when the nonlinear interaction components are important because LNL and LPP exhibited strong epistatic interaction components. Additionally, our results suggest that applying bivariate DL models could also contribute to the prediction accuracy by utilizing the genetic correlation between LNL and LPP.
Collapse
Affiliation(s)
- Joon-Ki Hong
- Swine Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000,
Korea
| | - Yong-Min Kim
- Swine Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000,
Korea
| | - Eun-Seok Cho
- Swine Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000,
Korea
| | - Jae-Bong Lee
- Korea Zoonosis Research Institute, Jeonbuk National University, Iksan 54531,
Korea
| | - Young-Sin Kim
- Swine Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000,
Korea
| | - Hee-Bok Park
- Department of Animal Resources Science, Kongju National University, Yesan 32439,
Korea
- Resource Science Research Institute, Kongju National University, Yesan 32439,
Korea
| |
Collapse
|
7
|
Mota LFM, Arikawa LM, Santos SWB, Fernandes Júnior GA, Alves AAC, Rosa GJM, Mercadante MEZ, Cyrillo JNSG, Carvalheiro R, Albuquerque LG. Benchmarking machine learning and parametric methods for genomic prediction of feed efficiency-related traits in Nellore cattle. Sci Rep 2024; 14:6404. [PMID: 38493207 PMCID: PMC10944497 DOI: 10.1038/s41598-024-57234-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 03/15/2024] [Indexed: 03/18/2024] Open
Abstract
Genomic selection (GS) offers a promising opportunity for selecting more efficient animals to use consumed energy for maintenance and growth functions, impacting profitability and environmental sustainability. Here, we compared the prediction accuracy of multi-layer neural network (MLNN) and support vector regression (SVR) against single-trait (STGBLUP), multi-trait genomic best linear unbiased prediction (MTGBLUP), and Bayesian regression (BayesA, BayesB, BayesC, BRR, and BLasso) for feed efficiency (FE) traits. FE-related traits were measured in 1156 Nellore cattle from an experimental breeding program genotyped for ~ 300 K markers after quality control. Prediction accuracy (Acc) was evaluated using a forward validation splitting the dataset based on birth year, considering the phenotypes adjusted for the fixed effects and covariates as pseudo-phenotypes. The MLNN and SVR approaches were trained by randomly splitting the training population into fivefold to select the best hyperparameters. The results show that the machine learning methods (MLNN and SVR) and MTGBLUP outperformed STGBLUP and the Bayesian regression approaches, increasing the Acc by approximately 8.9%, 14.6%, and 13.7% using MLNN, SVR, and MTGBLUP, respectively. Acc for SVR and MTGBLUP were slightly different, ranging from 0.62 to 0.69 and 0.62 to 0.68, respectively, with empirically unbiased for both models (0.97 and 1.09). Our results indicated that SVR and MTGBLUBP approaches were more accurate in predicting FE-related traits than Bayesian regression and STGBLUP and seemed competitive for GS of complex phenotypes with various degrees of inheritance.
Collapse
Affiliation(s)
- Lucio F M Mota
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil.
| | - Leonardo M Arikawa
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil
| | - Samuel W B Santos
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil
| | - Gerardo A Fernandes Júnior
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil
| | - Anderson A C Alves
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil
| | - Guilherme J M Rosa
- Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - Maria E Z Mercadante
- Institute of Animal Science, Beef Cattle Research Center, Sertãozinho, SP, 14174-000, Brazil
- National Council for Science and Technological Development, Brasilia, DF, 71605-001, Brazil
| | - Joslaine N S G Cyrillo
- Institute of Animal Science, Beef Cattle Research Center, Sertãozinho, SP, 14174-000, Brazil
| | - Roberto Carvalheiro
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil
- National Council for Science and Technological Development, Brasilia, DF, 71605-001, Brazil
| | - Lucia G Albuquerque
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil.
- National Council for Science and Technological Development, Brasilia, DF, 71605-001, Brazil.
| |
Collapse
|
8
|
Aalborg T, Sverrisdóttir E, Kristensen HT, Nielsen KL. The effect of marker types and density on genomic prediction and GWAS of key performance traits in tetraploid potato. FRONTIERS IN PLANT SCIENCE 2024; 15:1340189. [PMID: 38525152 PMCID: PMC10957621 DOI: 10.3389/fpls.2024.1340189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 02/14/2024] [Indexed: 03/26/2024]
Abstract
Genomic prediction and genome-wide association studies are becoming widely employed in potato key performance trait QTL identifications and to support potato breeding using genomic selection. Elite cultivars are tetraploid and highly heterozygous but also share many common ancestors and generation-spanning inbreeding events, resulting from the clonal propagation of potatoes through seed potatoes. Consequentially, many SNP markers are not in a 1:1 relationship with a single allele variant but shared over several alleles that might exert varying effects on a given trait. The impact of such redundant "diluted" predictors on the statistical models underpinning genome-wide association studies (GWAS) and genomic prediction has scarcely been evaluated despite the potential impact on model accuracy and performance. We evaluated the impact of marker location, marker type, and marker density on the genomic prediction and GWAS of five key performance traits in tetraploid potato (chipping quality, dry matter content, length/width ratio, senescence, and yield). A 762-offspring panel of a diallel cross of 18 elite cultivars was genotyped by sequencing, and markers were annotated according to a reference genome. Genomic prediction models (GBLUP) were trained on four marker subsets [non-synonymous (29,553 SNPs), synonymous (31,229), non-coding (32,388), and a combination], and robustness to marker reduction was investigated. Single-marker regression GWAS was performed for each trait and marker subset. The best cross-validated prediction correlation coefficients of 0.54, 0.75, 0.49, 0.35, and 0.28 were obtained for chipping quality, dry matter content, length/width ratio, senescence, and yield, respectively. The trait prediction abilities were similar across all marker types, with only non-synonymous variants improving yield predictive ability by 16%. Marker reduction response did not depend on marker type but rather on trait. Traits with high predictive abilities, e.g., dry matter content, reached a plateau using fewer markers than traits with intermediate-low correlations, such as yield. The predictions were unbiased across all traits, marker types, and all marker densities >100 SNPs. Our results suggest that using non-synonymous variants does not enhance the performance of genomic prediction of most traits. The major known QTLs were identified by GWAS and were reproducible across exonic and whole-genome variant sets for dry matter content, length/width ratio, and senescence. In contrast, minor QTL detection was marker type dependent.
Collapse
Affiliation(s)
- Trine Aalborg
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | | | | | | |
Collapse
|
9
|
van Eijck CWF, Sabroso-Lasa S, Strijk GJ, Mustafa DAM, Fellah A, Koerkamp BG, Malats N, van Eijck CHJ. A liquid biomarker signature of inflammatory proteins accurately predicts early pancreatic cancer progression during FOLFIRINOX chemotherapy. Neoplasia 2024; 49:100975. [PMID: 38335839 PMCID: PMC10873733 DOI: 10.1016/j.neo.2024.100975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 01/31/2024] [Indexed: 02/12/2024]
Abstract
BACKGROUND Pancreatic ductal adenocarcinoma (PDAC) is often treated with FOLFIRINOX, a chemotherapy associated with high toxicity rates and variable efficacy. Therefore, it is crucial to identify patients at risk of early progression during treatment. This study aims to explore the potential of a multi-omics biomarker for predicting early PDAC progression by employing an in-depth mathematical modeling approach. METHODS Blood samples were collected from 58 PDAC patients undergoing FOLFIRINOX before and after the first cycle. These samples underwent gene (GEP) and inflammatory protein expression profiling (IPEP). We explored the predictive potential of exclusively IPEP through Stepwise (Backward) Multivariate Logistic Regression modeling. Additionally, we integrated GEP and IPEP using Bayesian Kernel Regression modeling, aiming to enhance predictive performance. Ultimately, the FOLFIRINOX IPEP (FFX-IPEP) signature was developed. RESULTS Our findings revealed that proteins exhibited superior predictive accuracy than genes. Consequently, the FFX-IPEP signature consisted of six proteins: AMN, BANK1, IL1RL2, ITGB6, MYO9B, and PRSS8. The signature effectively identified patients transitioning from disease control to progression early during FOLFIRINOX, achieving remarkable predictive accuracy with an AUC of 0.89 in an independent test set. Importantly, the FFX-IPEP signature outperformed the conventional CA19-9 tumor marker. CONCLUSIONS Our six-protein FFX-IPEP signature holds solid potential as a liquid biomarker for the early prediction of PDAC progression during toxic FOLFIRINOX chemotherapy. Further validation in an external cohort is crucial to confirm the utility of the FFX-IPEP signature. Future studies should expand to predict progression under different chemotherapies to enhance the guidance of personalized treatment selection in PDAC.
Collapse
Affiliation(s)
- Casper W F van Eijck
- Erasmus MC Cancer Institute, Erasmus University Medical Center, Rotterdam, The Netherlands; Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Center, Madrid, Spain.
| | - Sergio Sabroso-Lasa
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Center, Madrid, Spain; Centro de Investigación Biomédica en Red-Cáncer (CIBERONC), Madrid, Spain
| | - Gaby J Strijk
- Erasmus MC Cancer Institute, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Dana A M Mustafa
- Department of Clinical Bioinformatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Amine Fellah
- Erasmus MC Cancer Institute, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Bas Groot Koerkamp
- Department of Surgery, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Núria Malats
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Center, Madrid, Spain; Centro de Investigación Biomédica en Red-Cáncer (CIBERONC), Madrid, Spain
| | - Casper H J van Eijck
- Erasmus MC Cancer Institute, Erasmus University Medical Center, Rotterdam, The Netherlands; Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Center, Madrid, Spain.
| |
Collapse
|
10
|
Chiaravallotti I, Lin J, Arief V, Jahufer Z, Osorno JM, McClean P, Jarquin D, Hoyos-Villegas V. Simulations of multiple breeding strategy scenarios in common bean for assessing genomic selection accuracy and model updating. THE PLANT GENOME 2024; 17:e20388. [PMID: 38317595 DOI: 10.1002/tpg2.20388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 07/24/2023] [Accepted: 08/20/2023] [Indexed: 02/07/2024]
Abstract
The aim of this study was to evaluate the accuracy of the ridge regression best linear unbiased prediction model across different traits, parent population sizes, and breeding strategies when estimating breeding values in common bean (Phaseolus vulgaris). Genomic selection was implemented to make selections within a breeding cycle and compared across five different breeding strategies (single seed descent, mass selection, pedigree method, modified pedigree method, and bulk breeding) following 10 breeding cycles. The model was trained on a simulated population of recombinant inbreds genotyped for 1010 single nucleotide polymorphism markers including 38 known quantitative trait loci identified in the literature. These QTL included 11 for seed yield, eight for white mold disease incidence, and 19 for days to flowering. Simulation results revealed that realized accuracies fluctuate depending on the factors investigated: trait genetic architecture, breeding strategy, and the number of initial parents used to begin the first breeding cycle. Trait architecture and breeding strategy appeared to have a larger impact on accuracy than the initial number of parents. Generally, maximum accuracies (in terms of the correlation between true and estimated breeding value) were consistently achieved under a mass selection strategy, pedigree method, and single seed descent method depending on the simulation parameters being tested. This study also investigated model updating, which involves retraining the prediction model with a new set of genotypes and phenotypes that have a closer relation to the population being tested. While it has been repeatedly shown that model updating generally improves prediction accuracy, it benefited some breeding strategies more than others. For low heritability traits (e.g., yield), conventional phenotype-based selection methods showed consistent rates of genetic gain, but genetic gain under genomic selection reached a plateau after fewer cycles. This plateauing is likely a cause of faster fixation of alleles and a diminishing of genetic variance when selections are made based on estimated breeding value as opposed to phenotype.
Collapse
Affiliation(s)
| | - Jennifer Lin
- Department of Plant Science, McGill University, Montreal, Quebec, Canada
| | - Vivi Arief
- School of Agriculture and Food Sustainability Faculty of Science, University of Queensland, Brisbane, Australia
| | - Zulfi Jahufer
- School of Agriculture and Food Sustainability Faculty of Science, University of Queensland, Brisbane, Australia
| | - Juan M Osorno
- Department of Plant Sciences, North Dakota State University, Fargo, North Dakota, USA
| | - Phil McClean
- Department of Plant Sciences, North Dakota State University, Fargo, North Dakota, USA
| | - Diego Jarquin
- Agronomy Department, University of Florida, Gainesville, Florida, USA
| | | |
Collapse
|
11
|
Dong L, Xie Y, Zhang Y, Wang R, Sun X. Genomic dissection of additive and non-additive genetic effects and genomic prediction in an open-pollinated family test of Japanese larch. BMC Genomics 2024; 25:11. [PMID: 38166605 PMCID: PMC10759612 DOI: 10.1186/s12864-023-09891-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 12/11/2023] [Indexed: 01/05/2024] Open
Abstract
Genomic dissection of genetic effects on desirable traits and the subsequent use of genomic selection hold great promise for accelerating the rate of genetic improvement of forest tree species. In this study, a total of 661 offspring trees from 66 open-pollinated families of Japanese larch (Larix kaempferi (Lam.) Carrière) were sampled at a test site. The contributions of additive and non-additive effects (dominance, imprinting and epistasis) were evaluated for nine valuable traits related to growth, wood physical and chemical properties, and competitive ability using three pedigree-based and four Genomics-based Best Linear Unbiased Predictions (GBLUP) models and used to determine the genetic model. The predictive ability (PA) of two genomic prediction methods, GBLUP and Reproducing Kernel Hilbert Spaces (RKHS), was compared. The traits could be classified into two types based on different quantitative genetic architectures: for type I, including wood chemical properties and Pilodyn penetration, additive effect is the main source of variation (38.20-67.46%); for type II, including growth, competitive ability and acoustic velocity, epistasis plays a significant role (50.76-91.26%). Dominance and imprinting showed low to moderate contributions (< 36.26%). GBLUP was more suitable for traits of type I (PAs = 0.37-0.39 vs. 0.14-0.25), and RKHS was more suitable for traits of type II (PAs = 0.23-0.37 vs. 0.07-0.23). Non-additive effects make no meaningful contribution to the enhancement of PA of GBLUP method for all traits. These findings enhance our current understanding of the architecture of quantitative traits and lay the foundation for the development of genomic selection strategies in Japanese larch.
Collapse
Affiliation(s)
- Leiming Dong
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry and Grassland Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, 100091, China
- Key Laboratory of National Forestry and Grassland Administration on Plant Ex situ Conservation, Beijing Floriculture Engineering Technology Research Centre, Beijing Botanical Garden, Beijing, 100093, China
| | - Yunhui Xie
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry and Grassland Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, 100091, China
| | - Yalin Zhang
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry and Grassland Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, 100091, China
| | - Ruizhen Wang
- Key Laboratory of National Forestry and Grassland Administration on Plant Ex situ Conservation, Beijing Floriculture Engineering Technology Research Centre, Beijing Botanical Garden, Beijing, 100093, China
| | - Xiaomei Sun
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry and Grassland Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, 100091, China.
| |
Collapse
|
12
|
Alboali H, Moradi MH, Khaltabadi Farahani AH, Mohammadi H. Genome-wide association study for body weight and feed consumption traits in Japanese quail using Bayesian approaches. Poult Sci 2024; 103:103208. [PMID: 37980758 PMCID: PMC10663954 DOI: 10.1016/j.psj.2023.103208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 10/12/2023] [Accepted: 10/13/2023] [Indexed: 11/21/2023] Open
Abstract
The aim of this study was to perform a genome-wide association study (GWAS) based on Bayes A and Bayes B statistical methods to identify genomic loci and candidate genes associated with body weight gain, feed intake, and feed conversion ratio in Japanese quail. For this purpose, genomic data obtained from Illumina iSelect 4K quail SNP chip were utilized. After implementing various quality control steps, genotype data from a total of 875 birds for 2,015 SNP markers were used for subsequent analyses. The Bayesian analyses were performed using hibayes package in R (version 4.3.1) and Gibbs sampling algorithm. The results of the analyses showed that Bayes A accounted for 11.43, 11.65, and 11.39% of the phenotypic variance for body weight gain, feed intake, and feed conversion ratio, respectively, while the variance explained by Bayes B was 7.02, 8.61, and 6.48%, respectively. Therefore, in the current study, results obtained from Bayes A were used for further analyses. In order to perform the gene enrichment analysis and to identify the functional pathways and classes of genes that are over-represented in a large set of genes associated with each trait, all markers that accounted for more than 0.1% of the phenotypic variance for each trait were used. The results of this analysis revealed a total of 23, 38, and 14 SNP markers associated with body weight gain, feed intake, and feed conversion ratio in Japanese quail, respectively. The results of the gene enrichment analysis led to the identification of biological pathways (and candidate genes) related to lipid phosphorylation (TTC7A gene) and cell junction (FGFR4 and FLRT2 genes) associated with body weight gain, calcium signaling pathway (ADCY2 and CAMK1D genes) associated with feed intake, and glycerolipid metabolic process (LIPC gene), lipid metabolic process (ADGRF5 and ESR1 genes), and glutathione transferase (GSTK1 gene) associated with feed conversion ratio. Overall, the findings of this study can provide valuable insights into the genetic architecture of growth and feed consumption traits in Japanese quail.
Collapse
Affiliation(s)
- Hassan Alboali
- Department of Animal Science, Faculty of Agriculture and Environment, Arak University, 38156-8-8349 Arak, Iran
| | - Mohammad Hossein Moradi
- Department of Animal Science, Faculty of Agriculture and Environment, Arak University, 38156-8-8349 Arak, Iran.
| | | | - Hossein Mohammadi
- Department of Animal Science, Faculty of Agriculture and Environment, Arak University, 38156-8-8349 Arak, Iran
| |
Collapse
|
13
|
Azevedo CF, Ferrão LFV, Benevenuto J, de Resende MDV, Nascimento M, Nascimento ACC, Munoz PR. Using visual scores for genomic prediction of complex traits in breeding programs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 137:9. [PMID: 38102495 DOI: 10.1007/s00122-023-04512-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 11/21/2023] [Indexed: 12/17/2023]
Abstract
KEY MESSAGE An approach for handling visual scores with potential errors and subjectivity in scores was evaluated in simulated and blueberry recurrent selection breeding schemes to assist breeders in their decision-making. Most genomic prediction methods are based on assumptions of normality due to their simplicity and ease of implementation. However, in plant and animal breeding, continuous traits are often visually scored as categorical traits and analyzed as a Gaussian variable, thus violating the normality assumption, which could affect the prediction of breeding values and the estimation of genetic parameters. In this study, we examined the main challenges of visual scores for genomic prediction and genetic parameter estimation using mixed models, Bayesian, and machine learning methods. We evaluated these approaches using simulated and real breeding data sets. Our contribution in this study is a five-fold demonstration: (i) collecting data using an intermediate number of categories (1-3 and 1-5) is the best strategy, even considering errors associated with visual scores; (ii) Linear Mixed Models and Bayesian Linear Regression are robust to the normality violation, but marginal gains can be achieved when using Bayesian Ordinal Regression Models (BORM) and Random Forest Classification; (iii) genetic parameters are better estimated using BORM; (iv) our conclusions using simulated data are also applicable to real data in autotetraploid blueberry; and (v) a comparison of continuous and categorical phenotypes found that investing in the evaluation of 600-1000 categorical data points with low error, when it is not feasible to collect continuous phenotypes, is a strategy for improving predictive abilities. Our findings suggest the best approaches for effectively using visual scores traits to explore genetic information in breeding programs and highlight the importance of investing in the training of evaluator teams and in high-quality phenotyping.
Collapse
Affiliation(s)
- Camila Ferreira Azevedo
- Statistics Department, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
- Horticultural Sciences Department, Blueberry Breeding and Genomics Lab, University of Florida, Gainesville, FL, USA
| | - Luis Felipe Ventorim Ferrão
- Horticultural Sciences Department, Blueberry Breeding and Genomics Lab, University of Florida, Gainesville, FL, USA
| | - Juliana Benevenuto
- Horticultural Sciences Department, Blueberry Breeding and Genomics Lab, University of Florida, Gainesville, FL, USA
| | - Marcos Deon Vilela de Resende
- Statistics Department, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
- Department of Forestry Engineering, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
- Embrapa Café, Brasília, Distrito Federal, Brazil
| | - Moyses Nascimento
- Statistics Department, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | | | - Patricio R Munoz
- Horticultural Sciences Department, Blueberry Breeding and Genomics Lab, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
14
|
Doran BA, Chen RY, Giba H, Behera V, Barat B, Sundararajan A, Lin H, Sidebottom A, Pamer EG, Raman AS. An evolution-based framework for describing human gut bacteria. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.04.569969. [PMID: 38105970 PMCID: PMC10723311 DOI: 10.1101/2023.12.04.569969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The human gut microbiome contains many bacterial strains of the same species ('strain-level variants'). Describing strains in a biologically meaningful way rather than purely taxonomically is an important goal but challenging due to the genetic complexity of strain-level variation. Here, we measured patterns of co-evolution across >7,000 strains spanning the bacterial tree-of-life. Using these patterns as a prior for studying hundreds of gut commensal strains that we isolated, sequenced, and metabolically profiled revealed widespread structure beneath the phylogenetic level of species. Defining strains by their co-evolutionary signatures enabled predicting their metabolic phenotypes and engineering consortia from strain genome content alone. Our findings demonstrate a biologically relevant organization to strain-level variation and motivate a new schema for describing bacterial strains based on their evolutionary history.
Collapse
Affiliation(s)
- Benjamin A. Doran
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, 60637
| | - Robert Y. Chen
- Department of Psychiatry, University of Washington, Seattle, WA, 98195
| | - Hannah Giba
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
- Department of Pathology, University of Chicago, Chicago, IL, 60637
| | - Vivek Behera
- Department of Medicine, University of Chicago, Chicago, IL, 60637
| | - Bidisha Barat
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
| | | | - Huaiying Lin
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
| | - Ashley Sidebottom
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
| | - Eric G. Pamer
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
- Department of Medicine, University of Chicago, Chicago, IL, 60637
| | - Arjun S. Raman
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
- Department of Pathology, University of Chicago, Chicago, IL, 60637
- Center for the Physics of Evolving Systems, University of Chicago, Chicago, IL, 60637
| |
Collapse
|
15
|
Meher PK, Gupta A, Rustgi S, Mir RR, Kumar A, Kumar J, Balyan HS, Gupta PK. Evaluation of eight Bayesian genomic prediction models for three micronutrient traits in bread wheat (Triticum aestivum L.). THE PLANT GENOME 2023; 16:e20332. [PMID: 37122189 DOI: 10.1002/tpg2.20332] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 02/21/2023] [Accepted: 03/13/2023] [Indexed: 06/19/2023]
Abstract
In wheat, genomic prediction accuracy (GPA) was assessed for three micronutrient traits (grain iron, grain zinc, and β-carotenoid concentrations) using eight Bayesian regression models. For this purpose, data on 246 accessions, each genotyped with 17,937 DArT markers, were utilized. The phenotypic data on traits were available for 2013-2014 from Powerkheda (Madhya Pradesh) and for 2014-2015 from Meerut (Uttar Pradesh), India. The accuracy of the models was measured in terms of reliability, which was computed following a repeated cross-validation approach. The predictions were obtained independently for each of the two environments after adjusting for the local effects and across environments after adjusting for the environmental effects. The Bayes ridge regression (BayesRR) model outperformed the other seven models, whereas BayesLASSO (BayesL) was the least efficient. The GPA increased with an increase in the size of the training set as well as with an increase in marker density. The GPA values differed for the three traits and were higher for the best linear unbiased estimate (BLUE) (obtained after adjusting for the environmental effects) relative to those for the two environments. The GPA also remained unaffected after accounting for the population structure. The results of the present study suggest that only the best model should be used for the estimations of genomic estimated breeding values (GEBVs) before their use for genomic selection to improve the grain micronutrient contents.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Sachin Rustgi
- Department of Plant and Environmental Sciences, Pee Dee Research and Education Centre, Clemson University, Florence, South Carolina, USA
| | - Reyazul Rouf Mir
- Division of Genetics and Plant Breeding, SKUAST-Kashmir, Kashmir, India
| | - Anuj Kumar
- Department of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, Canada
- Laboratory of Immunity, Shantou University Medical College, Shantou, People's Republic of China
| | - Jitendra Kumar
- National Agri-Food Biotechnology Institute (NABI), Ajitgarh, India
| | - Harindra Singh Balyan
- Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut, India
| | - Pushpendra Kumar Gupta
- Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut, India
| |
Collapse
|
16
|
Tanaka R, Wu D, Li X, Tibbs-Cortes LE, Wood JC, Magallanes-Lundback M, Bornowski N, Hamilton JP, Vaillancourt B, Li X, Deason NT, Schoenbaum GR, Buell CR, DellaPenna D, Yu J, Gore MA. Leveraging prior biological knowledge improves prediction of tocochromanols in maize grain. THE PLANT GENOME 2023; 16:e20276. [PMID: 36321716 DOI: 10.1002/tpg2.20276] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 09/21/2022] [Indexed: 06/16/2023]
Abstract
With an essential role in human health, tocochromanols are mostly obtained by consuming seed oils; however, the vitamin E content of the most abundant tocochromanols in maize (Zea mays L.) grain is low. Several large-effect genes with cis-acting variants affecting messenger RNA (mRNA) expression are mostly responsible for tocochromanol variation in maize grain, with other relevant associated quantitative trait loci (QTL) yet to be fully resolved. Leveraging existing genomic and transcriptomic information for maize inbreds could improve prediction when selecting for higher vitamin E content. Here, we first evaluated a multikernel genomic best linear unbiased prediction (MK-GBLUP) approach for modeling known QTL in the prediction of nine tocochromanol grain phenotypes (12-21 QTL per trait) within and between two panels of 1,462 and 242 maize inbred lines. On average, MK-GBLUP models improved predictive abilities by 7.0-13.6% when compared with GBLUP. In a second approach with a subset of 545 lines from the larger panel, the highest average improvement in predictive ability relative to GBLUP was achieved with a multi-trait GBLUP model (15.4%) that had a tocochromanol phenotype and transcript abundances in developing grain for a few large-effect candidate causal genes (1-3 genes per trait) as multiple response variables. Taken together, our study illustrates the enhancement of prediction models when informed by existing biological knowledge pertaining to QTL and candidate causal genes.
Collapse
Affiliation(s)
- Ryokei Tanaka
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Di Wu
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Xiaowei Li
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | | | - Joshua C Wood
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | | | - Nolan Bornowski
- Dep. of Plant Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | - John P Hamilton
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Brieanne Vaillancourt
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Xianran Li
- USDA ARS, Wheat Health, Genetics, and Quality Research Unit, Pullman, WA, 99164, USA
| | - Nicholas T Deason
- Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | | | - C Robin Buell
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Dean DellaPenna
- Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | - Jianming Yu
- Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| |
Collapse
|
17
|
Warburton CL, Costilla R, Engle BN, Moore SS, Corbet NJ, Fordyce G, McGowan MR, Burns BM, Hayes BJ. Concurrently mapping quantitative trait loci associations from multiple subspecies within hybrid populations. Heredity (Edinb) 2023; 131:350-360. [PMID: 37798326 PMCID: PMC10673866 DOI: 10.1038/s41437-023-00651-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 09/19/2023] [Accepted: 09/21/2023] [Indexed: 10/07/2023] Open
Abstract
Many of the world's agriculturally important plant and animal populations consist of hybrids of subspecies. Cattle in tropical and sub-tropical regions for example, originate from two subspecies, Bos taurus indicus (Bos indicus) and Bos taurus taurus (Bos taurus). Methods to derive the underlying genetic architecture for these two subspecies are essential to develop accurate genomic predictions in these hybrid populations. We propose a novel method to achieve this. First, we use haplotypes to assign SNP alleles to ancestral subspecies of origin in a multi-breed and multi-subspecies population. Then we use a BayesR framework to allow SNP alleles originating from the different subspecies differing effects. Applying this method in a composite population of B. indicus and B. taurus hybrids, our results show that there are underlying genomic differences between the two subspecies, and these effects are not identified in multi-breed genomic evaluations that do not account for subspecies of origin effects. The method slightly improved the accuracy of genomic prediction. More significantly, by allocating SNP alleles to ancestral subspecies of origin, we were able to identify four SNP with high posterior probabilities of inclusion that have not been previously associated with cattle fertility and were close to genes associated with fertility in other species. These results show that haplotypes can be used to trace subspecies of origin through the genome of this hybrid population and, in conjunction with our novel Bayesian analysis, subspecies SNP allele allocation can be used to increase the accuracy of QTL association mapping in genetically diverse populations.
Collapse
Affiliation(s)
- Christie L Warburton
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia.
| | - Roy Costilla
- Agresearch Limited, Ruakura Research Centre, Hamilton, 3214, New Zealand
| | - Bailey N Engle
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| | - Stephen S Moore
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| | - Nicholas J Corbet
- Formerly Central Queensland University, School of Health, Medical and Applied Sciences, Rockhampton, QLD, Australia
| | - Geoffry Fordyce
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| | - Michael R McGowan
- The University of Queensland, School of Veterinary Science, St Lucia, QLD, Australia
| | - Brian M Burns
- Formerly Department of Agriculture and Fisheries, Rockhampton, QLD, Australia
| | - Ben J Hayes
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia
| |
Collapse
|
18
|
Singh V, Krause M, Sandhu D, Sekhon RS, Kaundal A. Salinity stress tolerance prediction for biomass-related traits in maize (Zea mays L.) using genome-wide markers. THE PLANT GENOME 2023; 16:e20385. [PMID: 37667417 DOI: 10.1002/tpg2.20385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/18/2023] [Accepted: 08/14/2023] [Indexed: 09/06/2023]
Abstract
Maize (Zea mays L.) is the third most important cereal crop after rice (Oryza sativa) and wheat (Triticum aestivum). Salinity stress significantly affects vegetative biomass and grain yield and, therefore, reduces the food and silage productivity of maize. Selecting salt-tolerant genotypes is a cumbersome and time-consuming process that requires meticulous phenotyping. To predict salt tolerance in maize, we estimated breeding values for four biomass-related traits, including shoot length, shoot weight, root length, and root weight under salt-stressed and controlled conditions. A five-fold cross-validation method was used to select the best model among genomic best linear unbiased prediction (GBLUP), ridge-regression BLUP (rrBLUP), extended GBLUP, Bayesian Lasso, Bayesian ridge regression, BayesA, BayesB, and BayesC. Examination of the effect of different marker densities on prediction accuracy revealed that a set of low-density single nucleotide polymorphisms obtained through filtering based on a combination of analysis of variance and linkage disequilibrium provided the best prediction accuracy for all the traits. The average prediction accuracy in cross-validations ranged from 0.46 to 0.77 across the four derived traits. The GBLUP, rrBLUP, and all Bayesian models except BayesB demonstrated comparable levels of prediction accuracy that were superior to the other modeling approaches. These findings provide a roadmap for the deployment and optimization of genomic selection in breeding for salt tolerance in maize.
Collapse
Affiliation(s)
- Vishal Singh
- Plants, Soils, and Climate, College of Agricultural and Applied Sciences, Utah State University, Logan, Utah, USA
- ICAR-Indian Institute of Maize Research, Ludhiana, Punjab, India
| | - Margaret Krause
- Plants, Soils, and Climate, College of Agricultural and Applied Sciences, Utah State University, Logan, Utah, USA
| | - Devinder Sandhu
- US Salinity Laboratory (USDA-ARS), Riverside, California, USA
| | - Rajandeep S Sekhon
- Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina, USA
| | - Amita Kaundal
- Plants, Soils, and Climate, College of Agricultural and Applied Sciences, Utah State University, Logan, Utah, USA
| |
Collapse
|
19
|
Akutsu H, Na’iem M, Widiyatno, Indrioko S, Sawitri, Purnomo S, Uchiyama K, Tsumura Y, Tani N. Comparing modeling methods of genomic prediction for growth traits of a tropical timber species, Shorea macrophylla. FRONTIERS IN PLANT SCIENCE 2023; 14:1241908. [PMID: 38023878 PMCID: PMC10644202 DOI: 10.3389/fpls.2023.1241908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 09/13/2023] [Indexed: 12/01/2023]
Abstract
Introduction Shorea macrophylla is a commercially important tropical tree species grown for timber and oil. It is amenable to plantation forestry due to its fast initial growth. Genomic selection (GS) has been used in tree breeding studies to shorten long breeding cycles but has not previously been applied to S. macrophylla. Methods To build genomic prediction models for GS, leaves and growth trait data were collected from a half-sib progeny population of S. macrophylla in Sari Bumi Kusuma forest concession, central Kalimantan, Indonesia. 18037 SNP markers were identified in two ddRAD-seq libraries. Genomic prediction models based on these SNPs were then generated for diameter at breast height and total height in the 7th year from planting (D7 and H7). Results and discussion These traits were chosen because of their relatively high narrow-sense genomic heritability and because seven years was considered long enough to assess initial growth. Genomic prediction models were built using 6 methods and their derivatives with the full set of identified SNPs and subsets of 48, 96, and 192 SNPs selected based on the results of a genome-wide association study (GWAS). The GBLUP and RKHS methods gave the highest predictive ability for D7 and H7 with the sets of selected SNPs and showed that D7 has an additive genetic architecture while H7 has an epistatic genetic architecture. LightGBM and CNN1D also achieved high predictive abilities for D7 with 48 and 96 selected SNPs, and for H7 with 96 and 192 selected SNPs, showing that gradient boosting decision trees and deep learning can be useful in genomic prediction. Predictive abilities were higher in H7 when smaller number of SNP subsets selected by GWAS p-value was used, However, D7 showed the contrary tendency, which might have originated from the difference in genetic architecture between primary and secondary growth of the species. This study suggests that GS with GWAS-based SNP selection can be used in breeding for non-cultivated tree species to improve initial growth and reduce genotyping costs for next-generation seedlings.
Collapse
Affiliation(s)
- Haruto Akutsu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Mohammad Na’iem
- Faculty of Forestry, Gadjah Mada University, Yogyakarta, Indonesia
| | - Widiyatno
- Faculty of Forestry, Gadjah Mada University, Yogyakarta, Indonesia
| | - Sapto Indrioko
- Faculty of Forestry, Gadjah Mada University, Yogyakarta, Indonesia
| | - Sawitri
- Faculty of Forestry, Gadjah Mada University, Yogyakarta, Indonesia
| | - Susilo Purnomo
- PT. Sari Bumi Kusuma, Pontianak, West Kalimantan, Indonesia
| | - Kentaro Uchiyama
- Department of Forest Molecular Genetics and Biotechnology, Forestry and Forest Products Research Institute, Tsukuba, Ibaraki, Japan
| | - Yoshihiko Tsumura
- Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Naoki Tani
- Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan
- Forestry Division, Japan International Research Center for Agricultural Sciences, Tsukuba, Ibaraki, Japan
| |
Collapse
|
20
|
Weber SE, Frisch M, Snowdon RJ, Voss-Fels KP. Haplotype blocks for genomic prediction: a comparative evaluation in multiple crop datasets. FRONTIERS IN PLANT SCIENCE 2023; 14:1217589. [PMID: 37731980 PMCID: PMC10507710 DOI: 10.3389/fpls.2023.1217589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 08/21/2023] [Indexed: 09/22/2023]
Abstract
In modern plant breeding, genomic selection is becoming the gold standard for selection of superior genotypes. The basis for genomic prediction models is a set of phenotyped lines along with their genotypic profile. With high marker density and linkage disequilibrium (LD) between markers, genotype data in breeding populations tends to exhibit considerable redundancy. Therefore, interest is growing in the use of haplotype blocks to overcome redundancy by summarizing co-inherited features. Moreover, haplotype blocks can help to capture local epistasis caused by interacting loci. Here, we compared genomic prediction methods that either used single SNPs or haplotype blocks with regards to their prediction accuracy for important traits in crop datasets. We used four published datasets from canola, maize, wheat and soybean. Different approaches to construct haplotype blocks were compared, including blocks based on LD, physical distance, number of adjacent markers and the algorithms implemented in the software "Haploview" and "HaploBlocker". The tested prediction methods included Genomic Best Linear Unbiased Prediction (GBLUP), Extended GBLUP to account for additive by additive epistasis (EGBLUP), Bayesian LASSO and Reproducing Kernel Hilbert Space (RKHS) regression. We found improved prediction accuracy in some traits when using haplotype blocks compared to SNP-based predictions, however the magnitude of improvement was very trait- and model-specific. Especially in settings with low marker density, haplotype blocks can improve genomic prediction accuracy. In most cases, physically large haplotype blocks yielded a strong decrease in prediction accuracy. Especially when prediction accuracy varies greatly across different prediction models, prediction based on haplotype blocks can improve prediction accuracy of underperforming models. However, there is no "best" method to build haplotype blocks, since prediction accuracy varied considerably across methods and traits. Hence, criteria used to define haplotype blocks should not be viewed as fixed biological parameters, but rather as hyperparameters that need to be adjusted for every dataset.
Collapse
Affiliation(s)
- Sven E. Weber
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Matthias Frisch
- Department of Biometry and Population Genetics, Justus Liebig University, Giessen, Germany
| | - Rod J. Snowdon
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Kai P. Voss-Fels
- Institute for Grapevine Breeding, Hochschule Geisenheim University, Geisenheim, Germany
| |
Collapse
|
21
|
Morgante F, Carbonetto P, Wang G, Zou Y, Sarkar A, Stephens M. A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes. PLoS Genet 2023; 19:e1010539. [PMID: 37418505 PMCID: PMC10355440 DOI: 10.1371/journal.pgen.1010539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 06/02/2023] [Indexed: 07/09/2023] Open
Abstract
Predicting phenotypes from genotypes is a fundamental task in quantitative genetics. With technological advances, it is now possible to measure multiple phenotypes in large samples. Multiple phenotypes can share their genetic component; therefore, modeling these phenotypes jointly may improve prediction accuracy by leveraging effects that are shared across phenotypes. However, effects can be shared across phenotypes in a variety of ways, so computationally efficient statistical methods are needed that can accurately and flexibly capture patterns of effect sharing. Here, we describe new Bayesian multivariate, multiple regression methods that, by using flexible priors, are able to model and adapt to different patterns of effect sharing and specificity across phenotypes. Simulation results show that these new methods are fast and improve prediction accuracy compared with existing methods in a wide range of settings where effects are shared. Further, in settings where effects are not shared, our methods still perform competitively with state-of-the-art methods. In real data analyses of expression data in the Genotype Tissue Expression (GTEx) project, our methods improve prediction performance on average for all tissues, with the greatest gains in tissues where effects are strongly shared, and in the tissues with smaller sample sizes. While we use gene expression prediction to illustrate our methods, the methods are generally applicable to any multi-phenotype applications, including prediction of polygenic scores and breeding values. Thus, our methods have the potential to provide improvements across fields and organisms.
Collapse
Affiliation(s)
- Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina, United States of America
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Research Computing Center, University of Chicago, Chicago, Illinois, United States of America
| | - Gao Wang
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Department of Neurology, Columbia University, New York, New York, United States of America
- Gertrude H. Sergievsky Center, Columbia University, New York, New York, United States of America
| | - Yuxin Zou
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
- Regeneron Genetics Center, Regeneron Pharmaceuticals Inc., Tarrytown, New York, United States of America
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
22
|
Canal GB, Oliveira GF, de Almeida FAN, Péres MZ, Moro GLJ, Dos Santos Oliveira WB, Azevedo CF, Nascimento M, da Silva Ferreira MF, Ferreira A. Genomic studies of the additive and dominant genetic control on production traits of Euterpe edulis fruits. Sci Rep 2023; 13:9795. [PMID: 37328527 PMCID: PMC10276026 DOI: 10.1038/s41598-023-36970-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 06/13/2023] [Indexed: 06/18/2023] Open
Abstract
In forest genetic improvement programs for non-domesticated species, limited knowledge of kinship can compromise or make the estimation of variance components and genetic parameters of traits of interest unfeasible. We used mixed models and genomics (in the latter, considering additive and non-additive effects) to evaluate the genetic architecture of 12 traits in juçaizeiro for fruit production. A population of 275 genotypes without genetic relationship knowledge was phenotyped over three years and genotyped by whole genome SNP markers. We have verified superiority in the quality of the fits, the prediction accuracy for unbalanced data, and the possibility of unfolding the genetic effects into their additive and non-additive terms in the genomic models. Estimates of the variance components and genetic parameters obtained by the additive models may be overestimated since, when considering the dominance effect in the model, there are substantial reductions in them. The number of bunches, fresh fruit mass of bunch, rachis length, fresh mass of 25 fruits, and amount of pulp were strongly influenced by the dominance effect, showing that genomic models with such effect should be considered for these traits, which may result in selective improvements by being able to return more accurate genomic breeding values. The present study reveals the additive and non-additive genetic control of the evaluated traits and highlights the importance of genomic information-based approaches for populations without knowledge of kinship and experimental design. Our findings underscore the critical role of genomic data in elucidating the genetic control architecture of quantitative traits, thereby providing crucial insights for driving species' genetic improvement.
Collapse
Affiliation(s)
- Guilherme Bravim Canal
- Department of Agronomy, Federal University of Espírito Santo, Alegre, Espírito Santo, 29500-000, Brazil
| | | | | | - Marcello Zatta Péres
- Department of Agronomy, Federal University of Espírito Santo, Alegre, Espírito Santo, 29500-000, Brazil
| | | | | | | | - Moysés Nascimento
- Department of Statistics, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | | | - Adésio Ferreira
- Department of Agronomy, Federal University of Espírito Santo, Alegre, Espírito Santo, 29500-000, Brazil
| |
Collapse
|
23
|
Alemu A, Batista L, Singh PK, Ceplitis A, Chawade A. Haplotype-tagged SNPs improve genomic prediction accuracy for Fusarium head blight resistance and yield-related traits in wheat. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:92. [PMID: 37009920 PMCID: PMC10068637 DOI: 10.1007/s00122-023-04352-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 03/21/2023] [Indexed: 06/19/2023]
Abstract
Linkage disequilibrium (LD)-based haplotyping with subsequent SNP tagging improved the genomic prediction accuracy up to 0.07 and 0.092 for Fusarium head blight resistance and spike width, respectively, across six different models. Genomic prediction is a powerful tool to enhance genetic gain in plant breeding. However, the method is accompanied by various complications leading to low prediction accuracy. One of the major challenges arises from the complex dimensionality of marker data. To overcome this issue, we applied two pre-selection methods for SNP markers viz. LD-based haplotype-tagging and GWAS-based trait-linked marker identification. Six different models were tested with preselected SNPs to predict the genomic estimated breeding values (GEBVs) of four traits measured in 419 winter wheat genotypes. Ten different sets of haplotype-tagged SNPs were selected by adjusting the level of LD thresholds. In addition, various sets of trait-linked SNPs were identified with different scenarios from the training-test combined and only from the training populations. The BRR and RR-BLUP models developed from haplotype-tagged SNPs had a higher prediction accuracy for FHB and SPW by 0.07 and 0.092, respectively, compared to the corresponding models developed without marker pre-selection. The highest prediction accuracy for SPW and FHB was achieved with tagged SNPs pruned at weak LD thresholds (r2 < 0.5), while stringent LD was required for spike length (SPL) and flag leaf area (FLA). Trait-linked SNPs identified only from training populations failed to improve the prediction accuracy of the four studied traits. Pre-selection of SNPs via LD-based haplotype-tagging could play a vital role in optimizing genomic selection and reducing genotyping costs. Furthermore, the method could pave the way for developing low-cost genotyping methods through customized genotyping platforms targeting key SNP markers tagged to essential haplotype blocks.
Collapse
Affiliation(s)
- Admas Alemu
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | | | - Pawan K Singh
- International Maize and Wheat Improvement Center, Texcoco, Mexico
| | | | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| |
Collapse
|
24
|
Mota LFM, Giannuzzi D, Pegolo S, Trevisi E, Ajmone-Marsan P, Cecchinato A. Integrating on-farm and genomic information improves the predictive ability of milk infrared prediction of blood indicators of metabolic disorders in dairy cows. Genet Sel Evol 2023; 55:23. [PMID: 37013482 PMCID: PMC10069109 DOI: 10.1186/s12711-023-00795-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 03/21/2023] [Indexed: 04/05/2023] Open
Abstract
BACKGROUND Blood metabolic profiles can be used to assess metabolic disorders and to evaluate the health status of dairy cows. Given that these analyses are time-consuming, expensive, and stressful for the cows, there has been increased interest in Fourier transform infrared (FTIR) spectroscopy of milk samples as a rapid, cost-effective alternative for predicting metabolic disturbances. The integration of FTIR data with other layers of information such as genomic and on-farm data (days in milk (DIM) and parity) has been proposed to further enhance the predictive ability of statistical methods. Here, we developed a phenotype prediction approach for a panel of blood metabolites based on a combination of milk FTIR data, on-farm data, and genomic information recorded on 1150 Holstein cows, using BayesB and gradient boosting machine (GBM) models, with tenfold, batch-out and herd-out cross-validation (CV) scenarios. RESULTS The predictive ability of these approaches was measured by the coefficient of determination (R2). The results show that, compared to the model that includes only FTIR data, integration of both on-farm (DIM and parity) and genomic information with FTIR data improves the R2 for blood metabolites across the three CV scenarios, especially with the herd-out CV: R2 values ranged from 5.9 to 17.8% for BayesB, from 8.2 to 16.9% for GBM with the tenfold random CV, from 3.8 to 13.5% for BayesB and from 8.6 to 17.5% for GBM with the batch-out CV, and from 8.4 to 23.0% for BayesB and from 8.1 to 23.8% for GBM with the herd-out CV. Overall, with the model that includes the three sources of data, GBM was more accurate than BayesB with accuracies across the CV scenarios increasing by 7.1% for energy-related metabolites, 10.7% for liver function/hepatic damage, 9.6% for oxidative stress, 6.1% for inflammation/innate immunity, and 11.4% for mineral indicators. CONCLUSIONS Our results show that, compared to using only milk FTIR data, a model integrating milk FTIR spectra with on-farm and genomic information improves the prediction of blood metabolic traits in Holstein cattle and that GBM is more accurate in predicting blood metabolites than BayesB, especially for the batch-out CV and herd-out CV scenarios.
Collapse
Affiliation(s)
- Lucio F M Mota
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy.
| | - Diana Giannuzzi
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| | - Sara Pegolo
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| | - Erminio Trevisi
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
- Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Paolo Ajmone-Marsan
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
- Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Alessio Cecchinato
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| |
Collapse
|
25
|
Qu J, Runcie D, Cheng H. Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits. Genetics 2023; 223:6931802. [PMID: 36529897 PMCID: PMC9991502 DOI: 10.1093/genetics/iyac183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 05/06/2022] [Accepted: 11/17/2022] [Indexed: 12/23/2022] Open
Abstract
Large-scale phenotype data are expected to increase the accuracy of genome-wide prediction and the power of genome-wide association analyses. However, genomic analyses of high-dimensional, highly correlated traits are challenging. We developed a method for implementing high-dimensional Bayesian multivariate regression to simultaneously analyze genetic variants underlying thousands of traits. As a demonstration, we implemented the BayesC prior in the R package MegaLMM. Applied to Genomic Prediction, MegaBayesC effectively integrated hyperspectral reflectance data from 620 hyperspectral wavelengths to improve the accuracy of genetic value prediction on grain yield in a wheat dataset. Applied to Genome-Wide Association Studies, we used simulations to show that MegaBayesC can accurately estimate the effect sizes of QTL across a range of genetic architectures and causes of correlations among traits. To apply MegaBayesC to a realistic scenario involving whole-genome marker data, we developed a 2-stage procedure involving a preliminary step of candidate marker selection prior to multivariate regression. We then used MegaBayesC to identify genetic associations with flowering time in Arabidopsis thaliana, leveraging expression data from 20,843 genes. MegaBayesC selected 15 single nucleotide polymorphisms as important for flowering time, with 13 located within 100 kb of known flowering-time related genes, a higher validation rate than achieved by a single-stage analysis using only the flowering time data itself. These results demonstrate that MegaBayesC can efficiently and effectively leverage high-dimensional phenotypes in genetic analyses.
Collapse
Affiliation(s)
- Jiayi Qu
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA
| | - Daniel Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| | - Hao Cheng
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| |
Collapse
|
26
|
Jeon D, Kang Y, Lee S, Choi S, Sung Y, Lee TH, Kim C. Digitalizing breeding in plants: A new trend of next-generation breeding based on genomic prediction. FRONTIERS IN PLANT SCIENCE 2023; 14:1092584. [PMID: 36743488 PMCID: PMC9892199 DOI: 10.3389/fpls.2023.1092584] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 01/05/2023] [Indexed: 06/18/2023]
Abstract
As the world's population grows and food needs diversification, the demand for cereals and horticultural crops with beneficial traits increases. In order to meet a variety of demands, suitable cultivars and innovative breeding methods need to be developed. Breeding methods have changed over time following the advance of genetics. With the advent of new sequencing technology in the early 21st century, predictive breeding, such as genomic selection (GS), emerged when large-scale genomic information became available. GS shows good predictive ability for the selection of individuals with traits of interest even for quantitative traits by using various types of the whole genome-scanning markers, breaking away from the limitations of marker-assisted selection (MAS). In the current review, we briefly describe the history of breeding techniques, each breeding method, various statistical models applied to GS and methods to increase the GS efficiency. Consequently, we intend to propose and define the term digital breeding through this review article. Digital breeding is to develop a predictive breeding methods such as GS at a higher level, aiming to minimize human intervention by automatically proceeding breeding design, propagating breeding populations, and to make selections in consideration of various environments, climates, and topography during the breeding process. We also classified the phases of digital breeding based on the technologies and methods applied to each phase. This review paper will provide an understanding and a direction for the final evolution of plant breeding in the future.
Collapse
Affiliation(s)
- Donghyun Jeon
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
| | - Yuna Kang
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Solji Lee
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Sehyun Choi
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Yeonjun Sung
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
| | - Tae-Ho Lee
- Genomics Division, National Institute of Agricultural Sciences, Jeonju, Republic of Korea
| | - Changsoo Kim
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| |
Collapse
|
27
|
Farooq M, van Dijk AD, Nijveen H, Mansoor S, de Ridder D. Genomic prediction in plants: opportunities for ensemble machine learning based approaches. F1000Res 2023; 11:802. [PMID: 37035464 PMCID: PMC10080209 DOI: 10.12688/f1000research.122437.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/04/2023] [Indexed: 01/12/2023] Open
Abstract
Background: Many studies have demonstrated the utility of machine learning (ML) methods for genomic prediction (GP) of various plant traits, but a clear rationale for choosing ML over conventionally used, often simpler parametric methods, is still lacking. Predictive performance of GP models might depend on a plethora of factors including sample size, number of markers, population structure and genetic architecture. Methods: Here, we investigate which problem and dataset characteristics are related to good performance of ML methods for genomic prediction. We compare the predictive performance of two frequently used ensemble ML methods (Random Forest and Extreme Gradient Boosting) with parametric methods including genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space regression (RKHS), BayesA and BayesB. To explore problem characteristics, we use simulated and real plant traits under different genetic complexity levels determined by the number of Quantitative Trait Loci (QTLs), heritability (h2 and h2e), population structure and linkage disequilibrium between causal nucleotides and other SNPs. Results: Decision tree based ensemble ML methods are a better choice for nonlinear phenotypes and are comparable to Bayesian methods for linear phenotypes in the case of large effect Quantitative Trait Nucleotides (QTNs). Furthermore, we find that ML methods are susceptible to confounding due to population structure but less sensitive to low linkage disequilibrium than linear parametric methods. Conclusions: Overall, this provides insights into the role of ML in GP as well as guidelines for practitioners.
Collapse
Affiliation(s)
- Muhammad Farooq
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Punjab, 38000, Pakistan
| | - Aalt D.J. van Dijk
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
| | - Harm Nijveen
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
| | - Shahid Mansoor
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Punjab, 38000, Pakistan
| | - Dick de Ridder
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
| |
Collapse
|
28
|
Lopes FB, Baldi F, Brunes LC, Oliveira E Costa MF, da Costa Eifert E, Rosa GJM, Lobo RB, Magnabosco CU. Genomic prediction for meat and carcass traits in Nellore cattle using a Markov blanket algorithm. J Anim Breed Genet 2023; 140:1-12. [PMID: 36239216 DOI: 10.1111/jbg.12740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 09/22/2022] [Indexed: 12/13/2022]
Abstract
This study was carried out to evaluate the advantage of preselecting SNP markers using Markov blanket algorithm regarding the accuracy of genomic prediction for carcass and meat quality traits in Nellore cattle. This study considered 3675, 3680, 3660 and 524 records of rib eye area (REA), back fat thickness (BF), rump fat (RF), and Warner-Bratzler shear force (WBSF), respectively, from the Nellore Brazil Breeding Program. The animals have been genotyped using low-density SNP panel (30 k), and subsequently imputed for arrays with 777 k SNPs. Four Bayesian specifications of genomic regression models, namely Bayes A, Bayes B, Bayes Cπ and Bayesian Ridge Regression methods were compared in terms of prediction accuracy using a five folds cross-validation. Prediction accuracy for REA, BF and RF was all similar using the Bayesian Alphabet models, ranging from 0.75 to 0.95. For WBSF, the predictive ability was higher using Bayes B (0.47) than other methods (0.39 to 0.42). Although the prediction accuracies using Markov blanket of SNP markers were lower than those using all SNPs, for WBSF the relative gain was lower than 13%. With a subset of informative SNPs markers, identified using Markov blanket, probably, is possible to capture a large proportion of the genetic variance for WBSF. The development of low-density and customized arrays using Markov blanket might be cost-effective to perform a genomic selection for this trait, increasing the number of evaluated animals, improving the management decisions based on genomic information and applying genomic selection on a large scale.
Collapse
Affiliation(s)
- Fernando Brito Lopes
- São Paulo State University - Júlio de Mesquita Filho (UNESP), Department of Animal Science, Prof. Paulo Donato Castelane, Jaboticabal, Brazil.,Embrapa Cerrados, Brasilia, Brazil
| | - Fernando Baldi
- São Paulo State University - Júlio de Mesquita Filho (UNESP), Department of Animal Science, Prof. Paulo Donato Castelane, Jaboticabal, Brazil
| | | | | | | | - Guilherme Jordão Magalhães Rosa
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | | | | |
Collapse
|
29
|
Nishio M, Inoue K, Arakawa A, Ichinoseki K, Kobayashi E, Okamura T, Fukuzawa Y, Ogawa S, Taniguchi M, Oe M, Takeda M, Kamata T, Konno M, Takagi M, Sekiya M, Matsuzawa T, Inoue Y, Watanabe A, Kobayashi H, Shibata E, Ohtani A, Yazaki R, Nakashima R, Ishii K. Application of linear and machine learning models to genomic prediction of fatty acid composition in Japanese Black cattle. Anim Sci J 2023; 94:e13883. [PMID: 37909231 DOI: 10.1111/asj.13883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 08/29/2023] [Accepted: 09/15/2023] [Indexed: 11/02/2023]
Abstract
We collected 3180 records of oleic acid (C18:1) and monounsaturated fatty acid (MUFA) measured using gas chromatography (GC) and 6960 records of C18:1 and MUFA measured using near-infrared spectroscopy (NIRS) in intermuscular fat samples of Japanese Black cattle. We compared genomic prediction performance for four linear models (genomic best linear unbiased prediction [GBLUP], kinship-adjusted multiple loci [KAML], BayesC, and BayesLASSO) and five machine learning models (Gaussian kernel [GK], deep kernel [DK], random forest [RF], extreme gradient boost [XGB], and convolutional neural network [CNN]). For GC-based C18:1 and MUFA, KAML showed the highest accuracies, followed by BayesC, XGB, DK, GK, and BayesLASSO, with more than 6% gain of accuracy by KAML over GBLUP. Meanwhile, DK had the highest prediction accuracy for NIRS-based C18:1 and MUFA, but the difference in accuracies between DK and KAML was slight. For all traits, accuracies of RF and CNN were lower than those of GBLUP. The KAML extends GBLUP methods, of which marker effects are weighted, and involves only additive genetic effects; whereas machine learning methods capture non-additive genetic effects. Thus, KAML is the most suitable method for breeding of fatty acid composition in Japanese Black cattle.
Collapse
Affiliation(s)
- Motohide Nishio
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | - Keiichi Inoue
- National Livestock Breeding Center, Fukushima, Japan
- University of Miyazaki, Miyazaki, Japan
| | - Aisaku Arakawa
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | | | - Eiji Kobayashi
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | | | - Yo Fukuzawa
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | - Shinichiro Ogawa
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | | | - Mika Oe
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| | | | - Takehiro Kamata
- Aomori Prefectural Industrial Technology Research Center, Tsugaru, Japan
| | - Masaru Konno
- Iwate Agricultural Research Center Animal Industry Research Institute, Takizawa, Japan
| | - Michihiro Takagi
- Miyagi Prefecture Animal Industry Experiment Station, Osaki, Japan
| | - Mario Sekiya
- Akita Prefectural Livestock Experiment Station, Daisen, Japan
| | - Tamotsu Matsuzawa
- Livestock Research Centre, Fukushima Agricultural Technology Centre, Fukushima, Japan
| | - Yoshinobu Inoue
- Tottori Prefectural Livestock Research Center, Tottori, Japan
| | | | - Hiroshi Kobayashi
- Institute of Animal Production Okayama Prefectural Technology Center for Agriculture, Forestry and Fisheries, Misaki, Japan
| | - Eri Shibata
- Hiroshima Prefectural Technology Research Institute, Livestock Technology Research Center, Shobara, Japan
| | - Akihumi Ohtani
- Yamaguchi Prefectural Agriculture and Forestry General Technology Center, Mine, Japan
| | - Ryu Yazaki
- Oita Prefectural Agriculture, Forestry, and Fisheries Research Center, Takeda, Japan
| | - Ryotaro Nakashima
- Cattle Breeding Development Institute of Kagoshima Prefecture, Soo, Japan
| | - Kazuo Ishii
- Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
| |
Collapse
|
30
|
Gianola D, Fernando RL, Schön CC. Inference about quantitative traits under selection: a Bayesian revisitation for the post-genomic era. Genet Sel Evol 2022; 54:78. [PMID: 36460973 PMCID: PMC9716705 DOI: 10.1186/s12711-022-00765-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 10/26/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Selection schemes distort inference when estimating differences between treatments or genetic associations between traits, and may degrade prediction of outcomes, e.g., the expected performance of the progeny of an individual with a certain genotype. If input and output measurements are not collected on random samples, inferences and predictions must be biased to some degree. Our paper revisits inference in quantitative genetics when using samples stemming from some selection process. The approach used integrates the classical notion of fitness with that of missing data. Treatment is fully Bayesian, with inference and prediction dealt with, in an unified manner. While focus is on animal and plant breeding, concepts apply to natural selection as well. Examples based on real data and stylized models illustrate how selection can be accounted for in four different situations, and sometimes without success. RESULTS Our flexible "soft selection" setting helps to diagnose the extent to which selection can be ignored. The clear connection between probability of missingness and the concept of fitness in stylized selection scenarios is highlighted. It is not realistic to assume that a fixed selection threshold t holds in conceptual replication, as the chance of selection depends on observed and unobserved data, and on unequal amounts of information over individuals, aspects that a "soft" selection representation addresses explicitly. There does not seem to be a general prescription to accommodate potential distortions due to selection. In structures that combine cross-sectional, longitudinal and multi-trait data such as in animal breeding, balance is the exception rather than the rule. The Bayesian approach provides an integrated answer to inference, prediction and model choice under selection that goes beyond the likelihood-based approach, where breeding values are inferred indirectly. CONCLUSIONS The approach used here for inference and prediction under selection may or may not yield the best possible answers. One may believe that selection has been accounted for diligently, but the central problem of whether statistical inferences are good or bad does not have an unambiguous solution. On the other hand, the quality of predictions can be gauged empirically via appropriate training-testing of competing methods.
Collapse
Affiliation(s)
- Daniel Gianola
- grid.28803.310000 0001 0701 8607Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI USA
| | - Rohan L. Fernando
- grid.34421.300000 0004 1936 7312Department of Animal Science, Iowa State University, Ames, IA USA
| | - Chris C. Schön
- grid.6936.a0000000123222966Department of Plant Breeding, Technical University of Munich, Freising, Germany
| |
Collapse
|
31
|
Cappa EP, Chen C, Klutsch JG, Sebastian-Azcona J, Ratcliffe B, Wei X, Da Ros L, Ullah A, Liu Y, Benowicz A, Sadoway S, Mansfield SD, Erbilgin N, Thomas BR, El-Kassaby YA. Multiple-trait analyses improved the accuracy of genomic prediction and the power of genome-wide association of productivity and climate change-adaptive traits in lodgepole pine. BMC Genomics 2022; 23:536. [PMID: 35870886 PMCID: PMC9308220 DOI: 10.1186/s12864-022-08747-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 07/08/2022] [Indexed: 11/10/2022] Open
Abstract
Background Genomic prediction (GP) and genome-wide association (GWA) analyses are currently being employed to accelerate breeding cycles and to identify alleles or genomic regions of complex traits in forest trees species. Here, 1490 interior lodgepole pine (Pinus contorta Dougl. ex. Loud. var. latifolia Engelm) trees from four open-pollinated progeny trials were genotyped with 25,099 SNPs, and phenotyped for 15 growth, wood quality, pest resistance, drought tolerance, and defense chemical (monoterpenes) traits. The main objectives of this study were to: (1) identify genetic markers associated with these traits and determine their genetic architecture, and to compare the marker detected by single- (ST) and multiple-trait (MT) GWA models; (2) evaluate and compare the accuracy and control of bias of the genomic predictions for these traits underlying different ST and MT parametric and non-parametric GP methods. GWA, ST and MT analyses were compared using a linear transformation of genomic breeding values from the respective genomic best linear unbiased prediction (GBLUP) model. GP, ST and MT parametric and non-parametric (Reproducing Kernel Hilbert Spaces, RKHS) models were compared in terms of prediction accuracy (PA) and control of bias. Results MT-GWA analyses identified more significant associations than ST. Some SNPs showed potential pleiotropic effects. Averaging across traits, PA from the studied ST-GP models did not differ significantly from each other, with generally a slight superiority of the RKHS method. MT-GP models showed significantly higher PA (and lower bias) than the ST models, being generally the PA (bias) of the RKHS approach significantly higher (lower) than the GBLUP. Conclusions The power of GWA and the accuracy of GP were improved when MT models were used in this lodgepole pine population. Given the number of GP and GWA models fitted and the traits assessed across four progeny trials, this work has produced the most comprehensive empirical genomic study across any lodgepole pine population to date. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08747-7.
Collapse
|
32
|
Ashraf B, Hunter DC, Bérénos C, Ellis PA, Johnston SE, Pilkington JG, Pemberton JM, Slate J. Genomic prediction in the wild: A case study in Soay sheep. Mol Ecol 2022; 31:6541-6555. [PMID: 34719074 DOI: 10.1111/mec.16262] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 10/13/2021] [Accepted: 10/25/2021] [Indexed: 01/13/2023]
Abstract
Genomic prediction, the technique whereby an individual's genetic component of their phenotype is estimated from its genome, has revolutionised animal and plant breeding and medical genetics. However, despite being first introduced nearly two decades ago, it has hardly been adopted by the evolutionary genetics community studying wild organisms. Here, genomic prediction is performed on eight traits in a wild population of Soay sheep. The population has been the focus of a >30 year evolutionary ecology study and there is already considerable understanding of the genetic architecture of the focal Mendelian and quantitative traits. We show that the accuracy of genomic prediction is high for all traits, but especially those with loci of large effect segregating. Five different methods are compared, and the two methods that can accommodate zero-effect and large-effect loci in the same model tend to perform best. If the accuracy of genomic prediction is similar in other wild populations, then there is a real opportunity for pedigree-free molecular quantitative genetics research to be enabled in many more wild populations; currently the literature is dominated by studies that have required decades of field data collection to generate sufficiently deep pedigrees. Finally, some of the potential applications of genomic prediction in wild populations are discussed.
Collapse
Affiliation(s)
- Bilal Ashraf
- School of Biosciences, University of Sheffield, Sheffield, UK.,Department of Anthropology, Durham University, Durham, UK
| | - Darren C Hunter
- School of Biosciences, University of Sheffield, Sheffield, UK.,School of Biology, University of St Andrews, St Andrews, UK
| | - Camillo Bérénos
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | - Philip A Ellis
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | - Susan E Johnston
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | - Jill G Pilkington
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | | | - Jon Slate
- School of Biosciences, University of Sheffield, Sheffield, UK
| |
Collapse
|
33
|
Rooney TE, Kunze KH, Sorrells ME. Genome-wide marker effect heterogeneity is associated with a large effect dormancy locus in winter malting barley. THE PLANT GENOME 2022; 15:e20247. [PMID: 35971877 DOI: 10.1002/tpg2.20247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/20/2022] [Indexed: 06/15/2023]
Abstract
Prediction of trait values in plant breeding populations typically relies on assumptions about marker effect homogeneity across populations. Evidence is presented for winter malting barley (Hordeum vulgare L.) germination traits that a single, causative, large-effect gene in the Seed dormancy 1 region on Chromosome 5H, HvAlaAT1 (Qsd1), leads to heterogeneous estimated marker effects genome wide between groups of otherwise related individuals carrying different Qsd1 alleles. This led to reduced prediction accuracy across alleles when a model was trained either on individuals carrying both alleles or one allele. Several genomic prediction models were tested to increase prediction accuracy within the Qsd1 allele groups. Small gains (5-12%) in prediction accuracy were realized using structured genomic best linear unbiased predictor models when information about the Qsd1 allele was used to stratify the population. We concluded that a single large-effect locus can lead to heterogeneous marker effects in the same breeding family. Variance partitioning based on large-effect loci can be used to inform best practices in designing genomic prediction models; however, there are likely few cases for which it may be practical to do this. For malting barley, if germination traits are highly associated with malting quality traits, then similar steps should be considered for malting quality trait prediction.
Collapse
Affiliation(s)
- Travis E Rooney
- Plant Breeding and Genetics Section, School of Integrative Plant Sciences, Cornell Univ., Ithaca, NY, 14853, USA
| | - Karl H Kunze
- Plant Breeding and Genetics Section, School of Integrative Plant Sciences, Cornell Univ., Ithaca, NY, 14853, USA
| | - Mark E Sorrells
- Plant Breeding and Genetics Section, School of Integrative Plant Sciences, Cornell Univ., Ithaca, NY, 14853, USA
| |
Collapse
|
34
|
Li H, Wang Z, Xu L, Li Q, Gao H, Ma H, Cai W, Chen Y, Gao X, Zhang L, Gao H, Zhu B, Xu L, Li J. Genomic prediction of carcass traits using different haplotype block partitioning methods in beef cattle. Evol Appl 2022; 15:2028-2042. [PMID: 36540636 PMCID: PMC9753827 DOI: 10.1111/eva.13491] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 09/18/2022] [Indexed: 09/22/2023] Open
Abstract
Genomic prediction (GP) based on haplotype alleles can capture quantitative trait loci (QTL) effects and increase predictive ability because the haplotypes are expected to be in linkage disequilibrium (LD) with QTL. In this study, we constructed haploblocks using LD-based and the fixed number of single nucleotide polymorphisms (fixed-SNP) methods with Illumina BovineHD chip in beef cattle. To evaluate the performance of different haplotype block partitioning methods, we constructed haploblocks based on LD thresholds (from r 2 > 0.2 to r 2 > 0.8) and the number of fixed-SNPs (5, 10, 20). The performance of predictive methods for three carcass traits including liveweight (LW), dressing percentage (DP), and longissimus dorsi muscle weight (LDMW) was evaluated using three approaches (GBLUP and BayesB model based on the SNP, GHBLUP, and BayesBH models based on the haploblock, and GHBLUP+GBLUP and BayesBH+BayesB models based on the combined haploblock and the nonblocked SNPs, which were located between blocks). In this study, we found the accuracies of LD-based and fixed-SNP haplotype Bayesian methods outperformed the Bayesian models (up to 8.54 ± 7.44% and 5.74 ± 2.95%, respectively). GHBLUP showed a high improvement (up to 11.29 ± 9.87%) compared with GBLUP. The Bayesian models have higher accuracies than BLUP models in most scenarios. The average computing time of the BayesBH+BayesB model can reduce by 29.3% compared with the BayesB model. The prediction accuracies using the LD-based haplotype method showed higher improvements than the fixed-SNP haplotype method. In addition, to avoid the influence of rare haplotypes generated from haplotype construction, we compared the performance of GP by filtering four types of minor haplotype allele frequency (MHAF) (0.01, 0.025, 0.05, and 0.1) under different conditions (LD levels were set at r 2 > 0.3, and the fixed number of SNPs was 5). We found the optimal MHAF threshold for LW was 0.01, and the optimal MHAF threshold for DP and LDMW was 0.025.
Collapse
Affiliation(s)
- Hongwei Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Zezhao Wang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Lei Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Qian Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Han Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Haoran Ma
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Wentao Cai
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Yan Chen
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Xue Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Lupei Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Huijiang Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Bo Zhu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Lingyang Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Junya Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
| |
Collapse
|
35
|
An Improved Bayesian Shrinkage Regression Algorithm for Genomic Selection. Genes (Basel) 2022; 13:genes13122193. [PMID: 36553460 PMCID: PMC9778053 DOI: 10.3390/genes13122193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/14/2022] [Accepted: 11/18/2022] [Indexed: 11/25/2022] Open
Abstract
Currently a hot topic, genomic selection (GS) has consistently provided powerful support for breeding studies and achieved more comprehensive and reliable selection in animal and plant breeding. GS estimates the effects of all single nucleotide polymorphisms (SNPs) and thereby predicts the genomic estimation of breeding value (GEBV), accelerating breeding progress and overcoming the limitations of conventional breeding. The successful application of GS primarily depends on the accuracy of the GEBV. Adopting appropriate advanced algorithms to improve the accuracy of the GEBV is time-saving and efficient for breeders, and the available algorithms can be further improved in the big data era. In this study, we develop a new algorithm under the Bayesian Shrinkage Regression (BSR, which is called BayesA) framework, an improved expectation-maximization algorithm for BayesA (emBAI). The emBAI algorithm first corrects the polygenic and environmental noise and then calculates the GEBV by emBayesA. We conduct two simulation experiments and a real dataset analysis for flowering time-related Arabidopsis phenotypes to validate the new algorithm. Compared to established methods, emBAI is more powerful in terms of prediction accuracy, mean square error (MSE), mean absolute error (MAE), the area under the receiver operating characteristic curve (AUC) and correlation of prediction in simulation studies. In addition, emBAI performs well under the increasing genetic background. The analysis of the Arabidopsis real dataset further illustrates the benefits of emBAI for genomic prediction according to prediction accuracy, MSE, MAE and correlation of prediction. Furthermore, the new method shows the advantages of significant loci detection and effect coefficient estimation, which are confirmed by The Arabidopsis Information Resource (TAIR) gene bank. In conclusion, the emBAI algorithm provides powerful support for GS in high-dimensional genomic datasets.
Collapse
|
36
|
Nazzicari N, Biscarini F. Stacked kinship CNN vs. GBLUP for genomic predictions of additive and complex continuous phenotypes. Sci Rep 2022; 12:19889. [PMID: 36400808 PMCID: PMC9674857 DOI: 10.1038/s41598-022-24405-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022] Open
Abstract
Deep learning is impacting many fields of data science with often spectacular results. However, its application to whole-genome predictions in plant and animal science or in human biology has been rather limited, with mostly underwhelming results. While most works focus on exploring alternative network architectures, in this study we propose an innovative representation of marker genotype data and tested it against the GBLUP (Genomic BLUP) benchmark with linear and nonlinear phenotypes. From publicly available cattle SNP genotype data, different types of genomic kinship matrices are stacked together in a 3D pile from where 2D grayscale slices are extracted and fed to a deep convolutional neural network (DNN). We simulated nine phenotype scenarios with combinations of additivity, dominance and epistasis, and compared the DNN to GBLUP-A (computed using only the additive kinship matrix) and GBLUP-optim (additive, dominance, and epistasis kinship matrices, as needed). Results varied depending on the accuracy metric employed, with DNN performing better in terms of root mean squared error (1-12% lower than GBLUP-A; 1-9% lower than GBLUP-optim) but worse in terms of Pearson's correlation (0.505 for DNN compared to 0.672 and 0.669 of GBLUP-A and GBLUP-optim for fully additive case; 0.274 for DNN, 0.279 for GBLUP-A, and 0.477 for GBLUP-optim for fully dominant case). The proposed approach offers a basis to explore further the application of DNN to tabular data in whole-genome predictions.
Collapse
Affiliation(s)
- Nelson Nazzicari
- CREA Council for Agricultural Research and Analysis of Agricultural Economics, Research Centre for Animal Production and Aquaculture, Viale Piacenza 29, 26900 Lodi, Italy
| | - Filippo Biscarini
- grid.510304.3CNR: National Research Council, Institute of Agricultural Biology and Biotechnology, Via Bassini 15, Milan, 20133 Italy
| |
Collapse
|
37
|
John M, Haselbeck F, Dass R, Malisi C, Ricca P, Dreischer C, Schultheiss SJ, Grimm DG. A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species. FRONTIERS IN PLANT SCIENCE 2022; 13:932512. [PMID: 36407627 PMCID: PMC9673477 DOI: 10.3389/fpls.2022.932512] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 07/25/2022] [Indexed: 06/16/2023]
Abstract
Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare 12 different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allow us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well-established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research.
Collapse
Affiliation(s)
- Maura John
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
| | - Florian Haselbeck
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
| | | | | | | | | | | | - Dominik G. Grimm
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
- Technical University of Munich, Department of Informatics, Garching, Germany
| |
Collapse
|
38
|
A divide-and-conquer approach for genomic prediction in rubber tree using machine learning. Sci Rep 2022; 12:18023. [PMID: 36289298 PMCID: PMC9605989 DOI: 10.1038/s41598-022-20416-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 09/13/2022] [Indexed: 01/20/2023] Open
Abstract
Rubber tree (Hevea brasiliensis) is the main feedstock for commercial rubber; however, its long vegetative cycle has hindered the development of more productive varieties via breeding programs. With the availability of H. brasiliensis genomic data, several linkage maps with associated quantitative trait loci have been constructed and suggested as a tool for marker-assisted selection. Nonetheless, novel genomic strategies are still needed, and genomic selection (GS) may facilitate rubber tree breeding programs aimed at reducing the required cycles for performance assessment. Even though such a methodology has already been shown to be a promising tool for rubber tree breeding, increased model predictive capabilities and practical application are still needed. Here, we developed a novel machine learning-based approach for predicting rubber tree stem circumference based on molecular markers. Through a divide-and-conquer strategy, we propose a neural network prediction system with two stages: (1) subpopulation prediction and (2) phenotype estimation. This approach yielded higher accuracies than traditional statistical models in a single-environment scenario. By delivering large accuracy improvements, our methodology represents a powerful tool for use in Hevea GS strategies. Therefore, the incorporation of machine learning techniques into rubber tree GS represents an opportunity to build more robust models and optimize Hevea breeding programs.
Collapse
|
39
|
Zuffo LT, DeLima RO, Lübberstedt T. Combining datasets for maize root seedling traits increases the power of GWAS and genomic prediction accuracies. JOURNAL OF EXPERIMENTAL BOTANY 2022; 73:5460-5473. [PMID: 35608947 PMCID: PMC9467658 DOI: 10.1093/jxb/erac236] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Accepted: 06/06/2022] [Indexed: 05/13/2023]
Abstract
The identification of genomic regions associated with root traits and the genomic prediction of untested genotypes can increase the rate of genetic gain in maize breeding programs targeting roots traits. Here, we combined two maize association panels with different genetic backgrounds to identify single nucleotide polymorphisms (SNPs) associated with root traits, and used a genome-wide association study (GWAS) and to assess the potential of genomic prediction for these traits in maize. For this, we evaluated 377 lines from the Ames panel and 302 from the Backcrossed Germplasm Enhancement of Maize (BGEM) panel in a combined panel of 679 lines. The lines were genotyped with 232 460 SNPs, and four root traits were collected from 14-day-old seedlings. We identified 30 SNPs significantly associated with root traits in the combined panel, whereas only two and six SNPs were detected in the Ames and BGEM panels, respectively. Those 38 SNPs were in linkage disequilibrium with 35 candidate genes. In addition, we found higher prediction accuracy in the combined panel than in the Ames or BGEM panel. We conclude that combining association panels appears to be a useful strategy to identify candidate genes associated with root traits in maize and improve the efficiency of genomic prediction.
Collapse
Affiliation(s)
- Leandro Tonello Zuffo
- Corteva Agriscience, Rio Verde, GO, Brazil
- Department of Agronomy, Universidade Federal de Viçosa, Viçosa, MG, Brazil
- Department of Agronomy, Iowa State University, Ames, IA, USA
| | | | | |
Collapse
|
40
|
A joint learning approach for genomic prediction in polyploid grasses. Sci Rep 2022; 12:12499. [PMID: 35864135 PMCID: PMC9304331 DOI: 10.1038/s41598-022-16417-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 07/11/2022] [Indexed: 12/20/2022] Open
Abstract
Poaceae, among the most abundant plant families, includes many economically important polyploid species, such as forage grasses and sugarcane (Saccharum spp.). These species have elevated genomic complexities and limited genetic resources, hindering the application of marker-assisted selection strategies. Currently, the most promising approach for increasing genetic gains in plant breeding is genomic selection. However, due to the polyploidy nature of these polyploid species, more accurate models for incorporating genomic selection into breeding schemes are needed. This study aims to develop a machine learning method by using a joint learning approach to predict complex traits from genotypic data. Biparental populations of sugarcane and two species of forage grasses (Urochloa decumbens, Megathyrsus maximus) were genotyped, and several quantitative traits were measured. High-quality markers were used to predict several traits in different cross-validation scenarios. By combining classification and regression strategies, we developed a predictive system with promising results. Compared with traditional genomic prediction methods, the proposed strategy achieved accuracy improvements exceeding 50%. Our results suggest that the developed methodology could be implemented in breeding programs, helping reduce breeding cycles and increase genetic gains.
Collapse
|
41
|
Farooq M, van Dijk AD, Nijveen H, Mansoor S, de Ridder D. Genomic prediction in plants: opportunities for ensemble machine learning based approaches. F1000Res 2022; 11:802. [PMID: 37035464 PMCID: PMC10080209 DOI: 10.12688/f1000research.122437.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/08/2022] [Indexed: 12/15/2022] Open
Abstract
Background: Many studies have demonstrated the utility of machine learning (ML) methods for genomic prediction (GP) of various plant traits, but a clear rationale for choosing ML over conventionally used, often simpler parametric methods, is still lacking. Predictive performance of GP models might depend on a plethora of factors including sample size, number of markers, population structure and genetic architecture. Methods: Here, we investigate which problem and dataset characteristics are related to good performance of ML methods for genomic prediction. We compare the predictive performance of two frequently used ensemble ML methods (Random Forest and Extreme Gradient Boosting) with parametric methods including genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space regression (RKHS), BayesA and BayesB. To explore problem characteristics, we use simulated and real plant traits under different genetic complexity levels determined by the number of Quantitative Trait Loci (QTLs), heritability (h2 and h2e), population structure and linkage disequilibrium between causal nucleotides and other SNPs. Results: Decision tree based ensemble ML methods are a better choice for nonlinear phenotypes and are comparable to Bayesian methods for linear phenotypes in the case of large effect Quantitative Trait Nucleotides (QTNs). Furthermore, we find that ML methods are susceptible to confounding due to population structure but less sensitive to low linkage disequilibrium than linear parametric methods. Conclusions: Overall, this provides insights into the role of ML in GP as well as guidelines for practitioners.
Collapse
Affiliation(s)
- Muhammad Farooq
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Punjab, 38000, Pakistan
| | - Aalt D.J. van Dijk
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
| | - Harm Nijveen
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
| | - Shahid Mansoor
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Punjab, 38000, Pakistan
| | - Dick de Ridder
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
| |
Collapse
|
42
|
Li P, Hao H, Mao X, Xu J, Lv Y, Chen W, Ge D, Zhang Z. Convolutional neural network-based applied research on the enrichment of heavy metals in the soil-rice system in China. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:53642-53655. [PMID: 35290576 DOI: 10.1007/s11356-022-19640-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 03/05/2022] [Indexed: 06/14/2023]
Abstract
The enrichment of heavy metals in the soil-rice system is affected by various factors, which hampers the prediction of heavy metal concentrations. In this research, a prediction model (CNN-HM) of heavy metal concentrations in rice was constructed based on convolutional neural network (CNN) technology and 17 environmental factors. For comparison, other machine learning models, such as multiple linear regression, Bayesian ridge regression, support vector machine, and backpropagation neural networks, were applied. Furthermore, the LH-OAT method was used to evaluate the sensitivity of CNN-HM to each environmental factor. The results showed that the R2 values of CNN-HM for Cd, Pb, Cr, As, and Hg were 0.818, 0.709, 0.688, 0.462, and 0.816, respectively, and both the MAE and RMAE values were acceptable. The sensitivity analysis showed that the concentrations of Cd and Pb, mechanical composition, soil pH, and altitude were the main sensitive features for CNN-HM. Compared with CNN-HM based on all input features, the performance of the quick prediction model that was based on the sensitive features did not degrade significantly, thereby indicating that CNN-HM has stronger stability and robustness. The quick prediction model has extensive application value for timely prediction of the enrichment of heavy metals in emergencies. This study demonstrated the effectiveness and practicability of CNNs in predicting heavy metal enrichment in the soil-rice system and provided a new perspective and solution for heavy metal prediction.
Collapse
Affiliation(s)
- Panpan Li
- College of Computer, National University of Defense Technology, Changsha, 410005, People's Republic of China
| | - Huijuan Hao
- College of Resources and Environment, Hunan Agricultural University, Changsha, 410128, People's Republic of China
- Risk Assessment Laboratory for Environmental Factors of Agro-Product Quality Safety, Ministry of Agriculture and Villages, Changsha, 410005, People's Republic of China
| | - Xiaoguang Mao
- College of Computer, National University of Defense Technology, Changsha, 410005, People's Republic of China
| | - Jianjun Xu
- College of Computer, National University of Defense Technology, Changsha, 410005, People's Republic of China
| | - Yuntao Lv
- Risk Assessment Laboratory for Environmental Factors of Agro-Product Quality Safety, Ministry of Agriculture and Villages, Changsha, 410005, People's Republic of China
| | - Wanming Chen
- Risk Assessment Laboratory for Environmental Factors of Agro-Product Quality Safety, Ministry of Agriculture and Villages, Changsha, 410005, People's Republic of China
| | - Dabing Ge
- College of Resources and Environment, Hunan Agricultural University, Changsha, 410128, People's Republic of China
| | - Zhuo Zhang
- College of Information and Communication Technology, Guangzhou College of Commerce, Guangzhou, 510000, People's Republic of China.
| |
Collapse
|
43
|
Ye H, Zhang Z, Ren D, Cai X, Zhu Q, Ding X, Zhang H, Zhang Z, Li J. Genomic Prediction Using LD-Based Haplotypes in Combined Pig Populations. Front Genet 2022; 13:843300. [PMID: 35754827 PMCID: PMC9218795 DOI: 10.3389/fgene.2022.843300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Accepted: 05/02/2022] [Indexed: 11/13/2022] Open
Abstract
The size of reference population is an important factor affecting genomic prediction. Thus, combining different populations in genomic prediction is an attractive way to improve prediction ability. However, combining multireference population roughly cannot increase the prediction accuracy as well as expected in pig. This may be due to different linkage disequilibrium (LD) pattern differences between population. In this study, we used the imputed whole-genome sequencing (WGS) data to construct LD-based haplotypes for genomic prediction in combined population to explore the impact of different single-nucleotide polymorphism (SNP) densities, variant representation (SNPs or haplotype alleles), and reference population size on the prediction accuracy for reproduction traits. Our results showed that genomic best linear unbiased prediction (GBLUP) using the WGS data can improve prediction accuracy in multi-population but not within-population. Not only the genomic prediction accuracy of the haplotype method using 80 K chip data in multi-population but also GBLUP for the multi-population (3.4–5.9%) was higher than that within-population (1.2–4.3%). More importantly, we have found that using the haplotype method based on the WGS data in multi-population has better genomic prediction performance, and our results showed that building haploblock in this scenario based on low LD threshold (r2 = 0.2–0.3) produced an optimal set of variables for reproduction traits in Yorkshire pig population. Our results suggested that whether the use of the haplotype method based on the chip data or GBLUP (individual SNP method) based on the WGS data were beneficial for genomic prediction in multi-population, while simultaneously combining the haplotype method and WGS data was a better strategy for multi-population genomic evaluation.
Collapse
Affiliation(s)
- Haoqiang Ye
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zipeng Zhang
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Duanyang Ren
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Xiaodian Cai
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Qianghui Zhu
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Xiangdong Ding
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Hao Zhang
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zhe Zhang
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Jiaqi Li
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China
| |
Collapse
|
44
|
Mancin E, Mota LFM, Tuliozi B, Verdiglione R, Mantovani R, Sartori C. Improvement of Genomic Predictions in Small Breeds by Construction of Genomic Relationship Matrix Through Variable Selection. Front Genet 2022; 13:814264. [PMID: 35664297 PMCID: PMC9158133 DOI: 10.3389/fgene.2022.814264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
Genomic selection has been increasingly implemented in the animal breeding industry, and it is becoming a routine method in many livestock breeding contexts. However, its use is still limited in several small-population local breeds, which are, nonetheless, an important source of genetic variability of great economic value. A major roadblock for their genomic selection is accuracy when population size is limited: to improve breeding value accuracy, variable selection models that assume heterogenous variance have been proposed over the last few years. However, while these models might outperform traditional and genomic predictions in terms of accuracy, they also carry a proportional increase of breeding value bias and dispersion. These mutual increases are especially striking when genomic selection is performed with a low number of phenotypes and high shrinkage value—which is precisely the situation that happens with small local breeds. In our study, we tested several alternative methods to improve the accuracy of genomic selection in a small population. First, we investigated the impact of using only a subset of informative markers regarding prediction accuracy, bias, and dispersion. We used different algorithms to select them, such as recursive feature eliminations, penalized regression, and XGBoost. We compared our results with the predictions of pedigree-based BLUP, single-step genomic BLUP, and weighted single-step genomic BLUP in different simulated populations obtained by combining various parameters in terms of number of QTLs and effective population size. We also investigated these approaches on a real data set belonging to the small local Rendena breed. Our results show that the accuracy of GBLUP in small-sized populations increased when performed with SNPs selected via variable selection methods both in simulated and real data sets. In addition, the use of variable selection models—especially those using XGBoost—in our real data set did not impact bias and the dispersion of estimated breeding values. We have discussed possible explanations for our results and how our study can help estimate breeding values for future genomic selection in small breeds.
Collapse
Affiliation(s)
- Enrico Mancin
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Lucio Flavio Macedo Mota
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Beniamino Tuliozi
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Rina Verdiglione
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Roberto Mantovani
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Cristina Sartori
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| |
Collapse
|
45
|
Wolc A, Dekkers JCM. Application of Bayesian genomic prediction methods to genome-wide association analyses. Genet Sel Evol 2022; 54:31. [PMID: 35562659 PMCID: PMC9103490 DOI: 10.1186/s12711-022-00724-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open
Abstract
Background Bayesian genomic prediction methods were developed to simultaneously fit all genotyped markers to a set of available phenotypes for prediction of breeding values for quantitative traits, allowing for differences in the genetic architecture (distribution of marker effects) of traits. These methods also provide a flexible and reliable framework for genome-wide association (GWA) studies. The objective here was to review developments in Bayesian hierarchical and variable selection models for GWA analyses. Results By fitting all genotyped markers simultaneously, Bayesian GWA methods implicitly account for population structure and the multiple-testing problem of classical single-marker GWA. Implemented using Markov chain Monte Carlo methods, Bayesian GWA methods allow for control of error rates using probabilities obtained from posterior distributions. Power of GWA studies using Bayesian methods can be enhanced by using informative priors based on previous association studies, gene expression analyses, or functional annotation information. Applied to multiple traits, Bayesian GWA analyses can give insight into pleiotropic effects by multi-trait, structural equation, or graphical models. Bayesian methods can also be used to combine genomic, transcriptomic, proteomic, and other -omics data to infer causal genotype to phenotype relationships and to suggest external interventions that can improve performance. Conclusions Bayesian hierarchical and variable selection methods provide a unified and powerful framework for genomic prediction, GWA, integration of prior information, and integration of information from other -omics platforms to identify causal mutations for complex quantitative traits.
Collapse
Affiliation(s)
- Anna Wolc
- Department of Animal Science, Iowa State University, 806 Stange Road, 239 Kildee Hall, Ames, IA, 50010, USA.,Hy-Line International, 2583 240th Street, Dallas Center, IA, 50063, USA
| | - Jack C M Dekkers
- Department of Animal Science, Iowa State University, 806 Stange Road, 239 Kildee Hall, Ames, IA, 50010, USA.
| |
Collapse
|
46
|
Building a Calibration Set for Genomic Prediction, Characteristics to Be Considered, and Optimization Approaches. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:77-112. [PMID: 35451773 DOI: 10.1007/978-1-0716-2205-6_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The efficiency of genomic selection strongly depends on the prediction accuracy of the genetic merit of candidates. Numerous papers have shown that the composition of the calibration set is a key contributor to prediction accuracy. A poorly defined calibration set can result in low accuracies, whereas an optimized one can considerably increase accuracy compared to random sampling, for a same size. Alternatively, optimizing the calibration set can be a way of decreasing the costs of phenotyping by enabling similar levels of accuracy compared to random sampling but with fewer phenotypic units. We present here the different factors that have to be considered when designing a calibration set, and review the different criteria proposed in the literature. We classified these criteria into two groups: model-free criteria based on relatedness, and criteria derived from the linear mixed model. We introduce criteria targeting specific prediction objectives including the prediction of highly diverse panels, biparental families, or hybrids. We also review different ways of updating the calibration set, and different procedures for optimizing phenotyping experimental designs.
Collapse
|
47
|
Infrared Predictions Are a Valuable Alternative to Actual Measures of Dry-Cured Ham Weight Loss in the Training of Genome-Enabled Prediction Models. Animals (Basel) 2022; 12:ani12070814. [PMID: 35405804 PMCID: PMC8996942 DOI: 10.3390/ani12070814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 03/18/2022] [Accepted: 03/21/2022] [Indexed: 11/17/2022] Open
Abstract
Selection to reduce ham weight losses during dry-curing (WL) requires individual traceability of hams throughout dry-curing, with high phenotyping costs and long generation intervals. Infrared spectroscopy enables cost-effective, high-throughput phenotyping for WL 24 h after slaughter. Direct genomic values (DGV) of crossbred pigs and their purebred sires were estimated, for observed (OB) and infrared-predicted WL (IR), through models developed from 640 and 956 crossbred pigs, respectively. Five Bayesian models and two pseudo-phenotypes (estimated breeding value, EBV, and adjusted phenotype) were tested in random cross-validation and leave-one-family-out validation. The use of EBV as pseudo-phenotypes resulted in the highest accuracies. Accuracies in leave-one-family-out validation were much lower than those obtained in random cross-validation but still satisfactory and very similar for both traits. For sires in the leave-one-family-out validation scenario, the correlation between the DGV for IR and EBV for OB was slightly lower (0.32) than the correlation between the DGV for OB and EBV for OB (0.38). While genomic prediction of OB and IR can be equally suggested to be incorporated in future selection programs aiming at reducing WL, the use of IR enables an early, cost-effective phenotyping, favoring the construction of larger reference populations, with accuracies comparable to those achievable using OB phenotype.
Collapse
|
48
|
Sánchez-Mayor M, Riggio V, Navarro P, Gutiérrez-Gil B, Haley CS, De la Fuente LF, Arranz JJ, Pong-Wong R. Effect of genotyping strategies on the sustained benefit of single-step genomic BLUP over multiple generations. Genet Sel Evol 2022; 54:23. [PMID: 35303797 PMCID: PMC8931970 DOI: 10.1186/s12711-022-00712-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 02/28/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Single-step genomic best linear unbiased prediction (ssGBLUP) allows the inclusion of information from genotyped and ungenotyped individuals in a single analysis. This avoids the need to genotype all candidates with the potential benefit of reducing overall costs. The aim of this study was to assess the effect of genotyping strategies, the proportion of genotyped candidates and the genotyping criterion to rank candidates to be genotyped, when using ssGBLUP evaluation. A simulation study was carried out assuming selection over several discrete generations where a proportion of the candidates were genotyped and evaluation was done using ssGBLUP. The scenarios compared were: (i) three genotyping strategies defined by their protocol for choosing candidates to be genotyped (RANDOM: candidates were chosen at random; TOP: candidates with the best genotyping criterion were genotyped; and EXTREME: candidates with the best and worse criterion were genotyped); (ii) eight proportions of genotyped candidates (p); and (iii) two genotyping criteria to rank candidates to be genotyped (candidates' own phenotype or estimated breeding values). The criteria of the comparison were the cumulated gain and reliability of the genomic estimated breeding values (GEBV). RESULTS The genotyping strategy with the greatest cumulated gain was TOP followed by RANDOM, with EXTREME behaving as RANDOM at low p and as TOP with high p. However, the reliability of GEBV was higher with RANDOM than with TOP. This disparity between the trend of the gain and the reliability is due to the TOP scheme genotyping the candidates with the greater chances of being selected. The extra gain obtained with TOP increases when the accuracy of the selection criterion to rank candidates to be genotyped increases. CONCLUSIONS The best strategy to maximise genetic gain when only a proportion of the candidates are to be genotyped is TOP, since it prioritises the genotyping of candidates which are more likely to be selected. However, the strategy with the greatest GEBV reliability does not achieve the largest gain, thus reliability cannot be considered as an absolute and sufficient criterion for determining the scheme which maximises genetic gain.
Collapse
Affiliation(s)
| | - Valentina Riggio
- The Roslin Institute and R(D)SVS, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK.,Centre for Tropical Livestock Genetics and Health (CTLGH), Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| | - Pau Navarro
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | | | - Chris S Haley
- The Roslin Institute and R(D)SVS, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK.,MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | | | - Juan-José Arranz
- Dpto. Producción Animal, Universidad de León, 24071, León, Spain
| | - Ricardo Pong-Wong
- The Roslin Institute and R(D)SVS, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK.
| |
Collapse
|
49
|
Estimating genetic variance contributed by a quantitative trait locus: A random model approach. PLoS Comput Biol 2022; 18:e1009923. [PMID: 35275920 PMCID: PMC8942241 DOI: 10.1371/journal.pcbi.1009923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 03/23/2022] [Accepted: 02/13/2022] [Indexed: 11/20/2022] Open
Abstract
Detecting quantitative trait loci (QTL) and estimating QTL variances (represented by the squared QTL effects) are two main goals of QTL mapping and genome-wide association studies (GWAS). However, there are issues associated with estimated QTL variances and such issues have not attracted much attention from the QTL mapping community. Estimated QTL variances are usually biased upwards due to estimation being associated with significance tests. The phenomenon is called the Beavis effect. However, estimated variances of QTL without significance tests can also be biased upwards, which cannot be explained by the Beavis effect; rather, this bias is due to the fact that QTL variances are often estimated as the squares of the estimated QTL effects. The parameters are the QTL effects and the estimated QTL variances are obtained by squaring the estimated QTL effects. This square transformation failed to incorporate the errors of estimated QTL effects into the transformation. The consequence is biases in estimated QTL variances. To correct the biases, we can either reformulate the QTL model by treating the QTL effect as random and directly estimate the QTL variance (as a variance component) or adjust the bias by taking into account the error of the estimated QTL effect. A moment method of estimation has been proposed to correct the bias. The method has been validated via Monte Carlo simulation studies. The method has been applied to QTL mapping for the 10-week-body-weight trait from an F2 mouse population. One of the goals of QTL mapping and GWAS is to quantify the size of a QTL, which is measured by the QTL variance or the proportion of trait variance explained by the QTL. The effect of a QTL appears in a linear or linear mixed model as a regression coefficient and defined as a fixed effect. The estimated QTL variance in conventional QTL mapping studies takes the square of the estimated QTL effect. This is a biased estimate of QTL variance. An unbiased estimate of the QTL variance should be obtained by (1) treating the QTL effect as random and estimating the variance of the random effect or (2) adjusting the squared estimated QTL effect by the squared estimation error. We proved that the two methods are identical. We further proved that the usual R2 (goodness of fit) in regression analysis is equivalent to the biased QTL heritability while the adjusted R2 is equivalent to the bias corrected QTL heritability.
Collapse
|
50
|
Yang L, Qu Q, Hao Z, Sha K, Li Z, Li S. Powerful Identification of Large Quantitative Trait Loci Using Genome-wide R/glmnet-Based Regression. J Hered 2022; 113:472-478. [PMID: 35134967 DOI: 10.1093/jhered/esac006] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 02/02/2022] [Indexed: 11/14/2022] Open
Abstract
R/glmnet has been successfully applied to jointly-mapped multiple quantitative trait loci for linkage analysis, along with statistical inference for quantitative trait loci candidates with non-zero genetic effects using R/lm for normally distributed traits, R/glm for discrete traits, and R/coxph for survival times. In this study, we extended R/glmnet to a genome-wide association study by means of parallel computation. A multi-locus genome-wide association study for high-throughput single nucleotide polymorphisms was implemented in the "Multi-Runking" software written within the R workspace. This software can better detect common and large quantitative trait nucleotides and more accurately estimate than genome-wide mixed model analysis for one single nucleotide polymorphism at a time and linear mixed models-least absolute shrinkage and selection operator. Its applicability and utility were demonstrated by multi-locus genome-wide association studies for the simulated and real traits distributed normally, binary traits, and survival times.
Collapse
Affiliation(s)
- Li'ang Yang
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Qiannan Qu
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Zhiyu Hao
- College of Animal Science and Technology, Northeast Agricultural University, Harbin 150030, China
| | - Ke Sha
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Ziyu Li
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Shuling Li
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| |
Collapse
|