Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Gianola D. Priors in whole-genome regression: the bayesian alphabet returns. Genetics 2013;194:573-96. [PMID: 23636739 DOI: 10.1534/genetics.113.151753] [Citation(s) in RCA: 265] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Number

Cited by Other Article(s)

Wang X, Zhang Z, Du H, Pfeiffer C, Mészáros G, Ding X. Predictive ability of multi-population genomic prediction methods of phenotypes for reproduction traits in Chinese and Austrian pigs. Genet Sel Evol 2024;56:49. [PMID: 38926647 PMCID: PMC11201905 DOI: 10.1186/s12711-024-00915-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 05/30/2024] [Indexed: 06/28/2024] Open

Abstract

BACKGROUND

Multi-population genomic prediction can rapidly expand the size of the reference population and improve genomic prediction ability. Machine learning (ML) algorithms have shown advantages in single-population genomic prediction of phenotypes. However, few studies have explored the effectiveness of ML methods for multi-population genomic prediction.

RESULTS

In this study, 3720 Yorkshire pigs from Austria and four breeding farms in China were used, and single-trait genomic best linear unbiased prediction (ST-GBLUP), multitrait GBLUP (MT-GBLUP), Bayesian Horseshoe (BayesHE), and three ML methods (support vector regression (SVR), kernel ridge regression (KRR) and AdaBoost.R2) were compared to explore the optimal method for joint genomic prediction of phenotypes of Chinese and Austrian pigs through 10 replicates of fivefold cross-validation. In this study, we tested the performance of different methods in two scenarios: (i) including only one Austrian population and one Chinese pig population that were genetically linked based on principal component analysis (PCA) (designated as the "two-population scenario") and (ii) adding reference populations that are unrelated based on PCA to the above two populations (designated as the "multi-population scenario"). Our results show that, the use of MT-GBLUP in the two-population scenario resulted in an improvement of 7.1% in predictive ability compared to ST-GBLUP, while the use of SVR and KKR yielded improvements in predictive ability of 4.5 and 5.3%, respectively, compared to MT-GBLUP. SVR and KRR also yielded lower mean square errors (MSE) in most population and trait combinations. In the multi-population scenario, improvements in predictive ability of 29.7, 24.4 and 11.1% were obtained compared to ST-GBLUP when using, respectively, SVR, KRR, and AdaBoost.R2. However, compared to MT-GBLUP, the potential of ML methods to improve predictive ability was not demonstrated.

CONCLUSIONS

Our study demonstrates that ML algorithms can achieve better prediction performance than multitrait GBLUP models in multi-population genomic prediction of phenotypes when the populations have similar genetic backgrounds; however, when reference populations that are unrelated based on PCA are added, the ML methods did not show a benefit. When the number of populations increased, only MT-GBLUP improved predictive ability in both validation populations, while the other methods showed improvement in only one population.

Collapse

Li X, Chen X, Wang Q, Yang N, Sun C. Integrating Bioinformatics and Machine Learning for Genomic Prediction in Chickens. Genes (Basel) 2024;15:690. [PMID: 38927626 PMCID: PMC11202573 DOI: 10.3390/genes15060690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 05/12/2024] [Accepted: 05/23/2024] [Indexed: 06/28/2024] Open

Li C, Yang Q, Liu B, Shi X, Liu Z, Yang C, Wang T, Xiao F, Zhang M, Shi A, Yan L. Ability of Genomic Prediction to Bi-Parent-Derived Breeding Population Using Public Data for Soybean Oil and Protein Content. PLANTS (BASEL, SWITZERLAND) 2024;13:1260. [PMID: 38732474 PMCID: PMC11085238 DOI: 10.3390/plants13091260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 04/21/2024] [Accepted: 04/29/2024] [Indexed: 05/13/2024]

Abstract

Genomic selection (GS) is a marker-based selection method used to improve the genetic gain of quantitative traits in plant breeding. A large number of breeding datasets are available in the soybean database, and the application of these public datasets in GS will improve breeding efficiency and reduce time and cost. However, the most important problem to be solved is how to improve the ability of across-population prediction. The objectives of this study were to perform genomic prediction (GP) and estimate the prediction ability (PA) for seed oil and protein contents in soybean using available public datasets to predict breeding populations in current, ongoing breeding programs. In this study, six public datasets of USDA GRIN soybean germplasm accessions with available phenotypic data of seed oil and protein contents from different experimental populations and their genotypic data of single-nucleotide polymorphisms (SNPs) were used to perform GP and to predict a bi-parent-derived breeding population in our experiment. The average PA was 0.55 and 0.50 for seed oil and protein contents within the bi-parents population according to the within-population prediction; and 0.45 for oil and 0.39 for protein content when the six USDA populations were combined and employed as training sets to predict the bi-parent-derived population. The results showed that four USDA-cultivated populations can be used as a training set individually or combined to predict oil and protein contents in GS when using 800 or more USDA germplasm accessions as a training set. The smaller the genetic distance between training population and testing population, the higher the PA. The PA increased as the population size increased. In across-population prediction, no significant difference was observed in PA for oil and protein content among different models. The PA increased as the SNP number increased until a marker set consisted of 10,000 SNPs. This study provides reasonable suggestions and methods for breeders to utilize public datasets for GS. It will aid breeders in developing GS-assisted breeding strategies to develop elite soybean cultivars with high oil and protein contents.

Collapse

Affiliation(s)

Chenhui Li College of Life Sciences, Hebei Agricultural University, Baoding 071001, China; Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Qing Yang Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Bingqiang Liu Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Xiaolei Shi Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Zhi Liu Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Chunyan Yang Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Tao Wang Handan Academy of Agricultural Science, Handan 056001, China; (T.W.); (F.X.)
Fuming Xiao Handan Academy of Agricultural Science, Handan 056001, China; (T.W.); (F.X.)
Mengchen Zhang Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)
Ainong Shi Department of Horticulture, University of Arkansas, Fayetteville, AR 72701, USA
Long Yan Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China; (Q.Y.); (B.L.); (X.S.); (Z.L.); (C.Y.)

Collapse

Mota LFM, Giannuzzi D, Pegolo S, Sturaro E, Gianola D, Negrini R, Trevisi E, Ajmone Marsan P, Cecchinato A. Genomic prediction of blood biomarkers of metabolic disorders in Holstein cattle using parametric and nonparametric models. Genet Sel Evol 2024;56:31. [PMID: 38684971 PMCID: PMC11057143 DOI: 10.1186/s12711-024-00903-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 04/12/2024] [Indexed: 05/02/2024] Open

Abstract

BACKGROUND

Metabolic disturbances adversely impact productive and reproductive performance of dairy cattle due to changes in endocrine status and immune function, which increase the risk of disease. This may occur in the post-partum phase, but also throughout lactation, with sub-clinical symptoms. Recently, increased attention has been directed towards improved health and resilience in dairy cattle, and genomic selection (GS) could be a helpful tool for selecting animals that are more resilient to metabolic disturbances throughout lactation. Hence, we evaluated the genomic prediction of serum biomarkers levels for metabolic distress in 1353 Holsteins genotyped with the 100K single nucleotide polymorphism (SNP) chip assay. The GS was evaluated using parametric models best linear unbiased prediction (GBLUP), Bayesian B (BayesB), elastic net (ENET), and nonparametric models, gradient boosting machine (GBM) and stacking ensemble (Stack), which combines ENET and GBM approaches.

RESULTS

The results show that the Stack approach outperformed other methods with a relative difference (RD), calculated as an increment in prediction accuracy, of approximately 18.0% compared to GBLUP, 12.6% compared to BayesB, 8.7% compared to ENET, and 4.4% compared to GBM. The highest RD in prediction accuracy between other models with respect to GBLUP was observed for haptoglobin (hapto) from 17.7% for BayesB to 41.2% for Stack; for Zn from 9.8% (BayesB) to 29.3% (Stack); for ceruloplasmin (CuCp) from 9.3% (BayesB) to 27.9% (Stack); for ferric reducing antioxidant power (FRAP) from 8.0% (BayesB) to 40.0% (Stack); and for total protein (PROTt) from 5.7% (BayesB) to 22.9% (Stack). Using a subset of top SNPs (1.5k) selected from the GBM approach improved the accuracy for GBLUP from 1.8 to 76.5%. However, for the other models reductions in prediction accuracy of 4.8% for ENET (average of 10 traits), 5.9% for GBM (average of 21 traits), and 6.6% for Stack (average of 16 traits) were observed.

CONCLUSIONS

Our results indicate that the Stack approach was more accurate in predicting metabolic disturbances than GBLUP, BayesB, ENET, and GBM and seemed to be competitive for predicting complex phenotypes with various degrees of mode of inheritance, i.e. additive and non-additive effects. Selecting markers based on GBM improved accuracy of GBLUP.

Collapse

Affiliation(s)

Lucio F M Mota Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy.
Diana Giannuzzi Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
Sara Pegolo Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy.
Enrico Sturaro Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
Daniel Gianola Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI, 53706, USA
Riccardo Negrini Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
Erminio Trevisi Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
Paolo Ajmone Marsan Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
Alessio Cecchinato Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy

Collapse

Kjetså MV, Gjuvsland AB, Grindflek E, Meuwissen T. Effects of reference population size and structure on genomic prediction of maternal traits in two pig lines using whole-genome sequence-, high-density- and combined annotation-dependent depletion genotypes. J Anim Breed Genet 2024. [PMID: 38564181 DOI: 10.1111/jbg.12865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 03/14/2024] [Accepted: 03/16/2024] [Indexed: 04/04/2024]

Abstract

The aim of this study was to investigate the reference population size required to obtain substantial prediction accuracy within- and across-lines and the effect of using a multi-line reference population for genomic predictions of maternal traits in pigs. The data consisted of two nucleus pig populations, one pure-bred Landrace (L) and one Synthetic (S) Yorkshire/Large White line. All animals were genotyped with up to 30 K animals in each line, and all had records on maternal traits. Prediction accuracy was tested with three different marker data sets: High-density SNP (HD), whole genome sequence (WGS), and markers derived from WGS based on pig combined annotation dependent depletion-score (pCADD). Also, two different genomic prediction methods (GBLUP and Bayes GC) were compared for four maternal traits; total number piglets born (TNB), total number of stillborn piglets (STB), Shoulder Lesion Score and Body Condition Score. The main results from this study showed that a reference population of 3 K-6 K animals for within-line prediction generally was sufficient to achieve high prediction accuracy. However, when the number of animals in the reference population was increased to 30 K, the prediction accuracy significantly increased for the traits TNB and STB. For multi-line prediction accuracy, the accuracy was most dependent on the number of within-line animals in the reference data. The S-line provided a generally higher prediction accuracy compared to the L-line. Using pCADD scores to reduce the number of markers from WGS data in combination with the GBLUP method generally reduced prediction accuracies relative to GBLUP using HD genotypes. The BayesGC method benefited from a large reference population and was less dependent on the different genotype marker datasets to achieve a high prediction accuracy.

Collapse

Hong JK, Kim YM, Cho ES, Lee JB, Kim YS, Park HB. Application of deep learning with bivariate models for genomic prediction of sow lifetime productivity-related traits. Anim Biosci 2024;37:622-630. [PMID: 38228129 PMCID: PMC10915216 DOI: 10.5713/ab.23.0264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 08/31/2023] [Accepted: 11/03/2023] [Indexed: 01/18/2024] Open

Abstract

OBJECTIVE

Pig breeders cannot obtain phenotypic information at the time of selection for sow lifetime productivity (SLP). They would benefit from obtaining genetic information of candidate sows. Genomic data interpreted using deep learning (DL) techniques could contribute to the genetic improvement of SLP to maximize farm profitability because DL models capture nonlinear genetic effects such as dominance and epistasis more efficiently than conventional genomic prediction methods based on linear models. This study aimed to investigate the usefulness of DL for the genomic prediction of two SLP-related traits; lifetime number of litters (LNL) and lifetime pig production (LPP).

METHODS

Two bivariate DL models, convolutional neural network (CNN) and local convolutional neural network (LCNN), were compared with conventional bivariate linear models (i.e., genomic best linear unbiased prediction, Bayesian ridge regression, Bayes A, and Bayes B). Phenotype and pedigree data were collected from 40,011 sows that had husbandry records. Among these, 3,652 pigs were genotyped using the PorcineSNP60K BeadChip.

RESULTS

The best predictive correlation for LNL was obtained with CNN (0.28), followed by LCNN (0.26) and conventional linear models (approximately 0.21). For LPP, the best predictive correlation was also obtained with CNN (0.29), followed by LCNN (0.27) and conventional linear models (approximately 0.25). A similar trend was observed with the mean squared error of prediction for the SLP traits.

CONCLUSION

This study provides an example of a CNN that can outperform against the linear model-based genomic prediction approaches when the nonlinear interaction components are important because LNL and LPP exhibited strong epistatic interaction components. Additionally, our results suggest that applying bivariate DL models could also contribute to the prediction accuracy by utilizing the genetic correlation between LNL and LPP.

Collapse

Mota LFM, Arikawa LM, Santos SWB, Fernandes Júnior GA, Alves AAC, Rosa GJM, Mercadante MEZ, Cyrillo JNSG, Carvalheiro R, Albuquerque LG. Benchmarking machine learning and parametric methods for genomic prediction of feed efficiency-related traits in Nellore cattle. Sci Rep 2024;14:6404. [PMID: 38493207 PMCID: PMC10944497 DOI: 10.1038/s41598-024-57234-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 03/15/2024] [Indexed: 03/18/2024] Open

Aalborg T, Sverrisdóttir E, Kristensen HT, Nielsen KL. The effect of marker types and density on genomic prediction and GWAS of key performance traits in tetraploid potato. FRONTIERS IN PLANT SCIENCE 2024;15:1340189. [PMID: 38525152 PMCID: PMC10957621 DOI: 10.3389/fpls.2024.1340189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 02/14/2024] [Indexed: 03/26/2024]

Abstract

Genomic prediction and genome-wide association studies are becoming widely employed in potato key performance trait QTL identifications and to support potato breeding using genomic selection. Elite cultivars are tetraploid and highly heterozygous but also share many common ancestors and generation-spanning inbreeding events, resulting from the clonal propagation of potatoes through seed potatoes. Consequentially, many SNP markers are not in a 1:1 relationship with a single allele variant but shared over several alleles that might exert varying effects on a given trait. The impact of such redundant "diluted" predictors on the statistical models underpinning genome-wide association studies (GWAS) and genomic prediction has scarcely been evaluated despite the potential impact on model accuracy and performance. We evaluated the impact of marker location, marker type, and marker density on the genomic prediction and GWAS of five key performance traits in tetraploid potato (chipping quality, dry matter content, length/width ratio, senescence, and yield). A 762-offspring panel of a diallel cross of 18 elite cultivars was genotyped by sequencing, and markers were annotated according to a reference genome. Genomic prediction models (GBLUP) were trained on four marker subsets [non-synonymous (29,553 SNPs), synonymous (31,229), non-coding (32,388), and a combination], and robustness to marker reduction was investigated. Single-marker regression GWAS was performed for each trait and marker subset. The best cross-validated prediction correlation coefficients of 0.54, 0.75, 0.49, 0.35, and 0.28 were obtained for chipping quality, dry matter content, length/width ratio, senescence, and yield, respectively. The trait prediction abilities were similar across all marker types, with only non-synonymous variants improving yield predictive ability by 16%. Marker reduction response did not depend on marker type but rather on trait. Traits with high predictive abilities, e.g., dry matter content, reached a plateau using fewer markers than traits with intermediate-low correlations, such as yield. The predictions were unbiased across all traits, marker types, and all marker densities >100 SNPs. Our results suggest that using non-synonymous variants does not enhance the performance of genomic prediction of most traits. The major known QTLs were identified by GWAS and were reproducible across exonic and whole-genome variant sets for dry matter content, length/width ratio, and senescence. In contrast, minor QTL detection was marker type dependent.

Collapse

van Eijck CWF, Sabroso-Lasa S, Strijk GJ, Mustafa DAM, Fellah A, Koerkamp BG, Malats N, van Eijck CHJ. A liquid biomarker signature of inflammatory proteins accurately predicts early pancreatic cancer progression during FOLFIRINOX chemotherapy. Neoplasia 2024;49:100975. [PMID: 38335839 PMCID: PMC10873733 DOI: 10.1016/j.neo.2024.100975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 01/31/2024] [Indexed: 02/12/2024]

Abstract

BACKGROUND

Pancreatic ductal adenocarcinoma (PDAC) is often treated with FOLFIRINOX, a chemotherapy associated with high toxicity rates and variable efficacy. Therefore, it is crucial to identify patients at risk of early progression during treatment. This study aims to explore the potential of a multi-omics biomarker for predicting early PDAC progression by employing an in-depth mathematical modeling approach.

METHODS

Blood samples were collected from 58 PDAC patients undergoing FOLFIRINOX before and after the first cycle. These samples underwent gene (GEP) and inflammatory protein expression profiling (IPEP). We explored the predictive potential of exclusively IPEP through Stepwise (Backward) Multivariate Logistic Regression modeling. Additionally, we integrated GEP and IPEP using Bayesian Kernel Regression modeling, aiming to enhance predictive performance. Ultimately, the FOLFIRINOX IPEP (FFX-IPEP) signature was developed.

RESULTS

Our findings revealed that proteins exhibited superior predictive accuracy than genes. Consequently, the FFX-IPEP signature consisted of six proteins: AMN, BANK1, IL1RL2, ITGB6, MYO9B, and PRSS8. The signature effectively identified patients transitioning from disease control to progression early during FOLFIRINOX, achieving remarkable predictive accuracy with an AUC of 0.89 in an independent test set. Importantly, the FFX-IPEP signature outperformed the conventional CA19-9 tumor marker.

CONCLUSIONS

Our six-protein FFX-IPEP signature holds solid potential as a liquid biomarker for the early prediction of PDAC progression during toxic FOLFIRINOX chemotherapy. Further validation in an external cohort is crucial to confirm the utility of the FFX-IPEP signature. Future studies should expand to predict progression under different chemotherapies to enhance the guidance of personalized treatment selection in PDAC.

Collapse

Chiaravallotti I, Lin J, Arief V, Jahufer Z, Osorno JM, McClean P, Jarquin D, Hoyos-Villegas V. Simulations of multiple breeding strategy scenarios in common bean for assessing genomic selection accuracy and model updating. THE PLANT GENOME 2024;17:e20388. [PMID: 38317595 DOI: 10.1002/tpg2.20388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 07/24/2023] [Accepted: 08/20/2023] [Indexed: 02/07/2024]

Abstract

The aim of this study was to evaluate the accuracy of the ridge regression best linear unbiased prediction model across different traits, parent population sizes, and breeding strategies when estimating breeding values in common bean (Phaseolus vulgaris). Genomic selection was implemented to make selections within a breeding cycle and compared across five different breeding strategies (single seed descent, mass selection, pedigree method, modified pedigree method, and bulk breeding) following 10 breeding cycles. The model was trained on a simulated population of recombinant inbreds genotyped for 1010 single nucleotide polymorphism markers including 38 known quantitative trait loci identified in the literature. These QTL included 11 for seed yield, eight for white mold disease incidence, and 19 for days to flowering. Simulation results revealed that realized accuracies fluctuate depending on the factors investigated: trait genetic architecture, breeding strategy, and the number of initial parents used to begin the first breeding cycle. Trait architecture and breeding strategy appeared to have a larger impact on accuracy than the initial number of parents. Generally, maximum accuracies (in terms of the correlation between true and estimated breeding value) were consistently achieved under a mass selection strategy, pedigree method, and single seed descent method depending on the simulation parameters being tested. This study also investigated model updating, which involves retraining the prediction model with a new set of genotypes and phenotypes that have a closer relation to the population being tested. While it has been repeatedly shown that model updating generally improves prediction accuracy, it benefited some breeding strategies more than others. For low heritability traits (e.g., yield), conventional phenotype-based selection methods showed consistent rates of genetic gain, but genetic gain under genomic selection reached a plateau after fewer cycles. This plateauing is likely a cause of faster fixation of alleles and a diminishing of genetic variance when selections are made based on estimated breeding value as opposed to phenotype.

Collapse

Dong L, Xie Y, Zhang Y, Wang R, Sun X. Genomic dissection of additive and non-additive genetic effects and genomic prediction in an open-pollinated family test of Japanese larch. BMC Genomics 2024;25:11. [PMID: 38166605 PMCID: PMC10759612 DOI: 10.1186/s12864-023-09891-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 12/11/2023] [Indexed: 01/05/2024] Open

Alboali H, Moradi MH, Khaltabadi Farahani AH, Mohammadi H. Genome-wide association study for body weight and feed consumption traits in Japanese quail using Bayesian approaches. Poult Sci 2024;103:103208. [PMID: 37980758 PMCID: PMC10663954 DOI: 10.1016/j.psj.2023.103208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 10/12/2023] [Accepted: 10/13/2023] [Indexed: 11/21/2023] Open

Abstract

The aim of this study was to perform a genome-wide association study (GWAS) based on Bayes A and Bayes B statistical methods to identify genomic loci and candidate genes associated with body weight gain, feed intake, and feed conversion ratio in Japanese quail. For this purpose, genomic data obtained from Illumina iSelect 4K quail SNP chip were utilized. After implementing various quality control steps, genotype data from a total of 875 birds for 2,015 SNP markers were used for subsequent analyses. The Bayesian analyses were performed using hibayes package in R (version 4.3.1) and Gibbs sampling algorithm. The results of the analyses showed that Bayes A accounted for 11.43, 11.65, and 11.39% of the phenotypic variance for body weight gain, feed intake, and feed conversion ratio, respectively, while the variance explained by Bayes B was 7.02, 8.61, and 6.48%, respectively. Therefore, in the current study, results obtained from Bayes A were used for further analyses. In order to perform the gene enrichment analysis and to identify the functional pathways and classes of genes that are over-represented in a large set of genes associated with each trait, all markers that accounted for more than 0.1% of the phenotypic variance for each trait were used. The results of this analysis revealed a total of 23, 38, and 14 SNP markers associated with body weight gain, feed intake, and feed conversion ratio in Japanese quail, respectively. The results of the gene enrichment analysis led to the identification of biological pathways (and candidate genes) related to lipid phosphorylation (TTC7A gene) and cell junction (FGFR4 and FLRT2 genes) associated with body weight gain, calcium signaling pathway (ADCY2 and CAMK1D genes) associated with feed intake, and glycerolipid metabolic process (LIPC gene), lipid metabolic process (ADGRF5 and ESR1 genes), and glutathione transferase (GSTK1 gene) associated with feed conversion ratio. Overall, the findings of this study can provide valuable insights into the genetic architecture of growth and feed consumption traits in Japanese quail.

Collapse

Azevedo CF, Ferrão LFV, Benevenuto J, de Resende MDV, Nascimento M, Nascimento ACC, Munoz PR. Using visual scores for genomic prediction of complex traits in breeding programs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023;137:9. [PMID: 38102495 DOI: 10.1007/s00122-023-04512-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 11/21/2023] [Indexed: 12/17/2023]

Abstract

KEY MESSAGE

An approach for handling visual scores with potential errors and subjectivity in scores was evaluated in simulated and blueberry recurrent selection breeding schemes to assist breeders in their decision-making. Most genomic prediction methods are based on assumptions of normality due to their simplicity and ease of implementation. However, in plant and animal breeding, continuous traits are often visually scored as categorical traits and analyzed as a Gaussian variable, thus violating the normality assumption, which could affect the prediction of breeding values and the estimation of genetic parameters. In this study, we examined the main challenges of visual scores for genomic prediction and genetic parameter estimation using mixed models, Bayesian, and machine learning methods. We evaluated these approaches using simulated and real breeding data sets. Our contribution in this study is a five-fold demonstration: (i) collecting data using an intermediate number of categories (1-3 and 1-5) is the best strategy, even considering errors associated with visual scores; (ii) Linear Mixed Models and Bayesian Linear Regression are robust to the normality violation, but marginal gains can be achieved when using Bayesian Ordinal Regression Models (BORM) and Random Forest Classification; (iii) genetic parameters are better estimated using BORM; (iv) our conclusions using simulated data are also applicable to real data in autotetraploid blueberry; and (v) a comparison of continuous and categorical phenotypes found that investing in the evaluation of 600-1000 categorical data points with low error, when it is not feasible to collect continuous phenotypes, is a strategy for improving predictive abilities. Our findings suggest the best approaches for effectively using visual scores traits to explore genetic information in breeding programs and highlight the importance of investing in the training of evaluator teams and in high-quality phenotyping.

Collapse

Doran BA, Chen RY, Giba H, Behera V, Barat B, Sundararajan A, Lin H, Sidebottom A, Pamer EG, Raman AS. An evolution-based framework for describing human gut bacteria. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.04.569969. [PMID: 38105970 PMCID: PMC10723311 DOI: 10.1101/2023.12.04.569969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]

Meher PK, Gupta A, Rustgi S, Mir RR, Kumar A, Kumar J, Balyan HS, Gupta PK. Evaluation of eight Bayesian genomic prediction models for three micronutrient traits in bread wheat (Triticum aestivum L.). THE PLANT GENOME 2023;16:e20332. [PMID: 37122189 DOI: 10.1002/tpg2.20332] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 02/21/2023] [Accepted: 03/13/2023] [Indexed: 06/19/2023]

Tanaka R, Wu D, Li X, Tibbs-Cortes LE, Wood JC, Magallanes-Lundback M, Bornowski N, Hamilton JP, Vaillancourt B, Li X, Deason NT, Schoenbaum GR, Buell CR, DellaPenna D, Yu J, Gore MA. Leveraging prior biological knowledge improves prediction of tocochromanols in maize grain. THE PLANT GENOME 2023;16:e20276. [PMID: 36321716 DOI: 10.1002/tpg2.20276] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 09/21/2022] [Indexed: 06/16/2023]

Affiliation(s)

Ryokei Tanaka Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
Di Wu Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
Xiaowei Li Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
Laura E Tibbs-Cortes Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
Joshua C Wood Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
Maria Magallanes-Lundback Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
Nolan Bornowski Dep. of Plant Biology, Michigan State Univ., East Lansing, MI, 48824, USA
John P Hamilton Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
Brieanne Vaillancourt Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
Xianran Li USDA ARS, Wheat Health, Genetics, and Quality Research Unit, Pullman, WA, 99164, USA
Nicholas T Deason Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
Gregory R Schoenbaum Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
C Robin Buell Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
Dean DellaPenna Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
Jianming Yu Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
Michael A Gore Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA

Collapse

Warburton CL, Costilla R, Engle BN, Moore SS, Corbet NJ, Fordyce G, McGowan MR, Burns BM, Hayes BJ. Concurrently mapping quantitative trait loci associations from multiple subspecies within hybrid populations. Heredity (Edinb) 2023;131:350-360. [PMID: 37798326 PMCID: PMC10673866 DOI: 10.1038/s41437-023-00651-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 09/19/2023] [Accepted: 09/21/2023] [Indexed: 10/07/2023] Open

Singh V, Krause M, Sandhu D, Sekhon RS, Kaundal A. Salinity stress tolerance prediction for biomass-related traits in maize (Zea mays L.) using genome-wide markers. THE PLANT GENOME 2023;16:e20385. [PMID: 37667417 DOI: 10.1002/tpg2.20385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/18/2023] [Accepted: 08/14/2023] [Indexed: 09/06/2023]

Akutsu H, Na’iem M, Widiyatno, Indrioko S, Sawitri, Purnomo S, Uchiyama K, Tsumura Y, Tani N. Comparing modeling methods of genomic prediction for growth traits of a tropical timber species, Shorea macrophylla. FRONTIERS IN PLANT SCIENCE 2023;14:1241908. [PMID: 38023878 PMCID: PMC10644202 DOI: 10.3389/fpls.2023.1241908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 09/13/2023] [Indexed: 12/01/2023]

Abstract

Introduction

Shorea macrophylla is a commercially important tropical tree species grown for timber and oil. It is amenable to plantation forestry due to its fast initial growth. Genomic selection (GS) has been used in tree breeding studies to shorten long breeding cycles but has not previously been applied to S. macrophylla.

Methods

To build genomic prediction models for GS, leaves and growth trait data were collected from a half-sib progeny population of S. macrophylla in Sari Bumi Kusuma forest concession, central Kalimantan, Indonesia. 18037 SNP markers were identified in two ddRAD-seq libraries. Genomic prediction models based on these SNPs were then generated for diameter at breast height and total height in the 7th year from planting (D7 and H7).

Results and discussion

These traits were chosen because of their relatively high narrow-sense genomic heritability and because seven years was considered long enough to assess initial growth. Genomic prediction models were built using 6 methods and their derivatives with the full set of identified SNPs and subsets of 48, 96, and 192 SNPs selected based on the results of a genome-wide association study (GWAS). The GBLUP and RKHS methods gave the highest predictive ability for D7 and H7 with the sets of selected SNPs and showed that D7 has an additive genetic architecture while H7 has an epistatic genetic architecture. LightGBM and CNN1D also achieved high predictive abilities for D7 with 48 and 96 selected SNPs, and for H7 with 96 and 192 selected SNPs, showing that gradient boosting decision trees and deep learning can be useful in genomic prediction. Predictive abilities were higher in H7 when smaller number of SNP subsets selected by GWAS p-value was used, However, D7 showed the contrary tendency, which might have originated from the difference in genetic architecture between primary and secondary growth of the species. This study suggests that GS with GWAS-based SNP selection can be used in breeding for non-cultivated tree species to improve initial growth and reduce genotyping costs for next-generation seedlings.

Collapse

Weber SE, Frisch M, Snowdon RJ, Voss-Fels KP. Haplotype blocks for genomic prediction: a comparative evaluation in multiple crop datasets. FRONTIERS IN PLANT SCIENCE 2023;14:1217589. [PMID: 37731980 PMCID: PMC10507710 DOI: 10.3389/fpls.2023.1217589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 08/21/2023] [Indexed: 09/22/2023]

Abstract

In modern plant breeding, genomic selection is becoming the gold standard for selection of superior genotypes. The basis for genomic prediction models is a set of phenotyped lines along with their genotypic profile. With high marker density and linkage disequilibrium (LD) between markers, genotype data in breeding populations tends to exhibit considerable redundancy. Therefore, interest is growing in the use of haplotype blocks to overcome redundancy by summarizing co-inherited features. Moreover, haplotype blocks can help to capture local epistasis caused by interacting loci. Here, we compared genomic prediction methods that either used single SNPs or haplotype blocks with regards to their prediction accuracy for important traits in crop datasets. We used four published datasets from canola, maize, wheat and soybean. Different approaches to construct haplotype blocks were compared, including blocks based on LD, physical distance, number of adjacent markers and the algorithms implemented in the software "Haploview" and "HaploBlocker". The tested prediction methods included Genomic Best Linear Unbiased Prediction (GBLUP), Extended GBLUP to account for additive by additive epistasis (EGBLUP), Bayesian LASSO and Reproducing Kernel Hilbert Space (RKHS) regression. We found improved prediction accuracy in some traits when using haplotype blocks compared to SNP-based predictions, however the magnitude of improvement was very trait- and model-specific. Especially in settings with low marker density, haplotype blocks can improve genomic prediction accuracy. In most cases, physically large haplotype blocks yielded a strong decrease in prediction accuracy. Especially when prediction accuracy varies greatly across different prediction models, prediction based on haplotype blocks can improve prediction accuracy of underperforming models. However, there is no "best" method to build haplotype blocks, since prediction accuracy varied considerably across methods and traits. Hence, criteria used to define haplotype blocks should not be viewed as fixed biological parameters, but rather as hyperparameters that need to be adjusted for every dataset.

Collapse

Morgante F, Carbonetto P, Wang G, Zou Y, Sarkar A, Stephens M. A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes. PLoS Genet 2023;19:e1010539. [PMID: 37418505 PMCID: PMC10355440 DOI: 10.1371/journal.pgen.1010539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 06/02/2023] [Indexed: 07/09/2023] Open

Canal GB, Oliveira GF, de Almeida FAN, Péres MZ, Moro GLJ, Dos Santos Oliveira WB, Azevedo CF, Nascimento M, da Silva Ferreira MF, Ferreira A. Genomic studies of the additive and dominant genetic control on production traits of Euterpe edulis fruits. Sci Rep 2023;13:9795. [PMID: 37328527 PMCID: PMC10276026 DOI: 10.1038/s41598-023-36970-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 06/13/2023] [Indexed: 06/18/2023] Open

Alemu A, Batista L, Singh PK, Ceplitis A, Chawade A. Haplotype-tagged SNPs improve genomic prediction accuracy for Fusarium head blight resistance and yield-related traits in wheat. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023;136:92. [PMID: 37009920 PMCID: PMC10068637 DOI: 10.1007/s00122-023-04352-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 03/21/2023] [Indexed: 06/19/2023]

Abstract

Linkage disequilibrium (LD)-based haplotyping with subsequent SNP tagging improved the genomic prediction accuracy up to 0.07 and 0.092 for Fusarium head blight resistance and spike width, respectively, across six different models. Genomic prediction is a powerful tool to enhance genetic gain in plant breeding. However, the method is accompanied by various complications leading to low prediction accuracy. One of the major challenges arises from the complex dimensionality of marker data. To overcome this issue, we applied two pre-selection methods for SNP markers viz. LD-based haplotype-tagging and GWAS-based trait-linked marker identification. Six different models were tested with preselected SNPs to predict the genomic estimated breeding values (GEBVs) of four traits measured in 419 winter wheat genotypes. Ten different sets of haplotype-tagged SNPs were selected by adjusting the level of LD thresholds. In addition, various sets of trait-linked SNPs were identified with different scenarios from the training-test combined and only from the training populations. The BRR and RR-BLUP models developed from haplotype-tagged SNPs had a higher prediction accuracy for FHB and SPW by 0.07 and 0.092, respectively, compared to the corresponding models developed without marker pre-selection. The highest prediction accuracy for SPW and FHB was achieved with tagged SNPs pruned at weak LD thresholds (r² < 0.5), while stringent LD was required for spike length (SPL) and flag leaf area (FLA). Trait-linked SNPs identified only from training populations failed to improve the prediction accuracy of the four studied traits. Pre-selection of SNPs via LD-based haplotype-tagging could play a vital role in optimizing genomic selection and reducing genotyping costs. Furthermore, the method could pave the way for developing low-cost genotyping methods through customized genotyping platforms targeting key SNP markers tagged to essential haplotype blocks.

Collapse

Mota LFM, Giannuzzi D, Pegolo S, Trevisi E, Ajmone-Marsan P, Cecchinato A. Integrating on-farm and genomic information improves the predictive ability of milk infrared prediction of blood indicators of metabolic disorders in dairy cows. Genet Sel Evol 2023;55:23. [PMID: 37013482 PMCID: PMC10069109 DOI: 10.1186/s12711-023-00795-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 03/21/2023] [Indexed: 04/05/2023] Open

Abstract

BACKGROUND

Blood metabolic profiles can be used to assess metabolic disorders and to evaluate the health status of dairy cows. Given that these analyses are time-consuming, expensive, and stressful for the cows, there has been increased interest in Fourier transform infrared (FTIR) spectroscopy of milk samples as a rapid, cost-effective alternative for predicting metabolic disturbances. The integration of FTIR data with other layers of information such as genomic and on-farm data (days in milk (DIM) and parity) has been proposed to further enhance the predictive ability of statistical methods. Here, we developed a phenotype prediction approach for a panel of blood metabolites based on a combination of milk FTIR data, on-farm data, and genomic information recorded on 1150 Holstein cows, using BayesB and gradient boosting machine (GBM) models, with tenfold, batch-out and herd-out cross-validation (CV) scenarios.

RESULTS

The predictive ability of these approaches was measured by the coefficient of determination (R²). The results show that, compared to the model that includes only FTIR data, integration of both on-farm (DIM and parity) and genomic information with FTIR data improves the R² for blood metabolites across the three CV scenarios, especially with the herd-out CV: R² values ranged from 5.9 to 17.8% for BayesB, from 8.2 to 16.9% for GBM with the tenfold random CV, from 3.8 to 13.5% for BayesB and from 8.6 to 17.5% for GBM with the batch-out CV, and from 8.4 to 23.0% for BayesB and from 8.1 to 23.8% for GBM with the herd-out CV. Overall, with the model that includes the three sources of data, GBM was more accurate than BayesB with accuracies across the CV scenarios increasing by 7.1% for energy-related metabolites, 10.7% for liver function/hepatic damage, 9.6% for oxidative stress, 6.1% for inflammation/innate immunity, and 11.4% for mineral indicators.

CONCLUSIONS

Our results show that, compared to using only milk FTIR data, a model integrating milk FTIR spectra with on-farm and genomic information improves the prediction of blood metabolic traits in Holstein cattle and that GBM is more accurate in predicting blood metabolites than BayesB, especially for the batch-out CV and herd-out CV scenarios.

Collapse

Qu J, Runcie D, Cheng H. Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits. Genetics 2023;223:6931802. [PMID: 36529897 PMCID: PMC9991502 DOI: 10.1093/genetics/iyac183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 05/06/2022] [Accepted: 11/17/2022] [Indexed: 12/23/2022] Open

Jeon D, Kang Y, Lee S, Choi S, Sung Y, Lee TH, Kim C. Digitalizing breeding in plants: A new trend of next-generation breeding based on genomic prediction. FRONTIERS IN PLANT SCIENCE 2023;14:1092584. [PMID: 36743488 PMCID: PMC9892199 DOI: 10.3389/fpls.2023.1092584] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 01/05/2023] [Indexed: 06/18/2023]

Farooq M, van Dijk AD, Nijveen H, Mansoor S, de Ridder D. Genomic prediction in plants: opportunities for ensemble machine learning based approaches. F1000Res 2023;11:802. [PMID: 37035464 PMCID: PMC10080209 DOI: 10.12688/f1000research.122437.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/04/2023] [Indexed: 01/12/2023] Open

Lopes FB, Baldi F, Brunes LC, Oliveira E Costa MF, da Costa Eifert E, Rosa GJM, Lobo RB, Magnabosco CU. Genomic prediction for meat and carcass traits in Nellore cattle using a Markov blanket algorithm. J Anim Breed Genet 2023;140:1-12. [PMID: 36239216 DOI: 10.1111/jbg.12740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 09/22/2022] [Indexed: 12/13/2022]

Nishio M, Inoue K, Arakawa A, Ichinoseki K, Kobayashi E, Okamura T, Fukuzawa Y, Ogawa S, Taniguchi M, Oe M, Takeda M, Kamata T, Konno M, Takagi M, Sekiya M, Matsuzawa T, Inoue Y, Watanabe A, Kobayashi H, Shibata E, Ohtani A, Yazaki R, Nakashima R, Ishii K. Application of linear and machine learning models to genomic prediction of fatty acid composition in Japanese Black cattle. Anim Sci J 2023;94:e13883. [PMID: 37909231 DOI: 10.1111/asj.13883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 08/29/2023] [Accepted: 09/15/2023] [Indexed: 11/02/2023]

Affiliation(s)

Motohide Nishio Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
Keiichi Inoue National Livestock Breeding Center, Fukushima, Japan University of Miyazaki, Miyazaki, Japan
Aisaku Arakawa Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
Kasumi Ichinoseki National Livestock Breeding Center, Fukushima, Japan
Eiji Kobayashi Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
Toshihiro Okamura Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
Yo Fukuzawa Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
Shinichiro Ogawa Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
Masaaki Taniguchi Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
Mika Oe Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan
Masayuki Takeda National Livestock Breeding Center, Fukushima, Japan
Takehiro Kamata Aomori Prefectural Industrial Technology Research Center, Tsugaru, Japan
Masaru Konno Iwate Agricultural Research Center Animal Industry Research Institute, Takizawa, Japan
Michihiro Takagi Miyagi Prefecture Animal Industry Experiment Station, Osaki, Japan
Mario Sekiya Akita Prefectural Livestock Experiment Station, Daisen, Japan
Tamotsu Matsuzawa Livestock Research Centre, Fukushima Agricultural Technology Centre, Fukushima, Japan
Yoshinobu Inoue Tottori Prefectural Livestock Research Center, Tottori, Japan
Akihiro Watanabe Shimane Prefectural Livestock Technology Center, Izumo, Japan
Hiroshi Kobayashi Institute of Animal Production Okayama Prefectural Technology Center for Agriculture, Forestry and Fisheries, Misaki, Japan
Eri Shibata Hiroshima Prefectural Technology Research Institute, Livestock Technology Research Center, Shobara, Japan
Akihumi Ohtani Yamaguchi Prefectural Agriculture and Forestry General Technology Center, Mine, Japan
Ryu Yazaki Oita Prefectural Agriculture, Forestry, and Fisheries Research Center, Takeda, Japan
Ryotaro Nakashima Cattle Breeding Development Institute of Kagoshima Prefecture, Soo, Japan
Kazuo Ishii Institute of Livestock and Grassland Science, NARO, Tsukuba, Japan

Collapse

Gianola D, Fernando RL, Schön CC. Inference about quantitative traits under selection: a Bayesian revisitation for the post-genomic era. Genet Sel Evol 2022;54:78. [PMID: 36460973 PMCID: PMC9716705 DOI: 10.1186/s12711-022-00765-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 10/26/2022] [Indexed: 12/03/2022] Open

Abstract

BACKGROUND

Selection schemes distort inference when estimating differences between treatments or genetic associations between traits, and may degrade prediction of outcomes, e.g., the expected performance of the progeny of an individual with a certain genotype. If input and output measurements are not collected on random samples, inferences and predictions must be biased to some degree. Our paper revisits inference in quantitative genetics when using samples stemming from some selection process. The approach used integrates the classical notion of fitness with that of missing data. Treatment is fully Bayesian, with inference and prediction dealt with, in an unified manner. While focus is on animal and plant breeding, concepts apply to natural selection as well. Examples based on real data and stylized models illustrate how selection can be accounted for in four different situations, and sometimes without success.

RESULTS

Our flexible "soft selection" setting helps to diagnose the extent to which selection can be ignored. The clear connection between probability of missingness and the concept of fitness in stylized selection scenarios is highlighted. It is not realistic to assume that a fixed selection threshold t holds in conceptual replication, as the chance of selection depends on observed and unobserved data, and on unequal amounts of information over individuals, aspects that a "soft" selection representation addresses explicitly. There does not seem to be a general prescription to accommodate potential distortions due to selection. In structures that combine cross-sectional, longitudinal and multi-trait data such as in animal breeding, balance is the exception rather than the rule. The Bayesian approach provides an integrated answer to inference, prediction and model choice under selection that goes beyond the likelihood-based approach, where breeding values are inferred indirectly.

CONCLUSIONS

The approach used here for inference and prediction under selection may or may not yield the best possible answers. One may believe that selection has been accounted for diligently, but the central problem of whether statistical inferences are good or bad does not have an unambiguous solution. On the other hand, the quality of predictions can be gauged empirically via appropriate training-testing of competing methods.

Collapse

Cappa EP, Chen C, Klutsch JG, Sebastian-Azcona J, Ratcliffe B, Wei X, Da Ros L, Ullah A, Liu Y, Benowicz A, Sadoway S, Mansfield SD, Erbilgin N, Thomas BR, El-Kassaby YA. Multiple-trait analyses improved the accuracy of genomic prediction and the power of genome-wide association of productivity and climate change-adaptive traits in lodgepole pine. BMC Genomics 2022;23:536. [PMID: 35870886 PMCID: PMC9308220 DOI: 10.1186/s12864-022-08747-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 07/08/2022] [Indexed: 11/10/2022] Open

Abstract

Background

Genomic prediction (GP) and genome-wide association (GWA) analyses are currently being employed to accelerate breeding cycles and to identify alleles or genomic regions of complex traits in forest trees species. Here, 1490 interior lodgepole pine (Pinus contorta Dougl. ex. Loud. var. latifolia Engelm) trees from four open-pollinated progeny trials were genotyped with 25,099 SNPs, and phenotyped for 15 growth, wood quality, pest resistance, drought tolerance, and defense chemical (monoterpenes) traits. The main objectives of this study were to: (1) identify genetic markers associated with these traits and determine their genetic architecture, and to compare the marker detected by single- (ST) and multiple-trait (MT) GWA models; (2) evaluate and compare the accuracy and control of bias of the genomic predictions for these traits underlying different ST and MT parametric and non-parametric GP methods. GWA, ST and MT analyses were compared using a linear transformation of genomic breeding values from the respective genomic best linear unbiased prediction (GBLUP) model. GP, ST and MT parametric and non-parametric (Reproducing Kernel Hilbert Spaces, RKHS) models were compared in terms of prediction accuracy (PA) and control of bias.

Results

MT-GWA analyses identified more significant associations than ST. Some SNPs showed potential pleiotropic effects. Averaging across traits, PA from the studied ST-GP models did not differ significantly from each other, with generally a slight superiority of the RKHS method. MT-GP models showed significantly higher PA (and lower bias) than the ST models, being generally the PA (bias) of the RKHS approach significantly higher (lower) than the GBLUP.

Conclusions

The power of GWA and the accuracy of GP were improved when MT models were used in this lodgepole pine population. Given the number of GP and GWA models fitted and the traits assessed across four progeny trials, this work has produced the most comprehensive empirical genomic study across any lodgepole pine population to date.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12864-022-08747-7.

Collapse

Ashraf B, Hunter DC, Bérénos C, Ellis PA, Johnston SE, Pilkington JG, Pemberton JM, Slate J. Genomic prediction in the wild: A case study in Soay sheep. Mol Ecol 2022;31:6541-6555. [PMID: 34719074 DOI: 10.1111/mec.16262] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 10/13/2021] [Accepted: 10/25/2021] [Indexed: 01/13/2023]

Rooney TE, Kunze KH, Sorrells ME. Genome-wide marker effect heterogeneity is associated with a large effect dormancy locus in winter malting barley. THE PLANT GENOME 2022;15:e20247. [PMID: 35971877 DOI: 10.1002/tpg2.20247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/20/2022] [Indexed: 06/15/2023]

Li H, Wang Z, Xu L, Li Q, Gao H, Ma H, Cai W, Chen Y, Gao X, Zhang L, Gao H, Zhu B, Xu L, Li J. Genomic prediction of carcass traits using different haplotype block partitioning methods in beef cattle. Evol Appl 2022;15:2028-2042. [PMID: 36540636 PMCID: PMC9753827 DOI: 10.1111/eva.13491] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 09/18/2022] [Indexed: 09/22/2023] Open

Abstract

Genomic prediction (GP) based on haplotype alleles can capture quantitative trait loci (QTL) effects and increase predictive ability because the haplotypes are expected to be in linkage disequilibrium (LD) with QTL. In this study, we constructed haploblocks using LD-based and the fixed number of single nucleotide polymorphisms (fixed-SNP) methods with Illumina BovineHD chip in beef cattle. To evaluate the performance of different haplotype block partitioning methods, we constructed haploblocks based on LD thresholds (from r 2 > 0.2 to r 2 > 0.8) and the number of fixed-SNPs (5, 10, 20). The performance of predictive methods for three carcass traits including liveweight (LW), dressing percentage (DP), and longissimus dorsi muscle weight (LDMW) was evaluated using three approaches (GBLUP and BayesB model based on the SNP, GHBLUP, and BayesBH models based on the haploblock, and GHBLUP+GBLUP and BayesBH+BayesB models based on the combined haploblock and the nonblocked SNPs, which were located between blocks). In this study, we found the accuracies of LD-based and fixed-SNP haplotype Bayesian methods outperformed the Bayesian models (up to 8.54 ± 7.44% and 5.74 ± 2.95%, respectively). GHBLUP showed a high improvement (up to 11.29 ± 9.87%) compared with GBLUP. The Bayesian models have higher accuracies than BLUP models in most scenarios. The average computing time of the BayesBH+BayesB model can reduce by 29.3% compared with the BayesB model. The prediction accuracies using the LD-based haplotype method showed higher improvements than the fixed-SNP haplotype method. In addition, to avoid the influence of rare haplotypes generated from haplotype construction, we compared the performance of GP by filtering four types of minor haplotype allele frequency (MHAF) (0.01, 0.025, 0.05, and 0.1) under different conditions (LD levels were set at r 2 > 0.3, and the fixed number of SNPs was 5). We found the optimal MHAF threshold for LW was 0.01, and the optimal MHAF threshold for DP and LDMW was 0.025.

Collapse

Affiliation(s)

Hongwei Li Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
Zezhao Wang Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
Lei Xu Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
Qian Li Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
Han Gao Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
Haoran Ma Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
Wentao Cai Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
Yan Chen Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
Xue Gao Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
Lupei Zhang Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
Huijiang Gao Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
Bo Zhu Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
Lingyang Xu Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina
Junya Li Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal SciencesChinese Academy of Agricultural SciencesBeijingChina

Collapse

An Improved Bayesian Shrinkage Regression Algorithm for Genomic Selection. Genes (Basel) 2022;13:genes13122193. [PMID: 36553460 PMCID: PMC9778053 DOI: 10.3390/genes13122193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/14/2022] [Accepted: 11/18/2022] [Indexed: 11/25/2022] Open

Abstract

Currently a hot topic, genomic selection (GS) has consistently provided powerful support for breeding studies and achieved more comprehensive and reliable selection in animal and plant breeding. GS estimates the effects of all single nucleotide polymorphisms (SNPs) and thereby predicts the genomic estimation of breeding value (GEBV), accelerating breeding progress and overcoming the limitations of conventional breeding. The successful application of GS primarily depends on the accuracy of the GEBV. Adopting appropriate advanced algorithms to improve the accuracy of the GEBV is time-saving and efficient for breeders, and the available algorithms can be further improved in the big data era. In this study, we develop a new algorithm under the Bayesian Shrinkage Regression (BSR, which is called BayesA) framework, an improved expectation-maximization algorithm for BayesA (emBAI). The emBAI algorithm first corrects the polygenic and environmental noise and then calculates the GEBV by emBayesA. We conduct two simulation experiments and a real dataset analysis for flowering time-related Arabidopsis phenotypes to validate the new algorithm. Compared to established methods, emBAI is more powerful in terms of prediction accuracy, mean square error (MSE), mean absolute error (MAE), the area under the receiver operating characteristic curve (AUC) and correlation of prediction in simulation studies. In addition, emBAI performs well under the increasing genetic background. The analysis of the Arabidopsis real dataset further illustrates the benefits of emBAI for genomic prediction according to prediction accuracy, MSE, MAE and correlation of prediction. Furthermore, the new method shows the advantages of significant loci detection and effect coefficient estimation, which are confirmed by The Arabidopsis Information Resource (TAIR) gene bank. In conclusion, the emBAI algorithm provides powerful support for GS in high-dimensional genomic datasets.

Collapse

Nazzicari N, Biscarini F. Stacked kinship CNN vs. GBLUP for genomic predictions of additive and complex continuous phenotypes. Sci Rep 2022;12:19889. [PMID: 36400808 PMCID: PMC9674857 DOI: 10.1038/s41598-022-24405-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022] Open

John M, Haselbeck F, Dass R, Malisi C, Ricca P, Dreischer C, Schultheiss SJ, Grimm DG. A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species. FRONTIERS IN PLANT SCIENCE 2022;13:932512. [PMID: 36407627 PMCID: PMC9673477 DOI: 10.3389/fpls.2022.932512] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 07/25/2022] [Indexed: 06/16/2023]

A divide-and-conquer approach for genomic prediction in rubber tree using machine learning. Sci Rep 2022;12:18023. [PMID: 36289298 PMCID: PMC9605989 DOI: 10.1038/s41598-022-20416-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 09/13/2022] [Indexed: 01/20/2023] Open

Zuffo LT, DeLima RO, Lübberstedt T. Combining datasets for maize root seedling traits increases the power of GWAS and genomic prediction accuracies. JOURNAL OF EXPERIMENTAL BOTANY 2022;73:5460-5473. [PMID: 35608947 PMCID: PMC9467658 DOI: 10.1093/jxb/erac236] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Accepted: 06/06/2022] [Indexed: 05/13/2023]

A joint learning approach for genomic prediction in polyploid grasses. Sci Rep 2022;12:12499. [PMID: 35864135 PMCID: PMC9304331 DOI: 10.1038/s41598-022-16417-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 07/11/2022] [Indexed: 12/20/2022] Open

Farooq M, van Dijk AD, Nijveen H, Mansoor S, de Ridder D. Genomic prediction in plants: opportunities for ensemble machine learning based approaches. F1000Res 2022;11:802. [PMID: 37035464 PMCID: PMC10080209 DOI: 10.12688/f1000research.122437.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/08/2022] [Indexed: 12/15/2022] Open

Li P, Hao H, Mao X, Xu J, Lv Y, Chen W, Ge D, Zhang Z. Convolutional neural network-based applied research on the enrichment of heavy metals in the soil-rice system in China. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022;29:53642-53655. [PMID: 35290576 DOI: 10.1007/s11356-022-19640-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 03/05/2022] [Indexed: 06/14/2023]

Ye H, Zhang Z, Ren D, Cai X, Zhu Q, Ding X, Zhang H, Zhang Z, Li J. Genomic Prediction Using LD-Based Haplotypes in Combined Pig Populations. Front Genet 2022;13:843300. [PMID: 35754827 PMCID: PMC9218795 DOI: 10.3389/fgene.2022.843300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Accepted: 05/02/2022] [Indexed: 11/13/2022] Open

Abstract

The size of reference population is an important factor affecting genomic prediction. Thus, combining different populations in genomic prediction is an attractive way to improve prediction ability. However, combining multireference population roughly cannot increase the prediction accuracy as well as expected in pig. This may be due to different linkage disequilibrium (LD) pattern differences between population. In this study, we used the imputed whole-genome sequencing (WGS) data to construct LD-based haplotypes for genomic prediction in combined population to explore the impact of different single-nucleotide polymorphism (SNP) densities, variant representation (SNPs or haplotype alleles), and reference population size on the prediction accuracy for reproduction traits. Our results showed that genomic best linear unbiased prediction (GBLUP) using the WGS data can improve prediction accuracy in multi-population but not within-population. Not only the genomic prediction accuracy of the haplotype method using 80 K chip data in multi-population but also GBLUP for the multi-population (3.4–5.9%) was higher than that within-population (1.2–4.3%). More importantly, we have found that using the haplotype method based on the WGS data in multi-population has better genomic prediction performance, and our results showed that building haploblock in this scenario based on low LD threshold (r² = 0.2–0.3) produced an optimal set of variables for reproduction traits in Yorkshire pig population. Our results suggested that whether the use of the haplotype method based on the chip data or GBLUP (individual SNP method) based on the WGS data were beneficial for genomic prediction in multi-population, while simultaneously combining the haplotype method and WGS data was a better strategy for multi-population genomic evaluation.

Collapse

Affiliation(s)

Haoqiang Ye Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China
Zipeng Zhang Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
Duanyang Ren Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China
Xiaodian Cai Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China
Qianghui Zhu Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China
Xiangdong Ding Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
Hao Zhang Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China
Zhe Zhang Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China
Jiaqi Li Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China

Collapse

Mancin E, Mota LFM, Tuliozi B, Verdiglione R, Mantovani R, Sartori C. Improvement of Genomic Predictions in Small Breeds by Construction of Genomic Relationship Matrix Through Variable Selection. Front Genet 2022;13:814264. [PMID: 35664297 PMCID: PMC9158133 DOI: 10.3389/fgene.2022.814264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open

Abstract

Genomic selection has been increasingly implemented in the animal breeding industry, and it is becoming a routine method in many livestock breeding contexts. However, its use is still limited in several small-population local breeds, which are, nonetheless, an important source of genetic variability of great economic value. A major roadblock for their genomic selection is accuracy when population size is limited: to improve breeding value accuracy, variable selection models that assume heterogenous variance have been proposed over the last few years. However, while these models might outperform traditional and genomic predictions in terms of accuracy, they also carry a proportional increase of breeding value bias and dispersion. These mutual increases are especially striking when genomic selection is performed with a low number of phenotypes and high shrinkage value—which is precisely the situation that happens with small local breeds. In our study, we tested several alternative methods to improve the accuracy of genomic selection in a small population. First, we investigated the impact of using only a subset of informative markers regarding prediction accuracy, bias, and dispersion. We used different algorithms to select them, such as recursive feature eliminations, penalized regression, and XGBoost. We compared our results with the predictions of pedigree-based BLUP, single-step genomic BLUP, and weighted single-step genomic BLUP in different simulated populations obtained by combining various parameters in terms of number of QTLs and effective population size. We also investigated these approaches on a real data set belonging to the small local Rendena breed. Our results show that the accuracy of GBLUP in small-sized populations increased when performed with SNPs selected via variable selection methods both in simulated and real data sets. In addition, the use of variable selection models—especially those using XGBoost—in our real data set did not impact bias and the dispersion of estimated breeding values. We have discussed possible explanations for our results and how our study can help estimate breeding values for future genomic selection in small breeds.

Collapse

Wolc A, Dekkers JCM. Application of Bayesian genomic prediction methods to genome-wide association analyses. Genet Sel Evol 2022;54:31. [PMID: 35562659 PMCID: PMC9103490 DOI: 10.1186/s12711-022-00724-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open

Building a Calibration Set for Genomic Prediction, Characteristics to Be Considered, and Optimization Approaches. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022;2467:77-112. [PMID: 35451773 DOI: 10.1007/978-1-0716-2205-6_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Infrared Predictions Are a Valuable Alternative to Actual Measures of Dry-Cured Ham Weight Loss in the Training of Genome-Enabled Prediction Models. Animals (Basel) 2022;12:ani12070814. [PMID: 35405804 PMCID: PMC8996942 DOI: 10.3390/ani12070814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 03/18/2022] [Accepted: 03/21/2022] [Indexed: 11/17/2022] Open

Sánchez-Mayor M, Riggio V, Navarro P, Gutiérrez-Gil B, Haley CS, De la Fuente LF, Arranz JJ, Pong-Wong R. Effect of genotyping strategies on the sustained benefit of single-step genomic BLUP over multiple generations. Genet Sel Evol 2022;54:23. [PMID: 35303797 PMCID: PMC8931970 DOI: 10.1186/s12711-022-00712-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 02/28/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Single-step genomic best linear unbiased prediction (ssGBLUP) allows the inclusion of information from genotyped and ungenotyped individuals in a single analysis. This avoids the need to genotype all candidates with the potential benefit of reducing overall costs. The aim of this study was to assess the effect of genotyping strategies, the proportion of genotyped candidates and the genotyping criterion to rank candidates to be genotyped, when using ssGBLUP evaluation. A simulation study was carried out assuming selection over several discrete generations where a proportion of the candidates were genotyped and evaluation was done using ssGBLUP. The scenarios compared were: (i) three genotyping strategies defined by their protocol for choosing candidates to be genotyped (RANDOM: candidates were chosen at random; TOP: candidates with the best genotyping criterion were genotyped; and EXTREME: candidates with the best and worse criterion were genotyped); (ii) eight proportions of genotyped candidates (p); and (iii) two genotyping criteria to rank candidates to be genotyped (candidates' own phenotype or estimated breeding values). The criteria of the comparison were the cumulated gain and reliability of the genomic estimated breeding values (GEBV).

RESULTS

The genotyping strategy with the greatest cumulated gain was TOP followed by RANDOM, with EXTREME behaving as RANDOM at low p and as TOP with high p. However, the reliability of GEBV was higher with RANDOM than with TOP. This disparity between the trend of the gain and the reliability is due to the TOP scheme genotyping the candidates with the greater chances of being selected. The extra gain obtained with TOP increases when the accuracy of the selection criterion to rank candidates to be genotyped increases.

CONCLUSIONS

The best strategy to maximise genetic gain when only a proportion of the candidates are to be genotyped is TOP, since it prioritises the genotyping of candidates which are more likely to be selected. However, the strategy with the greatest GEBV reliability does not achieve the largest gain, thus reliability cannot be considered as an absolute and sufficient criterion for determining the scheme which maximises genetic gain.

Collapse

Estimating genetic variance contributed by a quantitative trait locus: A random model approach. PLoS Comput Biol 2022;18:e1009923. [PMID: 35275920 PMCID: PMC8942241 DOI: 10.1371/journal.pcbi.1009923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 03/23/2022] [Accepted: 02/13/2022] [Indexed: 11/20/2022] Open

Abstract

Detecting quantitative trait loci (QTL) and estimating QTL variances (represented by the squared QTL effects) are two main goals of QTL mapping and genome-wide association studies (GWAS). However, there are issues associated with estimated QTL variances and such issues have not attracted much attention from the QTL mapping community. Estimated QTL variances are usually biased upwards due to estimation being associated with significance tests. The phenomenon is called the Beavis effect. However, estimated variances of QTL without significance tests can also be biased upwards, which cannot be explained by the Beavis effect; rather, this bias is due to the fact that QTL variances are often estimated as the squares of the estimated QTL effects. The parameters are the QTL effects and the estimated QTL variances are obtained by squaring the estimated QTL effects. This square transformation failed to incorporate the errors of estimated QTL effects into the transformation. The consequence is biases in estimated QTL variances. To correct the biases, we can either reformulate the QTL model by treating the QTL effect as random and directly estimate the QTL variance (as a variance component) or adjust the bias by taking into account the error of the estimated QTL effect. A moment method of estimation has been proposed to correct the bias. The method has been validated via Monte Carlo simulation studies. The method has been applied to QTL mapping for the 10-week-body-weight trait from an F₂ mouse population.

One of the goals of QTL mapping and GWAS is to quantify the size of a QTL, which is measured by the QTL variance or the proportion of trait variance explained by the QTL. The effect of a QTL appears in a linear or linear mixed model as a regression coefficient and defined as a fixed effect. The estimated QTL variance in conventional QTL mapping studies takes the square of the estimated QTL effect. This is a biased estimate of QTL variance. An unbiased estimate of the QTL variance should be obtained by (1) treating the QTL effect as random and estimating the variance of the random effect or (2) adjusting the squared estimated QTL effect by the squared estimation error. We proved that the two methods are identical. We further proved that the usual R² (goodness of fit) in regression analysis is equivalent to the biased QTL heritability while the adjusted R² is equivalent to the bias corrected QTL heritability.

Collapse

Yang L, Qu Q, Hao Z, Sha K, Li Z, Li S. Powerful Identification of Large Quantitative Trait Loci Using Genome-wide R/glmnet-Based Regression. J Hered 2022;113:472-478. [PMID: 35134967 DOI: 10.1093/jhered/esac006] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 02/02/2022] [Indexed: 11/14/2022] Open