1
|
Anilkumar C, Muhammed Azharudheen TP, Sah RP, Sunitha NC, Devanna BN, Marndi BC, Patra BC. Gene based markers improve precision of genome-wide association studies and accuracy of genomic predictions in rice breeding. Heredity (Edinb) 2023; 130:335-345. [PMID: 36792661 PMCID: PMC10163052 DOI: 10.1038/s41437-023-00599-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 02/02/2023] [Accepted: 02/03/2023] [Indexed: 02/17/2023] Open
Abstract
It is hypothesized that the genome-wide genic markers may increase the prediction accuracy of genomic selection for quantitative traits. To test this hypothesis, a set of candidate gene-based markers for yield and grain traits-related genes cloned across the rice genome were custom-designed. A multi-model, multi-locus genome-wide association study (GWAS) was performed using new genic markers developed to test their effectiveness for gene discovery. Two multi-locus models, FarmCPU and mrMLM, along with a single-locus mixed linear model (MLM), identified 28 significant marker-trait associations. These associations revealed novel causative alleles for grain weight and pleiotropic associations with other traits. For instance, the marker YD91 derived from the gene OsAAP3 on chromosome 1 was consistently associated with grain weight, while the gene has a significant effect on grain yield. Furthermore, nine genomic selection methods, including regression-based and machine learning-based models, were used to predict grain weight using a leave-one-out five-fold cross-validation approach to optimize the genomic selection model with genic markers. Among nine prediction models, Kernel Hilbert Space Regression (RKHS) is the best among regression-based models, and Random Forest Regression (RFR) is the best among machine learning-based models. Genomic prediction accuracies with and without GWAS significant markers were compared to assess the effectiveness of markers. The rapid decreases in prediction accuracy upon dropping GWAS significant markers indicate the effectiveness of new genic markers in genomic selection. Apart from that, the candidate gene-based markers were found to be more effective in genomic selection programs for better accuracy.
Collapse
|
2
|
Klosa J, Simon N, Liebscher V, Wittenburg D. A Fitted Sparse-Group Lasso for Genome-Based Evaluations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:30-38. [PMID: 35254989 DOI: 10.1109/tcbb.2022.3156805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
In life sciences, high-throughput techniques typically lead to high-dimensional data and often the number of covariates is much larger than the number of observations. This inherently comes with multicollinearity challenging a statistical analysis in a linear regression framework. Penalization methods such as the lasso, ridge regression, the group lasso, and convex combinations thereof, which introduce additional conditions on regression variables, have proven themselves effective. In this study, we introduce a novel approach by combining the lasso and the standardized group lasso leading to meaningful weighting of the predicted ("fitted") outcome which is of primary importance, e.g., in breeding populations. This "fitted" sparse-group lasso was implemented as a proximal-averaged gradient descent method and is part of the R package "seagull" available at CRAN. For the evaluation of the novel method, we executed an extensive simulation study. We simulated genotypes and phenotypes which resemble data of a dairy cattle population. Genotypes at thousands of genomic markers were used as covariates to fit a quantitative response. The proximity of markers on a chromosome determined grouping. In the majority of simulated scenarios, the new method revealed improved prediction abilities compared to other penalization approaches and was able to localize the signals of simulated features.
Collapse
|
3
|
Gowane GR, Alex R, Mukherjee A, Vohra V. Impact and utility of shallow pedigree using single-step genomic BLUP for prediction of unbiased genomic breeding values. Trop Anim Health Prod 2022; 54:339. [DOI: 10.1007/s11250-022-03340-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 10/04/2022] [Indexed: 11/28/2022]
|
4
|
Bonnett D, Li Y, Crossa J, Dreisigacker S, Basnet B, Pérez-Rodríguez P, Alvarado G, Jannink JL, Poland J, Sorrells M. Response to Early Generation Genomic Selection for Yield in Wheat. FRONTIERS IN PLANT SCIENCE 2022; 12:718611. [PMID: 35087542 PMCID: PMC8787636 DOI: 10.3389/fpls.2021.718611] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 10/22/2021] [Indexed: 06/14/2023]
Abstract
We investigated increasing genetic gain for grain yield using early generation genomic selection (GS). A training set of 1,334 elite wheat breeding lines tested over three field seasons was used to generate Genomic Estimated Breeding Values (GEBVs) for grain yield under irrigated conditions applying markers and three different prediction methods: (1) Genomic Best Linear Unbiased Predictor (GBLUP), (2) GBLUP with the imputation of missing genotypic data by Ridge Regression BLUP (rrGBLUP_imp), and (3) Reproducing Kernel Hilbert Space (RKHS) a.k.a. Gaussian Kernel (GK). F2 GEBVs were generated for 1,924 individuals from 38 biparental cross populations between 21 parents selected from the training set. Results showed that F2 GEBVs from the different methods were not correlated. Experiment 1 consisted of selecting F2s with the highest average GEBVs and advancing them to form genomically selected bulks and make intercross populations aiming to combine favorable alleles for yield. F4:6 lines were derived from genomically selected bulks, intercrosses, and conventional breeding methods with similar numbers from each. Results of field-testing for Experiment 1 did not find any difference in yield with genomic compared to conventional selection. Experiment 2 compared the predictive ability of the different GEBV calculation methods in F2 using a set of single plant-derived F2:4 lines from randomly selected F2 plants. Grain yield results from Experiment 2 showed a significant positive correlation between observed yields of F2:4 lines and predicted yield GEBVs of F2 single plants from GK (the predictive ability of 0.248, P < 0.001) and GBLUP (0.195, P < 0.01) but no correlation with rrGBLUP_imp. Results demonstrate the potential for the application of GS in early generations of wheat breeding and the importance of using the appropriate statistical model for GEBV calculation, which may not be the same as the best model for inbreds.
Collapse
Affiliation(s)
- David Bonnett
- International Maize and Wheat Improvement Center, Texcoco, Mexico
- BASF Wheat Breeding, Sabin, MN, United States
| | - Yongle Li
- School of Agriculture, Food and Wine, Faculty of Sciences, The University of Adelaide, Adelaide, SA, Australia
| | - Jose Crossa
- International Maize and Wheat Improvement Center, Texcoco, Mexico
- Colegio de Postgraduados, Texcoco, Mexico
| | | | - Bhoja Basnet
- International Maize and Wheat Improvement Center, Texcoco, Mexico
| | | | - G. Alvarado
- International Maize and Wheat Improvement Center, Texcoco, Mexico
| | - J. L. Jannink
- USDA-ARS, Robert W. Holley Center for Agriculture and Health, Ithaca, NY, United States
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Jesse Poland
- Department of Plant Pathology, Kansas State University, Manhattan, KS, United States
| | - Mark Sorrells
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| |
Collapse
|
5
|
Rios EF, Andrade MHML, Resende MFR, Kirst M, de Resende MDV, de Almeida Filho JE, Gezan SA, Munoz P. Genomic prediction in family bulks using different traits and cross-validations in pine. G3-GENES GENOMES GENETICS 2021; 11:6321952. [PMID: 34544139 PMCID: PMC8496210 DOI: 10.1093/g3journal/jkab249] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 07/02/2021] [Indexed: 11/13/2022]
Abstract
Genomic prediction integrates statistical, genomic, and computational tools to improve the estimation of breeding values and increase genetic gain. Due to the broad diversity in mating systems, breeding schemes, propagation methods, and unit of selection, no universal genomic prediction approach can be applied in all crops. In a genome-wide family prediction (GWFP) approach, the family is the basic unit of selection. We tested GWFP in two loblolly pine (Pinus taeda L.) datasets: a breeding population composed of 63 full-sib families (5–20 individuals per family), and a simulated population with the same pedigree structure. In both populations, phenotypic and genomic data was pooled at the family level in silico. Marker effects were estimated to compute genomic estimated breeding values (GEBV) at the individual and family (GWFP) levels. Less than six individuals per family produced inaccurate estimates of family phenotypic performance and allele frequency. Tested across different scenarios, GWFP predictive ability was higher than those for GEBV in both populations. Validation sets composed of families with similar phenotypic mean and variance as the training population yielded predictions consistently higher and more accurate than other validation sets. Results revealed potential for applying GWFP in breeding programs whose selection unit are family, and for systems where family can serve as training sets. The GWFP approach is well suited for crops that are routinely genotyped and phenotyped at the plot-level, but it can be extended to other breeding programs. Higher predictive ability obtained with GWFP would motivate the application of genomic prediction in these situations.
Collapse
Affiliation(s)
- Esteban F Rios
- Agronomy Department, University of Florida, Gainesville, FL 32611, USA
| | | | - Marcio F R Resende
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Matias Kirst
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611, USA
| | - Marcos D V de Resende
- EMBRAPA Café/Department of Statistics, Federal University of Viçosa, Avenida PH Rolfs S/N, Viçosa 36570-000, Brazil
| | | | | | - Patricio Munoz
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
6
|
Casellas J, Martín de Hijas-Villalba M, Vázquez-Gómez M, Id-Lahoucine S. Low-coverage whole-genome sequencing in livestock species for individual traceability and parentage testing. Livest Sci 2021. [DOI: 10.1016/j.livsci.2021.104629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
7
|
Gaynor RC, Gorjanc G, Hickey JM. AlphaSimR: an R package for breeding program simulations. G3-GENES GENOMES GENETICS 2021; 11:6025179. [PMID: 33704430 PMCID: PMC8022926 DOI: 10.1093/g3journal/jkaa017] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 11/05/2020] [Indexed: 01/03/2023]
Abstract
This paper introduces AlphaSimR, an R package for stochastic simulations of plant and animal breeding programs. AlphaSimR is a highly flexible software package able to simulate a wide range of plant and animal breeding programs for diploid and autopolyploid species. AlphaSimR is ideal for testing the overall strategy and detailed design of breeding programs. AlphaSimR utilizes a scripting approach to building simulations that is particularly well suited for modeling highly complex breeding programs, such as commercial breeding programs. The primary benefit of this scripting approach is that it frees users from preset breeding program designs and allows them to model nearly any breeding program design. This paper lists the main features of AlphaSimR and provides a brief example simulation to show how to use the software.
Collapse
Affiliation(s)
- R Chris Gaynor
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Research Centre, Midlothian EH25 9RG, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Research Centre, Midlothian EH25 9RG, UK
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Research Centre, Midlothian EH25 9RG, UK
| |
Collapse
|
8
|
Obšteter J, Jenko J, Hickey JM, Gorjanc G. Efficient use of genomic information for sustainable genetic improvement in small cattle populations. J Dairy Sci 2019; 102:9971-9982. [PMID: 31477287 DOI: 10.3168/jds.2019-16853] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 07/13/2019] [Indexed: 11/19/2022]
Abstract
In this study, we compared genetic gain, genetic variation, and the efficiency of converting variation into gain under different genomic selection scenarios with truncation or optimum contribution selection in a small dairy population by simulation. Breeding programs have to maximize genetic gain but also ensure sustainability by maintaining genetic variation. Numerous studies have shown that genomic selection increases genetic gain. Although genomic selection is a well-established method, small populations still struggle with choosing the most sustainable strategy to adopt this type of selection. We developed a simulator of a dairy population and simulated a model after the Slovenian Brown Swiss population with ∼10,500 cows. We compared different truncation selection scenarios by varying (1) the method of sire selection and their use on cows or bull-dams, and (2) selection intensity and the number of years a sire is in use. Furthermore, we compared different optimum contribution selection scenarios with optimization of sire selection and their usage. We compared scenarios in terms of genetic gain, selection accuracy, generation interval, genetic and genic variance, rate of coancestry, effective population size, and conversion efficiency. The results showed that early use of genomically tested sires increased genetic gain compared with progeny testing, as expected from changes in selection accuracy and generation interval. A faster turnover of sires from year to year and higher intensity increased the genetic gain even further but increased the loss of genetic variation per year. Although maximizing intensity gave the lowest conversion efficiency, faster turnover of sires gave an intermediate conversion efficiency. The largest conversion efficiency was achieved with the simultaneous use of genomically and progeny-tested sires that were used over several years. Compared with truncation selection, optimizing sire selection and their usage increased the conversion efficiency by achieving either comparable genetic gain for a smaller loss of genetic variation or higher genetic gain for a comparable loss of genetic variation. Our results will help breeding organizations implement sustainable genomic selection.
Collapse
Affiliation(s)
- J Obšteter
- Department of Animal Science, Agricultural Institute of Slovenia, Hacquetova ulica 17, 1000 Ljubljana, Slovenia.
| | - J Jenko
- Department of Animal Science, Agricultural Institute of Slovenia, Hacquetova ulica 17, 1000 Ljubljana, Slovenia; Geno Breeding and A.I. Association, Storhamargata 44, 2317 Hamar, Norway
| | - J M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, United Kingdom
| | - G Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, United Kingdom; Biotechnical Faculty, University of Ljubljana, Jamnikarjeva 101, 1000 Ljubljana, Slovenia
| |
Collapse
|
9
|
Genomic Prediction of Additive and Non-additive Effects Using Genetic Markers and Pedigrees. G3-GENES GENOMES GENETICS 2019; 9:2739-2748. [PMID: 31263059 PMCID: PMC6686920 DOI: 10.1534/g3.119.201004] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The genetic merit of individuals can be estimated using models with dense markers and pedigree information. Early genomic models accounted only for additive effects. However, the prediction of non-additive effects is important for different forest breeding systems where the whole genotypic value can be captured through clonal propagation. In this study, we evaluated the integration of marker data with pedigree information, in models that included or ignored non-additive effects. We tested the models Reproducing Kernel Hilbert Spaces (RKHS) and BayesA, with additive and additive-dominance frameworks. Model performance was assessed for the traits tree height, diameter at breast height and rust resistance, measured in 923 pine individuals from a structured population of 71 full-sib families. We have also simulated a population with similar genetic properties and evaluated the performance of models for six simulated traits with distinct genetic architectures. Different cross validation strategies were evaluated, and highest accuracies were achieved using within family cross validation. The inclusion of pedigree information in genomic prediction models did not yield higher accuracies. The different RKHS models resulted in similar predictions accuracies, and RKHS and BayesA generated substantially better predictions than pedigree-only models. The additive-BayesA resulted in higher accuracies than RKHS for rust incidence and in simulated additive-oligogenic traits. For DBH, HT and additive-dominance polygenic traits, the RKHS- based models showed slightly higher accuracies than BayesA. Our results indicate that BayesA performs the best for traits with few genes with major effects, while RKHS based models can best predict genotypic effects for clonal selection of complex traits.
Collapse
|
10
|
Gowane GR, Lee SH, Clark S, Moghaddar N, Al-Mamun HA, van der Werf JHJ. Effect of selection and selective genotyping for creation of reference on bias and accuracy of genomic prediction. J Anim Breed Genet 2019; 136:390-407. [PMID: 31215699 DOI: 10.1111/jbg.12420] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Revised: 05/22/2019] [Accepted: 05/23/2019] [Indexed: 01/17/2023]
Abstract
Reference populations for genomic selection usually involve selected individuals, which may result in biased prediction of estimated genomic breeding values (GEBV). In a simulation study, bias and accuracy of GEBV were explored for various genetic models with individuals selectively genotyped in a typical nucleus breeding program. We compared the performance of three existing methods, that is, Best Linear Unbiased Prediction of breeding values using pedigree-based relationships (PBLUP), genomic relationships for genotyped animals only (GBLUP) and a Single-Step approach (SSGBLUP) using both. For a scenario with no-selection and random mating (RR), prediction was unbiased. However, lower accuracy and bias were observed for scenarios with selection and random mating (SR) or selection and positive assortative mating (SA). As expected, bias disappeared when all individuals were genotyped and used in GBLUP. SSGBLUP showed higher accuracy compared to GBLUP, and bias of prediction was negligible with SR. However, PBLUP and SSGBLUP still showed bias in SA due to high inbreeding. SSGBLUP and PBLUP were unbiased provided that inbreeding was accounted for in the relationship matrices. Selective genotyping based on extreme phenotypic contrasts increased the prediction accuracy, but prediction was biased when using GBLUP. SSGBLUP could correct the biasedness while gaining higher accuracy than GBLUP. In a typical animal breeding program, where it is too expensive to genotype all animals, it would be appropriate to genotype phenotypically contrasting selection candidates and use a Single-Step approach to obtain accurate and unbiased prediction of GEBV.
Collapse
Affiliation(s)
- Gopal R Gowane
- Animal Genetics & Breeding Division, ICAR-Central Sheep & Wool Research Institute, Avikanagar, India
| | - Sang Hong Lee
- Australian Centre for Precision Health, University of South Australia Cancer Research Institute, Adelaide, South Australia, Australia
| | - Sam Clark
- School of Environmental and Rural Sciences, University of New England, Armidale, New South Wales, Australia
| | - Nasir Moghaddar
- School of Environmental and Rural Sciences, University of New England, Armidale, New South Wales, Australia
| | | | - Julius H J van der Werf
- School of Environmental and Rural Sciences, University of New England, Armidale, New South Wales, Australia
| |
Collapse
|
11
|
Mota RR, Vanderick S, Colinet FG, Hammami H, Wiggans GR, Gengler N. Additional considerations to the use of single-step genomic predictions in a dominance setting. J Anim Breed Genet 2019; 136:430-440. [PMID: 31161675 DOI: 10.1111/jbg.12406] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 04/23/2019] [Accepted: 05/03/2019] [Indexed: 11/27/2022]
Abstract
Recent publications indicate that single-step models are suitable to estimate breeding values, dominance deviations and total genetic values with acceptable quality. Additive single-step methods implicitly extend known number of allele information from genotyped to non-genotyped animals. This theory is well derived in an additive setting. It was recently shown, at least empirically, that this basic strategy can be extended to dominance with reasonable prediction quality. Our study addressed two additional issues. It illustrated the theoretical basis for extension and validated genomic predictions to dominance based on single-step genomic best linear unbiased prediction theory. This development was then extended to include inbreeding into dominance relationships, which is a currently not yet solved issue. Different parametrizations of dominance relationship matrices were proposed. Five dominance single-step inverse matrices were tested and described as C1 , C2 , C3 , C4 and C5 . Genotypes were simulated for a real pig population (n = 11,943 animals). In order to avoid any confounding issues with additive effects, pseudo-records including only dominance deviations and residuals were simulated. SNP effects of heterozygous genotypes were summed up to generate true dominance deviations. We added random noise to those values and used them as phenotypes. Accuracy was defined as correlation between true and predicted dominance deviations. We conducted five replicates and estimated accuracies in three sets: between all (S1 ), non-genotyped (S2 ) and inbred non-genotyped (S3 ) animals. Potential bias was assessed by regressing true dominance deviations on predicted values. Matrices accounting for inbreeding (C3 , C4 and C5 ) best fit. Accuracies were on average 0.77, 0.40 and 0.46 in S1 , S2 and S3 , respectively. In addition, C3 , C4 and C5 scenarios have shown better accuracies than C1 and C2 , and dominance deviations were less biased. Better matrix compatibility (accuracy and bias) was observed by re-scaling diagonal elements to 1 minus the inbreeding coefficient (C5 ).
Collapse
Affiliation(s)
- Rodrigo R Mota
- TERRA Teaching and Research Centre, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium
| | - Sylvie Vanderick
- TERRA Teaching and Research Centre, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium
| | - Frédéric G Colinet
- TERRA Teaching and Research Centre, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium
| | - Hedi Hammami
- TERRA Teaching and Research Centre, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium
| | | | - Nicolas Gengler
- TERRA Teaching and Research Centre, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium
| |
Collapse
|
12
|
Pégard M, Rogier O, Bérard A, Faivre-Rampant P, Paslier MCL, Bastien C, Jorge V, Sánchez L. Sequence imputation from low density single nucleotide polymorphism panel in a black poplar breeding population. BMC Genomics 2019; 20:302. [PMID: 30999856 PMCID: PMC6471894 DOI: 10.1186/s12864-019-5660-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 03/29/2019] [Indexed: 12/30/2022] Open
Abstract
Background Genomic selection accuracy increases with the use of high SNP (single nucleotide polymorphism) coverage. However, such gains in coverage come at high costs, preventing their prompt operational implementation by breeders. Low density panels imputed to higher densities offer a cheaper alternative during the first stages of genomic resources development. Our study is the first to explore the imputation in a tree species: black poplar. About 1000 pure-breed Populus nigra trees from a breeding population were selected and genotyped with a 12K custom Infinium Bead-Chip. Forty-three of those individuals corresponding to nodal trees in the pedigree were fully sequenced (reference), while the remaining majority (target) was imputed from 8K to 1.4 million SNPs using FImpute. Each SNP and individual was evaluated for imputation errors by leave-one-out cross validation in the training sample of 43 sequenced trees. Some summary statistics such as Hardy-Weinberg Equilibrium exact test p-value, quality of sequencing, depth of sequencing per site and per individual, minor allele frequency, marker density ratio or SNP information redundancy were calculated. Principal component and Boruta analyses were used on all these parameters to rank the factors affecting the quality of imputation. Additionally, we characterize the impact of the relatedness between reference population and target population. Results During the imputation process, we used 7540 SNPs from the chip to impute 1,438,827 SNPs from sequences. At the individual level, imputation accuracy was high with a proportion of SNPs correctly imputed between 0.84 and 0.99. The variation in accuracies was mostly due to differences in relatedness between individuals. At a SNP level, the imputation quality depended on genotyped SNP density and on the original minor allele frequency. The imputation did not appear to result in an increase of linkage disequilibrium. The genotype densification not only brought a better distribution of markers all along the genome, but also we did not detect any substantial bias in annotation categories. Conclusions This study shows that it is possible to impute low-density marker panels to whole genome sequence with good accuracy under certain conditions that could be common to many breeding populations. Electronic supplementary material The online version of this article (10.1186/s12864-019-5660-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marie Pégard
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France
| | - Odile Rogier
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France
| | - Aurélie Bérard
- Etude du Polymorphisme des Génomes Végétaux (EPGV), INRA, Université Paris-Saclay, 91000, 2 rue Gaston Crémieux, Evry, 9100, France
| | - Patricia Faivre-Rampant
- Etude du Polymorphisme des Génomes Végétaux (EPGV), INRA, Université Paris-Saclay, 91000, 2 rue Gaston Crémieux, Evry, 9100, France
| | - Marie-Christine Le Paslier
- Etude du Polymorphisme des Génomes Végétaux (EPGV), INRA, Université Paris-Saclay, 91000, 2 rue Gaston Crémieux, Evry, 9100, France
| | - Catherine Bastien
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France
| | - Véronique Jorge
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France
| | - Leopoldo Sánchez
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France.
| |
Collapse
|
13
|
Genomic Prediction Using Individual-Level Data and Summary Statistics from Multiple Populations. Genetics 2018; 210:53-69. [PMID: 30021793 DOI: 10.1534/genetics.118.301109] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Accepted: 07/16/2018] [Indexed: 01/27/2023] Open
Abstract
This study presents a method for genomic prediction that uses individual-level data and summary statistics from multiple populations. Genome-wide markers are nowadays widely used to predict complex traits, and genomic prediction using multi-population data are an appealing approach to achieve higher prediction accuracies. However, sharing of individual-level data across populations is not always possible. We present a method that enables integration of summary statistics from separate analyses with the available individual-level data. The data can either consist of individuals with single or multiple (weighted) phenotype records per individual. We developed a method based on a hypothetical joint analysis model and absorption of population-specific information. We show that population-specific information is fully captured by estimated allele substitution effects and the accuracy of those estimates, i.e., the summary statistics. The method gives identical result as the joint analysis of all individual-level data when complete summary statistics are available. We provide a series of easy-to-use approximations that can be used when complete summary statistics are not available or impractical to share. Simulations show that approximations enable integration of different sources of information across a wide range of settings, yielding accurate predictions. The method can be readily extended to multiple-traits. In summary, the developed method enables integration of genome-wide data in the individual-level or summary statistics from multiple populations to obtain more accurate estimates of allele substitution effects and genomic predictions.
Collapse
|
14
|
A method for allocating low-coverage sequencing resources by targeting haplotypes rather than individuals. Genet Sel Evol 2017; 49:78. [PMID: 29070022 PMCID: PMC5655873 DOI: 10.1186/s12711-017-0353-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 10/18/2017] [Indexed: 11/25/2022] Open
Abstract
Background This paper describes a heuristic method for allocating low-coverage sequencing resources by targeting haplotypes rather than individuals. Low-coverage sequencing assembles high-coverage sequence information for every individual by accumulating data from the genome segments that they share with many other individuals into consensus haplotypes. Deriving the consensus haplotypes accurately is critical for achieving a high phasing and imputation accuracy. In order to enable accurate phasing and imputation of sequence information for the whole population, we allocate the available sequencing resources among individuals with existing phased genomic data by targeting the sequencing coverage of their haplotypes. Results Our method, called AlphaSeqOpt, prioritizes haplotypes using a score function that is based on the frequency of the haplotypes in the sequencing set relative to the target coverage. AlphaSeqOpt has two steps: (1) selection of an initial set of individuals by iteratively choosing the individuals that have the maximum score conditional on the current set, and (2) refinement of the set through several rounds of exchanges of individuals. AlphaSeqOpt is very effective for distributing a fixed amount of sequencing resources evenly across haplotypes, which results in a reduction of the proportion of haplotypes that are sequenced below the target coverage. AlphaSeqOpt can provide a greater proportion of haplotypes sequenced at the target coverage by sequencing less individuals, as compared with other methods that use a score function based on haplotype frequencies in the population. A refinement of the initially selected set can provide a larger more diverse set with more unique individuals, which is beneficial in the context of low-coverage sequencing. We extend the method with an approach for filtering rare haplotypes based on their flanking haplotypes, so that only those that are likely to derive from a recombination event are targeted. Conclusions We present a method for allocating sequencing resources so that a greater proportion of haplotypes are sequenced at a coverage that is sufficiently high for population-based imputation with low-coverage sequencing. The haplotype score function, the refinement step, and the new approach for filtering rare haplotypes make AlphaSeqOpt more effective for that purpose than previously reported methods for reducing sequencing redundancy. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0353-y) contains supplementary material, which is available to authorized users.
Collapse
|
15
|
Gonen S, Battagin M, Johnston SE, Gorjanc G, Hickey JM. The potential of shifting recombination hotspots to increase genetic gain in livestock breeding. Genet Sel Evol 2017; 49:55. [PMID: 28676070 PMCID: PMC5496647 DOI: 10.1186/s12711-017-0330-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 06/26/2017] [Indexed: 01/01/2023] Open
Abstract
Background This study uses simulation to explore and quantify the potential effect of shifting recombination hotspots on genetic gain in livestock breeding programs. Methods We simulated three scenarios that differed in the locations of quantitative trait nucleotides (QTN) and recombination hotspots in the genome. In scenario 1, QTN were randomly distributed along the chromosomes and recombination was restricted to occur within specific genomic regions (i.e. recombination hotspots). In the other two scenarios, both QTN and recombination hotspots were located in specific regions, but differed in whether the QTN occurred outside of (scenario 2) or inside (scenario 3) recombination hotspots. We split each chromosome into 250, 500 or 1000 regions per chromosome of which 10% were recombination hotspots and/or contained QTN. The breeding program was run for 21 generations of selection, after which recombination hotspot regions were kept the same or were shifted to adjacent regions for a further 80 generations of selection. We evaluated the effect of shifting recombination hotspots on genetic gain, genetic variance and genic variance. Results Our results show that shifting recombination hotspots reduced the decline of genetic and genic variance by releasing standing allelic variation in the form of new allele combinations. This in turn resulted in larger increases in genetic gain. However, the benefit of shifting recombination hotspots for increased genetic gain was only observed when QTN were initially outside recombination hotspots. If QTN were initially inside recombination hotspots then shifting them decreased genetic gain. Discussion Shifting recombination hotspots to regions of the genome where recombination had not occurred for 21 generations of selection (i.e. recombination deserts) released more of the standing allelic variation available in each generation and thus increased genetic gain. However, whether and how much increase in genetic gain was achieved by shifting recombination hotspots depended on the distribution of QTN in the genome, the number of recombination hotspots and whether QTN were initially inside or outside recombination hotspots. Conclusions Our findings show future scope for targeted modification of recombination hotspots e.g. through changes in zinc-finger motifs of the PRDM9 protein to increase genetic gain in production species. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0330-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Serap Gonen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Mara Battagin
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Susan E Johnston
- Institute of Evolutionary Biology, The University of Edinburgh, Charlotte Auerbach Road, Edinburgh, EH9 3FL, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| |
Collapse
|
16
|
CASELLAS JOAQUIM, CAÑAS-ÁLVAREZ JHONJACOBO, FINA MARTA, PIEDRAFITA JESÚS, CECCHINATO ALESSIO. Fine mapping by composite genome-wide association analysis. Genet Res (Camb) 2017; 99:e4. [PMID: 28583209 PMCID: PMC6865146 DOI: 10.1017/s0016672317000027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Revised: 02/22/2017] [Accepted: 03/07/2017] [Indexed: 11/06/2022] Open
Abstract
Genome-wide association (GWA) studies play a key role in current genetics research, unravelling genomic regions linked to phenotypic traits of interest in multiple species. Nevertheless, the extent of linkage disequilibrium (LD) may provide confounding results when significant genetic markers span along several contiguous cM. In this study, we have adapted the composite interval mapping approach to the GWA framework (composite GWA), in order to evaluate the impact of including competing (possibly linked) genetic markers when testing for the additive allelic effect inherent to a given genetic marker. We tested model performance on simulated data sets under different scenarios (i.e., qualitative trait loci effects, LD between genetic markers and width of the genomic region involved in the analysis). Our results showed that the genomic region had a small impact on the number of competing single nucleotide polymorphisms (SNPs) as well as on the precision of the composite GWA analysis. A similar conclusion was derived from the preferable range of LD between the tested SNP and competing SNPs, although moderate-to-high LD seemed to attenuate the loss of statistical power. The composite GWA improved specificity and reduced the number of significant genetic markers. The composite GWA model contributes a novel point of view for GWA analyses where testing circumscribed to the genomic region flanking each SNP (delimited by the nearest competing SNPs) and conditioning on linked markers increases the precision to locate causal mutations, but possibly at the expense of power.
Collapse
Affiliation(s)
- JOAQUIM CASELLAS
- Grup de Recerca en Millora Genètica Molecular Veterinària, Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - JHON JACOBO CAÑAS-ÁLVAREZ
- Grup de Recerca en Remugants, Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - MARTA FINA
- Grup de Recerca en Remugants, Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - JESÚS PIEDRAFITA
- Grup de Recerca en Remugants, Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - ALESSIO CECCHINATO
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell'Università 16, 35020 Legnaro, Italy
| |
Collapse
|
17
|
Gonen S, Ros-Freixedes R, Battagin M, Gorjanc G, Hickey JM. A method for the allocation of sequencing resources in genotyped livestock populations. Genet Sel Evol 2017; 49:47. [PMID: 28521728 PMCID: PMC5437657 DOI: 10.1186/s12711-017-0322-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Accepted: 05/12/2017] [Indexed: 11/18/2022] Open
Abstract
Background This paper describes a method, called AlphaSeqOpt, for the allocation of sequencing resources in livestock populations with existing phased genomic data to maximise the ability to phase and impute sequenced haplotypes into the whole population. Methods We present two algorithms. The first selects focal individuals that collectively represent the maximum possible portion of the haplotype diversity in the population. The second allocates a fixed sequencing budget among the families of focal individuals to enable phasing of their haplotypes at the sequence level. We tested the performance of the two algorithms in simulated pedigrees. For each pedigree, we evaluated the proportion of population haplotypes that are carried by the focal individuals and compared our results to a variant of the widely-used key ancestors approach and to two haplotype-based approaches. We calculated the expected phasing accuracy of the haplotypes of a focal individual at the sequence level given the proportion of the fixed sequencing budget allocated to its family. Results AlphaSeqOpt maximises the ability to capture and phase the most frequent haplotypes in a population in three ways. First, it selects focal individuals that collectively represent a larger portion of the population haplotype diversity than existing methods. Second, it selects focal individuals from across the pedigree whose haplotypes can be easily phased using family-based phasing and imputation algorithms, thus maximises the ability to impute sequence into the rest of the population. Third, it allocates more of the fixed sequencing budget to focal individuals whose haplotypes are more frequent in the population than to focal individuals whose haplotypes are less frequent. Unlike existing methods, we additionally present an algorithm to allocate part of the sequencing budget to the families (i.e. immediate ancestors) of focal individuals to ensure that their haplotypes can be phased at the sequence level, which is essential for enabling and maximising subsequent sequence imputation. Conclusions We present a new method for the allocation of a fixed sequencing budget to focal individuals and their families such that the final sequenced haplotypes, when phased at the sequence level, represent the maximum possible portion of the haplotype diversity in the population that can be sequenced and phased at that budget. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0322-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Serap Gonen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Roger Ros-Freixedes
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Mara Battagin
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| |
Collapse
|
18
|
Garcia-Baccino CA, Legarra A, Christensen OF, Misztal I, Pocrnic I, Vitezica ZG, Cantet RJC. Metafounders are related to F st fixation indices and reduce bias in single-step genomic evaluations. Genet Sel Evol 2017; 49:34. [PMID: 28283016 PMCID: PMC5439149 DOI: 10.1186/s12711-017-0309-2] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Accepted: 03/03/2017] [Indexed: 01/03/2023] Open
Abstract
Background Metafounders are pseudo-individuals that encapsulate genetic heterozygosity and relationships within and across base pedigree populations, i.e. ancestral populations. This work addresses the estimation and usefulness of metafounder relationships in single-step genomic best linear unbiased prediction (ssGBLUP). Results We show that ancestral relationship parameters are proportional to standardized covariances of base allelic frequencies across populations, such as \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_{\text{st}}$$\end{document}Fst fixation indexes. These covariances of base allelic frequencies can be estimated from marker genotypes of related recent individuals and pedigree. Simple methods for their estimation include naïve computation of allele frequencies from marker genotypes or a method of moments that equates average pedigree-based and marker-based relationships. Complex methods include generalized least squares (best linear unbiased estimator (BLUE)) or maximum likelihood based on pedigree relationships. To our knowledge, methods to infer \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_{\text{st}}$$\end{document}Fst coefficients from marker data have not been developed for related individuals. We derived a genomic relationship matrix, compatible with pedigree relationships, that is constructed as a cross-product of {−1,0,1} codes and that is equivalent (apart from scale factors) to an identity-by-state relationship matrix at genome-wide markers. Using a simulation with a single population under selection in which only males and youngest animals are genotyped, we observed that generalized least squares or maximum likelihood gave accurate and unbiased estimates of the ancestral relationship parameter (true value: 0.40) whereas the naïve method and the method of moments were biased (average estimates of 0.43 and 0.35). We also observed that genomic evaluation by ssGBLUP using metafounders was less biased in terms of estimates of genetic trend (bias of 0.01 instead of 0.12), resulted in less overdispersed (0.94 instead of 0.99) and as accurate (0.74) estimates of breeding values than ssGBLUP without metafounders and provided consistent estimates of heritability. Conclusions Estimation of metafounder relationships can be achieved using BLUP-like methods with pedigree and markers. Inclusion of metafounder relationships reduces bias of genomic predictions with no loss in accuracy.
Collapse
Affiliation(s)
- Carolina A Garcia-Baccino
- Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, C1417DSE, Buenos Aires, Argentina.,Instituto de Investigaciones en Producción Animal - Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Andres Legarra
- GenPhySE, INRA, INPT, ENVT, Université de Toulouse, 31326, Castanet-Tolosan, France.
| | - Ole F Christensen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830, Tjele, Denmark
| | - Ignacy Misztal
- Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| | - Ivan Pocrnic
- Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| | - Zulma G Vitezica
- GenPhySE, INRA, INPT, ENVT, Université de Toulouse, 31326, Castanet-Tolosan, France
| | - Rodolfo J C Cantet
- Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, C1417DSE, Buenos Aires, Argentina.,Instituto de Investigaciones en Producción Animal - Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| |
Collapse
|
19
|
Antolín R, Nettelblad C, Gorjanc G, Money D, Hickey JM. A hybrid method for the imputation of genomic data in livestock populations. Genet Sel Evol 2017; 49:30. [PMID: 28253858 PMCID: PMC5439152 DOI: 10.1186/s12711-017-0300-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Accepted: 02/13/2017] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND This paper describes a combined heuristic and hidden Markov model (HMM) method to accurately impute missing genotypes in livestock datasets. Genomic selection in breeding programs requires high-density genotyping of many individuals, making algorithms that economically generate this information crucial. There are two common classes of imputation methods, heuristic methods and probabilistic methods, the latter being largely based on hidden Markov models. Heuristic methods are robust, but fail to impute markers in regions where the thresholds of heuristic rules are not met, or the pedigree is inconsistent. Hidden Markov models are probabilistic methods which typically do not require specific family structures or pedigree information, making them very flexible, but they are computationally expensive and, in some cases, less accurate. RESULTS We implemented a new hybrid imputation method that combined heuristic and HMM methods, AlphaImpute and MaCH, and compared the computation time and imputation accuracy of the three methods. AlphaImpute was the fastest, followed by the hybrid method and then the HMM. The computation time of the hybrid method and the HMM increased linearly with the number of iterations used in the hidden Markov model, however, the computation time of the hybrid method increased almost linearly and that of the HMM quadratically with the number of template haplotypes. The hybrid method was the most accurate imputation method for low-density panels when pedigree information was missing, especially if minor allele frequency was also low. The accuracy of the hybrid method and the HMM increased with the number of template haplotypes. The imputation accuracy of all three methods increased with the marker density of the low-density panels. Excluding the pedigree information reduced imputation accuracy for the hybrid method and AlphaImpute. Finally, the imputation accuracy of the three methods decreased with decreasing minor allele frequency. CONCLUSIONS The hybrid heuristic and probabilistic imputation method is able to impute all markers for all individuals in a population, as the HMM. The hybrid method is usually more accurate and never significantly less accurate than a purely heuristic method or a purely probabilistic method and is faster than a standard probabilistic method.
Collapse
Affiliation(s)
- Roberto Antolín
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Research Centre, Midlothian, EH25 9RG Scotland, UK
| | - Carl Nettelblad
- Division of Scientific Computing, Department of Information Technology, Science for Life Laboratory, Uppsala University, Lägerhyddsvägen 2, Box 337, 751 05 Uppsala, Sweden
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Research Centre, Midlothian, EH25 9RG Scotland, UK
| | - Daniel Money
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Research Centre, Midlothian, EH25 9RG Scotland, UK
| | - John M. Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Research Centre, Midlothian, EH25 9RG Scotland, UK
| |
Collapse
|
20
|
Gonen S, Jenko J, Gorjanc G, Mileham AJ, Whitelaw CBA, Hickey JM. Potential of gene drives with genome editing to increase genetic gain in livestock breeding programs. Genet Sel Evol 2017; 49:3. [PMID: 28093068 PMCID: PMC5240390 DOI: 10.1186/s12711-016-0280-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Accepted: 12/14/2016] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND This paper uses simulation to explore how gene drives can increase genetic gain in livestock breeding programs. Gene drives are naturally occurring phenomena that cause a mutation on one chromosome to copy itself onto its homologous chromosome. METHODS We simulated nine different breeding and editing scenarios with a common overall structure. Each scenario began with 21 generations of selection, followed by 20 generations of selection based on true breeding values where the breeder used selection alone, selection in combination with genome editing, or selection with genome editing and gene drives. In the scenarios that used gene drives, we varied the probability of successfully incorporating the gene drive. For each scenario, we evaluated genetic gain, genetic variance [Formula: see text], rate of change in inbreeding ([Formula: see text]), number of distinct quantitative trait nucleotides (QTN) edited, rate of increase in favourable allele frequencies of edited QTN and the time to fix favourable alleles. RESULTS Gene drives enhanced the benefits of genome editing in seven ways: (1) they amplified the increase in genetic gain brought about by genome editing; (2) they amplified the rate of increase in the frequency of favourable alleles and reduced the time it took to fix them; (3) they enabled more rapid targeting of QTN with lesser effect for genome editing; (4) they distributed fixed editing resources across a larger number of distinct QTN across generations; (5) they focussed editing on a smaller number of QTN within a given generation; (6) they reduced the level of inbreeding when editing a subset of the sires; and (7) they increased the efficiency of converting genetic variation into genetic gain. CONCLUSIONS Genome editing in livestock breeding results in short-, medium- and long-term increases in genetic gain. The increase in genetic gain occurs because editing increases the frequency of favourable alleles in the population. Gene drives accelerate the increase in allele frequency caused by editing, which results in even higher genetic gain over a shorter period of time with no impact on inbreeding.
Collapse
Affiliation(s)
- Serap Gonen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Janez Jenko
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | | | - C. Bruce A. Whitelaw
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - John M. Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| |
Collapse
|
21
|
Faux AM, Gorjanc G, Gaynor RC, Battagin M, Edwards SM, Wilson DL, Hearne SJ, Gonen S, Hickey JM. AlphaSim: Software for Breeding Program Simulation. THE PLANT GENOME 2016; 9. [PMID: 27902803 DOI: 10.3835/plantgenome2016.02.0013] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
This paper describes AlphaSim, a software package for simulating plant and animal breeding programs. AlphaSim enables the simulation of multiple aspects of breeding programs with a high degree of flexibility. AlphaSim simulates breeding programs in a series of steps: (i) simulate haplotype sequences and pedigree; (ii) drop haplotypes into the base generation of the pedigree and select single-nucleotide polymorphism (SNP) and quantitative trait nucleotide (QTN); (iii) assign QTN effects, calculate genetic values, and simulate phenotypes; (iv) drop haplotypes into the burn-in generations; and (v) perform selection and simulate new generations. The program is flexible in terms of historical population structure and diversity, recent pedigree structure, trait architecture, and selection strategy. It integrates biotechnologies such as doubled-haploids (DHs) and gene editing and allows the user to simulate multiple traits and multiple environments, specify recombination hot spots and cold spots, specify gene jungles and deserts, perform genomic predictions, and apply optimal contribution selection. AlphaSim also includes restart functionalities, which increase its flexibility by allowing the simulation process to be paused so that the parameters can be changed or to import an externally created pedigree, trial design, or results of an analysis of previously simulated data. By combining the options, a user can simulate simple or complex breeding programs with several generations, variable population structures and variable breeding decisions over time. In conclusion, AlphaSim is a flexible and computationally efficient software package to simulate biotechnology enhanced breeding programs with the aim of performing rapid, low-cost, and objective in silico comparison of breeding technologies.
Collapse
|
22
|
Battagin M, Gorjanc G, Faux AM, Johnston SE, Hickey JM. Effect of manipulating recombination rates on response to selection in livestock breeding programs. Genet Sel Evol 2016; 48:44. [PMID: 27335010 PMCID: PMC4917950 DOI: 10.1186/s12711-016-0221-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Accepted: 06/07/2016] [Indexed: 11/10/2022] Open
Abstract
Background In this work, we performed simulations to explore the potential of manipulating recombination rates to increase response to selection in livestock breeding programs. Methods We carried out ten replicates of several scenarios that followed a common overall structure but differed in the average rate of recombination along the genome (expressed as the length of a chromosome in Morgan), the genetic architecture of the trait under selection, and the selection intensity under truncation selection (expressed as the proportion of males selected). Recombination rates were defined by simulating nine different chromosome lengths: 0.10, 0.25, 0.50, 1, 2, 5, 10, 15 and 20 Morgan, respectively. One Morgan was considered to be the typical chromosome length for current livestock species. The genetic architecture was defined by the number of quantitative trait variants (QTV) that affected the trait under selection. Either a large (10,000) or a small (1000 or 500) number of QTV was simulated. Finally, the proportions of males selected under truncation selection as sires for the next generation were equal to 1.2, 2.4, 5, or 10 %. Results Increasing recombination rate increased the overall response to selection and decreased the loss of genetic variance. The difference in cumulative response between low and high recombination rates increased over generations. At low recombination rates, cumulative response to selection tended to asymptote sooner and the genetic variance was completely eroded. If the trait under selection was affected by few QTV, differences between low and high recombination rates still existed, but the selection limit was reached at all rates of recombination. Conclusions Higher recombination rates can enhance the efficiency of breeding programs to turn genetic variation into response to selection. However, to increase response to selection significantly, the recombination rate would need to be increased 10- or 20-fold. The biological feasibility and consequences of such large increases in recombination rates are unknown. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0221-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mara Battagin
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Anne-Michelle Faux
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Susan E Johnston
- Institute of Evolutionary Biology, University of Edinburgh, Charlotte Auerbach Road, Edinburgh, EH9 3FL, UK
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| |
Collapse
|
23
|
de Almeida Filho JE, Guimarães JFR, E Silva FF, de Resende MDV, Muñoz P, Kirst M, Resende MFR. The contribution of dominance to phenotype prediction in a pine breeding and simulated population. Heredity (Edinb) 2016; 117:33-41. [PMID: 27118156 PMCID: PMC4901355 DOI: 10.1038/hdy.2016.23] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 12/07/2015] [Accepted: 03/04/2016] [Indexed: 02/01/2023] Open
Abstract
Pedigrees and dense marker panels have been used to predict the genetic merit of individuals in plant and animal breeding, accounting primarily for the contribution of additive effects. However, nonadditive effects may also affect trait variation in many breeding systems, particularly when specific combining ability is explored. Here we used models with different priors, and including additive-only and additive plus dominance effects, to predict polygenic (height) and oligogenic (fusiform rust resistance) traits in a structured breeding population of loblolly pine (Pinus taeda L.). Models were largely similar in predictive ability, and the inclusion of dominance only improved modestly the predictions for tree height. Next, we simulated a genetically similar population to assess the ability of predicting polygenic and oligogenic traits controlled by different levels of dominance. The simulation showed an overall decrease in the accuracy of total genomic predictions as dominance increases, regardless of the method used for prediction. Thus, dominance effects may not be accounted for as effectively in prediction models compared with traits controlled by additive alleles only. When the ratio of dominance to total phenotypic variance reached 0.2, the additive-dominance prediction models were significantly better than the additive-only models. However, in the prediction of the subsequent progeny population, this accuracy increase was only observed for the oligogenic trait.
Collapse
Affiliation(s)
- J E de Almeida Filho
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL, USA.,Graduate Program in Genetics and Improvement, Federal University of Viçosa, Avenida PH Rolfs S/N, Viçosa, Brazil.,Department of Zootecnia, Federal University of Viçosa, Avenida PH Rolfs S/N, Viçosa, Brazil
| | - J F R Guimarães
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL, USA.,Graduate Program in Genetics and Improvement, Federal University of Viçosa, Avenida PH Rolfs S/N, Viçosa, Brazil.,Department of Zootecnia, Federal University of Viçosa, Avenida PH Rolfs S/N, Viçosa, Brazil
| | - F F E Silva
- Department of Zootecnia, Federal University of Viçosa, Avenida PH Rolfs S/N, Viçosa, Brazil
| | - M D V de Resende
- EMBRAPA Florestas/Department of Statistics, Federal University of Viçosa, Avenida PH Rolfs S/N, Viçosa, Brazil
| | - P Muñoz
- Agronomy Department, University of Florida, Gainesville, FL, USA
| | - M Kirst
- University of Florida Genetics Institute, University of Florida, Gainesville, FL, USA.,School of Forest Resources and Conservation, University of Florida, Gainesville, FL, USA
| | | |
Collapse
|
24
|
Pérez-Enciso M, Legarra A. A combined coalescence gene-dropping tool for evaluating genomic selection in complex scenarios (ms2gs). J Anim Breed Genet 2016; 133:85-91. [PMID: 26995218 DOI: 10.1111/jbg.12200] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2015] [Accepted: 12/07/2015] [Indexed: 11/28/2022]
Abstract
We present ms2gs, a combined coalescence - gene dropping (i.e. backward-forward) simulator for complex traits. It therefore aims at combining the advantages of both approaches. It is primarily conceived for very short term, recent scenarios such as those that are of interest in animal and plant breeding. It is very flexible in terms of defining QTL architecture and SNP ascertainment bias, and it allows for easy modelling of alternative markers such as RADs. It can use real sequence or chip data or generate molecular polymorphisms via the coalescence. It can generate QTL conditional on extant molecular information, such as low-density genotyping. It models (simplistically) sequence, imputation or genotyping errors. It requires as input both genotypic data in plink or ms formats, and a pedigree that is used to perform the gene dropping. By default, it compares accuracy for BLUP, SNP ascertained data, sequence, and causal SNPs. It employs VanRaden's linear (GBLUP) and nonlinear method for incorporating molecular information. To illustrate the program, we present a small application in a half-sib population and a multiparental (MAGIC) cross. The program, manual and examples are available at https://github.com/mperezenciso/ms2gs.
Collapse
Affiliation(s)
- M Pérez-Enciso
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB Consortium, Bellaterra, Spain.,Institut Català de Recerca i Estudis Avançats (ICREA), Carrer de Lluís Companys, Barcelona, Spain
| | - A Legarra
- UMR 1388 GENPHYSE, Génétique, Physiologie et Systèmes d'Elevage, INRA, Castanet-Tolosan, France
| |
Collapse
|
25
|
Gorjanc G, Jenko J, Hearne SJ, Hickey JM. Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace populations. BMC Genomics 2016; 17:30. [PMID: 26732811 PMCID: PMC4702314 DOI: 10.1186/s12864-015-2345-z] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Accepted: 12/21/2015] [Indexed: 11/23/2022] Open
Abstract
Background The limited genetic diversity of elite maize germplasms raises concerns about the potential to breed for new challenges. Initiatives have been formed over the years to identify and utilize useful diversity from landraces to overcome this issue. The aim of this study was to evaluate the proposed designs to initiate a pre-breeding program within the Seeds of Discovery (SeeD) initiative with emphasis on harnessing polygenic variation from landraces using genomic selection. We evaluated these designs with stochastic simulation to provide decision support about the effect of several design factors on the quality of resulting (pre-bridging) germplasm. The evaluated design factors were: i) the approach to initiate a pre-breeding program from the selected landraces, doubled haploids of the selected landraces, or testcrosses of the elite hybrid and selected landraces, ii) the genetic parameters of landraces and phenotypes, and iii) logistical factors related to the size and management of a pre-breeding program. Results The results suggest a pre-breeding program should be initiated directly from landraces. Initiating from testcrosses leads to a rapid reconstruction of the elite donor genome during further improvement of the pre-bridging germplasm. The analysis of accuracy of genomic predictions across the various design factors indicate the power of genomic selection for pre-breeding programs with large genetic diversity and constrained resources for data recording. The joint effect of design factors was summarized with decision trees with easy to follow guidelines to optimize pre-breeding efforts of SeeD and similar initiatives. Conclusions Results of this study provide guidelines for SeeD and similar initiatives on how to initiate pre-breeding programs that aim to harness polygenic variation from landraces. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2345-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gregor Gorjanc
- Biotechnical Faculty, University of Ljubljana, 1000, Ljubljana, Slovenia. .,The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| | - Janez Jenko
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK. .,Agricultural Institute of Slovenia, 1000, Ljubljana, Slovenia.
| | - Sarah J Hearne
- Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Apdo, 06600, México, D.F., México.
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| |
Collapse
|
26
|
Casellas J, Piedrafita J. Accuracy and expected genetic gain under genetic or genomic evaluation in sheep flocks with different amounts of pedigree, genomic and phenotypic data. Livest Sci 2015. [DOI: 10.1016/j.livsci.2015.10.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
27
|
Gorjanc G, Bijma P, Hickey JM. Reliability of pedigree-based and genomic evaluations in selected populations. Genet Sel Evol 2015; 47:65. [PMID: 26271246 PMCID: PMC4536753 DOI: 10.1186/s12711-015-0145-1] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2014] [Accepted: 07/29/2015] [Indexed: 11/14/2022] Open
Abstract
Background Reliability is an important parameter in breeding. It measures the precision of estimated breeding values (EBV) and, thus, potential response to selection on those EBV. The precision of EBV is commonly measured by relating the prediction error variance (PEV) of EBV to the base population additive genetic variance (base PEV reliability), while the potential for response to selection is commonly measured by the squared correlation between the EBV and breeding values (BV) on selection candidates (reliability of selection). While these two measures are equivalent for unselected populations, they are not equivalent for selected populations. The aim of this study was to quantify the effect of selection on these two measures of reliability and to show how this affects comparison of breeding programs using pedigree-based or genomic evaluations. Methods Two scenarios with random and best linear unbiased prediction (BLUP) selection were simulated, where the EBV of selection candidates were estimated using only pedigree, pedigree and phenotype, genome-wide marker genotypes and phenotype, or only genome-wide marker genotypes. The base PEV reliabilities of these EBV were compared to the corresponding reliabilities of selection. Realized genetic selection intensity was evaluated to quantify the potential of selection on the different types of EBV and, thus, to validate differences in reliabilities. Finally, the contribution of different underlying processes to changes in additive genetic variance and reliabilities was quantified. Results The simulations showed that, for selected populations, the base PEV reliability substantially overestimates the reliability of selection of EBV that are mainly based on old information from the parental generation, as is the case with pedigree-based prediction. Selection on such EBV gave very low realized genetic selection intensities, confirming the overestimation and importance of genotyping both male and female selection candidates. The two measures of reliability matched when the reductions in additive genetic variance due to the Bulmer effect, selection, and inbreeding were taken into account. Conclusions For populations under selection, EBV based on genome-wide information are more valuable than suggested by the comparison of the base PEV reliabilities between the different types of EBV. This implies that genome-wide marker information is undervalued for selected populations and that genotyping un-phenotyped female selection candidates should be reconsidered. Electronic supplementary material The online version of this article (doi:10.1186/s12711-015-0145-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| | - Piter Bijma
- Wageningen University, Animal Breeding and Genomics Centre, Wageningen, The Netherlands.
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| |
Collapse
|
28
|
Jenko J, Gorjanc G, Cleveland MA, Varshney RK, Whitelaw CBA, Woolliams JA, Hickey JM. Potential of promotion of alleles by genome editing to improve quantitative traits in livestock breeding programs. Genet Sel Evol 2015; 47:55. [PMID: 26133579 PMCID: PMC4487592 DOI: 10.1186/s12711-015-0135-3] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Accepted: 06/15/2015] [Indexed: 12/29/2022] Open
Abstract
Background Genome editing (GE) is a method that enables specific nucleotides in the genome of an individual to be changed. To date, use of GE in livestock has focussed on simple traits that are controlled by a few quantitative trait nucleotides (QTN) with large effects. The aim of this study was to evaluate the potential of GE to improve quantitative traits that are controlled by many QTN, referred to here as promotion of alleles by genome editing (PAGE). Methods Multiple scenarios were simulated to test alternative PAGE strategies for a quantitative trait. They differed in (i) the number of edits per sire (0 to 100), (ii) the number of edits per generation (0 to 500), and (iii) the extent of use of PAGE (i.e. editing all sires or only a proportion of them). The base line scenario involved selecting individuals on true breeding values (i.e., genomic selection only (GS only)-genomic selection with perfect accuracy) for several generations. Alternative scenarios complemented this base line scenario with PAGE (GS + PAGE). The effect of different PAGE strategies was quantified by comparing response to selection, changes in allele frequencies, the number of distinct QTN edited, the sum of absolute effects of the edited QTN per generation, and inbreeding. Results Response to selection after 20 generations was between 1.08 and 4.12 times higher with GS + PAGE than with GS only. Increases in response to selection were larger with more edits per sire and more sires edited. When the total resources for PAGE were limited, editing a few sires for many QTN resulted in greater response to selection and inbreeding compared to editing many sires for a few QTN. Between the scenarios GS only and GS + PAGE, there was little difference in the average change in QTN allele frequencies, but there was a major difference for the QTN with the largest effects. The sum of the effects of the edited QTN decreased across generations. Conclusions This study showed that PAGE has great potential for application in livestock breeding programs, but inbreeding needs to be managed. Electronic supplementary material The online version of this article (doi:10.1186/s12711-015-0135-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Janez Jenko
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| | - Matthew A Cleveland
- , Genus plc.,100 Bluegrass Commons Blvd., Suite 2200, Hendersonville, TN, 37075, USA.
| | - Rajeev K Varshney
- International Crop Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India.
| | - C Bruce A Whitelaw
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| | - John A Woolliams
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| |
Collapse
|
29
|
Gorjanc G, Cleveland MA, Houston RD, Hickey JM. Potential of genotyping-by-sequencing for genomic selection in livestock populations. Genet Sel Evol 2015; 47:12. [PMID: 25887531 PMCID: PMC4344748 DOI: 10.1186/s12711-015-0102-z] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2014] [Accepted: 01/29/2015] [Indexed: 12/12/2022] Open
Abstract
Background Next-generation sequencing techniques, such as genotyping-by-sequencing (GBS), provide alternatives to single nucleotide polymorphism (SNP) arrays. The aim of this work was to evaluate the potential of GBS compared to SNP array genotyping for genomic selection in livestock populations. Methods The value of GBS was quantified by simulation analyses in which three parameters were varied: (i) genome-wide sequence read depth (x) per individual from 0.01x to 20x or using SNP array genotyping; (ii) number of genotyped markers from 3000 to 300 000; and (iii) size of training and prediction sets from 500 to 50 000 individuals. The latter was achieved by distributing the total available x of 1000x, 5000x, or 10 000x per genotyped locus among the varying number of individuals. With SNP arrays, genotypes were called from sequence data directly. With GBS, genotypes were called from sequence reads that varied between loci and individuals according to a Poisson distribution with mean equal to x. Simulated data were analyzed with ridge regression and the accuracy and bias of genomic predictions and response to selection were quantified under the different scenarios. Results Accuracies of genomic predictions using GBS data or SNP array data were comparable when large numbers of markers were used and x per individual was ~1x or higher. The bias of genomic predictions was very high at a very low x. When the total available x was distributed among the training individuals, the accuracy of prediction was maximized when a large number of individuals was used that had GBS data with low x for a large number of markers. Similarly, response to selection was maximized under the same conditions due to increasing both accuracy and selection intensity. Conclusions GBS offers great potential for developing genomic selection in livestock populations because it makes it possible to cover large fractions of the genome and to vary the sequence read depth per individual. Thus, the accuracy of predictions is improved by increasing the size of training populations and the intensity of selection is increased by genotyping a larger number of selection candidates. Electronic supplementary material The online version of this article (doi:10.1186/s12711-015-0102-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| | - Matthew A Cleveland
- Genus Plc, 100 Bluegrass Commons Blvd., Suite 2200, Hendersonville, TN, 37075, USA.
| | - Ross D Houston
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
| |
Collapse
|
30
|
Zhang Z, Erbe M, He J, Ober U, Gao N, Zhang H, Simianer H, Li J. Accuracy of whole-genome prediction using a genetic architecture-enhanced variance-covariance matrix. G3 (BETHESDA, MD.) 2015; 5:615-27. [PMID: 25670771 PMCID: PMC4390577 DOI: 10.1534/g3.114.016261] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Accepted: 02/05/2015] [Indexed: 01/22/2023]
Abstract
Obtaining accurate predictions of unobserved genetic or phenotypic values for complex traits in animal, plant, and human populations is possible through whole-genome prediction (WGP), a combined analysis of genotypic and phenotypic data. Because the underlying genetic architecture of the trait of interest is an important factor affecting model selection, we propose a new strategy, termed BLUP|GA (BLUP-given genetic architecture), which can use genetic architecture information within the dataset at hand rather than from public sources. This is achieved by using a trait-specific covariance matrix ( T: ), which is a weighted sum of a genetic architecture part ( S: matrix) and the realized relationship matrix ( G: ). The algorithm of BLUP|GA (BLUP-given genetic architecture) is provided and illustrated with real and simulated datasets. Predictive ability of BLUP|GA was validated with three model traits in a dairy cattle dataset and 11 traits in three public datasets with a variety of genetic architectures and compared with GBLUP and other approaches. Results show that BLUP|GA outperformed GBLUP in 20 of 21 scenarios in the dairy cattle dataset and outperformed GBLUP, BayesA, and BayesB in 12 of 13 traits in the analyzed public datasets. Further analyses showed that the difference of accuracies for BLUP|GA and GBLUP significantly correlate with the distance between the T: and G: matrices. The new strategy applied in BLUP|GA is a favorable and flexible alternative to the standard GBLUP model, allowing to account for the genetic architecture of the quantitative trait under consideration when necessary. This feature is mainly due to the increased similarity between the trait-specific relationship matrix ( T: matrix) and the genetic relationship matrix at unobserved causal loci. Applying BLUP|GA in WGP would ease the burden of model selection.
Collapse
Affiliation(s)
- Zhe Zhang
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China Department of Animal Sciences, Animal Breeding and Genetics Group, Georg-August-Universität Göttingen, Göttingen 37075, Germany
| | - Malena Erbe
- Department of Animal Sciences, Animal Breeding and Genetics Group, Georg-August-Universität Göttingen, Göttingen 37075, Germany
| | - Jinlong He
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Ulrike Ober
- Department of Animal Sciences, Animal Breeding and Genetics Group, Georg-August-Universität Göttingen, Göttingen 37075, Germany
| | - Ning Gao
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Hao Zhang
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Henner Simianer
- Department of Animal Sciences, Animal Breeding and Genetics Group, Georg-August-Universität Göttingen, Göttingen 37075, Germany
| | - Jiaqi Li
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| |
Collapse
|
31
|
Onogi A, Ideta O, Inoshita Y, Ebana K, Yoshioka T, Yamasaki M, Iwata H. Exploring the areas of applicability of whole-genome prediction methods for Asian rice (Oryza sativa L.). TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2015; 128:41-53. [PMID: 25341369 DOI: 10.1007/s00122-014-2411-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 10/03/2014] [Indexed: 05/25/2023]
Abstract
Our simulation results clarify the areas of applicability of nine prediction methods and suggest the factors that affect their accuracy at predicting empirical traits. Whole-genome prediction is used to predict genetic value from genome-wide markers. The choice of method is important for successful prediction. We compared nine methods using empirical data for eight phenological and morphological traits of Asian rice cultivars (Oryza sativa L.) and data simulated from real marker genotype data. The methods were genomic BLUP (GBLUP), reproducing kernel Hilbert spaces regression (RKHS), Lasso, elastic net, random forest (RForest), Bayesian lasso (Blasso), extended Bayesian lasso (EBlasso), weighted Bayesian shrinkage regression (wBSR), and the average of all methods (Ave). The objectives were to evaluate the predictive ability of these methods in a cultivar population, to characterize them by exploring the area of applicability of each method using simulation, and to investigate the causes of their different accuracies for empirical traits. GBLUP was the most accurate for one trait, RKHS and Ave for two, and RForest for three traits. In the simulation, Blasso, EBlasso, and Ave showed stable performance across the simulated scenarios, whereas the other methods, except wBSR, had specific areas of applicability; wBSR performed poorly in most scenarios. For each method, the accuracy ranking for the empirical traits was largely consistent with that in one of the simulated scenarios, suggesting that the simulation conditions reflected the factors that affected the method accuracy for the empirical results. This study will be useful for genomic prediction not only in Asian rice, but also in populations from other crops with relatively small training sets and strong linkage disequilibrium structures.
Collapse
Affiliation(s)
- Akio Onogi
- Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ku, Tokyo, 113-8657, Japan
| | | | | | | | | | | | | |
Collapse
|
32
|
Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications. Animal 2014; 8:1743-53. [PMID: 25045914 DOI: 10.1017/s1751731114001803] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In livestock, many studies have reported the results of imputation to 50k single nucleotide polymorphism (SNP) genotypes for animals that are genotyped with low-density SNP panels. The objective of this paper is to review different measures of correctness of imputation, and to evaluate their utility depending on the purpose of the imputed genotypes. Across studies, imputation accuracy, computed as the correlation between true and imputed genotypes, and imputation error rates, that counts the number of incorrectly imputed alleles, are commonly used measures of imputation correctness. Based on the nature of both measures and results reported in the literature, imputation accuracy appears to be a more useful measure of the correctness of imputation than imputation error rates, because imputation accuracy does not depend on minor allele frequency (MAF), whereas imputation error rate depends on MAF. Therefore imputation accuracy can be better compared across loci with different MAF. Imputation accuracy depends on the ability of identifying the correct haplotype of a SNP, but many other factors have been identified as well, including the number of genotyped immediate ancestors, the number of animals with genotypes at the high-density panel, the SNP density on the low- and high-density panel, the MAF of the imputed SNP and whether imputed SNP are located at the end of a chromosome or not. Some of these factors directly contribute to the linkage disequilibrium between imputed SNP and SNP on the low-density panel. When imputation accuracy is assessed as a predictor for the accuracy of subsequent genomic prediction, we recommend that: (1) individual-specific imputation accuracies should be used that are computed after centring and scaling both true and imputed genotypes; and (2) imputation of gene dosage is preferred over imputation of the most likely genotype, as this increases accuracy and reduces bias of the imputed genotypes and the subsequent genomic predictions.
Collapse
|
33
|
Hickey JM, Gorjanc G, Hearne S, Huang BE. AlphaMPSim: flexible simulation of multi-parent crosses. Bioinformatics 2014; 30:2686-8. [DOI: 10.1093/bioinformatics/btu206] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
34
|
Abstract
In genomic prediction, common analysis methods rely on a linear mixed-model framework to estimate SNP marker effects and breeding values of animals or plants. Ridge regression-best linear unbiased prediction (RR-BLUP) is based on the assumptions that SNP marker effects are normally distributed, are uncorrelated, and have equal variances. We propose DAIRRy-BLUP, a parallel, Distributed-memory RR-BLUP implementation, based on single-trait observations ( Y: ), that uses the Average Information algorithm for restricted maximum-likelihood estimation of the variance components. The goal of DAIRRy-BLUP is to enable the analysis of large-scale data sets to provide more accurate estimates of marker effects and breeding values. A distributed-memory framework is required since the dimensionality of the problem, determined by the number of SNP markers, can become too large to be analyzed by a single computing node. Initial results show that DAIRRy-BLUP enables the analysis of very large-scale data sets (up to 1,000,000 individuals and 360,000 SNPs) and indicate that increasing the number of phenotypic and genotypic records has a more significant effect on the prediction accuracy than increasing the density of SNP arrays.
Collapse
|
35
|
Bouwman AC, Hickey JM, Calus MPL, Veerkamp RF. Imputation of non-genotyped individuals based on genotyped relatives: assessing the imputation accuracy of a real case scenario in dairy cattle. Genet Sel Evol 2014; 46:6. [PMID: 24490796 PMCID: PMC3929150 DOI: 10.1186/1297-9686-46-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2013] [Accepted: 01/07/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Imputation of genotypes for ungenotyped individuals could enable the use of valuable phenotypes created before the genomic era in analyses that require genotypes. The objective of this study was to investigate the accuracy of imputation of non-genotyped individuals using genotype information from relatives. METHODS Genotypes were simulated for all individuals in the pedigree of a real (historical) dataset of phenotyped dairy cows and with part of the pedigree genotyped. The software AlphaImpute was used for imputation in its standard settings but also without phasing, i.e. using basic inheritance rules and segregation analysis only. Different scenarios were evaluated i.e.: (1) the real data scenario, (2) addition of genotypes of sires and maternal grandsires of the ungenotyped individuals, and (3) addition of one, two, or four genotyped offspring of the ungenotyped individuals to the reference population. RESULTS The imputation accuracy using AlphaImpute in its standard settings was lower than without phasing. Including genotypes of sires and maternal grandsires in the reference population improved imputation accuracy, i.e. the correlation of the true genotypes with the imputed genotype dosages, corrected for mean gene content, across all animals increased from 0.47 (real situation) to 0.60. Including one, two and four genotyped offspring increased the accuracy of imputation across all animals from 0.57 (no offspring) to 0.73, 0.82, and 0.92, respectively. CONCLUSIONS At present, the use of basic inheritance rules and segregation analysis appears to be the best imputation method for ungenotyped individuals. Comparison of our empirical animal-specific imputation accuracies to predictions based on selection index theory suggested that not correcting for mean gene content considerably overestimates the true accuracy. Imputation of ungenotyped individuals can help to include valuable phenotypes for genome-wide association studies or for genomic prediction, especially when the ungenotyped individuals have genotyped offspring.
Collapse
Affiliation(s)
- Aniek C Bouwman
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P,O, Box 135, Wageningen 6700, AC, Netherlands.
| | | | | | | |
Collapse
|
36
|
Lourenco DAL, Misztal I, Wang H, Aguilar I, Tsuruta S, Bertrand JK. Prediction accuracy for a simulated maternally affected trait of beef cattle using different genomic evaluation models. J Anim Sci 2013; 91:4090-8. [PMID: 23893997 DOI: 10.2527/jas.2012-5826] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Different methods for genomic evaluation were compared for accuracy and feasibility of evaluation using phenotypic, pedigree, and genomic information for a trait influenced by a maternal effect. A simulated population was constructed that included 15,800 animals in 5 generations. Genotypes from 45,000 SNP were available for 1,500 animals in the last 3 generations. Genotyped animals in the last generation had no phenotypes. Weaning weight data were simulated using an animal model with direct and maternal effects. Additive direct and maternal effects were considered either noncorrelated (formula in text) or negatively correlated (formula in text). Methods of analysis were traditional BLUP, BayesC using phenotypes and ignoring maternal effects (BayesCPR), BayesC using deregressed EBV (BayesCDEBV), and single-step genomic BLUP (ssGBLUP). Whereas BayesCPR can be used when phenotypes of only genotyped animals are available, BayesCDEBV can be used when BLUP EBV of genotyped animals are available, and ssGBLUP is suitable when genotypes, phenotypes, and pedigrees are jointly available. For all genotyped and young genotyped animals, mean accuracies from BayesCPR and BayesCDEBV were lower than accuracies from BLUP for direct and maternal effects. The differences in mean accuracy were greater when genetic correlation was negative. Gains in accuracy were observed when ssGBLUP was compared with BLUP; for the direct (maternal) effect the average gain was 0.01 (0.02) for all genotyped animals and 0.03 (0.02) for young genotyped animals without phenotypes. Similar gains were observed for 0 and negative genetic correlation. Accuracy with BayesCPR was affected by ignoring phenotypes of nongenotyped animals and maternal effect and by not accounting for parent average. Accuracy with BayesCDEBV was affected by approximations needed for deregression, not accounting for parent average, and sequential rather than simultaneous fitting of genomic and nongenomic information. Whereas BayesCDEBV presented a considerable bias, especially for maternal effect, ssGBLUP was unbiased for both effects. The computing time was 1 s for BLUP, 44 s for ssGBLUP, and over 2,000 s for BayesC. Greatest computational efficiency and accuracy of genomic prediction for a maternally affected trait was obtained when information from all nongenotyped but related individuals was included and phenotypes, pedigree, and genotypes were available and considered jointly. Increasing the gain in accuracy of genomic predictions obtained by ssGBLUP over BLUP may require an increase in the number of genotyped animals.
Collapse
Affiliation(s)
- D A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602-2771, USA.
| | | | | | | | | | | |
Collapse
|
37
|
Casellas J, Esquivelzeta C, Legarra A. Short communication: Accounting for new mutations in genomic prediction models. J Dairy Sci 2013; 96:5398-402. [PMID: 23746579 DOI: 10.3168/jds.2012-6468] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Accepted: 04/22/2013] [Indexed: 11/19/2022]
Abstract
Genomic evaluation models so far do not allow for accounting of newly generated genetic variation due to mutation. The main target of this research was to extend current genomic BLUP models with mutational relationships (model AM), and compare them against standard genomic BLUP models (model A) by analyzing simulated data. Model performance and precision of the predicted breeding values were evaluated under different population structures and heritabilities. The deviance information criterion (DIC) clearly favored the mutational relationship model under large heritabilities or populations with moderate-to-deep pedigrees contributing phenotypic data (i.e., differences equal or larger than 10 DIC units); this model provided slightly higher correlation coefficients between simulated and predicted genomic breeding values. On the other hand, null DIC differences, or even relevant advantages for the standard genomic BLUP model, were reported under small heritabilities and shallow pedigrees, although precision of the genomic breeding values did not differ across models at a significant level. This method allows for slightly more accurate genomic predictions and handling of newly created variation; moreover, this approach does not require additional genotyping or phenotyping efforts, but a more accurate handing of available data.
Collapse
Affiliation(s)
- Joaquim Casellas
- Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain.
| | | | | |
Collapse
|
38
|
Abstract
As the molecular marker density grows, there is a strong need in both genome-wide association studies and genomic selection to fit models with a large number of parameters. Here we present a computationally efficient generalized ridge regression (RR) algorithm for situations in which the number of parameters largely exceeds the number of observations. The computationally demanding parts of the method depend mainly on the number of observations and not the number of parameters. The algorithm was implemented in the R package bigRR based on the previously developed package hglm. Using such an approach, a heteroscedastic effects model (HEM) was also developed, implemented, and tested. The efficiency for different data sizes were evaluated via simulation. The method was tested for a bacteria-hypersensitive trait in a publicly available Arabidopsis data set including 84 inbred lines and 216,130 SNPs. The computation of all the SNP effects required <10 sec using a single 2.7-GHz core. The advantage in run time makes permutation test feasible for such a whole-genome model, so that a genome-wide significance threshold can be obtained. HEM was found to be more robust than ordinary RR (a.k.a. SNP-best linear unbiased prediction) in terms of QTL mapping, because SNP-specific shrinkage was applied instead of a common shrinkage. The proposed algorithm was also assessed for genomic evaluation and was shown to give better predictions than ordinary RR.
Collapse
|
39
|
Hickey JM, Kinghorn BP, Tier B, Clark SA, van der Werf JHJ, Gorjanc G. Genomic evaluations using similarity between haplotypes. J Anim Breed Genet 2012; 130:259-69. [PMID: 23855628 DOI: 10.1111/jbg.12020] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Accepted: 11/07/2012] [Indexed: 10/27/2022]
Abstract
Long-range phasing and haplotype library imputation methodologies are accurate and efficient methods to provide haplotype information that could be used in prediction of breeding value or phenotype. Modelling long haplotypes as independent effects in genomic prediction would be inefficient due to the many effects that need to be estimated and phasing errors, even if relatively low in frequency, exacerbate this problem. One approach to overcome this is to use similarity between haplotypes to model covariance of genomic effects by region or of animal breeding values. We developed a simple method to do this and tested impact on genomic prediction by simulation. Results show that the diagonal and off-diagonal elements of a genomic relationship matrix constructed using the haplotype similarity method had higher correlations with the true relationship between pairs of individuals than genomic relationship matrices built using unphased genotypes or assumed unrelated haplotypes. However, the prediction accuracy of such haplotype-based prediction methods was not higher than those based on unphased genotype information.
Collapse
Affiliation(s)
- J M Hickey
- School of Environmental and Rural Science, University of New England, Armidale, Australia
| | | | | | | | | | | |
Collapse
|
40
|
Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics 2012; 193:347-65. [PMID: 23222650 DOI: 10.1534/genetics.112.147983] [Citation(s) in RCA: 239] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals.
Collapse
|
41
|
|
42
|
Setting the Standard: A Special Focus on Genomic Selection in GENETICS and G3. G3-GENES GENOMES GENETICS 2012; 2:423. [PMID: 22540032 PMCID: PMC3337469 DOI: 10.1534/g3.112.002295] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
43
|
Abstract
Hierarchical mixed effects models have been demonstrated to be powerful for predicting genomic merit of livestock and plants, on the basis of high-density single-nucleotide polymorphism (SNP) marker panels, and their use is being increasingly advocated for genomic predictions in human health. Two particularly popular approaches, labeled BayesA and BayesB, are based on specifying all SNP-associated effects to be independent of each other. BayesB extends BayesA by allowing a large proportion of SNP markers to be associated with null effects. We further extend these two models to specify SNP effects as being spatially correlated due to the chromosomally proximal effects of causal variants. These two models, that we respectively dub as ante-BayesA and ante-BayesB, are based on a first-order nonstationary antedependence specification between SNP effects. In a simulation study involving 20 replicate data sets, each analyzed at six different SNP marker densities with average LD levels ranging from r(2) = 0.15 to 0.31, the antedependence methods had significantly (P < 0.01) higher accuracies than their corresponding classical counterparts at higher LD levels (r(2) > 0. 24) with differences exceeding 3%. A cross-validation study was also conducted on the heterogeneous stock mice data resource (http://mus.well.ox.ac.uk/mouse/HS/) using 6-week body weights as the phenotype. The antedependence methods increased cross-validation prediction accuracies by up to 3.6% compared to their classical counterparts (P < 0.001). Finally, we applied our method to other benchmark data sets and demonstrated that the antedependence methods were more accurate than their classical counterparts for genomic predictions, even for individuals several generations beyond the training data.
Collapse
|