251
|
Müller D, Technow F, Melchinger AE. Shrinkage estimation of the genomic relationship matrix can improve genomic estimated breeding values in the training set. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2015; 128:693-703. [PMID: 25735232 DOI: 10.1007/s00122-015-2464-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Accepted: 01/10/2015] [Indexed: 05/18/2023]
Abstract
We evaluated several methods for computing shrinkage estimates of the genomic relationship matrix and demonstrated their potential to enhance the reliability of genomic estimated breeding values of training set individuals. In genomic prediction in plant breeding, the training set constitutes a large fraction of the total number of genotypes assayed and is itself subject to selection. The objective of our study was to investigate whether genomic estimated breeding values (GEBVs) of individuals in the training set can be enhanced by shrinkage estimation of the genomic relationship matrix. We simulated two different population types: a diversity panel of unrelated individuals and a biparental family of doubled haploid lines. For different training set sizes (50, 100, 200), number of markers (50, 100, 200, 500, 2,500) and heritabilities (0.25, 0.5, 0.75), shrinkage coefficients were computed by four different methods. Two of these methods are novel and based on measures of LD, the other two were previously described in the literature, one of which was extended by us. Our results showed that shrinkage estimation of the genomic relationship matrix can significantly improve the reliability of the GEBVs of training set individuals, especially for a low number of markers. We demonstrate that the number of markers is the primary determinant of the optimum shrinkage coefficient maximizing the reliability and we recommend methods eligible for routine usage in practical applications.
Collapse
|
252
|
Selection for silage yield and composition did not affect genomic diversity within the Wisconsin Quality Synthetic maize population. G3-GENES GENOMES GENETICS 2015; 5:541-9. [PMID: 25645532 PMCID: PMC4390570 DOI: 10.1534/g3.114.015263] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Maize silage is forage of high quality and yield, and represents the second most important use of maize in the United States. The Wisconsin Quality Synthetic (WQS) maize population has undergone five cycles of recurrent selection for silage yield and composition, resulting in a genetically improved population. The application of high-density molecular markers allows breeders and geneticists to identify important loci through association analysis and selection mapping, as well as to monitor changes in the distribution of genetic diversity across the genome. The objectives of this study were to identify loci controlling variation for maize silage traits through association analysis and the assessment of selection signatures and to describe changes in the genomic distribution of gene diversity through selection and genetic drift in the WQS recurrent selection program. We failed to find any significant marker-trait associations using the historical phenotypic data from WQS breeding trials combined with 17,719 high-quality, informative single nucleotide polymorphisms. Likewise, no strong genomic signatures were left by selection on silage yield and quality in the WQS despite genetic gain for these traits. These results could be due to the genetic complexity underlying these traits, or the role of selection on standing genetic variation. Variation in loss of diversity through drift was observed across the genome. Some large regions experienced much greater loss in diversity than what is expected, suggesting limited recombination combined with small populations in recurrent selection programs could easily lead to fixation of large swaths of the genome.
Collapse
|
253
|
Isidro J, Jannink JL, Akdemir D, Poland J, Heslot N, Sorrells ME. Training set optimization under population structure in genomic selection. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2015; 128:145-58. [PMID: 25367380 PMCID: PMC4282691 DOI: 10.1007/s00122-014-2418-4] [Citation(s) in RCA: 100] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Accepted: 10/12/2014] [Indexed: 05/17/2023]
Abstract
Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure.
Collapse
|
254
|
Hess JE, Caudill CC, Keefer ML, McIlraith BJ, Moser ML, Narum SR. Genes predict long distance migration and large body size in a migratory fish, Pacific lamprey. Evol Appl 2014; 7:1192-208. [PMID: 25558280 PMCID: PMC4275091 DOI: 10.1111/eva.12203] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Accepted: 08/17/2014] [Indexed: 12/20/2022] Open
Abstract
Elucidation of genetic mechanisms underpinning migratory behavior could help predict how changes in genetic diversity may affect future spatiotemporal distribution of a migratory species. This ability would benefit conservation of one such declining species, anadromous Pacific lamprey (Entosphenus tridentatus). Nonphilopatric migration of adult Pacific lamprey has homogenized population-level neutral variation but has maintained adaptive variation that differentiates groups based on geography, run-timing and adult body form. To investigate causes for this adaptive divergence, we examined 647 adult lamprey sampled at a fixed location on the Columbia River and radiotracked during their subsequent upstream migration. We tested whether genetic variation [94 neutral and adaptive single nucleotide polymorphisms (SNPs) previously identified from a genomewide association study] was associated with phenotypes of migration distance, migration timing, or morphology. Three adaptive markers were strongly associated with morphology, and one marker also correlated with upstream migration distance and timing. Genes physically linked with these markers plausibly influence differences in body size, which is also consistently associated with migration distance in Pacific lamprey. Pacific lamprey conservation implications include the potential to predict an individual's upstream destination based on its genotype. More broadly, the results suggest a genetic basis for intrapopulation variation in migration distance in migratory species.
Collapse
Affiliation(s)
- Jon E Hess
- Columbia River Inter-Tribal Fish Commission Hagerman, ID, USA
| | - Christopher C Caudill
- Department of Fish and Wildlife Sciences, College of Natural Resources, University of Idaho Moscow, ID, USA
| | - Matthew L Keefer
- Department of Fish and Wildlife Sciences, College of Natural Resources, University of Idaho Moscow, ID, USA
| | | | - Mary L Moser
- Fish Ecology Division, Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration Seattle, WA, USA
| | - Shawn R Narum
- Columbia River Inter-Tribal Fish Commission Hagerman, ID, USA
| |
Collapse
|
255
|
Auvray B, McEwan JC, Newman SAN, Lee M, Dodds KG. Genomic prediction of breeding values in the New Zealand sheep industry using a 50K SNP chip. J Anim Sci 2014; 92:4375-89. [PMID: 25149326 DOI: 10.2527/jas.2014-7801] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The aim of genomic prediction is to predict breeding value from genomic data. We describe the development of genomic prediction equations and accuracies for molecular breeding values (MBV) for industry use, focusing on the methodology used to deal with predictions for the New Zealand sheep population structure. This is made up of a mixture of pure and crossbred animals, but principally Romney based. In particular, we used pedigree-based EBV for 8 traits (weaning weight as a direct effect, weaning weight as a maternal effect, live weight at 8 mo, live weight at 12 mo, greasy fleece weight at 12 mo, lamb fleece weight, adult fleece weight, and number of lambs born) and Illumina OvineSNP50 BeadChip genotypes from 13,420 animals to investigate BLUP with different genomic relationship matrices (GRM) based on SNP markers and to investigate varying sets of older animals (training sets) to predict the MBV of younger animals (validation sets). The GRM tested included modifications to account for allele frequency differences between breeds, rescaling so that the mean GRM is equal to the mean of the traditional pedigree numerator relationship matrix A: , and combining of the GRM with A: using a convex combination with a weight estimated by maximizing a conditional restricted likelihood. We found that these modifications were beneficial and recommend using a breed-adjusted GRM combined with A: . Training data sets with Romney, Coopworth, and Perendale animals all together usually predicted better than using just a pure breed training data set for all traits. But predictions for the breed Perendale were more accurate with a Perendale training set for 3 of the 8 traits. We concluded that using a mixed-breed training set for all combinations of traits and breeds was best but advise that increasing the number of Perendale animals genotyped should be a priority to increase the MBV accuracies obtained for that breed.
Collapse
Affiliation(s)
- B Auvray
- Animal Productivity Group, AgResearch Limited, Mosgiel 9053, New Zealand
| | - J C McEwan
- Animal Productivity Group, AgResearch Limited, Mosgiel 9053, New Zealand
| | - S-A N Newman
- Animal Productivity Group, AgResearch Limited, Mosgiel 9053, New Zealand
| | - M Lee
- Animal Productivity Group, AgResearch Limited, Mosgiel 9053, New Zealand
| | - K G Dodds
- Animal Productivity Group, AgResearch Limited, Mosgiel 9053, New Zealand
| |
Collapse
|
256
|
Budde KB, Heuertz M, Hernández-Serrano A, Pausas JG, Vendramin GG, Verdú M, González-Martínez SC. In situ genetic association for serotiny, a fire-related trait, in Mediterranean maritime pine (Pinus pinaster). THE NEW PHYTOLOGIST 2014; 201:230-241. [PMID: 24015853 DOI: 10.1111/nph.12483] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Accepted: 07/28/2013] [Indexed: 05/28/2023]
Abstract
Wildfire is a major ecological driver of plant evolution. Understanding the genetic basis of plant adaptation to wildfire is crucial, because impending climate change will involve fire regime changes worldwide. We studied the molecular genetic basis of serotiny, a fire-related trait, in Mediterranean maritime pine using association genetics. A single nucleotide polymorphism (SNP) set was used to identify genotype : phenotype associations in situ in an unstructured natural population of maritime pine (eastern Iberian Peninsula) under a mixed-effects model framework. RR-BLUP was used to build predictive models for serotiny in this region. Model prediction power outside the focal region was tested using independent range-wide serotiny data. Seventeen SNPs were potentially associated with serotiny, explaining approximately 29% of the trait phenotypic variation in the eastern Iberian Peninsula. Similar prediction power was found for nearby geographical regions from the same maternal lineage, but not for other genetic lineages. Association genetics for ecologically relevant traits evaluated in situ is an attractive approach for forest trees provided that traits are under strong genetic control and populations are unstructured, with large phenotypic variability. This will help to extend the research focus to ecological keystone non-model species in their natural environments, where polymorphisms acquired their adaptive value.
Collapse
Affiliation(s)
- Katharina B Budde
- Department of Forest Ecology and Genetics, INIA Forest Research Centre, 28040, Madrid, Spain
| | - Myriam Heuertz
- Department of Forest Ecology and Genetics, INIA Forest Research Centre, 28040, Madrid, Spain
| | - Ana Hernández-Serrano
- Centro de Investigaciones sobre Desertificación (CIDE-CSIC/UV/GV), 46113, Moncada, Valencia, Spain
| | - Juli G Pausas
- Centro de Investigaciones sobre Desertificación (CIDE-CSIC/UV/GV), 46113, Moncada, Valencia, Spain
| | - Giovanni G Vendramin
- Plant Genetics Institute, National Research Council, 50019, Sesto Fiorentino, Florence, Italy
| | - Miguel Verdú
- Centro de Investigaciones sobre Desertificación (CIDE-CSIC/UV/GV), 46113, Moncada, Valencia, Spain
| | | |
Collapse
|
257
|
Huang X, Han B. Natural variations and genome-wide association studies in crop plants. ANNUAL REVIEW OF PLANT BIOLOGY 2014; 65:531-51. [PMID: 24274033 DOI: 10.1146/annurev-arplant-050213-035715] [Citation(s) in RCA: 360] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Natural variants of crops are generated from wild progenitor plants under both natural and human selection. Diverse crops that are able to adapt to various environmental conditions are valuable resources for crop improvements to meet the food demands of the increasing human population. With the completion of reference genome sequences, the advent of high-throughput sequencing technology now enables rapid and accurate resequencing of a large number of crop genomes to detect the genetic basis of phenotypic variations in crops. Comprehensive maps of genome variations facilitate genome-wide association studies of complex traits and functional investigations of evolutionary changes in crops. These advances will greatly accelerate studies on crop designs via genomics-assisted breeding. Here, we first discuss crop genome studies and describe the development of sequencing-based genotyping and genome-wide association studies in crops. We then review sequencing-based crop domestication studies and offer a perspective on genomics-driven crop designs.
Collapse
Affiliation(s)
- Xuehui Huang
- National Center for Gene Research, Shanghai Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200233, China; ,
| | | |
Collapse
|
258
|
Ertl J, Edel C, Emmerling R, Pausch H, Fries R, Götz KU. On the limited increase in validation reliability using high-density genotypes in genomic best linear unbiased prediction: observations from Fleckvieh cattle. J Dairy Sci 2013; 97:487-96. [PMID: 24210491 DOI: 10.3168/jds.2013-6855] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2013] [Accepted: 09/16/2013] [Indexed: 01/11/2023]
Abstract
This study investigated reliability of genomic predictions using medium-density (40,089; 50K) or high-density (HD; 388,951) marker sets. We developed an approximate method to test differences in validation reliability for significance. Model-based reliability and the effect of HD genotypes on inflation of predictions were analyzed additionally. Genomic breeding values were predicted for at least 1,321 validation bulls based on phenotypes and genotypes of at least 5,324 calibration bulls by means of a linear model in milk, fat, and protein yield; somatic cell score; milkability; muscling; udder, feet, and legs score as well as stature. In total, 1,485 bulls were actually HD genotyped and HD genotypes of the other animals were imputed from 50K genotypes using FImpute software. Validation reliability was measured as the coefficient of determination of the weighted regression of daughter yield deviations on predicted breeding values divided by the reliability of daughter yield deviations and inflation was evaluated by the slope of this regression. Model-based reliability was calculated from the model. Distributions for validation reliability of 50K markers were derived by repeated sampling of 50,000-marker samples from HD to test differences in validation reliability statistically. Additionally, the benefit of HD genotypes in validation reliability was tested by repeated sampling of validation groups and calculation of the difference in validation reliability between HD and 50K genotypes for the sampled groups of bulls. The mean benefit in validation reliability of HD genotypes was 0.015 compared with real 50K genotypes and 0.028 compared with 50K samples from HD affected by imputation error and was significant for all traits. The model-based reliability was, on average, 0.036 lower and the regression coefficient was 0.036 closer to the expected value with HD genotypes. The observed gain in validation reliability with HD genotypes was similar to expectations based on the number of markers and the effective number of segregating chromosome segments. Sampling error in the marker-based relationship coefficients causing overestimation of the model-based reliability was smaller with HD genotypes. Inflation of the genomic predictions was reduced with HD genotypes, accordingly. Similar effects on model-based reliability and inflation, but not on the validation reliability, were obtained by shrinkage estimation of the realized relationship matrix from 50K genotypes.
Collapse
Affiliation(s)
- J Ertl
- Institute of Animal Breeding, Bavarian State Research Centre for Agriculture, 85586 Poing, Germany.
| | - C Edel
- Institute of Animal Breeding, Bavarian State Research Centre for Agriculture, 85586 Poing, Germany
| | - R Emmerling
- Institute of Animal Breeding, Bavarian State Research Centre for Agriculture, 85586 Poing, Germany
| | - H Pausch
- Chair of Animal Breeding, Technische Universität München, 85354 Freising, Germany
| | - R Fries
- Chair of Animal Breeding, Technische Universität München, 85354 Freising, Germany
| | - K-U Götz
- Institute of Animal Breeding, Bavarian State Research Centre for Agriculture, 85586 Poing, Germany
| |
Collapse
|