1
|
Cai W, Hu J, Fan W, Xu Y, Tang J, Xie M, Zhang Y, Guo Z, Zhou Z, Hou S. Genetic parameters and genomic prediction of growth and breast morphological traits in a crossbreed duck population. Evol Appl 2024; 17:e13638. [PMID: 38333555 PMCID: PMC10848588 DOI: 10.1111/eva.13638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 09/02/2023] [Accepted: 12/07/2023] [Indexed: 02/10/2024] Open
Abstract
Genomic selection (GS) has great potential to increase genetic gain in poultry breeding. However, the performance of genomic prediction in duck growth and breast morphological (BM) traits remains largely unknown. The objective of this study was to evaluate the benefits of genomic prediction for duck growth and BM traits using methods such as GBLUP, single-step GBLUP, Bayesian models, and different marker densities. This study collected phenotypic data for 14 growth and BM traits in a crossbreed population of 1893 Pekin duck × mallard, which included 941 genotyped ducks. The estimation of genetic parameters indicated high heritabilities for body weight (0.54-0.72), whereas moderate-to-high heritabilities for average daily gain (0.21-0.57) traits. The heritabilities of BM traits ranged from low to moderate (0.18-0.39). The prediction ability of GS on growth and BM traits increased by 7.6% on average compared to the pedigree-based BLUP method. The single-step GBLUP outperformed GBLUP in most traits with an average of 0.3% higher reliability in our study. Most of the Bayesian models had better performance on predictive reliability, except for BayesR. BayesN emerged as the top-performing model for genomic prediction of both growth and BM traits, exhibiting an average increase in reliability of 3.0% compared to GBLUP. The permutation studies revealed that 50 K markers had achieved ideal prediction reliability, while 3 K markers still achieved 90.8% predictive capability would further reduce the cost for duck growth and BM traits. This study provides promising evidence for the application of GS in improving duck growth and BM traits. Our findings offer some useful strategies for optimizing the predictive ability of GS in growth and BM traits and provide theoretical foundations for designing a low-density panel in ducks.
Collapse
Affiliation(s)
- Wentao Cai
- Institute of Animal ScienceChinese Academy of Agricultural SciencesBeijingChina
| | - Jian Hu
- Institute of Animal ScienceChinese Academy of Agricultural SciencesBeijingChina
| | - Wenlei Fan
- Institute of Animal ScienceChinese Academy of Agricultural SciencesBeijingChina
- College of Animal Science and TechnologyQingdao Agricultural UniversityQingdaoChina
| | - Yaxi Xu
- College of Animal Science and TechnologyBeijing University of AgricultureBeijingChina
| | - Jing Tang
- Institute of Animal ScienceChinese Academy of Agricultural SciencesBeijingChina
| | - Ming Xie
- Institute of Animal ScienceChinese Academy of Agricultural SciencesBeijingChina
| | - Yunsheng Zhang
- Institute of Animal ScienceChinese Academy of Agricultural SciencesBeijingChina
| | - Zhanbao Guo
- Institute of Animal ScienceChinese Academy of Agricultural SciencesBeijingChina
| | - Zhengkui Zhou
- Institute of Animal ScienceChinese Academy of Agricultural SciencesBeijingChina
| | - Shuisheng Hou
- Institute of Animal ScienceChinese Academy of Agricultural SciencesBeijingChina
| |
Collapse
|
2
|
Degen B, Müller NA. A simulation study comparing advanced marker-assisted selection with genomic selection in tree breeding programs. G3 (BETHESDA, MD.) 2023; 13:jkad164. [PMID: 37494068 PMCID: PMC10542556 DOI: 10.1093/g3journal/jkad164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 07/27/2023]
Abstract
Advances in DNA sequencing technologies allow the sequencing of whole genomes of thousands of individuals and provide several million single nucleotide polymorphisms (SNPs) per individual. These data combined with precise and high-throughput phenotyping enable genome-wide association studies (GWAS) and the identification of SNPs underlying traits with complex genetic architectures. The identified causal SNPs and estimated allelic effects could then be used for advanced marker-assisted selection (MAS) in breeding programs. But could such MAS compete with the broadly used genomic selection (GS)? This question is of particular interest for the lengthy tree breeding strategies. Here, with our new software "SNPscan breeder," we simulated a simple tree breeding program and compared the impact of different selection criteria on genetic gain and inbreeding. Further, we assessed different genetic architectures and different levels of kinship among individuals of the breeding population. Interestingly, apart from progeny testing, GS using gBLUP performed best under almost all simulated scenarios. MAS based on GWAS results outperformed GS only if the allelic effects were estimated in large populations (ca. 10,000 individuals) of unrelated individuals. Notably, GWAS using 3,000 extreme phenotypes performed as good as the use of 10,000 phenotypes. GS increased inbreeding and thus reduced genetic diversity more strongly compared to progeny testing and GWAS-based selection. We discuss the practical implications for tree breeding programs. In conclusion, our analyses further support the potential of GS for forest tree breeding and improvement, although MAS may gain relevance with decreasing sequencing costs in the future.
Collapse
Affiliation(s)
- Bernd Degen
- Thünen Institute of Forest Genetics, Sieker Landstrasse 2, 22927, Grosshansdorf, Schleswig-Holstein, Germany
| | - Niels A Müller
- Thünen Institute of Forest Genetics, Sieker Landstrasse 2, 22927, Grosshansdorf, Schleswig-Holstein, Germany
| |
Collapse
|
3
|
Cai W, Hu J, Fan W, Xu Y, Tang J, Xie M, Zhang Y, Guo Z, Zhou Z, Hou S. Strategies to improve genomic predictions for 35 duck carcass traits in an F 2 population. J Anim Sci Biotechnol 2023; 14:74. [PMID: 37147656 PMCID: PMC10163724 DOI: 10.1186/s40104-023-00875-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 04/02/2023] [Indexed: 05/07/2023] Open
Abstract
BACKGROUND Carcass traits are crucial for broiler ducks, but carcass traits can only be measured postmortem. Genomic selection (GS) is an effective approach in animal breeding to improve selection and reduce costs. However, the performance of genomic prediction in duck carcass traits remains largely unknown. RESULTS In this study, we estimated the genetic parameters, performed GS using different models and marker densities, and compared the estimation performance between GS and conventional BLUP on 35 carcass traits in an F2 population of ducks. Most of the cut weight traits and intestine length traits were estimated to be high and moderate heritabilities, respectively, while the heritabilities of percentage slaughter traits were dynamic. The reliability of genome prediction using GBLUP increased by an average of 0.06 compared to the conventional BLUP method. The Permutation studies revealed that 50K markers had achieved ideal prediction reliability, while 3K markers still achieved 90.7% predictive capability would further reduce the cost for duck carcass traits. The genomic relationship matrix normalized by our true variance method instead of the widely used [Formula: see text] could achieve an increase in prediction reliability in most traits. We detected most of the bayesian models had a better performance, especially for BayesN. Compared to GBLUP, BayesN can further improve the predictive reliability with an average of 0.06 for duck carcass traits. CONCLUSION This study demonstrates genomic selection for duck carcass traits is promising. The genomic prediction can be further improved by modifying the genomic relationship matrix using our proposed true variance method and several Bayesian models. Permutation study provides a theoretical basis for the fact that low-density arrays can be used to reduce genotype costs in duck genome selection.
Collapse
Affiliation(s)
- Wentao Cai
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Jian Hu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
- Shandong New Hope Liuhe Group Co., Ltd., Qingdao, 266108, China
| | - Wenlei Fan
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
- College of Animal Science and Technology, Qingdao Agricultural University, Qingdao, 266109, China
| | - Yaxi Xu
- College of Animal Science and Technology, Beijing University of Agriculture, Beijing, 102206, China
| | - Jing Tang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Ming Xie
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Yunsheng Zhang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Zhanbao Guo
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Zhengkui Zhou
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Shuisheng Hou
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| |
Collapse
|
4
|
Zhang R, Zhang Y, Liu T, Jiang B, Li Z, Qu Y, Chen Y, Li Z. Utilizing Variants Identified with Multiple Genome-Wide Association Study Methods Optimizes Genomic Selection for Growth Traits in Pigs. Animals (Basel) 2023; 13:ani13040722. [PMID: 36830509 PMCID: PMC9952664 DOI: 10.3390/ani13040722] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 02/09/2023] [Accepted: 02/15/2023] [Indexed: 02/22/2023] Open
Abstract
Improving the prediction accuracies of economically important traits in genomic selection (GS) is a main objective for researchers and breeders in the livestock industry. This study aims at utilizing potentially functional SNPs and QTLs identified with various genome-wide association study (GWAS) models in GS of pig growth traits. We used three well-established GWAS methods, including the mixed linear model, Bayesian model and meta-analysis, as well as 60K SNP-chip and whole genome sequence (WGS) data from 1734 Yorkshire and 1123 Landrace pigs to detect SNPs related to four growth traits: average daily gain, backfat thickness, body weight and birth weight. A total of 1485 significant loci and 24 candidate genes which are involved in skeletal muscle development, fatty deposition, lipid metabolism and insulin resistance were identified. Compared with using all SNP-chip data, GS with the pre-selected functional SNPs in the standard genomic best linear unbiased prediction (GBLUP), and a two-kernel based GBLUP model yielded average gains in accuracy by 4 to 46% (from 0.19 ± 0.07 to 0.56 ± 0.07) and 5 to 27% (from 0.16 ± 0.06 to 0.57 ± 0.05) for the four traits, respectively, suggesting that the prioritization of preselected functional markers in GS models had the potential to improve prediction accuracies for certain traits in livestock breeding.
Collapse
Affiliation(s)
- Ruifeng Zhang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Yi Zhang
- Institute of Neuroscience, Panzhihua University, Panzhihua 617000, China
| | - Tongni Liu
- Genetic Data Center, Faculty of Forestry, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Bo Jiang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Zhenyang Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Youping Qu
- Guangdong IPIG Technology Co., Ltd., Guangzhou 510006, China
| | - Yaosheng Chen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Zhengcao Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
- Correspondence:
| |
Collapse
|
5
|
Zhao R, Pei S, Yau SST. New Genome Sequence Detection via Natural Vector Convex Hull Method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1782-1793. [PMID: 33237867 DOI: 10.1109/tcbb.2020.3040706] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
It remains challenging how to find existing but undiscovered genome sequence mutations or predict potential genome sequence mutations based on real sequence data. Motivated by this, we develop approaches to detect new, undiscovered genome sequences. Because discovering new genome sequences through biological experiments is resource-intensive, we want to achieve the new genome sequence detection task mathematically. However, little literature tells us how to detect new, undiscovered genome sequence mutations mathematically. We form a new framework based on natural vector convex hull method that conducts alignment-free sequence analysis. Our newly developed two approaches, Random-permutation Algorithm with Penalty (RAP) and Random-permutation Algorithm with Penalty and COstrained Search (RAPCOS), use the geometry properties captured by natural vectors. In our experiment, we discover a mathematically new human immunodeficiency virus (HIV) genome sequence using some real HIV genome sequences. Significantly, the proposed methods are applicable to solve the new genome sequence detection challenge and have many good properties, such as robustness, rapid convergence, and fast computation.
Collapse
|
6
|
Chen CJ, Garrick D, Fernando R, Karaman E, Stricker C, Keehan M, Cheng H. XSim version 2: simulation of modern breeding programs. G3 GENES|GENOMES|GENETICS 2022; 12:6542309. [PMID: 35244161 PMCID: PMC8982375 DOI: 10.1093/g3journal/jkac032] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 01/06/2022] [Indexed: 11/25/2022]
Abstract
Simulation can be an efficient approach to design, evaluate, and optimize breeding programs. In the era of modern agriculture, breeding programs can benefit from a simulator that integrates various sources of big data and accommodates state-of-the-art statistical models. The initial release of XSim, in which stochastic descendants can be efficiently simulated with a drop-down strategy, has mainly been used to validate genomic selection results. In this article, we present XSim Version 2 that is an open-source tool and has been extensively redesigned with additional features to meet the needs in modern breeding programs. It seamlessly incorporates multiple statistical models for genetic evaluations, such as GBLUP, Bayesian alphabets, and neural networks, and it can effortlessly simulate successive generations of descendants based on complex mating schemes by the aid of its modular design. Case studies are presented to demonstrate the flexibility of XSim Version 2 in simulating crossbreeding in animal and plant populations. Modern biotechnology, including double haploids and embryo transfer, can all be simultaneously integrated into the mating plans that drive the simulation. From a computing perspective, XSim Version 2 is implemented in Julia, which is a computer language that retains the readability of scripting languages (e.g. R and Python) without sacrificing much computational speed compared to compiled languages (e.g. C). This makes XSim Version 2 a simulation tool that is relatively easy for both champions and community members to maintain, modify, or extend in order to improve their breeding programs. Functions and operators are overloaded for a better user interface so they may concatenate, subset, summarize, and organize simulated populations at each breeding step. With the strong and foreseeable demands in the community, XSim Version 2 will serve as a modern simulator bridging the gaps between theories and experiments with its flexibility, extensibility, and friendly interface.
Collapse
Affiliation(s)
- Chunpeng James Chen
- Department of Animal Science, University of California, Davis, CA 95616, USA
| | | | - Rohan Fernando
- Department of Animal Science, Iowa State University, Ames, IA 50010, USA
| | - Emre Karaman
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus 8830, Denmark
| | - Chris Stricker
- agn Genetics GmbH, Davos-Dorf, Graubünden 7260, Switzerland
| | | | - Hao Cheng
- Department of Animal Science, University of California, Davis, CA 95616, USA
| |
Collapse
|
7
|
Pérez-Enciso M, Zingaretti LM, Ramayo-Caldas Y, de Los Campos G. Opportunities and limits of combining microbiome and genome data for complex trait prediction. Genet Sel Evol 2021; 53:65. [PMID: 34362312 PMCID: PMC8344190 DOI: 10.1186/s12711-021-00658-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 07/20/2021] [Indexed: 12/12/2022] Open
Abstract
Background Analysis and prediction of complex traits using microbiome data combined with host genomic information is a topic of utmost interest. However, numerous questions remain to be answered: how useful can the microbiome be for complex trait prediction? Are estimates of microbiability reliable? Can the underlying biological links between the host’s genome, microbiome, and phenome be recovered? Methods Here, we address these issues by (i) developing a novel simulation strategy that uses real microbiome and genotype data as inputs, and (ii) using variance-component approaches (Bayesian Reproducing Kernel Hilbert Space (RKHS) and Bayesian variable selection methods (Bayes C)) to quantify the proportion of phenotypic variance explained by the genome and the microbiome. The proposed simulation approach can mimic genetic links between the microbiome and genotype data by a permutation procedure that retains the distributional properties of the data. Results Using real genotype and rumen microbiota abundances from dairy cattle, simulation results suggest that microbiome data can significantly improve the accuracy of phenotype predictions, regardless of whether some microbiota abundances are under direct genetic control by the host or not. This improvement depends logically on the microbiome being stable over time. Overall, random-effects linear methods appear robust for variance components estimation, in spite of the typically highly leptokurtic distribution of microbiota abundances. The predictive performance of Bayes C was higher but more sensitive to the number of causative effects than RKHS. Accuracy with Bayes C depended, in part, on the number of microorganisms’ taxa that influence the phenotype. Conclusions While we conclude that, overall, genome-microbiome-links can be characterized using variance component estimates, we are less optimistic about the possibility of identifying the causative host genetic effects that affect microbiota abundances, which would require much larger sample sizes than are typically available for genome-microbiome-phenome studies. The R code to replicate the analyses is in https://github.com/miguelperezenciso/simubiome. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-021-00658-7.
Collapse
Affiliation(s)
- Miguel Pérez-Enciso
- ICREA, Passeig de Lluís Companys 23, 08010, Barcelona, Spain. .,Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, 08193, Bellaterra, Barcelona, Spain. .,Dept. of Epidemiology & Biostatistics, and Dept. of Statistics & Probability, Michigan State University, East Lansing, MI, 48824, USA.
| | - Laura M Zingaretti
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, 08193, Bellaterra, Barcelona, Spain.,Dept. of Epidemiology & Biostatistics, and Dept. of Statistics & Probability, Michigan State University, East Lansing, MI, 48824, USA
| | - Yuliaxis Ramayo-Caldas
- Animal Breeding and Genetics Program, Institute for Research and Technology in Food and Agriculture (IRTA), Torre Marimon, 08140, Caldes de Montbui, Barcelona, Spain
| | - Gustavo de Los Campos
- Dept. of Epidemiology & Biostatistics, and Dept. of Statistics & Probability, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
8
|
Impact of Marker Pruning Strategies Based on Different Measurements of Marker Distance on Genomic Prediction in Dairy Cattle. Animals (Basel) 2021; 11:ani11071992. [PMID: 34359120 PMCID: PMC8300388 DOI: 10.3390/ani11071992] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 06/27/2021] [Accepted: 06/28/2021] [Indexed: 11/16/2022] Open
Abstract
Simple Summary The usefulness of genomic prediction (GP) has been widely proofed by breeding analysis in livestock, plants and aquatic populations. It is well known that ‘marker density’ is a critical factor that affects the accuracy of GP, however, how to properly measure ‘marker density’ in GP is yet to be determined. With population-level whole-genome sequence data or high-density single nucleotide polymorphism (SNP) data available, this question seems to be answered more convincingly. In this study, we investigated and discussed the impact of four ‘marker density’ measures that reflect genetic or physical distances between SNPs on the accuracy of GP in a Germany Holstein dairy cattle population. Our results showed that the degree of variation of physical distance between adjacent SNPs had significant effects on the accuracy of GP, while the genetic distance between SNPs had no relationship with the accuracy of GP. Therefore, for studies based on high-density SNP data, the default strategy of pruning SNPs based on genetic distance is detrimental to heritability estimation and genomic prediction. The results extended the communities knowledge of ‘marker density’ and provided useful suggestions for the application and research on genome prediction. Abstract With the availability of high-density single-nucleotide polymorphism (SNP) data and the development of genotype imputation methods, high-density panel-based genomic prediction (GP) has become possible in livestock breeding. It is generally considered that the genomic estimated breeding value (GEBV) accuracy increases with the marker density, while studies have shown that the GEBV accuracy does not increase or even decrease when high-density panels were used. Therefore, in addition to the SNP number, other measurements of ‘marker density’ seem to have impacts on the GEBV accuracy, and exploring the relationship between the GEBV accuracy and the measurements of ‘marker density’ based on high-density SNP or whole-genome sequence data is important for the field of GP. In this study, we constructed different SNP panels with certain SNP numbers (e.g., 1 k) by using the physical distance (PhyD), genetic distance (GenD) and random distance (RanD) between SNPs respectively based on the high-density SNP data of a Germany Holstein dairy cattle population. Therefore, there are three different panels at a certain SNP number level. These panels were used to construct GP models to predict fat percentage, milk yield and somatic cell score. Meanwhile, the mean (d¯) and variance (σd2) of the physical distance between SNPs and the mean (r2¯) and variance (σr22) of the genetic distance between SNPs in each panel were used as marker density-related measurements and their influence on the GEBV accuracy was investigated. At the same SNP number level, the d¯ of all panels is basically the same, but the σd2, r2¯ and σr22 are different. Therefore, we only investigated the effects of σd2, r2¯ and σr22 on the GEBV accuracy. The results showed that at a certain SNP number level, the GEBV accuracy was negatively correlated with σd2, but not with r2¯ and σr22. Compared with GenD and RanD, the σd2 of panels constructed by PhyD is smaller. The low and moderate-density panels (< 50 k) constructed by RanD or GenD have large σd2, which is not conducive to genomic prediction. The GEBV accuracy of the low and moderate-density panels constructed by PhyD is 3.8~34.8% higher than that of the low and moderate-density panels constructed by RanD and GenD. Panels with 20–30 k SNPs constructed by PhyD can achieve the same or slightly higher GEBV accuracy than that of high-density SNP panels for all three traits. In summary, the smaller the variation degree of physical distance between adjacent SNPs, the higher the GEBV accuracy. The low and moderate-density panels construct by physical distance are beneficial to genomic prediction, while pruning high-density SNP data based on genetic distance is detrimental to genomic prediction. The results provide suggestions for the development of SNP panels and the research of genome prediction based on whole-genome sequence data.
Collapse
|
9
|
Methe BA, Hiltbrand D, Roach J, Xu W, Gordon SG, Goodner BW, Stapleton AE. Functional gene categories differentiate maize leaf drought-related microbial epiphytic communities. PLoS One 2020; 15:e0237493. [PMID: 32946440 PMCID: PMC7500591 DOI: 10.1371/journal.pone.0237493] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 07/11/2020] [Indexed: 11/18/2022] Open
Abstract
The phyllosphere epiphytic microbiome is composed of microorganisms that colonize the external aerial portions of plants. Relationships of plant responses to specific microorganisms–both pathogenic and beneficial–have been examined, but the phyllosphere microbiome functional and metabolic profile responses are not well described. Changing crop growth conditions, such as increased drought, can have profound impacts on crop productivity. Also, epiphytic microbial communities provide a new target for crop yield optimization. We compared Zea mays leaf microbiomes collected under drought and well-watered conditions by examining functional gene annotation patterns across three physically disparate locations each with and without drought treatment, through the application of short read metagenomic sequencing. Drought samples exhibited different functional sequence compositions at each of the three field sites. Maize phyllosphere functional profiles revealed a wide variety of metabolic and regulatory processes that differed in drought and normal water conditions and provide key baseline information for future selective breeding.
Collapse
Affiliation(s)
- Barbara A. Methe
- J Craig Venter Institute, Medical Center Drive, Rockville, MD, United States of America
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - David Hiltbrand
- Department of Biology and Marine Biology, University of North Carolina Wilmington, Wilmington, NC, United States of America
| | - Jeffrey Roach
- Research Computing, University of North Carolina Chapel Hill, Chapel Hill, NC, United States of America
| | - Wenwei Xu
- Agricultural and Extension Center, Texas A and M AgriLife Research, Lubbock, TX, United States of America
| | - Stuart G. Gordon
- Biology Department, Presbyterian College, Clinton, SC, United States of America
| | - Brad W. Goodner
- Department, Hiram College, Hiram, OH, United States of America
| | - Ann E. Stapleton
- Department of Biology and Marine Biology, University of North Carolina Wilmington, Wilmington, NC, United States of America
- * E-mail:
| |
Collapse
|
10
|
Teng J, Huang S, Chen Z, Gao N, Ye S, Diao S, Ding X, Yuan X, Zhang H, Li J, Zhang Z. Optimizing genomic prediction model given causal genes in a dairy cattle population. J Dairy Sci 2020; 103:10299-10310. [PMID: 32952023 DOI: 10.3168/jds.2020-18233] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 07/07/2020] [Indexed: 01/15/2023]
Abstract
As genotypic data are moving from SNP chip toward whole-genome sequence, the accuracy of genomic prediction (GP) exhibits a marginal gain, although all genetic variation, including causal genes, are contained in whole-genome sequence data. Meanwhile, genetic analyses on complex traits, such as genome-wide association studies, have identified an increasing number of genomic regions, including potential causal genes, which would be reliable prior knowledge for GP. Many studies have tried to improve the performance of GP by modifying the prediction model to incorporate prior knowledge. Although several plausible results have been obtained from model modification or strategy optimization, most of them were validated in a specific empirical population with a limited variety of genetic architecture for complex traits. An alternative approach is to use simulated genetic architecture with known causal genes (e.g., simulated causative SNP) to evaluate different GP models with given causal genes. Our objectives were to (1) evaluate the performance of GP under a variety of genetic architectures with a subset of known causal genes and (2) compare different GP models modified by highlighting causal genes and different strategies to weight causal genes. In this study, we simulated pseudo-phenotypes under a variety of genetic architectures based on the real genotypes and phenotypes of a dairy cattle population. Besides classical genomic best linear unbiased prediction, we evaluated 3 modified GP models that highlight causal genes as follows: (1) by treating them as fixed effects, (2) by treating them as a separate random component, and (3) by combining them into the genomic relationship matrix as random effects. Our results showed that highlighting the known causal genes, which explained a considerable proportion of genetic variance in the GP models, increased the predictive accuracy. Combining all given causal genes into the genomic relationship matrix was the optimal strategy under all the scenarios validated, and treating causal genes as a separate random component is also recommended, when more than 20% of genetic variance was explained by known causal genes. Moreover, assigning differential weights to each causal gene further improved the predictive accuracy.
Collapse
Affiliation(s)
- Jinyan Teng
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Shuwen Huang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Zitao Chen
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Ning Gao
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, North Third Road, Guangzhou Higher Education Mega Center, Guangzhou 510006, China
| | - Shaopan Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Shuqi Diao
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Xiangdong Ding
- National Engineering Laboratory for Animal Breeding, Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Xiaolong Yuan
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Hao Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Jiaqi Li
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| |
Collapse
|
11
|
Pook T, Schlather M, Simianer H. MoBPS - Modular Breeding Program Simulator. G3 (BETHESDA, MD.) 2020; 10:1915-1918. [PMID: 32229505 PMCID: PMC7263682 DOI: 10.1534/g3.120.401193] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 03/27/2020] [Indexed: 11/22/2022]
Abstract
The R-package MoBPS provides a computationally efficient and flexible framework to simulate complex breeding programs and compare their economic and genetic impact. Simulations are performed on the base of individuals. MoBPS utilizes a highly efficient implementation with bit-wise data storage and matrix multiplications from the associated R-package miraculix allowing to handle large scale populations. Individual haplotypes are not stored but instead automatically derived based on points of recombination and mutations. The modular structure of MoBPS allows to combine rather coarse simulations, as needed to generate founder populations, with a very detailed modeling of todays' complex breeding programs, making use of all available biotechnologies. MoBPS provides pre-implemented functions for common breeding practices such as optimum genetic contributions and single-step GBLUP but also allows the user to replace certain steps with personalized and/or self-written solutions.
Collapse
Affiliation(s)
- Torsten Pook
- Department of Animal Sciences, Animal Breeding and Genetics Group, University of Goettingen, 37075 Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen, 37075 Goettingen, Germany
| | - Martin Schlather
- Center for Integrated Breeding Research, University of Goettingen, 37075 Goettingen, Germany
- Stochastics and Its Applications Group, University of Mannheim, 68131 Mannheim, Germany
| | - Henner Simianer
- Department of Animal Sciences, Animal Breeding and Genetics Group, University of Goettingen, 37075 Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen, 37075 Goettingen, Germany
| |
Collapse
|
12
|
Toxo: a library for calculating penetrance tables of high-order epistasis models. BMC Bioinformatics 2020; 21:138. [PMID: 32272874 PMCID: PMC7147067 DOI: 10.1186/s12859-020-3456-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 03/18/2020] [Indexed: 12/12/2022] Open
Abstract
Background Epistasis is defined as the interaction between different genes when expressing a specific phenotype. The most common way to characterize an epistatic relationship is using a penetrance table, which contains the probability of expressing the phenotype under study given a particular allele combination. Available simulators can only create penetrance tables for well-known epistasis models involving a small number of genes and under a large number of limitations. Results Toxo is a MATLAB library designed to calculate penetrance tables of epistasis models of any interaction order which resemble real data more closely. The user specifies the desired heritability (or prevalence) and the program maximizes the table’s prevalence (or heritability) according to the input epistatic model boundaries. Conclusions Toxo extends the capabilities of existing simulators that define epistasis using penetrance tables. These tables can be directly used as input for software simulators such as GAMETES so that they are able to generate data samples with larger interactions and more realistic prevalences/heritabilities.
Collapse
|
13
|
Pérez-Enciso M, Ramírez-Ayala LC, Zingaretti LM. SeqBreed: a python tool to evaluate genomic prediction in complex scenarios. Genet Sel Evol 2020; 52:7. [PMID: 32039696 PMCID: PMC7008576 DOI: 10.1186/s12711-020-0530-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 01/29/2020] [Indexed: 11/28/2022] Open
Abstract
Background Genomic prediction (GP) is a method whereby DNA polymorphism information is used to predict breeding values for complex traits. Although GP can significantly enhance predictive accuracy, it can be expensive and difficult to implement. To help design optimum breeding programs and experiments, including genome-wide association studies and genomic selection experiments, we have developed SeqBreed, a generic and flexible forward simulator programmed in python3. Results SeqBreed accommodates sex and mitochondrion chromosomes as well as autopolyploidy. It can simulate any number of complex phenotypes that are determined by any number of causal loci. SeqBreed implements several GP methods, including genomic best linear unbiased prediction (GBLUP), single-step GBLUP, pedigree-based BLUP, and mass selection. We illustrate its functionality with Drosophila genome reference panel (DGRP) sequence data and with tetraploid potato genotype data. Conclusions SeqBreed is a flexible and easy to use tool that can be used to optimize GP or genome-wide association studies. It incorporates some of the most popular GP methods and includes several visualization tools. Code is open and can be freely modified. Software, documentation, and examples are available at https://github.com/miguelperezenciso/SeqBreed.
Collapse
Affiliation(s)
- Miguel Pérez-Enciso
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, 08193, Bellaterra, Barcelona, Spain. .,ICREA, Passeig de Lluís Companys 23, 08010, Barcelona, Spain.
| | - Lino C Ramírez-Ayala
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, 08193, Bellaterra, Barcelona, Spain
| | - Laura M Zingaretti
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, 08193, Bellaterra, Barcelona, Spain.,Universidad Nacional de Villa María, IAPBCyA-IAPCH Villa María, Córdoba, Argentina
| |
Collapse
|
14
|
Bustos-Korts D, Malosetti M, Chenu K, Chapman S, Boer MP, Zheng B, van Eeuwijk FA. From QTLs to Adaptation Landscapes: Using Genotype-To-Phenotype Models to Characterize G×E Over Time. FRONTIERS IN PLANT SCIENCE 2019; 10:1540. [PMID: 31867027 PMCID: PMC6904366 DOI: 10.3389/fpls.2019.01540] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 11/04/2019] [Indexed: 05/18/2023]
Abstract
Genotype by environment interaction (G×E) for the target trait, e.g. yield, is an emerging property of agricultural systems and results from the interplay between a hierarchy of secondary traits involving the capture and allocation of environmental resources during the growing season. This hierarchy of secondary traits ranges from basic traits that correspond to response mechanisms/sensitivities, to intermediate traits that integrate a larger number of processes over time and therefore show a larger amount of G×E. Traits underlying yield differ in their contribution to adaptation across environmental conditions and have different levels of G×E. Here, we provide a framework to study the performance of genotype to phenotype (G2P) modeling approaches. We generate and analyze response surfaces, or adaptation landscapes, for yield and yield related traits, emphasizing the organization of the traits in a hierarchy and their development and interactions over time. We use the crop growth model APSIM-wheat with genotype-dependent parameters as a tool to simulate non-linear trait responses over time with complex trait dependencies and apply it to wheat crops in Australia. For biological realism, APSIM parameters were given a genetic basis of 300 QTLs sampled from a gamma distribution whose shape and rate parameters were estimated from real wheat data. In the simulations, the hierarchical organization of the traits and their interactions over time cause G×E for yield even when underlying traits do not show G×E. Insight into how G×E arises during growth and development helps to improve the accuracy of phenotype predictions within and across environments and to optimize trial networks. We produced a tangible simulated adaptation landscape for yield that we first investigated for its biological credibility by statistical models for G×E that incorporate genotypic and environmental covariables. Subsequently, the simulated trait data were used to evaluate statistical genotype-to-phenotype models for multiple traits and environments and to characterize relationships between traits over time and across environments, as a way to identify traits that could be useful to select for specific adaptation. Designed appropriately, these types of simulated landscapes might also serve as a basis to train other, more deep learning methodologies in order to transfer such network models to real-world situations.
Collapse
Affiliation(s)
| | - Marcos Malosetti
- Biometris, Wageningen University and Research Centre, Wageningen, Netherlands
| | - Karine Chenu
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Toowoomba, QLD, Australia
| | - Scott Chapman
- Agriculture and Food, CSIRO, Queensland Bioscience Precinct, St Lucia, QLD, Australia
- School of Agriculture and Food Sciences, The University of Queensland, Gatton, QLD, Australia
| | - Martin P. Boer
- Biometris, Wageningen University and Research Centre, Wageningen, Netherlands
| | - Bangyou Zheng
- Agriculture and Food, CSIRO, Queensland Bioscience Precinct, St Lucia, QLD, Australia
| | - Fred A. van Eeuwijk
- Biometris, Wageningen University and Research Centre, Wageningen, Netherlands
| |
Collapse
|
15
|
Lowe JWE, Bruce A. Genetics without genes? The centrality of genetic markers in livestock genetics and genomics. HISTORY AND PHILOSOPHY OF THE LIFE SCIENCES 2019; 41:50. [PMID: 31659490 DOI: 10.1007/s40656-019-0290-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Accepted: 10/18/2019] [Indexed: 05/23/2023]
Abstract
In this paper, rather than focusing on genes as an organising concept around which historical considerations of theory and practice in genetics are elucidated, we place genetic markers at the heart of our analysis. This reflects their central role in the subject of our account, livestock genetics concerning the domesticated pig, Sus scrofa. We define a genetic marker as a (usually material) element existing in different forms in the genome, that can be identified and mapped using a variety (and often combination) of quantitative, classical and molecular genetic techniques. The conjugation of pig genome researchers around the common object of the marker from the early-1990s allowed the distinctive theories and approaches of quantitative and molecular genetics concerning the size and distribution of gene effects to align (but never fully integrate) in projects to populate genome maps. Critical to this was the nature of markers as ontologically inert, internally heterogeneous and relational. Though genes as an organising and categorising principle remained important, the particular concatenation of limitations, opportunities, and intended research goals of the pig genetics community, meant that a progressively stronger focus on the identification and mapping of markers rather than genes per se became a hallmark of the community. We therefore detail a different way of doing genetics to more gene-centred accounts. By doing so, we reveal the presence of practices, concepts and communities that would otherwise be hidden.
Collapse
Affiliation(s)
- James W E Lowe
- Science, Technology and Innovation Studies, University of Edinburgh, Old Surgeons' Hall, High School Yards, Edinburgh, EH1 1LZ, UK.
| | - Ann Bruce
- Science, Technology and Innovation Studies, University of Edinburgh, Old Surgeons' Hall, High School Yards, Edinburgh, EH1 1LZ, UK
| |
Collapse
|
16
|
Song H, Ye S, Jiang Y, Zhang Z, Zhang Q, Ding X. Using imputation-based whole-genome sequencing data to improve the accuracy of genomic prediction for combined populations in pigs. Genet Sel Evol 2019; 51:58. [PMID: 31638889 PMCID: PMC6805481 DOI: 10.1186/s12711-019-0500-8] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Accepted: 10/07/2019] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND For genomic selection in populations with a small reference population, combining populations of the same breed or populations of related breeds is an effective way to increase the size of the reference population. However, genomic predictions based on single nucleotide polymorphism (SNP)-chip genotype data using combined populations with different genetic backgrounds or from different breeds have not shown a clear advantage over using within-population or within-breed predictions. The increasing availability of whole-genome sequencing (WGS) data provides new opportunities for combined population genomic prediction. Our objective was to investigate the accuracy of genomic prediction using imputation-based WGS data from combined populations in pigs. Using 80K SNP panel genotypes, WGS genotypes, or genotypes on WGS variants that were pruned based on linkage disequilibrium (LD), three methods [genomic best linear unbiased prediction (GBLUP), single-step (ss)GBLUP, and genomic feature (GF)BLUP] were implemented with different prior information to identify the best method to improve the accuracy of genomic prediction for combined populations in pigs. RESULTS In total, 2089 and 2043 individuals with production and reproduction phenotypes, respectively, from three Yorkshire populations with different genetic backgrounds were genotyped with the PorcineSNP80 panel. Imputation accuracy from 80K to WGS variants reached 92%. The results showed that use of the WGS data compared to the 80K SNP panel did not increase the accuracy of genomic prediction in a single population, but using WGS data with LD pruning and GFBLUP with prior information did yield higher accuracy than the 80K SNP panel. For the 80K SNP panel genotypes, using the combined population resulted in a slight improvement, no change, or even a slight decrease in accuracy in comparison with the single population for GBLUP and ssGBLUP, while accuracy increased by 1 to 2.4% when using WGS data. Notably, the GFBLUP method did not perform well for both the combined population and the single populations. CONCLUSIONS The use of WGS data was beneficial for combined population genomic prediction. Simply increasing the number of SNPs to the WGS level did not increase accuracy for a single population, while using pruned WGS data based on LD and GFBLUP with prior information could yield higher accuracy than the 80K SNP panel.
Collapse
Affiliation(s)
- Hailiang Song
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Shaopan Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Yifan Jiang
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Qin Zhang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, Shandong Agricultural University, Taian, China
| | - Xiangdong Ding
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| |
Collapse
|
17
|
Santos DJA, Cole JB, Lawlor TJ, VanRaden PM, Tonhati H, Ma L. Variance of gametic diversity and its application in selection programs. J Dairy Sci 2019; 102:5279-5294. [PMID: 30981488 DOI: 10.3168/jds.2018-15971] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 02/27/2019] [Indexed: 11/19/2022]
Abstract
The variance of gametic diversity ( σgamete2) can be used to find individuals that more likely produce progeny with extreme breeding values. The aim of this study was to obtain this variance for individuals from routine genomic evaluations, and to apply gametic variance in a selection criterion in conjunction with breeding values to improve genetic progress. An analytical approach was developed to estimate σgamete2 by the sum of binomial variances of all individual quantitative trait loci across the genome. Simulation was used to verify the predictability of this variance in a range of scenarios. The accuracy of prediction ranged from 0.49 to 0.85, depending on the scenario and model used. Compared with sequence data, SNP data are sufficient for estimating σgamete2 Results also suggested that markers with low minor allele frequency and the covariance between markers should be included in the estimation. To incorporate σgamete2 into selective breeding programs, we proposed a new index, relative predicted transmitting ability, which better utilizes the genetic potential of individuals than traditional predicted transmitting ability. Simulation with a small genome showed an additional genetic gain of up to 16% in 10 generations, depending on the number of quantitative trait loci and selection intensity. Finally, we applied σgamete2 to the US genomic evaluations for Holstein and Jersey cattle. As expected, the DGAT1 gene had a strong effect on the estimation of σgamete2 for several production traits. However, inbreeding had a small impact on gametic variability, with greater effect for more polygenic traits. In conclusion, gametic variance, a potentially important parameter for selection programs, can be easily computed and is useful for improving genetic progress and controlling genetic diversity.
Collapse
Affiliation(s)
- D J A Santos
- Department of Animal and Avian Sciences, University of Maryland, College Park 20742; Departamento de Zootecinia, Universidade Estadual Paulista, Jaboticabal, 14884-900, Brazil.
| | - J B Cole
- Henry A. Wallace Beltsville Agricultural Research Center, Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
| | - T J Lawlor
- Holstein Association USA, Brattleboro, VT 05302-0808
| | - P M VanRaden
- Henry A. Wallace Beltsville Agricultural Research Center, Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
| | - H Tonhati
- Departamento de Zootecinia, Universidade Estadual Paulista, Jaboticabal, 14884-900, Brazil
| | - L Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park 20742.
| |
Collapse
|
18
|
Korani W, Vaughn JN. Crossword: A data-driven simulation language for the design of genetic-mapping experiments and breeding strategies. Sci Rep 2019; 9:4386. [PMID: 30867436 PMCID: PMC6416259 DOI: 10.1038/s41598-018-38348-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 12/17/2018] [Indexed: 11/15/2022] Open
Abstract
Quantitative genetic simulations can save time and resources by optimizing the logistics of an experiment. Current tools are difficult to use by those unfamiliar with programming, and these tools rarely address the actual genetic structure of the population under study. Here, we introduce crossword, which utilizes the widely available re-sequencing and genomics data to create more realistic simulations and to reduce user burden. The software was written in R, to simplify installation and implementation. Because crossword is a domain-specific language, it allows complex and unique simulations to be performed, but the language is supported by a graphical interface that guides users through functions and options. We first show crossword’s utility in QTL-seq design, where its output accurately reflects empirical data. By introducing the concept of levels to reflect family relatedness, crossword can simulate a broad range of breeding programs and crops. Using levels, we further illustrate crossword’s capabilities by examining the effect of family size and number of selfing generations on phenotyping accuracy and genomic selection. Additionally, we explore the ramifications of large phenotypic difference between parents in a QTL mapping cross, a scenario that is common in crop genetics but often difficult to simulate.
Collapse
Affiliation(s)
- Walid Korani
- Center for Applied Genetic Technologies, The University of Georgia, Athens, GA, 30602, USA
| | - Justin N Vaughn
- United States Department of Agriculture, Athens, GA, 30602, USA.
| |
Collapse
|
19
|
pSBVB: A Versatile Simulation Tool To Evaluate Genomic Selection in Polyploid Species. G3-GENES GENOMES GENETICS 2019; 9:327-334. [PMID: 30573468 PMCID: PMC6385978 DOI: 10.1534/g3.118.200942] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Genomic Selection (GS) is the procedure whereby molecular information is used to predict complex phenotypes and it is standard in many animal and plant breeding schemes. However, only a small number of studies have been reported in horticultural crops, and in polyploid species in particular. In this paper, we have developed a versatile forward simulation tool, called polyploid Sequence Based Virtual Breeding (pSBVB), to evaluate GS strategies in polyploids; pSBVB is an efficient gene dropping software that can simulate any number of complex phenotypes, allowing a very flexible modeling of phenotypes suited to polyploids. As input, it takes genotype data from the founder population, which can vary from single nucleotide polymorphisms (SNP) chips up to sequence, a list of causal variants for every trait and their heritabilities, and the pedigree. Recombination rates between homeologous chromosomes can be specified, so that both allo- and autopolyploid species can be considered. The program outputs phenotype and genotype data for all individuals in the pedigree. Optionally, it can produce several genomic relationship matrices that consider exact or approximate genotype values. pSBVB can therefore be used to evaluate GS strategies in polyploid species (say varying SNP density, genetic architecture or population size, among other factors), or to optimize experimental designs for association studies. We illustrate pSBVB with SNP data from tetraploid potato and partial sequence data from octoploid strawberry, and we show that GS is a promising breeding strategy for polyploid species but that the actual advantage critically depends on the underlying genetic architecture. Source code, examples and a complete manual are freely available in GitHub https://github.com/lauzingaretti/pSBVB.
Collapse
|
20
|
A survey of functional genomic variation in domesticated chickens. Genet Sel Evol 2018; 50:17. [PMID: 29661130 PMCID: PMC5902831 DOI: 10.1186/s12711-018-0390-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Accepted: 04/04/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Deleterious genetic variation can increase in frequency as a result of mutations, genetic drift, and genetic hitchhiking. Although individual effects are often small, the cumulative effect of deleterious genetic variation can impact population fitness substantially. In this study, we examined the genome of commercial purebred chicken lines for deleterious and functional variations, combining genotype and whole-genome sequence data. RESULTS We analysed over 22,000 animals that were genotyped on a 60 K SNP chip from four purebred lines (two white egg and two brown egg layer lines) and two crossbred lines. We identified 79 haplotypes that showed a significant deficit in homozygous carriers. This deficit was assumed to stem from haplotypes that potentially harbour lethal recessive variations. To identify potentially deleterious mutations, a catalogue of over 10 million variants was derived from 250 whole-genome sequenced animals from three purebred white-egg layer lines. Out of 4219 putative deleterious variants, 152 mutations were identified that likely induce embryonic lethality in the homozygous state. Inferred deleterious variation showed evidence of purifying selection and deleterious alleles were generally overrepresented in regions of low recombination. Finally, we found evidence that mutations, which were inferred to be evolutionally intolerant, likely have positive effects in commercial chicken populations. CONCLUSIONS We present a comprehensive genomic perspective on deleterious and functional genetic variation in egg layer breeding lines, which are under intensive selection and characterized by a small effective population size. We show that deleterious variation is subject to purifying selection and that there is a positive relationship between recombination rate and purging efficiency. In addition, multiple putative functional coding variants were discovered in selective sweep regions, which are likely under positive selection. Together, this study provides a unique molecular perspective on functional and deleterious variation in commercial egg-laying chickens, which can enhance current genomic breeding practices to lower the frequency of undesirable variants in the population.
Collapse
|
21
|
Zhang C, Kemp RA, Stothard P, Wang Z, Boddicker N, Krivushin K, Dekkers J, Plastow G. Genomic evaluation of feed efficiency component traits in Duroc pigs using 80K, 650K and whole-genome sequence variants. Genet Sel Evol 2018; 50:14. [PMID: 29625549 PMCID: PMC5889553 DOI: 10.1186/s12711-018-0387-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2017] [Accepted: 03/27/2018] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Increasing marker density was proposed to have potential to improve the accuracy of genomic prediction for quantitative traits; whole-sequence data is expected to give the best accuracy of prediction, since all causal mutations that underlie a trait are expected to be included. However, in cattle and chicken, this assumption is not supported by empirical studies. Our objective was to compare the accuracy of genomic prediction of feed efficiency component traits in Duroc pigs using single nucleotide polymorphism (SNP) panels of 80K, imputed 650K, and whole-genome sequence variants using GBLUP, BayesB and BayesRC methods, with the ultimate purpose to determine the optimal method to increase genetic gain for feed efficiency in pigs. RESULTS Phenotypes of average daily feed intake (ADFI), average daily gain (ADG), ultrasound backfat depth (FAT), and loin muscle depth (LMD) were available for 1363 Duroc boars from a commercial breeding program. Genotype imputation accuracies reached 92.1% from 80K to 650K and 85.6% from 650K to whole-genome sequence variants. Average accuracies across methods and marker densities of genomic prediction of ADFI, FAT, LMD and ADG were 0.40, 0.65, 0.30 and 0.15, respectively. For ADFI and FAT, BayesB outperformed GBLUP, but increasing marker density had little advantage for genomic prediction. For ADG and LMD, GBLUP outperformed BayesB, while BayesRC based on whole-genome sequence data gave the best accuracies and reached up to 0.35 for LMD and 0.25 for ADG. CONCLUSIONS Use of genomic information was beneficial for prediction of ADFI and FAT but not for that of ADG and LMD compared to pedigree-based estimates. BayesB based on 80K SNPs gave the best genomic prediction accuracy for ADFI and FAT, while BayesRC based on whole-genome sequence data performed best for ADG and LMD. We suggest that these differences between traits in the effect of marker density and method on accuracy of genomic prediction are mainly due to the underlying genetic architecture of the traits.
Collapse
Affiliation(s)
- Chunyan Zhang
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | | | - Paul Stothard
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Zhiquan Wang
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | | | - Kirill Krivushin
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Jack Dekkers
- Department of Animal Science, Iowa State University, Ames, IA, 50011, USA
| | - Graham Plastow
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada.
| |
Collapse
|
22
|
Forneris NS, Vitezica ZG, Legarra A, Pérez-Enciso M. Influence of epistasis on response to genomic selection using complete sequence data. Genet Sel Evol 2017; 49:66. [PMID: 28841821 PMCID: PMC5574158 DOI: 10.1186/s12711-017-0340-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 08/15/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The effect of epistasis on response to selection is a highly debated topic. Here, we investigated the impact of epistasis on response to sequence-based selection via genomic best linear prediction (GBLUP) in a regime of strong non-symmetrical epistasis under divergent selection, using real Drosophila sequence data. We also explored the possible advantage of including epistasis in the evaluation model and/or of knowing the causal mutations. RESULTS Response to selection was almost exclusively due to changes in allele frequency at a few loci with a large effect. Response was highly asymmetric (about four phenotypic standard deviations higher for upward than downward selection) due to the highly skewed site frequency spectrum. Epistasis accentuated this asymmetry and affected response to selection by modulating the additive genetic variance, which was sustained for longer under upward selection whereas it eroded rapidly under downward selection. Response to selection was quite insensitive to the evaluation model, especially under an additive scenario. Nevertheless, including epistasis in the model when there was none eventually led to lower accuracies as selection proceeded. Accounting for epistasis in the model, if it existed, was beneficial but only in the medium term. There was not much gain in response if causal mutations were known, compared to using sequence data, which is likely due to strong linkage disequilibrium, high heritability and availability of phenotypes on candidates. CONCLUSIONS Epistatic interactions affect the response to genomic selection by modulating the additive genetic variance used for selection. Epistasis releases additive variance that may increase response to selection compared to a pure additive genetic action. Furthermore, genomic evaluation models and, in particular, GBLUP are robust, i.e. adding complexity to the model did not modify substantially the response (for a given architecture).
Collapse
Affiliation(s)
- Natalia S Forneris
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB Consortium, 08193, Bellaterra, Barcelona, Spain. .,Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, C1417DSE, Buenos Aires, Argentina.
| | - Zulma G Vitezica
- GenPhySE, INRA, INPT, ENVT, Université de Toulouse, 31326, Castanet-Tolosan, France
| | - Andres Legarra
- GenPhySE, INRA, INPT, ENVT, Université de Toulouse, 31326, Castanet-Tolosan, France
| | - Miguel Pérez-Enciso
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB Consortium, 08193, Bellaterra, Barcelona, Spain. .,Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain. .,ICREA, Passeig de Lluís Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
23
|
Fragomeni BO, Lourenco DAL, Masuda Y, Legarra A, Misztal I. Incorporation of causative quantitative trait nucleotides in single-step GBLUP. Genet Sel Evol 2017; 49:59. [PMID: 28747171 PMCID: PMC5530494 DOI: 10.1186/s12711-017-0335-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Accepted: 07/17/2017] [Indexed: 11/23/2022] Open
Abstract
Background Much effort is put into identifying causative quantitative trait nucleotides (QTN) in animal breeding, empowered by the availability of dense single nucleotide polymorphism (SNP) information. Genomic selection using traditional SNP information is easily implemented for any number of genotyped individuals using single-step genomic best linear unbiased predictor (ssGBLUP) with the algorithm for proven and young (APY). Our aim was to investigate whether ssGBLUP is useful for genomic prediction when some or all QTN are known. Methods Simulations included 180,000 animals across 11 generations. Phenotypes were available for all animals in generations 6 to 10. Genotypes for 60,000 SNPs across 10 chromosomes were available for 29,000 individuals. The genetic variance was fully accounted for by 100 or 1000 biallelic QTN. Raw genomic relationship matrices (GRM) were computed from (a) unweighted SNPs, (b) unweighted SNPs and causative QTN, (c) SNPs and causative QTN weighted with results obtained with genome-wide association studies, (d) unweighted SNPs and causative QTN with simulated weights, (e) only unweighted causative QTN, (f–h) as in (b–d) but using only the top 10% causative QTN, and (i) using only causative QTN with simulated weight. Predictions were computed by pedigree-based BLUP (PBLUP) and ssGBLUP. Raw GRM were blended with 1 or 5% of the numerator relationship matrix, or 1% of the identity matrix. Inverses of GRM were obtained directly or with APY. Results Accuracy of breeding values for 5000 genotyped animals in the last generation with PBLUP was 0.32, and for ssGBLUP it increased to 0.49 with an unweighted GRM, 0.53 after adding unweighted QTN, 0.63 when QTN weights were estimated, and 0.89 when QTN weights were based on true effects known from the simulation. When the GRM was constructed from causative QTN only, accuracy was 0.95 and 0.99 with blending at 5 and 1%, respectively. Accuracies simulating 1000 QTN were generally lower, with a similar trend. Accuracies using the APY inverse were equal or higher than those with a regular inverse. Conclusions Single-step GBLUP can account for causative QTN via a weighted GRM. Accuracy gains are maximum when variances of causative QTN are known and blending is at 1%.
Collapse
Affiliation(s)
- Breno O Fragomeni
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA.
| | - Daniela A L Lourenco
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Yutaka Masuda
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA
| | - Andres Legarra
- GenPhySE, INRA, INPT, INP-ENVT, Université de Toulouse, 31326, Castanet-Tolosan, France
| | - Ignacy Misztal
- Edgar L. Rhodes Center for Animal and Dairy Science, University of Georgia, Athens, GA, USA
| |
Collapse
|