1
|
Atanda SA, Bandillo N. Genomic-inferred cross-selection methods for multi-trait improvement in a recurrent selection breeding program. PLANT METHODS 2024; 20:133. [PMID: 39218896 PMCID: PMC11367796 DOI: 10.1186/s13007-024-01258-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 08/05/2024] [Indexed: 09/04/2024]
Abstract
The major drawback to the implementation of genomic selection in a breeding program lies in long-term decrease in additive genetic variance, which is a trade-off for rapid genetic improvement in short term. Balancing increase in genetic gain with retention of additive genetic variance necessitates careful optimization of this trade-off. In this study, we proposed an integrated index selection approach within the genomic inferred cross-selection (GCS) framework to maximize genetic gain across multiple traits. With this method, we identified optimal crosses that simultaneously maximize progeny performance and maintain genetic variance for multiple traits. Using a stochastic simulated recurrent breeding program over a 40-years period, we evaluated different GCS methods along with other factors, such as the number of parents, crosses, and progeny per cross, that influence genetic gain in a pulse crop breeding program. Across all breeding scenarios, the posterior mean variance consistently enhances genetic gain when compared to other methods, such as the usefulness criterion, optimal haploid value, mean genomic estimated breeding value, and mean index selection value of the superior parents. In addition, we provide a detailed strategy to optimize the number of parents, crosses, and progeny per cross that can potentially maximize short- and long-term genetic gain in a public breeding program.
Collapse
Affiliation(s)
- Sikiru Adeniyi Atanda
- Agricultural Data Analytics Unit, North Dakota State University, Fargo, ND, 58105-6050, USA.
| | - Nonoy Bandillo
- Department of Plant Sciences, North Dakota State University, Fargo, ND, 58108-6050, USA.
| |
Collapse
|
2
|
Shang J, Xu A, Bi M, Zhang Y, Li F, Liu JX. A review: simulation tools for genome-wide interaction studies. Brief Funct Genomics 2024:elae034. [PMID: 39173096 DOI: 10.1093/bfgp/elae034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 07/25/2024] [Accepted: 08/10/2024] [Indexed: 08/24/2024] Open
Abstract
Genome-wide association study (GWAS) is essential for investigating the genetic basis of complex diseases; nevertheless, it usually ignores the interaction of multiple single nucleotide polymorphisms (SNPs). Genome-wide interaction studies provide crucial means for exploring complex genetic interactions that GWAS may miss. Although many interaction methods have been proposed, challenges still persist, including the lack of epistasis models and the inconsistency of benchmark datasets. SNP data simulation is a pivotal intermediary between interaction methods and real applications. Therefore, it is important to obtain epistasis models and benchmark datasets by simulation tools, which is helpful for further improving interaction methods. At present, many simulation tools have been widely employed in the field of population genetics. According to their basic principles, these existing tools can be divided into four categories: coalescent simulation, forward-time simulation, resampling simulation, and other simulation frameworks. In this paper, their basic principles and representative simulation tools are compared and analyzed in detail. Additionally, this paper provides a discussion and summary of the advantages and disadvantages of these frameworks and tools, offering technical insights for the design of new methods, and serving as valuable reference tools for researchers to comprehensively understand GWAS and genome-wide interaction studies.
Collapse
Affiliation(s)
- Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Anqi Xu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Mingyuan Bi
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266033, China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jin-Xing Liu
- School of Health and Life Sciences, University of Health and Rehabilitation Sciences, Qingdao 266114, China
| |
Collapse
|
3
|
Peixoto MA, Coelho IF, Leach KA, Lübberstedt T, Bhering LL, Resende MFR. Use of simulation to optimize a sweet corn breeding program: implementing genomic selection and doubled haploid technology. G3 (BETHESDA, MD.) 2024; 14:jkae128. [PMID: 38869242 PMCID: PMC11304600 DOI: 10.1093/g3journal/jkae128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 04/06/2024] [Accepted: 05/21/2024] [Indexed: 06/14/2024]
Abstract
Genomic selection and doubled haploids hold significant potential to enhance genetic gains and shorten breeding cycles across various crops. Here, we utilized stochastic simulations to investigate the best strategies for optimize a sweet corn breeding program. We assessed the effects of incorporating varying proportions of old and new parents into the crossing block (3:1, 1:1, 1:3, and 0:1 ratio, representing different degrees of parental substitution), as well as the implementation of genomic selection in two distinct pipelines: one calibrated using the phenotypes of testcross parents (GSTC scenario) and another using F1 individuals (GSF1). Additionally, we examined scenarios with doubled haploids, both with (DH) and without (DHGS) genomic selection. Across 20 years of simulated breeding, we evaluated scenarios considering traits with varying heritabilities, the presence or absence of genotype-by-environment effects, and two program sizes (50 vs 200 crosses per generation). We also assessed parameters such as parental genetic mean, average genetic variance, hybrid mean, and implementation costs for each scenario. Results indicated that within a conventional selection program, a 1:3 parental substitution ratio (replacing 75% of parents each generation with new lines) yielded the highest performance. Furthermore, the GSTC model outperformed the GSF1 model in enhancing genetic gain. The DHGS model emerged as the most effective, reducing cycle time from 5 to 4 years and enhancing hybrid gains despite increased costs. In conclusion, our findings strongly advocate for the integration of genomic selection and doubled haploids into sweet corn breeding programs, offering accelerated genetic gains and efficiency improvements.
Collapse
Affiliation(s)
- Marco Antônio Peixoto
- Laboratório de Biometria, Universidade Federal de Viçosa, Viçosa, Minas Gerais 36570-900, Brazil
- Sweet Corn Breeding and Genomics Lab, University of Florida, Gainesville, FL 32611, USA
| | - Igor Ferreira Coelho
- Laboratório de Biometria, Universidade Federal de Viçosa, Viçosa, Minas Gerais 36570-900, Brazil
- Sweet Corn Breeding and Genomics Lab, University of Florida, Gainesville, FL 32611, USA
| | - Kristen A Leach
- Sweet Corn Breeding and Genomics Lab, University of Florida, Gainesville, FL 32611, USA
| | | | - Leonardo Lopes Bhering
- Laboratório de Biometria, Universidade Federal de Viçosa, Viçosa, Minas Gerais 36570-900, Brazil
| | - Márcio F R Resende
- Sweet Corn Breeding and Genomics Lab, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
4
|
Pocrnic I, Lourenco D, Misztal I. Single nucleotide polymorphism profile for quantitative trait nucleotide in populations with small effective size and its impact on mapping and genomic predictions. Genetics 2024; 227:iyae103. [PMID: 38913695 PMCID: PMC11304960 DOI: 10.1093/genetics/iyae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 06/07/2024] [Accepted: 06/16/2024] [Indexed: 06/26/2024] Open
Abstract
Increasing SNP density by incorporating sequence information only marginally increases prediction accuracies of breeding values in livestock. To find out why, we used statistical models and simulations to investigate the shape of distribution of estimated SNP effects (a profile) around quantitative trait nucleotides (QTNs) in populations with a small effective population size (Ne). A QTN profile created by averaging SNP effects around each QTN was similar to the shape of expected pairwise linkage disequilibrium (PLD) based on Ne and genetic distance between SNP, with a distinct peak for the QTN. Populations with smaller Ne showed lower but wider QTN profiles. However, adding more genotyped individuals with phenotypes dragged the profile closer to the QTN. The QTN profile was higher and narrower for populations with larger compared to smaller Ne. Assuming the PLD curve for the QTN profile, 80% of the additive genetic variance explained by each QTN was contained in ± 1/Ne Morgan interval around the QTN, corresponding to 2 Mb in cattle and 5 Mb in pigs and chickens. With such large intervals, identifying QTN is difficult even if all of them are in the data and the assumed genetic architecture is simplistic. Additional complexity in QTN detection arises from confounding of QTN profiles with signals due to relationships, overlapping profiles with closely spaced QTN, and spurious signals. However, small Ne allows for accurate predictions with large data even without QTN identification because QTNs are accounted for by QTN profiles if SNP density is sufficient to saturate the segments.
Collapse
Affiliation(s)
- Ivan Pocrnic
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
5
|
Dubey R, Zustovi R, Landschoot S, Dewitte K, Verlinden G, Haesaert G, Maenhout S. Harnessing monocrop breeding strategies for intercrops. FRONTIERS IN PLANT SCIENCE 2024; 15:1394413. [PMID: 38799097 PMCID: PMC11119317 DOI: 10.3389/fpls.2024.1394413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 04/22/2024] [Indexed: 05/29/2024]
Abstract
Intercropping is considered advantageous for many reasons, including increased yield stability, nutritional value and the provision of various regulating ecosystem services. However, intercropping also introduces diverse competition effects between the mixing partners, which can negatively impact their agronomic performance. Therefore, selecting complementary intercropping partners is the key to realizing a well-mixed crop production. Several specialized intercrop breeding concepts have been proposed to support the development of complementary varieties, but their practical implementation still needs to be improved. To lower this adoption threshold, we explore the potential of introducing minor adaptations to commonly used monocrop breeding strategies as an initial stepping stone towards implementing dedicated intercrop breeding schemes. While we acknowledge that recurrent selection for reciprocal mixing abilities is likely a more effective breeding paradigm to obtain genetic progress for intercrops, a well-considered adaptation of monoculture breeding strategies is far less intrusive concerning the design of the breeding programme and allows for balancing genetic gain for both monocrop and intercrop performance. The main idea is to develop compatible variety combinations by improving the monocrop performance in the two breeding pools in parallel and testing for intercrop performance in the later stages of selection. We show that the optimal stage for switching from monocrop to intercrop testing should be adapted to the specificity of the crop and the heritability of the traits involved. However, the genetic correlation between the monocrop and intercrop trait performance is the primary driver of the intercrop breeding scheme optimization process.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Steven Maenhout
- Department of Plants and Crops, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| |
Collapse
|
6
|
Legarra A, Bermann M, Mei Q, Christensen OF. Estimating genomic relationships of metafounders across and within breeds using maximum likelihood, pseudo-expectation-maximization maximum likelihood and increase of relationships. Genet Sel Evol 2024; 56:35. [PMID: 38698347 DOI: 10.1186/s12711-024-00892-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 03/18/2024] [Indexed: 05/05/2024] Open
Abstract
BACKGROUND The theory of "metafounders" proposes a unified framework for relationships across base populations within breeds (e.g. unknown parent groups), and base populations across breeds (crosses) together with a sensible compatibility with genomic relationships. Considering metafounders might be advantageous in pedigree best linear unbiased prediction (BLUP) or single-step genomic BLUP. Existing methods to estimate relationships across metafounders Γ are not well adapted to highly unbalanced data, genotyped individuals far from base populations, or many unknown parent groups (within breed per year of birth). METHODS We derive likelihood methods to estimate Γ . For a single metafounder, summary statistics of pedigree and genomic relationships allow deriving a cubic equation with the real root being the maximum likelihood (ML) estimate of Γ . This equation is tested with Lacaune sheep data. For several metafounders, we split the first derivative of the complete likelihood in a term related to Γ , and a second term related to Mendelian sampling variances. Approximating the first derivative by its first term results in a pseudo-EM algorithm that iteratively updates the estimate of Γ by the corresponding block of the H-matrix. The method extends to complex situations with groups defined by year of birth, modelling the increase of Γ using estimates of the rate of increase of inbreeding ( Δ F ), resulting in an expanded Γ and in a pseudo-EM+ Δ F algorithm. We compare these methods with the generalized least squares (GLS) method using simulated data: complex crosses of two breeds in equal or unsymmetrical proportions; and in two breeds, with 10 groups per year of birth within breed. We simulate genotyping in all generations or in the last ones. RESULTS For a single metafounder, the ML estimates of the Lacaune data corresponded to the maximum. For simulated data, when genotypes were spread across all generations, both GLS and pseudo-EM(+ Δ F ) methods were accurate. With genotypes only available in the most recent generations, the GLS method was biased, whereas the pseudo-EM(+ Δ F ) approach yielded more accurate and unbiased estimates. CONCLUSIONS We derived ML, pseudo-EM and pseudo-EM+ Δ F methods to estimate Γ in many realistic settings. Estimates are accurate in real and simulated data and have a low computational cost.
Collapse
Affiliation(s)
| | - Matias Bermann
- Animal and Dairy Science, University of Georgia, 425 River Rd, Athens, GA, 30602, USA
| | - Quanshun Mei
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, 02118, USA
| | - Ole F Christensen
- Center for Quantitative Genetics and Genomics, Aarhus University, C. F. Møllers Allé 3, bld. 1130, 8000, Aarhus C, Denmark
| |
Collapse
|
7
|
Azevedo CF, Ferrão LFV, Benevenuto J, de Resende MDV, Nascimento M, Nascimento ACC, Munoz PR. Using visual scores for genomic prediction of complex traits in breeding programs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 137:9. [PMID: 38102495 DOI: 10.1007/s00122-023-04512-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 11/21/2023] [Indexed: 12/17/2023]
Abstract
KEY MESSAGE An approach for handling visual scores with potential errors and subjectivity in scores was evaluated in simulated and blueberry recurrent selection breeding schemes to assist breeders in their decision-making. Most genomic prediction methods are based on assumptions of normality due to their simplicity and ease of implementation. However, in plant and animal breeding, continuous traits are often visually scored as categorical traits and analyzed as a Gaussian variable, thus violating the normality assumption, which could affect the prediction of breeding values and the estimation of genetic parameters. In this study, we examined the main challenges of visual scores for genomic prediction and genetic parameter estimation using mixed models, Bayesian, and machine learning methods. We evaluated these approaches using simulated and real breeding data sets. Our contribution in this study is a five-fold demonstration: (i) collecting data using an intermediate number of categories (1-3 and 1-5) is the best strategy, even considering errors associated with visual scores; (ii) Linear Mixed Models and Bayesian Linear Regression are robust to the normality violation, but marginal gains can be achieved when using Bayesian Ordinal Regression Models (BORM) and Random Forest Classification; (iii) genetic parameters are better estimated using BORM; (iv) our conclusions using simulated data are also applicable to real data in autotetraploid blueberry; and (v) a comparison of continuous and categorical phenotypes found that investing in the evaluation of 600-1000 categorical data points with low error, when it is not feasible to collect continuous phenotypes, is a strategy for improving predictive abilities. Our findings suggest the best approaches for effectively using visual scores traits to explore genetic information in breeding programs and highlight the importance of investing in the training of evaluator teams and in high-quality phenotyping.
Collapse
Affiliation(s)
- Camila Ferreira Azevedo
- Statistics Department, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
- Horticultural Sciences Department, Blueberry Breeding and Genomics Lab, University of Florida, Gainesville, FL, USA
| | - Luis Felipe Ventorim Ferrão
- Horticultural Sciences Department, Blueberry Breeding and Genomics Lab, University of Florida, Gainesville, FL, USA
| | - Juliana Benevenuto
- Horticultural Sciences Department, Blueberry Breeding and Genomics Lab, University of Florida, Gainesville, FL, USA
| | - Marcos Deon Vilela de Resende
- Statistics Department, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
- Department of Forestry Engineering, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
- Embrapa Café, Brasília, Distrito Federal, Brazil
| | - Moyses Nascimento
- Statistics Department, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | | | - Patricio R Munoz
- Horticultural Sciences Department, Blueberry Breeding and Genomics Lab, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
8
|
Fritsche-Neto R, Ali J, De Asis EJ, Allahgholipour M, Labroo MR. Improving hybrid rice breeding programs via stochastic simulations: number of parents, number of hybrids, tester update, and genomic prediction of hybrid performance. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 137:3. [PMID: 38085288 PMCID: PMC10716074 DOI: 10.1007/s00122-023-04508-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 11/18/2023] [Indexed: 12/18/2023]
Abstract
KEY MESSAGE Schemes that use genomic prediction outperform others, updating testers increases hybrid genetic gain, and larger population sizes tend to have higher genetic gain and less depletion of genetic variance One of the most common methods to improve hybrid performance is reciprocal recurrent selection (RRS). Genomic prediction (GP) can be used to increase genetic gain in RRS by reducing cycle length, but it is also possible to use GP to predict single-cross hybrid performance. The impact of the latter method on genetic gain has yet to be previously reported. Therefore, we compared via stochastic simulations various phenotypic and genomics-assisted RRS breeding schemes which used GP to predict hybrid performance rather than reducing cycle length, which allows minimal changes to traditional breeding schemes. We also compared three breeding sizes scenarios that varied the number of genotypes crossed within heterotic pools, the number of genotypes crossed between heterotic pools, the number of hybrids evaluated, and the number of genomic predicted hybrids. Our results demonstrated that schemes that used genomic prediction of hybrid performance outperformed the others for the average interpopulation hybrid population and the best hybrid performance. Furthermore, updating the testers increased hybrid genetic gain with phenotypic RRS. As expected, the largest breeding size tested had the highest rates of genetic improvement and the lowest decrease in additive genetic variance due to the drift. Therefore, this study demonstrates the usefulness of single-cross prediction, which may be easier to implement than rapid-cycling RRS and cyclical updating of testers. We also reiterate that larger population sizes tend to have higher genetic gain and less depletion of genetic variance.
Collapse
Affiliation(s)
- Roberto Fritsche-Neto
- International Rice Research Institute (IRRI), Los Banos, Philippines.
- H. Rouse Caffey Rice Research Station, LSU AgCenter, Rayne, USA.
| | - Jauhar Ali
- International Rice Research Institute (IRRI), Los Banos, Philippines.
| | - Erik Jon De Asis
- International Rice Research Institute (IRRI), Los Banos, Philippines
| | | | - Marlee Rose Labroo
- Excellence in Breeding Platform, Consultative Group of International Agricultural Research, Lisbon, Mexico
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| |
Collapse
|
9
|
Ayala NM, Genetti M, Corbett-Detig R. Inferring multi-locus selection in admixed populations. PLoS Genet 2023; 19:e1011062. [PMID: 38015992 PMCID: PMC10707604 DOI: 10.1371/journal.pgen.1011062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 12/08/2023] [Accepted: 11/13/2023] [Indexed: 11/30/2023] Open
Abstract
Admixture, the exchange of genetic information between distinct source populations, is thought to be a major source of adaptive genetic variation. Unlike mutation events, which periodically generate single alleles, admixture can introduce many selected alleles simultaneously. As such, the effects of linkage between selected alleles may be especially pronounced in admixed populations. However, existing tools for identifying selected mutations within admixed populations only account for selection at a single site, overlooking phenomena such as linkage among proximal selected alleles. Here, we develop and extensively validate a method for identifying and quantifying the individual effects of multiple linked selected sites on a chromosome in admixed populations. Our approach numerically calculates the expected local ancestry landscape in an admixed population for a given multi-locus selection model, and then maximizes the likelihood of the model. After applying this method to admixed populations of Drosophila melanogaster and Passer italiae, we found that the impacts between linked sites may be an important contributor to natural selection in admixed populations. Furthermore, for the situations we considered, the selection coefficients and number of selected sites are overestimated in analyses that do not consider the effects of linkage among selected sites. Our results imply that linkage among selected sites may be an important evolutionary force in admixed populations. This tool provides a powerful generalized method to investigate these crucial phenomena in diverse populations.
Collapse
Affiliation(s)
- Nicolas M. Ayala
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, California, United States of America
- Department of Biomolecular Engineering, University of California, Santa Cruz; Santa Cruz, California, United States of America
| | - Maximilian Genetti
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, California, United States of America
- Department of Biomolecular Engineering, University of California, Santa Cruz; Santa Cruz, California, United States of America
| | - Russell Corbett-Detig
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, California, United States of America
- Department of Biomolecular Engineering, University of California, Santa Cruz; Santa Cruz, California, United States of America
| |
Collapse
|
10
|
Aoki S, Ishihama F, Fukasawa K. Robustness of genetic diversity measures under spatial sampling and a new frequency-independent measure. PeerJ 2023; 11:e16027. [PMID: 37744217 PMCID: PMC10512937 DOI: 10.7717/peerj.16027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 08/13/2023] [Indexed: 09/26/2023] Open
Abstract
The genetic diversity of a taxon has often been estimated by genetic diversity measures. However, they assume random sampling of individuals which is often inapplicable. Except when the distribution of the taxon is limited, researchers conventionally choose several sampling locations from the known distribution and then collect individuals from each location. Spatial sampling is a formalized version of the conventional sampling, which objectively provides geographically even sampling locations to cover genetic variation in a taxon assuming isolation by distance. To evaluate the validity of the spatial sampling in estimating genetic diversity, we conducted coalescent simulation experiments. The sampling locations were selected by spatial sampling and one sample was collected from each location for the sake of theoretical simplicity. We also devised a new measure of genetic diversity, ς, which assumes spatial sampling and is independent of allele frequency. This new measure places an emphasis on rare and phylogenetically distant alleles which have relatively small effect on nucleotide diversity. Therefore, it can complementarily serve for conservation studies although it cannot be used to estimate population mutation rate. We compared ς with the other diversity measures in the experiments. Nucleotide diversity, expected heterozygosity and ς showed within 3% relative biases on average while Watterson's theta was 31% overestimation on average. Thus, genetic diversities other than Watterson's theta held good robustness under the spatial sampling.
Collapse
Affiliation(s)
- Satoshi Aoki
- Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Ibaraki, Japan
| | - Fumiko Ishihama
- Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Ibaraki, Japan
| | - Keita Fukasawa
- Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Ibaraki, Japan
| |
Collapse
|
11
|
Grohmann CJ, Shull CM, Crum TE, Schwab C, Safranski TJ, Decker JE. Analysis of polygenic selection in purebred and crossbred pig genomes using generation proxy selection mapping. Genet Sel Evol 2023; 55:62. [PMID: 37710159 PMCID: PMC10500877 DOI: 10.1186/s12711-023-00836-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 08/25/2023] [Indexed: 09/16/2023] Open
Abstract
BACKGROUND Artificial selection on quantitative traits using breeding values and selection indices in commercial livestock breeding populations causes changes in allele frequency over time at hundreds or thousands of causal loci and the surrounding genomic regions. In population genetics, this type of selection is called polygenic selection. Researchers and managers of pig breeding programs are motivated to understand the genetic basis of phenotypic diversity across genetic lines, breeds, and populations using selection mapping analyses. Here, we applied generation proxy selection mapping (GPSM), a genome-wide association analysis of single nucleotide polymorphism (SNP) genotypes (38,294-46,458 markers) of birth date, in four pig populations (15,457, 15,772, 16,595 and 8447 pigs per population) to identify loci responding to artificial selection over a period of five to ten years. Gene-drop simulation analyses were conducted to provide context for the GPSM results. Selected loci within and across each population of pigs were compared in the context of swine breeding objectives. RESULTS The GPSM identified 49 to 854 loci as under selection (Q-values less than 0.10) across 15 subsets of pigs based on combinations of populations. The number of significant associations increased when data were pooled across populations. In addition, several significant associations were identified in more than one population. These results indicate concurrent selection objectives, similar genetic architectures, and shared causal variants responding to selection across these pig populations. Negligible error rates (less than or equal to 0.02%) of false-positive associations were found when testing GPSM on gene-drop simulated genotypes, suggesting that GPSM distinguishes selection from random genetic drift in actual pig populations. CONCLUSIONS This work confirms the efficacy and the negligible error rates of the GPSM method in detecting selected loci in commercial pig populations. Our results suggest shared selection objectives and genetic architectures across swine populations. The identified polygenic selection highlights loci that are important to swine production.
Collapse
|
12
|
Bonizzoni P, Boucher C, Cozzi D, Gagie T, Köppl D, Rossi M. Data Structures for SMEM-Finding in the PBWT. INTERNATIONAL SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL : SPIRE ... : PROCEEDINGS. SPIRE (SYMPOSIUM) 2023; 14240:89-101. [PMID: 39149146 PMCID: PMC11325217 DOI: 10.1007/978-3-031-43980-3_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
The positional Burrows-Wheeler Transform (PBWT) was presented as a means to find set-maximal exact matches (SMEMs) in haplotype data via the computation of the divergence array. Although run-length encoding the PBWT has been previously considered, storing the divergence array along with the PBWT in a compressed manner has not been as rigorously studied. We define two queries that can be used in combination to compute SMEMs, allowing us to define smaller data structures that support one or both of these queries. We combine these data structures, enabling the PBWT and the divergence array to be stored in a manner that allows for finding SMEMs. We estimate and compare the memory usage of these data structures, leading to one data structure that is most memory efficient. Lastly, we implement this data structure and compare its performance to prior methods using various datasets taken from the 1000 Genomes Project data.
Collapse
|
13
|
Hu W, Hao Z, Du P, Di Vincenzo F, Manzi G, Cui J, Fu YX, Pan YH, Li H. Genomic inference of a severe human bottleneck during the Early to Middle Pleistocene transition. Science 2023; 381:979-984. [PMID: 37651513 DOI: 10.1126/science.abq7487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 07/11/2023] [Indexed: 09/02/2023]
Abstract
Population size history is essential for studying human evolution. However, ancient population size history during the Pleistocene is notoriously difficult to unravel. In this study, we developed a fast infinitesimal time coalescent process (FitCoal) to circumvent this difficulty and calculated the composite likelihood for present-day human genomic sequences of 3154 individuals. Results showed that human ancestors went through a severe population bottleneck with about 1280 breeding individuals between around 930,000 and 813,000 years ago. The bottleneck lasted for about 117,000 years and brought human ancestors close to extinction. This bottleneck is congruent with a substantial chronological gap in the available African and Eurasian fossil record. Our results provide new insights into our ancestry and suggest a coincident speciation event.
Collapse
Affiliation(s)
- Wangjie Hu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- Key Laboratory of Brain Functional Genomics of Ministry of Education, School of Life Science, East China Normal University, Shanghai, China
| | - Ziqian Hao
- College of Artificial Intelligence and Big Data for Medical Sciences, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China
| | - Pengyuan Du
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- College of Artificial Intelligence and Big Data for Medical Sciences, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China
| | | | - Giorgio Manzi
- Department of Environmental Biology, Sapienza University of Rome, Rome, Italy
| | - Jialong Cui
- Key Laboratory of Brain Functional Genomics of Ministry of Education, School of Life Science, East China Normal University, Shanghai, China
| | - Yun-Xin Fu
- Department of Biostatistics and Data Science, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
- Key Laboratory for Conservation and Utilization of Bioresources, Yunnan University, Kunming, China
| | - Yi-Hsuan Pan
- Key Laboratory of Brain Functional Genomics of Ministry of Education, School of Life Science, East China Normal University, Shanghai, China
| | - Haipeng Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
14
|
Labroo MR, Endelman JB, Gemenet DC, Werner CR, Gaynor RC, Covarrubias-Pazaran GE. Clonal diploid and autopolyploid breeding strategies to harness heterosis: insights from stochastic simulation. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:147. [PMID: 37291402 DOI: 10.1007/s00122-023-04377-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 05/05/2023] [Indexed: 06/10/2023]
Abstract
KEY MESSAGE Reciprocal recurrent selection sometimes increases genetic gain per unit cost in clonal diploids with heterosis due to dominance, but it typically does not benefit autopolyploids. Breeding can change the dominance as well as additive genetic value of populations, thus utilizing heterosis. A common hybrid breeding strategy is reciprocal recurrent selection (RRS), in which parents of hybrids are typically recycled within pools based on general combining ability. However, the relative performances of RRS and other breeding strategies have not been thoroughly compared. RRS can have relatively increased costs and longer cycle lengths, but these are sometimes outweighed by its ability to harness heterosis due to dominance. Here, we used stochastic simulation to compare genetic gain per unit cost of RRS, terminal crossing, recurrent selection on breeding value, and recurrent selection on cross performance considering different amounts of population heterosis due to dominance, relative cycle lengths, time horizons, estimation methods, selection intensities, and ploidy levels. In diploids with phenotypic selection at high intensity, whether RRS was the optimal breeding strategy depended on the initial population heterosis. However, in diploids with rapid-cycling genomic selection at high intensity, RRS was the optimal breeding strategy after 50 years over almost all amounts of initial population heterosis under the study assumptions. Diploid RRS required more population heterosis to outperform other strategies as its relative cycle length increased and as selection intensity and time horizon decreased. The optimal strategy depended on selection intensity, a proxy for inbreeding rate. Use of diploid fully inbred parents vs. outbred parents with RRS typically did not affect genetic gain. In autopolyploids, RRS typically did not outperform one-pool strategies regardless of the initial population heterosis.
Collapse
Affiliation(s)
- Marlee R Labroo
- Excellence in Breeding Platform, Consultative Group of International Agricultural Research, Texcoco, Mexico
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Jeffrey B Endelman
- Department of Horticulture, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Dorcus C Gemenet
- Excellence in Breeding Platform, Consultative Group of International Agricultural Research, Texcoco, Mexico
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Christian R Werner
- Excellence in Breeding Platform, Consultative Group of International Agricultural Research, Texcoco, Mexico
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - Giovanny E Covarrubias-Pazaran
- Excellence in Breeding Platform, Consultative Group of International Agricultural Research, Texcoco, Mexico.
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico.
| |
Collapse
|
15
|
Pocrnic I, Obšteter J, Gaynor RC, Wolc A, Gorjanc G. Assessment of long-term trends in genetic mean and variance after the introduction of genomic selection in layers: a simulation study. Front Genet 2023; 14:1168212. [PMID: 37234871 PMCID: PMC10206274 DOI: 10.3389/fgene.2023.1168212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 05/02/2023] [Indexed: 05/28/2023] Open
Abstract
Nucleus-based breeding programs are characterized by intense selection that results in high genetic gain, which inevitably means reduction of genetic variation in the breeding population. Therefore, genetic variation in such breeding systems is typically managed systematically, for example, by avoiding mating the closest relatives to limit progeny inbreeding. However, intense selection requires maximum effort to make such breeding programs sustainable in the long-term. The objective of this study was to use simulation to evaluate the long-term impact of genomic selection on genetic mean and variance in an intense layer chicken breeding program. We developed a large-scale stochastic simulation of an intense layer chicken breeding program to compare conventional truncation selection to genomic truncation selection optimized with either minimization of progeny inbreeding or full-scale optimal contribution selection. We compared the programs in terms of genetic mean, genic variance, conversion efficiency, rate of inbreeding, effective population size, and accuracy of selection. Our results confirmed that genomic truncation selection has immediate benefits compared to conventional truncation selection in all specified metrics. A simple minimization of progeny inbreeding after genomic truncation selection did not provide any significant improvements. Optimal contribution selection was successful in having better conversion efficiency and effective population size compared to genomic truncation selection, but it must be fine-tuned for balance between loss of genetic variance and genetic gain. In our simulation, we measured this balance using trigonometric penalty degrees between truncation selection and a balanced solution and concluded that the best results were between 45° and 65°. This balance is specific to the breeding program and depends on how much immediate genetic gain a breeding program may risk vs. save for the future. Furthermore, our results show that the persistence of accuracy is better with optimal contribution selection compared to truncation selection. In general, our results show that optimal contribution selection can ensure long-term success in intensive breeding programs using genomic selection.
Collapse
Affiliation(s)
- Ivan Pocrnic
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, United Kingdom
| | - Jana Obšteter
- Agricultural Institute of Slovenia, Ljubljana, Slovenia
| | - R. Chris Gaynor
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, United Kingdom
| | - Anna Wolc
- Department of Animal Science, Iowa State University, Ames, IA, United States
- Hy-Line International, Dallas Center, IA, United States
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
16
|
Obšteter J, Strachan LK, Bubnič J, Prešern J, Gorjanc G. SIMplyBee: an R package to simulate honeybee populations and breeding programs. Genet Sel Evol 2023; 55:31. [PMID: 37161307 PMCID: PMC10169377 DOI: 10.1186/s12711-023-00798-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 03/31/2023] [Indexed: 05/11/2023] Open
Abstract
BACKGROUND The Western honeybee is an economically important species globally, but has been experiencing colony losses that lead to economical damage and decreased genetic variability. This situation is spurring additional interest in honeybee breeding and conservation programs. Stochastic simulators are essential tools for rapid and low-cost testing of breeding programs and methods, yet no existing simulator allows for a detailed simulation of honeybee populations. Here we describe SIMplyBee, a holistic simulator of honeybee populations and breeding programs. SIMplyBee is an R package and hence freely available for installation from CRAN http://cran.r-project.org/package=SIMplyBee . IMPLEMENTATION SIMplyBee builds upon the stochastic simulator AlphaSimR that simulates individuals with their corresponding genomes and quantitative genetic values. To enable honeybee-specific simulations, we extended AlphaSimR by developing classes for global simulation parameters, SimParamBee, for a honeybee colony, Colony, and multiple colonies, MultiColony. We also developed functions to address major honeybee specificities: honeybee genome, haplodiploid inheritance, social organisation, complementary sex determination, polyandry, colony events, and quantitative genetics at the individual- and colony-levels. RESULTS We describe its implementation for simulating a honeybee genome, creating a honeybee colony and its members, addressing haplodiploid inheritance and complementary sex determination, simulating colony events, creating and managing multiple colonies at the same time, and obtaining genomic data and honeybee quantitative genetics. Further documentation, available at http://www.SIMplyBee.info , provides details on these operations and describes additional operations related to genomics, quantitative genetics, and other functionalities. DISCUSSION SIMplyBee is a holistic simulator of honeybee populations and breeding programs. It simulates individual honeybees with their genomes, colonies with colony events, and individual- and colony-level genetic and breeding values. Regarding the latter, SIMplyBee takes a user-defined function to combine individual- into colony-level values and hence allows for modeling any type of interaction within a colony. SIMplyBee provides a research platform for testing breeding and conservation strategies and their effect on future genetic gain and genetic variability. Future developments of SIMplyBee will focus on improving the simulation of honeybee genomes, optimizing the simulator's performance, and including spatial awareness in mating functions and phenotype simulation. We invite the honeybee genetics and breeding community to join us in the future development of SIMplyBee.
Collapse
Affiliation(s)
- Jana Obšteter
- Department of Animal Science, The Agricultural Institute of Slovenia, Ljubljana, Slovenia.
| | - Laura K Strachan
- The Roslin Institute and Royal (Dick) School of Veterinary Medicine, The University of Edinburgh, Edinburgh, UK
| | - Jernej Bubnič
- Department of Animal Science, The Agricultural Institute of Slovenia, Ljubljana, Slovenia
| | - Janez Prešern
- Department of Animal Science, The Agricultural Institute of Slovenia, Ljubljana, Slovenia
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Medicine, The University of Edinburgh, Edinburgh, UK
- Biotechnical Faculty, Department of Animal Science, The University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
17
|
Werner CR, Gaynor RC, Sargent DJ, Lillo A, Gorjanc G, Hickey JM. Genomic selection strategies for clonally propagated crops. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:74. [PMID: 36952013 PMCID: PMC10036424 DOI: 10.1007/s00122-023-04300-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 01/14/2023] [Indexed: 05/27/2023]
Abstract
For genomic selection in clonally propagated crops with diploid (-like) meiotic behavior to be effective, crossing parents should be selected based on genomic predicted cross-performance unless dominance is negligible. For genomic selection (GS) in clonal breeding programs to be effective, parents should be selected based on genomic predicted cross-performance unless dominance is negligible. Genomic prediction of cross-performance enables efficient exploitation of the additive and dominance value simultaneously. Here, we compared different GS strategies for clonally propagated crops with diploid (-like) meiotic behavior, using strawberry as an example. We used stochastic simulation to evaluate six combinations of three breeding programs and two parent selection methods. The three breeding programs included (1) a breeding program that introduced GS in the first clonal stage, and (2) two variations of a two-part breeding program with one and three crossing cycles per year, respectively. The two parent selection methods were (1) parent selection based on genomic estimated breeding values (GEBVs) and (2) parent selection based on genomic predicted cross-performance (GPCP). Selection of parents based on GPCP produced faster genetic gain than selection of parents based on GEBVs because it reduced inbreeding when the dominance degree increased. The two-part breeding programs with one and three crossing cycles per year using GPCP always produced the most genetic gain unless dominance was negligible. We conclude that (1) in clonal breeding programs with GS, parents should be selected based on GPCP, and (2) a two-part breeding program with parent selection based on GPCP to rapidly drive population improvement has great potential to improve breeding clonally propagated crops.
Collapse
Affiliation(s)
- Christian R Werner
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, Easter Bush Research Centre, University of Edinburgh, Midlothian, EH25 9RG, UK.
| | - R Chris Gaynor
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, Easter Bush Research Centre, University of Edinburgh, Midlothian, EH25 9RG, UK
| | - Daniel J Sargent
- NIAB EMR, New Road, East Malling, Kent, ME19 6BJ, UK
- East Malling Enterprise Centre, Driscoll's Genetics Ltd, New Road, East Malling, Kent, ME19 6BJ, UK
| | - Alessandra Lillo
- East Malling Enterprise Centre, Driscoll's Genetics Ltd, New Road, East Malling, Kent, ME19 6BJ, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, Easter Bush Research Centre, University of Edinburgh, Midlothian, EH25 9RG, UK
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, Easter Bush Research Centre, University of Edinburgh, Midlothian, EH25 9RG, UK
| |
Collapse
|
18
|
Lubanga N, Massawe F, Mayes S, Gorjanc G, Bančič J. Genomic selection strategies to increase genetic gain in tea breeding programs. THE PLANT GENOME 2023; 16:e20282. [PMID: 36349831 DOI: 10.1002/tpg2.20282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Accepted: 10/01/2022] [Indexed: 05/10/2023]
Abstract
Tea [Camellia sinensis (L.) O. Kuntze] is mainly grown in low- to middle-income countries (LMIC) and is a global commodity. Breeding programs in these countries face the challenge of increasing genetic gain because the accuracy of selecting superior genotypes is low and resources are limited. Phenotypic selection (PS) is traditionally the primary method of developing improved tea varieties and can take over 16 yr. Genomic selection (GS) can be used to improve the efficiency of tea breeding by increasing selection accuracy and shortening the generation interval and breeding cycle. Our main objective was to investigate the potential of implementing GS in tea-breeding programs to speed up genetic progress despite the low cost of PS in LMIC. We used stochastic simulations to compare three GS-breeding programs with a Pedigree and PS program. The PS program mimicked a practical commercial tea-breeding program over a 40-yr breeding period. All the GS programs achieved at least 1.65 times higher genetic gains than the PS program and 1.4 times compared with Seed-Ped program. Seed-GSc was the most cost-effective strategy of implementing GS in tea-breeding programs. It introduces GS at the seedlings stage to increase selection accuracy early in the program and reduced the generation interval to 2 yr. The Seed-Ped program outperformed PS by 1.2 times and could be implemented where it is not possible to use GS. Our results indicate that GS could be used to improve genetic gain per unit time and cost even in cost-constrained tea-breeding programs.
Collapse
Affiliation(s)
- Nelson Lubanga
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The Univ. of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
- School of Biosciences, The Univ. of Nottingham Malaysia, Jalan Broga, Semenyih, Selangor Darul Ehsan, 43500, Malaysia
| | - Festo Massawe
- School of Biosciences, The Univ. of Nottingham Malaysia, Jalan Broga, Semenyih, Selangor Darul Ehsan, 43500, Malaysia
| | - Sean Mayes
- School of Biosciences, The Univ. of Nottingham, Sutton Bonington Campus, Loughborough, Leicestershire, LE12 5RD, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The Univ. of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Jon Bančič
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The Univ. of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| |
Collapse
|
19
|
Silva JM, Qi W, Pinho AJ, Pratas D. AlcoR: alignment-free simulation, mapping, and visualization of low-complexity regions in biological data. Gigascience 2022; 12:giad101. [PMID: 38091509 PMCID: PMC10716826 DOI: 10.1093/gigascience/giad101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/29/2023] [Accepted: 11/07/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Low-complexity data analysis is the area that addresses the search and quantification of regions in sequences of elements that contain low-complexity or repetitive elements. For example, these can be tandem repeats, inverted repeats, homopolymer tails, GC-biased regions, similar genes, and hairpins, among many others. Identifying these regions is crucial because of their association with regulatory and structural characteristics. Moreover, their identification provides positional and quantity information where standard assembly methodologies face significant difficulties because of substantial higher depth coverage (mountains), ambiguous read mapping, or where sequencing or reconstruction defects may occur. However, the capability to distinguish low-complexity regions (LCRs) in genomic and proteomic sequences is a challenge that depends on the model's ability to find them automatically. Low-complexity patterns can be implicit through specific or combined sources, such as algorithmic or probabilistic, and recurring to different spatial distances-namely, local, medium, or distant associations. FINDINGS This article addresses the challenge of automatically modeling and distinguishing LCRs, providing a new method and tool (AlcoR) for efficient and accurate segmentation and visualization of these regions in genomic and proteomic sequences. The method enables the use of models with different memories, providing the ability to distinguish local from distant low-complexity patterns. The method is reference and alignment free, providing additional methodologies for testing, including a highly flexible simulation method for generating biological sequences (DNA or protein) with different complexity levels, sequence masking, and a visualization tool for automatic computation of the LCR maps into an ideogram style. We provide illustrative demonstrations using synthetic, nearly synthetic, and natural sequences showing the high efficiency and accuracy of AlcoR. As large-scale results, we use AlcoR to unprecedentedly provide a whole-chromosome low-complexity map of a recent complete human genome and the haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar. CONCLUSIONS The AlcoR method provides the ability of fast sequence characterization through data complexity analysis, ideally for scenarios entangling the presence of new or unknown sequences. AlcoR is implemented in C language using multithreading to increase the computational speed, is flexible for multiple applications, and does not contain external dependencies. The tool accepts any sequence in FASTA format. The source code is freely provided at https://github.com/cobilab/alcor.
Collapse
Affiliation(s)
- Jorge M Silva
- IEETA, Institute of Electronics and Informatics Engineering of Aveiro, and LASI, Intelligent Systems Associate Laboratory, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago, 3810-193, Aveiro, Portugal
| | - Weihong Qi
- Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse, 190, 8057, Zurich, Switzerland
- SIB, Swiss Institute of Bioinformatics, 1202, Geneva, Switzerland
| | - Armando J Pinho
- IEETA, Institute of Electronics and Informatics Engineering of Aveiro, and LASI, Intelligent Systems Associate Laboratory, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago, 3810-193, Aveiro, Portugal
| | - Diogo Pratas
- IEETA, Institute of Electronics and Informatics Engineering of Aveiro, and LASI, Intelligent Systems Associate Laboratory, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago, 3810-193, Aveiro, Portugal
- Department of Virology, University of Helsinki, Haartmaninkatu, 3, 00014 Helsinki, Finland
| |
Collapse
|
20
|
DoVale JC, Carvalho HF, Sabadin F, Fritsche-Neto R. Genotyping marker density and prediction models effects in long-term breeding schemes of cross-pollinated crops. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:4523-4539. [PMID: 36261658 DOI: 10.1007/s00122-022-04236-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 10/09/2022] [Indexed: 06/16/2023]
Abstract
In genomic recurrent selection, the more markers, the better because they buffer the linkage disequilibrium losses caused by recombination over cycles, and consequently, provide higher responses to selection. Reductions of genotyping marker density have been extensively evaluated as potential strategies to reduce the genotyping costs of genomic selection (GS). Low-density marker panels are appealing in GS because they entail lower multicollinearity and computing time and allow more individuals to be genotyped for the same cost. However, statistical models used in GS are usually evaluated with empirical data, using "static" training sets and populations. This may be adequate for making predictions during a breeding program's initial cycles but not for the long-term. Moreover, studies that focus on long selective breeding cycles generally do not consider GS models with the effect of dominance, which is particularly important for breeding outcomes in cross-pollinated crops. Hence, dominance effects are important and unexplored in GS for long-term programs involving allogamous species. To address it, we employed two approaches: analysis of empirical maize datasets and simulations of long-term breeding applying phenotypic and genomic recurrent selection (intrapopulation and reciprocal schemes). In both schemes, we simulated twenty breeding cycles and assessed the effect of marker density reduction on the population mean, the best crosses, additive variance, selective accuracy, and response to selection with models [additive, additive-dominant, general (GCA), and this plus specific combining ability (GCA + SCA)]. Our results indicate that marker reduction based on linkage disequilibrium levels provides useful predictions only within a cycle, as accuracy significantly decreases over cycles. In the long-term, without training set updating, high-marker density provides the best responses to selection. The model to be used depends on the breeding scheme: additive for intrapopulation and additive-dominant or GCA + SCA for reciprocal.
Collapse
Affiliation(s)
- Júlio César DoVale
- Department of Crop Science, Federal University of Ceará, Fortaleza, CE, Brazil.
| | | | - Felipe Sabadin
- Virginia Tech: Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | | |
Collapse
|
21
|
Estimating the genome-wide mutation rate from thousands of unrelated individuals. Am J Hum Genet 2022; 109:2178-2184. [PMID: 36370709 PMCID: PMC9748258 DOI: 10.1016/j.ajhg.2022.10.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 10/15/2022] [Indexed: 11/13/2022] Open
Abstract
We provide a method for estimating the genome-wide mutation rate from sequence data on unrelated individuals by using segments of identity by descent (IBD). The length of an IBD segment indicates the time to shared ancestor of the segment, and mutations that have occurred since the shared ancestor result in discordances between the two IBD haplotypes. Previous methods for IBD-based estimation of mutation rate have required the use of family data for accurate phasing of the genotypes. This has limited the scope of application of IBD-based mutation rate estimation. Here, we develop an IBD-based method for mutation rate estimation from population data, and we apply it to whole-genome sequence data on 4,166 European American individuals from the TOPMed Framingham Heart Study, 2,996 European American individuals from the TOPMed My Life, Our Future study, and 1,586 African American individuals from the TOPMed Hypertension Genetic Epidemiology Network study. Although mutation rates may differ between populations as a result of genetic factors, demographic factors such as average parental age, and environmental exposures, our results are consistent with equal genome-wide average mutation rates across these three populations. Our overall estimate of the average genome-wide mutation rate per 108 base pairs per generation for single-nucleotide variants is 1.24 (95% CI 1.18-1.33).
Collapse
|
22
|
Guo Y, Betzen B, Salcedo A, He F, Bowden RL, Fellers JP, Jordan KW, Akhunova A, Rouse MN, Szabo LJ, Akhunov E. Population genomics of Puccinia graminis f.sp. tritici highlights the role of admixture in the origin of virulent wheat rust races. Nat Commun 2022; 13:6287. [PMID: 36271077 PMCID: PMC9587050 DOI: 10.1038/s41467-022-34050-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 10/12/2022] [Indexed: 12/25/2022] Open
Abstract
Puccinia graminis f.sp. tritici (Pgt) causes stem rust disease in wheat that can result in severe yield losses. The factors driving the evolution of its virulence and adaptation remain poorly characterized. We utilize long-read sequencing to develop a haplotype-resolved genome assembly of a U.S. isolate of Pgt. Using Pgt haplotypes as a reference, we characterize the structural variants (SVs) and single nucleotide polymorphisms in a diverse panel of isolates. SVs impact the repertoire of predicted effectors, secreted proteins involved in host-pathogen interaction, and show evidence of purifying selection. By analyzing global and local genomic ancestry we demonstrate that the origin of 8 out of 12 Pgt clades is linked with either somatic hybridization or sexual recombination between the diverged donor populations. Our study shows that SVs and admixture events appear to play an important role in broadening Pgt virulence and the origin of highly virulent races, creating a resource for studying the evolution of Pgt virulence and preventing future epidemic outbreaks.
Collapse
Affiliation(s)
- Yuanwen Guo
- grid.36567.310000 0001 0737 1259Department of Plant Pathology, Kansas State University, Manhattan, KS USA
| | - Bliss Betzen
- grid.36567.310000 0001 0737 1259Department of Plant Pathology, Kansas State University, Manhattan, KS USA ,grid.36567.310000 0001 0737 1259Present Address: USDA-APHIS-PPQ Field Operations, Kansas State University, Manhattan, KS USA
| | - Andres Salcedo
- grid.36567.310000 0001 0737 1259Department of Plant Pathology, Kansas State University, Manhattan, KS USA ,grid.40803.3f0000 0001 2173 6074Present Address: Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC USA
| | - Fei He
- grid.36567.310000 0001 0737 1259Department of Plant Pathology, Kansas State University, Manhattan, KS USA ,grid.9227.e0000000119573309Present Address: State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| | - Robert L. Bowden
- grid.512831.cUSDA-ARS, Hard Winter Wheat Genetics Research Unit, Manhattan, KS USA
| | - John P. Fellers
- grid.512831.cUSDA-ARS, Hard Winter Wheat Genetics Research Unit, Manhattan, KS USA
| | - Katherine W. Jordan
- grid.36567.310000 0001 0737 1259Department of Plant Pathology, Kansas State University, Manhattan, KS USA ,grid.512831.cUSDA-ARS, Hard Winter Wheat Genetics Research Unit, Manhattan, KS USA
| | - Alina Akhunova
- grid.36567.310000 0001 0737 1259Department of Plant Pathology, Kansas State University, Manhattan, KS USA ,grid.36567.310000 0001 0737 1259Integrated Genomics Facility, Kansas State University, Manhattan, KS USA
| | - Mathew N. Rouse
- grid.512864.c0000 0000 8881 3436Department of Plant Pathology, University of Minnesota & USDA-ARS, Cereal Disease Lab, St. Paul, MN USA
| | - Les J. Szabo
- grid.512864.c0000 0000 8881 3436Department of Plant Pathology, University of Minnesota & USDA-ARS, Cereal Disease Lab, St. Paul, MN USA
| | - Eduard Akhunov
- grid.36567.310000 0001 0737 1259Department of Plant Pathology, Kansas State University, Manhattan, KS USA ,grid.36567.310000 0001 0737 1259Wheat Genetics Resource Center, Kansas State University, Manhattan, KS USA
| |
Collapse
|
23
|
Sabadin F, DoVale JC, Platten JD, Fritsche-Neto R. Optimizing self-pollinated crop breeding employing genomic selection: From schemes to updating training sets. FRONTIERS IN PLANT SCIENCE 2022; 13:935885. [PMID: 36275547 PMCID: PMC9583387 DOI: 10.3389/fpls.2022.935885] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 09/12/2022] [Indexed: 06/16/2023]
Abstract
Long-term breeding schemes using genomic selection (GS) can boost the response to selection per year. Although several studies have shown that GS delivers a higher response to selection, only a few analyze which stage GS produces better results and how to update the training population to maintain prediction accuracy. We used stochastic simulation to compare five GS breeding schemes in a self-pollinated long-term breeding program. Also, we evaluated four strategies, using distinct methods and sizes, to update the training set. Finally, regarding breeding schemes, we proposed a new approach using GS to select the best individuals in each F2 progeny, based on genomic estimated breeding values and genetic divergence, to cross them and generate a new recombination event. Our results showed that the best scenario was using GS in F2, followed by the phenotypic selection of new parents in F4. For TS updating, adding new data every cycle (over 768) to update the TS maintains the prediction accuracy at satisfactory levels for more breeding cycles. However, only the last three generations can be kept in the TS, optimizing the genetic relationship between TS and the targeted population and reducing the computing demand and risks. Hence, we believe that our results may help breeders optimize GS in their programs and improve genetic gain in long-term schemes.
Collapse
Affiliation(s)
- Felipe Sabadin
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, United States
| | - Julio César DoVale
- Department of Crop Science, Federal University of Ceará, Fortaleza, Ceará, Brazil
| | | | - Roberto Fritsche-Neto
- International Rice Research Institute (IRRI), Los Baños, Philippines
- H. Rouse Caffey Rice Research Station, Louisiana State University (LSU) AgCenter, Rayne, LA, United States
| |
Collapse
|
24
|
Wangkumhang P, Greenfield M, Hellenthal G. An efficient method to identify, date, and describe admixture events using haplotype information. Genome Res 2022; 32:1553-1564. [PMID: 35794007 PMCID: PMC9435750 DOI: 10.1101/gr.275994.121] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 06/28/2022] [Indexed: 11/24/2022]
Abstract
We present fastGLOBETROTTER, an efficient new haplotype-based technique to identify, date, and describe admixture events using genome-wide autosomal data. With simulations, we show how fastGLOBETROTTER reduces computation time by an order of magnitude relative to the related technique GLOBETROTTER without suffering loss of accuracy. We apply fastGLOBETROTTER to a cohort of more than 6000 Europeans from 10 countries, revealing previously unreported admixture signals. In particular, we infer multiple periods of admixture related to East Asian or Siberian-like sources, starting >2000 yr ago, in people living in countries north of the Baltic Sea. In contrast, we infer admixture related to West Asian, North African, and/or Southern European sources in populations south of the Baltic Sea, including admixture dated to ∼300-700 CE, overlapping the fall of the Roman Empire, in people from Belgium, France, and parts of Germany. Our new approach scales to analyzing hundreds to thousands of individuals from a putatively admixed population and, hence, is applicable to emerging large-scale cohorts of genetically homogeneous populations.
Collapse
Affiliation(s)
- Pongsakorn Wangkumhang
- University College London Genetics Institute (UGI), Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, United Kingdom
- National Biobank of Thailand, National Science and Technology Development Agency, Pathum Thani 12120, Thailand
| | - Matthew Greenfield
- University College London Genetics Institute (UGI), Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, United Kingdom
| | - Garrett Hellenthal
- University College London Genetics Institute (UGI), Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, United Kingdom
| |
Collapse
|
25
|
Joint inference of ancestry and genotypes of parents from children. iScience 2022; 25:104768. [PMID: 35942102 PMCID: PMC9356179 DOI: 10.1016/j.isci.2022.104768] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 05/18/2022] [Accepted: 07/11/2022] [Indexed: 12/02/2022] Open
Abstract
In this paper, we address a problem: can we perform ancestry inference for parents from one or more children’s DNA samples? That is, suppose the parents’ genomes consist of segments of different ancestry, and our goal is inferring parental ancestry and at the same time, calling parental genotypes from given children’s genetic data. Such ancestry inference may provide insights into recent ancestors from children’s genomes, and potentially has applications in understanding genetic traits. At present, there exists no method for this inference problem. We present parMix, a method based on hidden Markov model (HMM) that can jointly infer parental ancestry and call parental genotypes from data of a small number of children. Simulation results show that parMix performs well in practice. It can provide reasonably accurate parental inference given data from a small number (say three) of children. parMix becomes more accurate when data from more children are used. Presented a method for inferring ancestry and genotypes of parents from children Recombination events can be detected using parMix parMix can deal with the genotypes with phasing errors parMix can be used to infer admixture proportion of parents
Collapse
|
26
|
Perera M, Montserrat DM, Barrabes M, Geleta M, Giro-I-Nieto X, Ioannidis AG. Generative Moment Matching Networks for Genotype Simulation. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:1379-1383. [PMID: 36086656 DOI: 10.1109/embc48229.2022.9871045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The generation of synthetic genomic sequences using neural networks has potential to ameliorate privacy and data sharing concerns and to mitigate potential bias within datasets due to under-representation of some population groups. However, there is not a consensus on which architectures, training procedures, and evaluation metrics should be used when simulating single nucleotide polymorphism (SNP) sequences with neural networks. In this paper, we explore the use of Generative Moment Matching Networks (GMMNs) for SNP simulation, we present some architectural and procedural changes to properly train the networks, and we introduce an evaluation scheme to qualitatively and quantitatively assess the quality of the simulated sequences.
Collapse
|
27
|
Baller JL, Kachman SD, Kuehn LA, Spangler ML. Using pooled data for genomic prediction in a bivariate framework with missing data. J Anim Breed Genet 2022; 139:489-501. [PMID: 35698863 PMCID: PMC9544112 DOI: 10.1111/jbg.12727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 05/21/2022] [Indexed: 11/29/2022]
Abstract
Pooling samples to derive group genotypes can enable the economically efficient use of commercial animals within genetic evaluations. To test a multivariate framework for genetic evaluations using pooled data, simulation was used to mimic a beef cattle population including two moderately heritable traits with varying genetic correlations, genotypes and pedigree data. There were 15 generations (n = 32,000; random selection and mating), and the last generation was subjected to genotyping through pooling. Missing records were induced in two ways: (a) sequential culling and (b) random missing records. Gaps in genotyping were also explored whereby genotyping occurred through generation 13 or 14. Pools of 1, 20, 50 and 100 animals were constructed randomly or by minimizing phenotypic variation. The EBV was estimated using a bivariate single-step genomic best linear unbiased prediction model. Pools of 20 animals constructed by minimizing phenotypic variation generally led to accuracies that were not different than using individual progeny data. Gaps in genotyping led to significantly different EBV accuracies (p < .05) for sires and dams born in the generation nearest the pools. Pooling of any size generally led to larger accuracies than no information from generation 15 regardless of the way missing records arose, the percentage of records available or the genetic correlation. Pooling to aid in the use of commercial data in genetic evaluations can be utilized in multivariate cases with varying relationships between the traits and in the presence of systematic and randomly missing phenotypes.
Collapse
Affiliation(s)
- Johnna L Baller
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Stephen D Kachman
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Larry A Kuehn
- USDA, ARS, U.S. Meat Animal Research Center, Clay Center, Nebraska, USA
| | - Matthew L Spangler
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| |
Collapse
|
28
|
Chintalapati M, Patterson N, Moorjani P. The spatiotemporal patterns of major human admixture events during the European Holocene. eLife 2022; 11:77625. [PMID: 35635751 PMCID: PMC9293011 DOI: 10.7554/elife.77625] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 05/29/2022] [Indexed: 11/16/2022] Open
Abstract
Recent studies have shown that admixture has been pervasive throughout human history. While several methods exist for dating admixture in contemporary populations, they are not suitable for sparse, low coverage ancient genomic data. Thus, we developed DATES (Distribution of Ancestry Tracts of Evolutionary Signals) that leverages ancestry covariance patterns across the genome of a single individual to infer the timing of admixture. DATES provides reliable estimates under various demographic scenarios and outperforms available methods for ancient DNA applications. Using DATES on~1100 ancient genomes from sixteen regions in Europe and west Asia, we reconstruct the chronology of the formation of the ancestral populations and the fine-scale details of the spread of Neolithic farming and Steppe pastoralist-related ancestry across Europe. By studying the genetic formation of Anatolian farmers, we infer that gene flow related to Iranian Neolithic farmers occurred before 9600 BCE, predating the advent of agriculture in Anatolia. Contrary to the archaeological evidence, we estimate that early Steppe pastoralist groups (Yamnaya and Afanasievo) were genetically formed more than a millennium before the start of Steppe pastoralism. Our analyses provide new insights on the origins and spread of farming and Indo-European languages, highlighting the power of genomic dating methods to elucidate the legacy of human migrations.
Collapse
Affiliation(s)
- Manjusha Chintalapati
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, United States
| | - Nick Patterson
- Program in Medical and Population Genetics, Broad Institute, Cambridge, United States
| | - Priya Moorjani
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, United States
| |
Collapse
|
29
|
See GM, Fix JS, Schwab CR, Spangler ML. Imputation of non-genotyped F1 dams to improve genetic gain in swine crossbreeding programs. J Anim Sci 2022; 100:6572187. [PMID: 35451025 PMCID: PMC9126202 DOI: 10.1093/jas/skac148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 04/20/2022] [Indexed: 11/12/2022] Open
Abstract
This study investigated using imputed genotypes from non-genotyped animals which were not in the pedigree for the purpose of genetic selection and improving genetic gain for economically relevant traits. Simulations were used to mimic a 3-breed crossbreeding system that resembled a modern swine breeding scheme. The simulation consisted of three purebred (PB) breeds A, B, and C each with 25 and 425 mating males and females, respectively. Males from A and females from B were crossed to produce AB females (n = 1,000), which were crossed with males from C to produce crossbreds (CB; n = 10,000). The genome consisted of three chromosomes with 300 quantitative trait loci and ~9,000 markers. Lowly heritable reproductive traits were simulated for A, B, and AB (h2 = 0.2, 0.2, and 0.15, respectively), whereas a moderately heritable carcass trait was simulated for C (h2 = 0.4). Genetic correlations between reproductive traits in A, B, and AB were moderate (rg = 0.65). The goal trait of the breeding program was AB performance. Selection was practiced for four generations where AB and CB animals were first produced in generations 1 and 2, respectively. Non-genotyped AB dams were imputed using FImpute beginning in generation 2. Genotypes of PB and CB were used for imputation. Imputation strategies differed by three factors: 1) AB progeny genotyped per generation (2, 3, 4, or 6), 2) known or unknown mates of AB dams, and 3) genotyping rate of females from breeds A and B (0% or 100%). PB selection candidates from A and B were selected using estimated breeding values for AB performance, whereas candidates from C were selected by phenotype. Response to selection using imputed genotypes of non-genotyped animals was then compared to the scenarios where true AB genotypes (trueGeno) or no AB genotypes/phenotypes (noGeno) were used in genetic evaluations. The simulation was replicated 20 times. The average increase in genotype concordance between unknown and known sire imputation strategies was 0.22. Genotype concordance increased as the number of genotyped CB increased with little additional gain beyond 9 progeny. When mates of AB were known and more than 4 progeny were genotyped per generation, the phenotypic response in AB did not differ (P > 0.05) from trueGeno yet was greater (P < 0.05) than noGeno. Imputed genotypes of non-genotyped animals can be used to increase performance when 4 or more progeny are genotyped and sire pedigrees of CB animals are known.
Collapse
Affiliation(s)
- Garrett M See
- Department of Animal Science, University of Nebraska - Lincoln, Lincoln, NE 68588, USA
| | | | | | - Matthew L Spangler
- Department of Animal Science, University of Nebraska - Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
30
|
Zhao R, Pei S, Yau SST. New Genome Sequence Detection via Natural Vector Convex Hull Method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1782-1793. [PMID: 33237867 DOI: 10.1109/tcbb.2020.3040706] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
It remains challenging how to find existing but undiscovered genome sequence mutations or predict potential genome sequence mutations based on real sequence data. Motivated by this, we develop approaches to detect new, undiscovered genome sequences. Because discovering new genome sequences through biological experiments is resource-intensive, we want to achieve the new genome sequence detection task mathematically. However, little literature tells us how to detect new, undiscovered genome sequence mutations mathematically. We form a new framework based on natural vector convex hull method that conducts alignment-free sequence analysis. Our newly developed two approaches, Random-permutation Algorithm with Penalty (RAP) and Random-permutation Algorithm with Penalty and COstrained Search (RAPCOS), use the geometry properties captured by natural vectors. In our experiment, we discover a mathematically new human immunodeficiency virus (HIV) genome sequence using some real HIV genome sequences. Significantly, the proposed methods are applicable to solve the new genome sequence detection challenge and have many good properties, such as robustness, rapid convergence, and fast computation.
Collapse
|
31
|
Baumdicker F, Bisschop G, Goldstein D, Gower G, Ragsdale AP, Tsambos G, Zhu S, Eldon B, Ellerman EC, Galloway JG, Gladstein AL, Gorjanc G, Guo B, Jeffery B, Kretzschumar WW, Lohse K, Matschiner M, Nelson D, Pope NS, Quinto-Cortés CD, Rodrigues MF, Saunack K, Sellinger T, Thornton K, van Kemenade H, Wohns AW, Wong Y, Gravel S, Kern AD, Koskela J, Ralph PL, Kelleher J. Efficient ancestry and mutation simulation with msprime 1.0. Genetics 2022; 220:iyab229. [PMID: 34897427 PMCID: PMC9176297 DOI: 10.1093/genetics/iyab229] [Citation(s) in RCA: 104] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/03/2021] [Indexed: 11/13/2022] Open
Abstract
Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime's many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.
Collapse
Affiliation(s)
- Franz Baumdicker
- Cluster of Excellence “Controlling Microbes to Fight Infections”, Mathematical and Computational Population Genetics, University of Tübingen, 72076 Tübingen, Germany
| | - Gertjan Bisschop
- Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Daniel Goldstein
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Graham Gower
- Lundbeck GeoGenetics Centre, Globe Institute, University of Copenhagen, 1350 Copenhagen K, Denmark
| | - Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin–Madison, Madison, WI 53706, USA
| | - Georgia Tsambos
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, Australia
| | - Sha Zhu
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Bjarki Eldon
- Leibniz Institute for Evolution and Biodiversity Science, Museum für Naturkunde, Berlin 10115, Germany
| | | | - Jared G Galloway
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Ariella L Gladstein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7264, USA
- Embark Veterinary, Inc., Boston, MA 02111, USA
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh EH25 9RG, UK
| | - Bing Guo
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Ben Jeffery
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Warren W Kretzschumar
- Center for Hematology and Regenerative Medicine, Karolinska Institute, 141 83 Huddinge, Sweden
| | - Konrad Lohse
- Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, UK
| | | | - Dominic Nelson
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
| | - Nathaniel S Pope
- Department of Entomology, Pennsylvania State University, State College, PA 16802, USA
| | - Consuelo D Quinto-Cortés
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Unit of Advanced Genomics, CINVESTAV, Irapuato, Mexico
| | - Murillo F Rodrigues
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
| | | | - Thibaut Sellinger
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, 85354 Freising, Germany
| | - Kevin Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697, USA
| | | | - Anthony W Wohns
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
| | - Andrew D Kern
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
| | - Jere Koskela
- Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| | - Peter L Ralph
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
- Department of Mathematics, University of Oregon, Eugene, OR 97403-5289, USA
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| |
Collapse
|
32
|
Omer EA, Hinrichs D, Addo S, Roessler R. Development of a breeding program for improving the milk yield performance of Butana cattle under smallholder production conditions using a stochastic simulation approach. J Dairy Sci 2022; 105:5261-5270. [DOI: 10.3168/jds.2021-21307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 01/20/2022] [Indexed: 11/19/2022]
|
33
|
Batista LG, Mello VH, Souza AP, Margarido GRA. Genomic prediction with allele dosage information in highly polyploid species. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:723-739. [PMID: 34800132 DOI: 10.1007/s00122-021-03994-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 11/06/2021] [Indexed: 06/13/2023]
Abstract
Including allele, dosage can improve genomic selection in highly polyploid species under higher frequency of different heterozygous genotypic classes and high dominance degree levels. Several studies have shown how to leverage allele dosage information to improve the accuracy of genomic selection models in autotetraploid. In this study, we expanded the methodology used for genomic selection in autotetraploid to higher (and mixed) ploidy levels. We adapted the models to build covariance matrices of both additive and digenic dominance effects that are subsequently used in genomic selection models. We applied these models using estimates of ploidy and allele dosage to sugarcane and sweet potato datasets and validated our results by also applying the models in simulated data. For the simulated datasets, including allele dosage information led up to 140% higher mean predictive abilities in comparison to using diploidized markers. Including dominance effects were highly advantageous when using diploidized markers, leading to mean predictive abilities which were up to 115% higher in comparison to only including additive effects. When the frequency of heterozygous genotypes in the population was low, such as in the sugarcane and sweet potato datasets, there was little advantage in including allele dosage information in the models. Overall, we show that including allele dosage can improve genomic selection in highly polyploid species under higher frequency of different heterozygous genotypic classes and high dominance degree levels.
Collapse
Affiliation(s)
- Lorena G Batista
- Luiz de Queiroz" College of Agriculture, University of São Paulo, Piracicaba, SP, 13418-900, Brazil
| | - Victor H Mello
- Luiz de Queiroz" College of Agriculture, University of São Paulo, Piracicaba, SP, 13418-900, Brazil
| | - Anete P Souza
- Center of Molecular Biology and Genetic Engineering, University of Campinas, Campinas, SP, 13083-970, Brazil
| | - Gabriel R A Margarido
- Luiz de Queiroz" College of Agriculture, University of São Paulo, Piracicaba, SP, 13418-900, Brazil.
| |
Collapse
|
34
|
Covarrubias-Pazaran G, Gebeyehu Z, Gemenet D, Werner C, Labroo M, Sirak S, Coaldrake P, Rabbi I, Kayondo SI, Parkes E, Kanju E, Mbanjo EGN, Agbona A, Kulakow P, Quinn M, Debaene J. Breeding Schemes: What Are They, How to Formalize Them, and How to Improve Them? FRONTIERS IN PLANT SCIENCE 2022; 12:791859. [PMID: 35126417 PMCID: PMC8813775 DOI: 10.3389/fpls.2021.791859] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Accepted: 12/10/2021] [Indexed: 05/26/2023]
Abstract
Formalized breeding schemes are a key component of breeding program design and a gateway to conducting plant breeding as a quantitative process. Unfortunately, breeding schemes are rarely defined, expressed in a quantifiable format, or stored in a database. Furthermore, the continuous review and improvement of breeding schemes is not routinely conducted in many breeding programs. Given the rapid development of novel breeding methodologies, it is important to adopt a philosophy of continuous improvement regarding breeding scheme design. Here, we discuss terms and definitions that are relevant to formalizing breeding pipelines, market segments and breeding schemes, and we present a software tool, Breeding Pipeline Manager, that can be used to formalize and continuously improve breeding schemes. In addition, we detail the use of continuous improvement methods and tools such as genetic simulation through a case study in the International Institute of Tropical Agriculture (IITA) Cassava east-Africa pipeline. We successfully deploy these tools and methods to optimize the program size as well as allocation of resources to the number of parents used, number of crosses made, and number of progeny produced. We propose a structured approach to improve breeding schemes which will help to sustain the rates of response to selection and help to deliver better products to farmers and consumers.
Collapse
Affiliation(s)
- Giovanny Covarrubias-Pazaran
- Excellence in Breeding Platform, Consultative Group on International Agricultural Research, Texcoco, Mexico
- Independent Researcher, Addis Ababa, Ethiopia
| | | | - Dorcus Gemenet
- Excellence in Breeding Platform, Consultative Group on International Agricultural Research, Texcoco, Mexico
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Christian Werner
- Excellence in Breeding Platform, Consultative Group on International Agricultural Research, Texcoco, Mexico
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Marlee Labroo
- Excellence in Breeding Platform, Consultative Group on International Agricultural Research, Texcoco, Mexico
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Solomon Sirak
- Excellence in Breeding Platform, Consultative Group on International Agricultural Research, Texcoco, Mexico
| | - Peter Coaldrake
- Excellence in Breeding Platform, Consultative Group on International Agricultural Research, Texcoco, Mexico
| | - Ismail Rabbi
- International Institute for Tropical Agriculture (IITA), Ibadan, Nigeria
| | | | - Elizabeth Parkes
- International Institute for Tropical Agriculture (IITA), Ibadan, Nigeria
| | - Edward Kanju
- International Institute for Tropical Agriculture (IITA), Ibadan, Nigeria
| | | | - Afolabi Agbona
- International Institute for Tropical Agriculture (IITA), Ibadan, Nigeria
| | - Peter Kulakow
- International Institute for Tropical Agriculture (IITA), Ibadan, Nigeria
| | - Michael Quinn
- Excellence in Breeding Platform, Consultative Group on International Agricultural Research, Texcoco, Mexico
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Jan Debaene
- Excellence in Breeding Platform, Consultative Group on International Agricultural Research, Texcoco, Mexico
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| |
Collapse
|
35
|
Legarra A, Garcia-Baccino CA, Wientjes YCJ, Vitezica ZG. The correlation of substitution effects across populations and generations in the presence of nonadditive functional gene action. Genetics 2021; 219:iyab138. [PMID: 34718531 PMCID: PMC8664574 DOI: 10.1093/genetics/iyab138] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 08/19/2021] [Indexed: 11/14/2022] Open
Abstract
Allele substitution effects at quantitative trait loci (QTL) are part of the basis of quantitative genetics theory and applications such as association analysis and genomic prediction. In the presence of nonadditive functional gene action, substitution effects are not constant across populations. We develop an original approach to model the difference in substitution effects across populations as a first order Taylor series expansion from a "focal" population. This expansion involves the difference in allele frequencies and second-order statistical effects (additive by additive and dominance). The change in allele frequencies is a function of relationships (or genetic distances) across populations. As a result, it is possible to estimate the correlation of substitution effects across two populations using three elements: magnitudes of additive, dominance, and additive by additive variances; relationships (Nei's minimum distances or Fst indexes); and assumed heterozygosities. Similarly, the theory applies as well to distinct generations in a population, in which case the distance across generations is a function of increase of inbreeding. Simulation results confirmed our derivations. Slight biases were observed, depending on the nonadditive mechanism and the reference allele. Our derivations are useful to understand and forecast the possibility of prediction across populations and the similarity of GWAS effects.
Collapse
Affiliation(s)
- Andres Legarra
- INRAE/INP, UMR 1388 GenPhySE, Castanet-Tolosan 31326, France
| | - Carolina A. Garcia-Baccino
- INRAE/INP, UMR 1388 GenPhySE, Castanet-Tolosan 31326, France
- Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires C1417DSQ, Argentina
- SAS NUCLEUS, Le Rheu 35650, France
| | - Yvonne C. J. Wientjes
- Wageningen University & Research, Animal Breeding and Genomics, Wageningen 6700 AH, the Netherlands
| | | |
Collapse
|
36
|
Lu CW, Yao CT, Hung CM. Domestication obscures genomic estimates of population history. Mol Ecol 2021; 31:752-766. [PMID: 34779057 DOI: 10.1111/mec.16277] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 11/05/2021] [Accepted: 11/08/2021] [Indexed: 11/28/2022]
Abstract
Domesticated species are valuable models to examine phenotypic evolution, and knowledge on domestication history is critical for understanding the trajectories of evolutionary changes. Sequentially Markov Coalescent models are often used to infer domestication history. However, domestication practices may obscure the signal left by population history, affecting demographic inference. Here we assembled the genomes of a recently domesticated species-the society finch-and its parent species-the white-rumped munia-to examine its domestication history. We applied genomic analyses to two society finch breeds and white-rumped munias to test whether domestication of the former resulted from inbreeding or hybridization. The society finch showed longer and more runs of homozygosity and lower genomic heterozygosity than the white-rumped munia, supporting an inbreeding origin in the former. Blocks of white-rumped munia and other ancestry in society finch genomes showed similar genetic distance between the two taxa, inconsistent with the hybridization origin hypothesis. We then applied two Sequentially Markov Coalescent models-psmc and smc++-to infer the demographic histories of both. Surprisingly, the two models did not reveal a recent population bottleneck, but instead the psmc model showed a specious, dramatic population increase in the society finch. Subsequently, we used simulated genomes based on an array of demographic scenarios to demonstrate that recent inbreeding, not hybridization, caused the distorted psmc population trajectory. Such analyses could have misled our understanding of the domestication process. Our findings stress caution when interpreting the histories of recently domesticated species inferred by psmc, arguing that these histories require multiple analyses to validate.
Collapse
Affiliation(s)
- Chia-Wei Lu
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Cheng-Te Yao
- Division of Zoology, Endemic Species Research Institute, Nantou, Taiwan
| | - Chih-Ming Hung
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
37
|
Powell O, Mrode R, Gaynor RC, Johnsson M, Gorjanc G, Hickey JM. Genomic evaluations using data recorded on smallholder dairy farms in low- to middle-income countries. JDS COMMUNICATIONS 2021; 2:366-370. [PMID: 36337118 PMCID: PMC9623656 DOI: 10.3168/jdsc.2021-0092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 07/14/2021] [Indexed: 12/02/2022]
Abstract
Genomic evaluations outperformed pedigree-based genetic evaluations. Shared haplotypes captured "hidden" genetic relationships to strengthen connectedness in genomic evaluations. Genomic evaluations were possible using LMIC smallholder records from herds with ≤4 cows. . Modelling herd as a random effect produced EBVs with the highest accuracies.
Breeding has increased genetic gain for dairy cattle in advanced economies but has had limited success in improving dairy cattle in low- to middle-income countries (LMIC). Genetic evaluations are a central component of delivering genetic gain, because they separate the genetic and environmental effects of animals' phenotypes. Genetic evaluations have been successful in advanced economies because of large data sets and strong genetic connectedness, provided by the widespread use of artificial insemination (AI) and accurate recording of pedigree information. In smallholder dairy production systems of many LMICs, the limited use of AI and small herd sizes results in a data structure with insufficient genetic connectedness between herds to facilitate genetic evaluations based on pedigree. Genomic information keeps track of shared haplotypes rather than shared relatives captured by pedigree records. Therefore, genomic information could capture “hidden” genetic relationships, that are not captured by pedigree information, to strengthen genetic connectedness in LMIC smallholder dairy data sets. This study's objective was to use simulation to quantify the power of genomic information to enable genetic evaluation using LMIC smallholder dairy data sets. The results from this study show that (1) genetic evaluations using genomic information were more accurate than those using pedigree information in populations with a high effective population size and weak genetic connectedness; and (2) genetic evaluations modeling herd as a random effect had higher or equal accuracy than those modeling herd as a fixed effect. This demonstrates the potential of genomic information to be an enabling technology in LMIC smallholder dairy production systems by facilitating genetic evaluations with in situ records collected from herds of ≤4 cows. The establishment of routine genomic evaluations could allow the development of LMIC breeding programs comprising an informal set of nucleus animals distributed across many small herds within the target environment. These nucleus animals could be used for genetic evaluation, and the best animals could be disseminated to participating smallholder dairy farms. Together, this could increase the productivity, profitability, and sustainability of LMIC smallholder dairy production systems.
Collapse
Affiliation(s)
- Owen Powell
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, United Kingdom
- Corresponding author
| | - Raphael Mrode
- Scotland's Rural College (SRUC), Easter Bush, Midlothian, EH25 9RG, United Kingdom
- International Livestock Research Institute (ILRI), Nairobi 00100, Kenya
| | - R. Chris Gaynor
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, United Kingdom
| | - Martin Johnsson
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, United Kingdom
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Box 7023, 750 07, Uppsala, Sweden
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, United Kingdom
| | - John M. Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, United Kingdom
| |
Collapse
|
38
|
Rios EF, Andrade MHML, Resende MFR, Kirst M, de Resende MDV, de Almeida Filho JE, Gezan SA, Munoz P. Genomic prediction in family bulks using different traits and cross-validations in pine. G3-GENES GENOMES GENETICS 2021; 11:6321952. [PMID: 34544139 PMCID: PMC8496210 DOI: 10.1093/g3journal/jkab249] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 07/02/2021] [Indexed: 11/13/2022]
Abstract
Genomic prediction integrates statistical, genomic, and computational tools to improve the estimation of breeding values and increase genetic gain. Due to the broad diversity in mating systems, breeding schemes, propagation methods, and unit of selection, no universal genomic prediction approach can be applied in all crops. In a genome-wide family prediction (GWFP) approach, the family is the basic unit of selection. We tested GWFP in two loblolly pine (Pinus taeda L.) datasets: a breeding population composed of 63 full-sib families (5–20 individuals per family), and a simulated population with the same pedigree structure. In both populations, phenotypic and genomic data was pooled at the family level in silico. Marker effects were estimated to compute genomic estimated breeding values (GEBV) at the individual and family (GWFP) levels. Less than six individuals per family produced inaccurate estimates of family phenotypic performance and allele frequency. Tested across different scenarios, GWFP predictive ability was higher than those for GEBV in both populations. Validation sets composed of families with similar phenotypic mean and variance as the training population yielded predictions consistently higher and more accurate than other validation sets. Results revealed potential for applying GWFP in breeding programs whose selection unit are family, and for systems where family can serve as training sets. The GWFP approach is well suited for crops that are routinely genotyped and phenotyped at the plot-level, but it can be extended to other breeding programs. Higher predictive ability obtained with GWFP would motivate the application of genomic prediction in these situations.
Collapse
Affiliation(s)
- Esteban F Rios
- Agronomy Department, University of Florida, Gainesville, FL 32611, USA
| | | | - Marcio F R Resende
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Matias Kirst
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611, USA
| | - Marcos D V de Resende
- EMBRAPA Café/Department of Statistics, Federal University of Viçosa, Avenida PH Rolfs S/N, Viçosa 36570-000, Brazil
| | | | | | - Patricio Munoz
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
39
|
da Silva ÉDB, Xavier A, Faria MV. Impact of Genomic Prediction Model, Selection Intensity, and Breeding Strategy on the Long-Term Genetic Gain and Genetic Erosion in Soybean Breeding. Front Genet 2021; 12:637133. [PMID: 34539725 PMCID: PMC8440908 DOI: 10.3389/fgene.2021.637133] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 08/05/2021] [Indexed: 11/21/2022] Open
Abstract
Genomic-assisted breeding has become an important tool in soybean breeding. However, the impact of different genomic selection (GS) approaches on short- and long-term gains is not well understood. Such gains are conditional on the breeding design and may vary with a combination of the prediction model, family size, selection strategies, and selection intensity. To address these open questions, we evaluated various scenarios through a simulated closed soybean breeding program over 200 breeding cycles. Genomic prediction was performed using genomic best linear unbiased prediction (GBLUP), Bayesian methods, and random forest, benchmarked against selection on phenotypic values, true breeding values (TBV), and random selection. Breeding strategies included selections within family (WF), across family (AF), and within pre-selected families (WPSF), with selection intensities of 2.5, 5.0, 7.5, and 10.0%. Selections were performed at the F4 generation, where individuals were phenotyped and genotyped with a 6K single nucleotide polymorphism (SNP) array. Initial genetic parameters for the simulation were estimated from the SoyNAM population. WF selections provided the most significant long-term genetic gains. GBLUP and Bayesian methods outperformed random forest and provided most of the genetic gains within the first 100 generations, being outperformed by phenotypic selection after generation 100. All methods provided similar performances under WPSF selections. A faster decay in genetic variance was observed when individuals were selected AF and WPSF, as 80% of the genetic variance was depleted within 28-58 cycles, whereas WF selections preserved the variance up to cycle 184. Surprisingly, the selection intensity had less impact on long-term gains than did the breeding strategies. The study supports that genetic gains can be optimized in the long term with specific combinations of prediction models, family size, selection strategies, and selection intensity. A combination of strategies may be necessary for balancing the short-, medium-, and long-term genetic gains in breeding programs while preserving the genetic variance.
Collapse
Affiliation(s)
| | - Alencar Xavier
- Department of Biostatistics, Corteva Agriscience, Johnston, IA, United States
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Marcos Ventura Faria
- Department of Agronomy, Universidade Estadual do Centro-Oeste, Guarapuava, Brazil
| |
Collapse
|
40
|
Evidence for opposing selective forces operating on human-specific duplicated TCAF genes in Neanderthals and humans. Nat Commun 2021; 12:5118. [PMID: 34433829 PMCID: PMC8387397 DOI: 10.1038/s41467-021-25435-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Accepted: 08/04/2021] [Indexed: 11/30/2022] Open
Abstract
TRP channel-associated factor 1/2 (TCAF1/TCAF2) proteins antagonistically regulate the cold-sensor protein TRPM8 in multiple human tissues. Understanding their significance has been complicated given the locus spans a gap-ridden region with complex segmental duplications in GRCh38. Using long-read sequencing, we sequence-resolve the locus, annotate full-length TCAF models in primate genomes, and show substantial human-specific TCAF copy number variation. We identify two human super haplogroups, H4 and H5, and establish that TCAF duplications originated ~1.7 million years ago but diversified only in Homo sapiens by recurrent structural mutations. Conversely, in all archaic-hominin samples the fixation for a specific H4 haplotype without duplication is likely due to positive selection. Here, our results of TCAF copy number expansion, selection signals in hominins, and differential TCAF2 expression between haplogroups and high TCAF2 and TRPM8 expression in liver and prostate in modern-day humans imply TCAF diversification among hominins potentially in response to cold or dietary adaptations. Duplications of gene segments can allow novel physiological adaptations to evolve. A detailed analysis of the TCAF gene family in primates and archaic humans suggest rapid duplication and diversification in this gene family is associated with cold or dietary adaptations.
Collapse
|
41
|
Vargas Jurado N, Kuehn LA, Keele JW, Lewis RM. Accuracy of GEBV of sires based on pooled allele frequency of their progeny. G3-GENES GENOMES GENETICS 2021; 11:6321233. [PMID: 34510188 DOI: 10.1093/g3journal/jkab231] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 06/17/2021] [Indexed: 11/12/2022]
Abstract
Despite decreasing genotyping costs, in some cases individually genotyping animals is not economically feasible (e.g., in small ruminants). An alternative is to pool DNA, using the pooled allele frequency (PAF) to garner information on performance. Still, the use of PAF for prediction (estimation of genomic breeding values; GEBVs) has been limited. Two potential sources of error on accuracy of GEBV of sires, obtained from PAF of their progeny themselves lacking pedigree information, were tested: (i) pool construction error (unequal contribution of DNA from animals in pools), and (ii) technical error (variability when reading the array). Pooling design (random, extremes, K-means), pool size (5, 10, 25, 50, and 100 individuals), and selection scenario (random, phenotypic) also were considered. These factors were tested by simulating a sheep population. Accuracy of GEBV-the correlation between true and estimated values-was not substantially affected by pool construction or technical error, or selection scenario. A significant interaction, however, between pool size and design was found. Still, regardless of design, mean accuracy was higher for pools of 10 or less individuals. Mean accuracy of GEBV was 0.174 (SE 0.001) for random pooling, and 0.704 (SE 0.004) and 0.696 (SE 0.004) for extreme and K-means pooling, respectively. Non-random pooling resulted in moderate accuracy of GEBV. Overall, pooled genotypes can be used in conjunction with individual genotypes of sires for moderately accurate predictions of their genetic merit with little effect of pool construction or technical error.
Collapse
Affiliation(s)
| | - Larry A Kuehn
- Genetics, Breeding, and Animal Health Research Unit, U.S. Meat Animal Research Center, USDA-ARS, Clay Center, NE 68933, USA
| | - John W Keele
- Genetics, Breeding, and Animal Health Research Unit, U.S. Meat Animal Research Center, USDA-ARS, Clay Center, NE 68933, USA
| | - Ronald M Lewis
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| |
Collapse
|
42
|
Hasan AR, Ness RW. Recombination Rate Variation and Infrequent Sex Influence Genetic Diversity in Chlamydomonas reinhardtii. Genome Biol Evol 2021; 12:370-380. [PMID: 32181819 PMCID: PMC7186780 DOI: 10.1093/gbe/evaa057] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/13/2020] [Indexed: 12/12/2022] Open
Abstract
Recombination confers a major evolutionary advantage by breaking up linkage disequilibrium between harmful and beneficial mutations, thereby facilitating selection. However, in species that are only periodically sexual, such as many microbial eukaryotes, the realized rate of recombination is also affected by the frequency of sex, meaning that infrequent sex can increase the effects of selection at linked sites despite high recombination rates. Despite this, the rate of sex of most facultatively sexual species is unknown. Here, we use genomewide patterns of linkage disequilibrium to infer fine-scale recombination rate variation in the genome of the facultatively sexual green alga Chlamydomonas reinhardtii. We observe recombination rate variation of up to two orders of magnitude and find evidence of recombination hotspots across the genome. Recombination rate is highest flanking genes, consistent with trends observed in other nonmammalian organisms, though intergenic recombination rates vary by intergenic tract length. We also find a positive relationship between nucleotide diversity and physical recombination rate, suggesting a widespread influence of selection at linked sites in the genome. Finally, we use estimates of the effective rate of recombination to calculate the rate of sex that occurs in natural populations, estimating a sexual cycle roughly every 840 generations. We argue that the relatively infrequent rate of sex and large effective population size creates a population genetic environment that increases the influence of selection on linked sites across the genome.
Collapse
Affiliation(s)
- Ahmed R Hasan
- Department of Cell and Systems Biology, University of Toronto, Ontario, Canada.,Department of Biology, University of Toronto Mississauga, Ontario, Canada
| | - Rob W Ness
- Department of Cell and Systems Biology, University of Toronto, Ontario, Canada.,Department of Biology, University of Toronto Mississauga, Ontario, Canada
| |
Collapse
|
43
|
Gaynor RC, Gorjanc G, Hickey JM. AlphaSimR: an R package for breeding program simulations. G3-GENES GENOMES GENETICS 2021; 11:6025179. [PMID: 33704430 PMCID: PMC8022926 DOI: 10.1093/g3journal/jkaa017] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 11/05/2020] [Indexed: 01/03/2023]
Abstract
This paper introduces AlphaSimR, an R package for stochastic simulations of plant and animal breeding programs. AlphaSimR is a highly flexible software package able to simulate a wide range of plant and animal breeding programs for diploid and autopolyploid species. AlphaSimR is ideal for testing the overall strategy and detailed design of breeding programs. AlphaSimR utilizes a scripting approach to building simulations that is particularly well suited for modeling highly complex breeding programs, such as commercial breeding programs. The primary benefit of this scripting approach is that it frees users from preset breeding program designs and allows them to model nearly any breeding program design. This paper lists the main features of AlphaSimR and provides a brief example simulation to show how to use the software.
Collapse
Affiliation(s)
- R Chris Gaynor
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Research Centre, Midlothian EH25 9RG, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Research Centre, Midlothian EH25 9RG, UK
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Research Centre, Midlothian EH25 9RG, UK
| |
Collapse
|
44
|
Rowan TN, Durbin HJ, Seabury CM, Schnabel RD, Decker JE. Powerful detection of polygenic selection and evidence of environmental adaptation in US beef cattle. PLoS Genet 2021; 17:e1009652. [PMID: 34292938 PMCID: PMC8297814 DOI: 10.1371/journal.pgen.1009652] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 06/09/2021] [Indexed: 12/19/2022] Open
Abstract
Selection on complex traits can rapidly drive evolution, especially in stressful environments. This polygenic selection does not leave intense sweep signatures on the genome, rather many loci experience small allele frequency shifts, resulting in large cumulative phenotypic changes. Directional selection and local adaptation are changing populations; but, identifying loci underlying polygenic or environmental selection has been difficult. We use genomic data on tens of thousands of cattle from three populations, distributed over time and landscapes, in linear mixed models with novel dependent variables to map signatures of selection on complex traits and local adaptation. We identify 207 genomic loci associated with an animal's birth date, representing ongoing selection for monogenic and polygenic traits. Additionally, hundreds of additional loci are associated with continuous and discrete environments, providing evidence for historical local adaptation. These candidate loci highlight the nervous system's possible role in local adaptation. While advanced technologies have increased the rate of directional selection in cattle, it has likely been at the expense of historically generated local adaptation, which is especially problematic in changing climates. When applied to large, diverse cattle datasets, these selection mapping methods provide an insight into how selection on complex traits continually shapes the genome. Further, understanding the genomic loci implicated in adaptation may help us breed more adapted and efficient cattle, and begin to understand the basis for mammalian adaptation, especially in changing climates. These selection mapping approaches help clarify selective forces and loci in evolutionary, model, and agricultural contexts.
Collapse
Affiliation(s)
- Troy N. Rowan
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, United States of America
- Genetics Area Program, University of Missouri, Columbia, Missouri, United States of America
- Department of Animal Science, University of Tennessee, Knoxville, Tennessee, United States of America
- College of Veterinary Medicine, Large Animal Clinical Science, University of Tennessee, Knoxville, Tennessee, United States of America
| | - Harly J. Durbin
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, United States of America
- Genetics Area Program, University of Missouri, Columbia, Missouri, United States of America
| | - Christopher M. Seabury
- Department of Veterinary Pathobiology, Texas A&M University, College Station, Texas, United States of America
| | - Robert D. Schnabel
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, United States of America
- Genetics Area Program, University of Missouri, Columbia, Missouri, United States of America
- Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, United States of America
| | - Jared E. Decker
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, United States of America
- Genetics Area Program, University of Missouri, Columbia, Missouri, United States of America
- Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, United States of America
| |
Collapse
|
45
|
Gonen S, Wimmer V, Gaynor RC, Byrne E, Gorjanc G, Hickey JM. Phasing and imputation of single nucleotide polymorphism data of missing parents of biparental plant populations. CROP SCIENCE 2021; 61:2243-2253. [PMID: 34413534 PMCID: PMC8362159 DOI: 10.1002/csc2.20409] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 11/07/2020] [Indexed: 06/13/2023]
Abstract
This paper presents an extension to a heuristic method for phasing and imputation of genotypes of descendants in biparental populations so that it can phase and impute genotypes of parents that are ungenotyped or partially genotyped. The imputed genotypes of the parent are used to impute low-density (Ld) genotyped descendants to high density (Hd). The extension was implemented as part of the AlphaPlantImpute software and works in three steps. First, it identifies whether a parent has no or Ld genotypes and identifies its relatives that have Hd genotypes. Second, using the Hd genotypes of relatives, it determines whether the parent is homozygous or heterozygous for a given locus. Third, it phases heterozygous positions of the parent by matching haplotypes to its relatives. We measured the accuracy (correlation between true and imputed genotypes) of imputing parent genotypes in simulated biparental populations from different scenarios. We tested the imputation accuracy of the missing parent's descendants using the true genotype of the parent and compared this with using the imputed genotypes of the parent. Across all scenarios, the imputation accuracy of a parent was >0.98 and did not drop below ∼0.96. The imputation accuracy of a parent was always higher when it was inbred than outbred. Including ancestors of the parent at Hd, increasing the number of crosses and the number of Hd descendants increased the imputation accuracy. The high imputation accuracy achieved for the parent translated to little or no impact on the imputation accuracy of its descendants.
Collapse
Affiliation(s)
- Serap Gonen
- The Roslin Institute and Royal (Dick) School of Veterinary StudiesUniversity of EdinburghEaster Bush Research CentreMidlothianEH25 9RGUK
| | | | - R. Chris Gaynor
- The Roslin Institute and Royal (Dick) School of Veterinary StudiesUniversity of EdinburghEaster Bush Research CentreMidlothianEH25 9RGUK
| | - Ed Byrne
- KWS‐UK Ltd56 Church StreetThriplowHertfordshireSG8 7REUK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary StudiesUniversity of EdinburghEaster Bush Research CentreMidlothianEH25 9RGUK
| | - John M. Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary StudiesUniversity of EdinburghEaster Bush Research CentreMidlothianEH25 9RGUK
| |
Collapse
|
46
|
Johnsson M, Whalen A, Ros-Freixedes R, Gorjanc G, Chen CY, Herring WO, de Koning DJ, Hickey JM. Genetic variation in recombination rate in the pig. Genet Sel Evol 2021; 53:54. [PMID: 34171988 PMCID: PMC8235837 DOI: 10.1186/s12711-021-00643-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 06/02/2021] [Indexed: 11/10/2022] Open
Abstract
Background Meiotic recombination results in the exchange of genetic material between homologous chromosomes. Recombination rate varies between different parts of the genome, between individuals, and is influenced by genetics. In this paper, we assessed the genetic variation in recombination rate along the genome and between individuals in the pig using multilocus iterative peeling on 150,000 individuals across nine genotyped pedigrees. We used these data to estimate the heritability of recombination and perform a genome-wide association study of recombination in the pig. Results Our results confirmed known features of the recombination landscape of the pig genome, including differences in genetic length of chromosomes and marked sex differences. The recombination landscape was repeatable between lines, but at the same time, there were differences in average autosome-wide recombination rate between lines. The heritability of autosome-wide recombination rate was low but not zero (on average 0.07 for females and 0.05 for males). We found six genomic regions that are associated with recombination rate, among which five harbour known candidate genes involved in recombination: RNF212, SHOC1, SYCP2, MSH4 and HFM1. Conclusions Our results on the variation in recombination rate in the pig genome agree with those reported for other vertebrates, with a low but nonzero heritability, and the identification of a major quantitative trait locus for recombination rate that is homologous to that detected in several other species. This work also highlights the utility of using large-scale livestock data to understand biological processes. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-021-00643-0.
Collapse
Affiliation(s)
- Martin Johnsson
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, EH25 9RG, Scotland, UK. .,Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, P.O. Box 7023, 750 07, Uppsala, Sweden.
| | - Andrew Whalen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, EH25 9RG, Scotland, UK
| | - Roger Ros-Freixedes
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, EH25 9RG, Scotland, UK.,Departament de Ciència Animal, Universitat de Lleida-Agrotecnio-CERCA Center, Lleida, Spain
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, EH25 9RG, Scotland, UK
| | - Ching-Yi Chen
- Pig Improvement Company, Genus plc, 100 Bluegrass Commons Blvd., Ste2200, Hendersonville, TN, 37075, USA
| | - William O Herring
- Pig Improvement Company, Genus plc, 100 Bluegrass Commons Blvd., Ste2200, Hendersonville, TN, 37075, USA
| | - Dirk-Jan de Koning
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, P.O. Box 7023, 750 07, Uppsala, Sweden
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, EH25 9RG, Scotland, UK
| |
Collapse
|
47
|
Korgaonkar A, Han C, Lemire AL, Siwanowicz I, Bennouna D, Kopec RE, Andolfatto P, Shigenobu S, Stern DL. A novel family of secreted insect proteins linked to plant gall development. Curr Biol 2021; 31:1836-1849.e12. [PMID: 33657407 PMCID: PMC8119383 DOI: 10.1016/j.cub.2021.01.104] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 12/23/2020] [Accepted: 01/28/2021] [Indexed: 12/17/2022]
Abstract
In an elaborate form of inter-species exploitation, many insects hijack plant development to induce novel plant organs called galls that provide the insect with a source of nutrition and a temporary home. Galls result from dramatic reprogramming of plant cell biology driven by insect molecules, but the roles of specific insect molecules in gall development have not yet been determined. Here, we study the aphid Hormaphis cornu, which makes distinctive "cone" galls on leaves of witch hazel Hamamelis virginiana. We found that derived genetic variants in the aphid gene determinant of gall color (dgc) are associated with strong downregulation of dgc transcription in aphid salivary glands, upregulation in galls of seven genes involved in anthocyanin synthesis, and deposition of two red anthocyanins in galls. We hypothesize that aphids inject DGC protein into galls and that this results in differential expression of a small number of plant genes. dgc is a member of a large, diverse family of novel predicted secreted proteins characterized by a pair of widely spaced cysteine-tyrosine-cysteine (CYC) residues, which we named BICYCLE proteins. bicycle genes are most strongly expressed in the salivary glands specifically of galling aphid generations, suggesting that they may regulate many aspects of gall development. bicycle genes have experienced unusually frequent diversifying selection, consistent with their potential role controlling gall development in a molecular arms race between aphids and their host plants.
Collapse
Affiliation(s)
- Aishwarya Korgaonkar
- Janelia Research Campus of the Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Clair Han
- Janelia Research Campus of the Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Andrew L Lemire
- Janelia Research Campus of the Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Igor Siwanowicz
- Janelia Research Campus of the Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Djawed Bennouna
- Human Nutrition Program, Department of Human Sciences, The Ohio State University, 262G Campbell Hall, 1787 Neil Avenue, Columbus, OH 43210, USA
| | - Rachel E Kopec
- Human Nutrition Program, Department of Human Sciences, The Ohio State University, 262G Campbell Hall, 1787 Neil Avenue, Columbus, OH 43210, USA; Ohio State University's Foods for Health Discovery Theme, The Ohio State University, 262G Campbell Hall, 1787 Neil Avenue, Columbus, OH 43210, USA
| | - Peter Andolfatto
- Department of Biology, Columbia University, 600 Fairchild Center, New York, NY 10027, USA
| | - Shuji Shigenobu
- Laboratory of Evolutionary Genomics, Center for the Development of New Model Organism, National Institute for Basic Biology, Okazaki 444-8585, Japan; NIBB Research Core Facilities, National Institute for Basic Biology, Okazaki 444-8585, Japan; Department of Basic Biology, School of Life Science, SOKENDAI (The Graduate University for Advanced Studies), 38 Nishigonaka, Myodaiji, Okazaki 444-8585, Japan
| | - David L Stern
- Janelia Research Campus of the Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA.
| |
Collapse
|
48
|
Korgaonkar A, Han C, Lemire AL, Siwanowicz I, Bennouna D, Kopec RE, Andolfatto P, Shigenobu S, Stern DL. A novel family of secreted insect proteins linked to plant gall development. Curr Biol 2021. [PMID: 33974861 DOI: 10.1101/2020.10.28.359562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
AbstractIn an elaborate form of inter-species exploitation, many insects hijack plant development to induce novel plant organs called galls that provide the insect with a source of nutrition and a temporary home. Galls result from dramatic reprogramming of plant cell biology driven by insect molecules, but the roles of specific insect molecules in gall development have not yet been determined. Here we study the aphidHormaphis cornu, which makes distinctive “cone” galls on leaves of witch hazelHamamelis virginiana. We found that derived genetic variants in the aphid genedeterminant of gall color(dgc) are associated with strong downregulation ofdgctranscription in aphid salivary glands, upregulation in galls of seven genes involved in anthocyanin synthesis, and deposition of two red anthocyanins in galls. We hypothesize that aphids inject DGC protein into galls, and that this results in differential expression of a small number of plant genes.Dgcis a member of a large, diverse family of novel predicted secreted proteins characterized by a pair of widely spaced cysteine-tyrosine-cysteine (CYC) residues, which we named BICYCLE proteins.Bicyclegenes are most strongly expressed in the salivary glands specifically of galling aphid generations, suggesting that they may regulate many aspects of gall development.Bicyclegenes have experienced unusually frequent diversifying selection, consistent with their potential role controlling gall development in a molecular arms race between aphids and their host plants.One Sentence SummaryAphidbicyclegenes, which encode diverse secreted proteins, contribute to plant gall development.
Collapse
|
49
|
Long-term comparison between index selection and optimal independent culling in plant breeding programs with genomic prediction. PLoS One 2021; 16:e0235554. [PMID: 33970915 PMCID: PMC8109766 DOI: 10.1371/journal.pone.0235554] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 01/20/2021] [Indexed: 11/19/2022] Open
Abstract
In the context of genomic selection, we evaluated and compared breeding programs using either index selection or independent culling for recurrent selection of parents. We simulated a clonally propagated crop breeding program for 20 cycles using either independent culling or an economic index with two unfavourably correlated traits under selection. Cycle time from crossing to selection of parents was kept the same for both strategies. Both methods led to increasingly unfavourable genetic correlations between traits and, compared to independent culling, index selection led to larger changes in the genetic correlation between the two traits. When linkage disequilibrium was not considered, the two methods had similar losses of genetic diversity. Two independent culling approaches were evaluated, one using optimal culling levels and one using the same selection intensity for both traits. Optimal culling levels outperformed the same selection intensity even when traits had the same economic importance. Therefore, accurately estimating optimal culling levels is essential for maximizing gains when independent culling is performed. Once optimal culling levels are achieved, independent culling and index selection lead to comparable genetic gains.
Collapse
|
50
|
Svedberg J, Shchur V, Reinman S, Nielsen R, Corbett-Detig R. Inferring Adaptive Introgression Using Hidden Markov Models. Mol Biol Evol 2021; 38:2152-2165. [PMID: 33502512 PMCID: PMC8097282 DOI: 10.1093/molbev/msab014] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Adaptive introgression-the flow of adaptive genetic variation between species or populations-has attracted significant interest in recent years and it has been implicated in a number of cases of adaptation, from pesticide resistance and immunity, to local adaptation. Despite this, methods for identification of adaptive introgression from population genomic data are lacking. Here, we present Ancestry_HMM-S, a hidden Markov model-based method for identifying genes undergoing adaptive introgression and quantifying the strength of selection acting on them. Through extensive validation, we show that this method performs well on moderately sized data sets for realistic population and selection parameters. We apply Ancestry_HMM-S to a data set of an admixed Drosophila melanogaster population from South Africa and we identify 17 loci which show signatures of adaptive introgression, four of which have previously been shown to confer resistance to insecticides. Ancestry_HMM-S provides a powerful method for inferring adaptive introgression in data sets that are typically collected when studying admixed populations. This method will enable powerful insights into the genetic consequences of admixture across diverse populations. Ancestry_HMM-S can be downloaded from https://github.com/jesvedberg/Ancestry_HMM-S/.
Collapse
Affiliation(s)
- Jesper Svedberg
- Department of Biomolecular Engineering, Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Vladimir Shchur
- National Research University Higher School of Economics, Moscow, Russian Federation
| | - Solomon Reinman
- Department of Biomolecular Engineering, Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Rasmus Nielsen
- National Research University Higher School of Economics, Moscow, Russian Federation
- Department of Integrative Biology and Department of Statistics, UC Berkeley, Berkeley, CA, USA
- Center for GeoGenetics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
- National Research University Higher School of Economics, Moscow, Russian Federation
| |
Collapse
|