1
|
Weber SE, Chawla HS, Ehrig L, Hickey LT, Frisch M, Snowdon RJ. Accurate prediction of quantitative traits with failed SNP calls in canola and maize. FRONTIERS IN PLANT SCIENCE 2023; 14:1221750. [PMID: 37936929 PMCID: PMC10627008 DOI: 10.3389/fpls.2023.1221750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 10/05/2023] [Indexed: 11/09/2023]
Abstract
In modern plant breeding, genomic selection is becoming the gold standard to select superior genotypes in large breeding populations that are only partially phenotyped. Many breeding programs commonly rely on single-nucleotide polymorphism (SNP) markers to capture genome-wide data for selection candidates. For this purpose, SNP arrays with moderate to high marker density represent a robust and cost-effective tool to generate reproducible, easy-to-handle, high-throughput genotype data from large-scale breeding populations. However, SNP arrays are prone to technical errors that lead to failed allele calls. To overcome this problem, failed calls are often imputed, based on the assumption that failed SNP calls are purely technical. However, this ignores the biological causes for failed calls-for example: deletions-and there is increasing evidence that gene presence-absence and other kinds of genome structural variants can play a role in phenotypic expression. Because deletions are frequently not in linkage disequilibrium with their flanking SNPs, permutation of missing SNP calls can potentially obscure valuable marker-trait associations. In this study, we analyze published datasets for canola and maize using four parametric and two machine learning models and demonstrate that failed allele calls in genomic prediction are highly predictive for important agronomic traits. We present two statistical pipelines, based on population structure and linkage disequilibrium, that enable the filtering of failed SNP calls that are likely caused by biological reasons. For the population and trait examined, prediction accuracy based on these filtered failed allele calls was competitive to standard SNP-based prediction, underlying the potential value of missing data in genomic prediction approaches. The combination of SNPs with all failed allele calls or the filtered allele calls did not outperform predictions with only SNP-based prediction due to redundancy in genomic relationship estimates.
Collapse
Affiliation(s)
- Sven E. Weber
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | | | - Lennard Ehrig
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Lee T. Hickey
- Centre for Crop Science, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD, Australia
| | - Matthias Frisch
- Department of Biometry and Population Genetics, Justus Liebig University, Giessen, Germany
| | - Rod J. Snowdon
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| |
Collapse
|
2
|
Della Coletta R, Fernandes SB, Monnahan PJ, Mikel MA, Bohn MO, Lipka AE, Hirsch CN. Importance of genetic architecture in marker selection decisions for genomic prediction. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:220. [PMID: 37819415 DOI: 10.1007/s00122-023-04469-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 09/25/2023] [Indexed: 10/13/2023]
Abstract
KEY MESSAGE We demonstrate potential for improved multi-environment genomic prediction accuracy using structural variant markers. However, the degree of observed improvement is highly dependent on the genetic architecture of the trait. Breeders commonly use genetic markers to predict the performance of untested individuals as a way to improve the efficiency of breeding programs. These genomic prediction models have almost exclusively used single nucleotide polymorphisms (SNPs) as their source of genetic information, even though other types of markers exist, such as structural variants (SVs). Given that SVs are associated with environmental adaptation and not all of them are in linkage disequilibrium to SNPs, SVs have the potential to bring additional information to multi-environment prediction models that are not captured by SNPs alone. Here, we evaluated different marker types (SNPs and/or SVs) on prediction accuracy across a range of genetic architectures for simulated traits across multiple environments. Our results show that SVs can improve prediction accuracy, but it is highly dependent on the genetic architecture of the trait and the relative gain in accuracy is minimal. When SVs are the only causative variant type, 70% of the time SV predictors outperform SNP predictors. However, the improvement in accuracy in these instances is only 1.5% on average. Further simulations with predictors in varying degrees of LD with causative variants of different types (e.g., SNPs, SVs, SNPs and SVs) showed that prediction accuracy increased as linkage disequilibrium between causative variants and predictors increased regardless of the marker type. This study demonstrates that knowing the genetic architecture of a trait in deciding what markers to use in large-scale genomic prediction modeling in a breeding program is more important than what types of markers to use.
Collapse
Affiliation(s)
- Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Samuel B Fernandes
- Department of Crop, Soil and Environmental Sciences at University of Arkansas, Fayetteville, AR, 72701, USA
| | - Patrick J Monnahan
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Mark A Mikel
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Martin O Bohn
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Alexander E Lipka
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA.
| |
Collapse
|
3
|
Nguyen TV, Vander Jagt CJ, Wang J, Daetwyler HD, Xiang R, Goddard ME, Nguyen LT, Ross EM, Hayes BJ, Chamberlain AJ, MacLeod IM. In it for the long run: perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants. Genet Sel Evol 2023; 55:9. [PMID: 36721111 PMCID: PMC9887926 DOI: 10.1186/s12711-023-00783-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 01/23/2023] [Indexed: 02/02/2023] Open
Abstract
Studies have demonstrated that structural variants (SV) play a substantial role in the evolution of species and have an impact on Mendelian traits in the genome. However, unlike small variants (< 50 bp), it has been challenging to accurately identify and genotype SV at the population scale using short-read sequencing. Long-read sequencing technologies are becoming competitively priced and can address several of the disadvantages of short-read sequencing for the discovery and genotyping of SV. In livestock species, analysis of SV at the population scale still faces challenges due to the lack of resources, high costs, technological barriers, and computational limitations. In this review, we summarize recent progress in the characterization of SV in the major livestock species, the obstacles that still need to be overcome, as well as the future directions in this growing field. It seems timely that research communities pool resources to build global population-scale long-read sequencing consortiums for the major livestock species for which the application of genomic tools has become cost-effective.
Collapse
Affiliation(s)
- Tuan V. Nguyen
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | - Christy J. Vander Jagt
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | - Jianghui Wang
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | - Hans D. Daetwyler
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1018.80000 0001 2342 0938School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083 Australia
| | - Ruidong Xiang
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1008.90000 0001 2179 088XFaculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052 Australia
| | - Michael E. Goddard
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1008.90000 0001 2179 088XFaculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052 Australia
| | - Loan T. Nguyen
- grid.1003.20000 0000 9320 7537Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Elizabeth M. Ross
- grid.1003.20000 0000 9320 7537Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Ben J. Hayes
- grid.1003.20000 0000 9320 7537Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Amanda J. Chamberlain
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1018.80000 0001 2342 0938School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083 Australia
| | - Iona M. MacLeod
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| |
Collapse
|
4
|
Blaj I, Tetens J, Bennewitz J, Thaller G, Falker-Gieske C. Structural variants and tandem repeats in the founder individuals of four F 2 pig crosses and implications to F 2 GWAS results. BMC Genomics 2022; 23:631. [PMID: 36057580 PMCID: PMC9440560 DOI: 10.1186/s12864-022-08716-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 06/23/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Structural variants and tandem repeats are relevant sources of genomic variation that are not routinely analyzed in genome wide association studies mainly due to challenging identification and genotyping. Here, we profiled these variants via state-of-the-art strategies in the founder animals of four F2 pig crosses using whole-genome sequence data (20x coverage). The variants were compared at a founder level with the commonly screened SNPs and small indels. At the F2 level, we carried out an association study using imputed structural variants and tandem repeats with four growth and carcass traits followed by a comparison with a previously conducted SNPs and small indels based association study. RESULTS A total of 13,201 high confidence structural variants and 103,730 polymorphic tandem repeats (with a repeat length of 2-20 bp) were profiled in the founders. We observed a moderate to high (r from 0.48 to 0.57) level of co-localization between SNPs or small indels and structural variants or tandem repeats. In the association step 56.56% of the significant variants were not in high LD with significantly associated SNPs and small indels identified for the same traits in the earlier study and thus presumably not tagged in case of a standard association study. For the four growth and carcass traits investigated, many of the already proposed candidate genes in our previous studies were confirmed and additional ones were identified. Interestingly, a common pattern on how structural variants or tandem repeats regulate the phenotypic traits emerged. Many of the significant variants were embedded or nearby long non-coding RNAs drawing attention to their functional importance. Through which specific mechanisms the identified long non-coding RNAs and their associated structural variants or tandem repeats contribute to quantitative trait variation will need further investigation. CONCLUSIONS The current study provides insights into the characteristics of structural variants and tandem repeats and their role in association studies. A systematic incorporation of these variants into genome wide association studies is advised. While not of immediate interest for genomic prediction purposes, this will be particularly beneficial for elucidating biological mechanisms driving the complex trait variation.
Collapse
Affiliation(s)
- Iulia Blaj
- Institute of Animal Breeding and Husbandry, Kiel University, Kiel, Germany.
| | - Jens Tetens
- Department of Animal Sciences, Georg-August-University, Göttingen, Germany
- Center for Integrated Breeding Research, Georg-August-University, Göttingen, Germany
| | - Jörn Bennewitz
- Institute of Animal Husbandry and Breeding, University of Hohenheim, Stuttgart, Germany
| | - Georg Thaller
- Institute of Animal Breeding and Husbandry, Kiel University, Kiel, Germany
| | | |
Collapse
|
5
|
Ruigrok M, Xue B, Catanach A, Zhang M, Jesson L, Davy M, Wellenreuther M. The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus. Genes (Basel) 2022; 13:genes13071129. [PMID: 35885912 PMCID: PMC9320665 DOI: 10.3390/genes13071129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 06/08/2022] [Accepted: 06/20/2022] [Indexed: 02/04/2023] Open
Abstract
Background: Genetic diversity provides the basic substrate for evolution. Genetic variation consists of changes ranging from single base pairs (single-nucleotide polymorphisms, or SNPs) to larger-scale structural variants, such as inversions, deletions, and duplications. SNPs have long been used as the general currency for investigations into how genetic diversity fuels evolution. However, structural variants can affect more base pairs in the genome than SNPs and can be responsible for adaptive phenotypes due to their impact on linkage and recombination. In this study, we investigate the first steps needed to explore the genetic basis of an economically important growth trait in the marine teleost finfish Chrysophrys auratus using both SNP and structural variant data. Specifically, we use feature selection methods in machine learning to explore the relative predictive power of both types of genetic variants in explaining growth and discuss the feature selection results of the evaluated methods. Methods: SNP and structural variant callers were used to generate catalogues of variant data from 32 individual fish at ages 1 and 3 years. Three feature selection algorithms (ReliefF, Chi-square, and a mutual-information-based method) were used to reduce the dataset by selecting the most informative features. Following this selection process, the subset of variants was used as features to classify fish into small, medium, or large size categories using KNN, naïve Bayes, random forest, and logistic regression. The top-scoring features in each feature selection method were subsequently mapped to annotated genomic regions in the zebrafish genome, and a permutation test was conducted to see if the number of mapped regions was greater than when random sampling was applied. Results: Without feature selection, the prediction accuracies ranged from 0 to 0.5 for both structural variants and SNPs. Following feature selection, the prediction accuracy increased only slightly to between 0 and 0.65 for structural variants and between 0 and 0.75 for SNPs. The highest prediction accuracy for the logistic regression was achieved for age 3 fish using SNPs, although generally predictions for age 1 and 3 fish were very similar (ranging from 0–0.65 for both SNPs and structural variants). The Chi-square feature selection of SNP data was the only method that had a significantly higher number of matches to annotated genomic regions of zebrafish than would be explained by chance alone. Conclusions: Predicting a complex polygenic trait such as growth using data collected from a low number of individuals remains challenging. While we demonstrate that both SNPs and structural variants provide important information to help understand the genetic basis of phenotypic traits such as fish growth, the full complexities that exist within a genome cannot be easily captured by classical machine learning techniques. When using high-dimensional data, feature selection shows some increase in the prediction accuracy of classification models and provides the potential to identify unknown genomic correlates with growth. Our results show that both SNPs and structural variants significantly impact growth, and we therefore recommend that researchers interested in the genotype–phenotype map should strive to go beyond SNPs and incorporate structural variants in their studies as well. We discuss how our machine learning models can be further expanded to serve as a test bed to inform evolutionary studies and the applied management of species.
Collapse
Affiliation(s)
- Mike Ruigrok
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
- Wellington Faculty of Engineering, Victoria University of Wellington, Wellington 6012, New Zealand; (B.X.); (M.Z.)
| | - Bing Xue
- Wellington Faculty of Engineering, Victoria University of Wellington, Wellington 6012, New Zealand; (B.X.); (M.Z.)
| | - Andrew Catanach
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
| | - Mengjie Zhang
- Wellington Faculty of Engineering, Victoria University of Wellington, Wellington 6012, New Zealand; (B.X.); (M.Z.)
| | - Linley Jesson
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
| | - Marcus Davy
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
| | - Maren Wellenreuther
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
- Correspondence:
| |
Collapse
|
6
|
Li T, Chen B, Wei C, Hou D, Qin P, Jing Z, Ma H, Niu X, Wang C, Han R, Li H, Liu X, Xu H, Kang X, Li Z. A 104-bp Structural Variation of the ADPRHL1 Gene Is Associated With Growth Traits in Chickens. Front Genet 2021; 12:691272. [PMID: 34512719 PMCID: PMC8427608 DOI: 10.3389/fgene.2021.691272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 07/29/2021] [Indexed: 11/13/2022] Open
Abstract
Analyzing marker-assisted breeding is an important method utilized in modern molecular breeding. Recent studies have determined that a large number of molecular markers appear to explain the impact of "lost heritability" on human height. Therefore, it is necessary to locate molecular marker sites in poultry and investigate the possible molecular mechanisms governing their effects. In this study, we found a 104-bp insertion/deletion polymorphism in the 5'UTR of the ADPRHL1 gene through resequencing. In cross-designed F2 resource groups, the indel was significantly associated with weight at 0, 2, 4, 6, and 10 weeks and a number of other traits [carcass weight (CW), semi-evisceration weight (SEW), evisceration weight (EW), claw weight (CLW), wings weight (DWW), gizzard weight (GW), pancreas weight (PW), chest muscle weight (CMW), leg weight (LW), leg muscle weight (LMW), shedding Weight (SW), liver rate (LR), and leg muscle rate (LMR)] (P < 0.05). In brief, the insertion-insertion (II) genotype was significantly associated with the greatest growth traits and meat quality traits, whereas the values associated with the insertion-deletion (ID) genotype were the lowest in the F2 reciprocal cross chickens. The mutation sites were genotyped in 4,526 individuals from 12 different chicken breeds and cross-designed F2 resource groups. The II genotype is the most important genotype in commercial broilers, and the I allele frequency observed in these breeds is relatively high. Deletion mutations tend to be fixed in commercial broilers. However, there is still considerable great potential for breeding in dual-purpose chickens and commercial laying hens. A luciferase reporter assay showed that the II genotype of the ADPRHL1 gene possessed 2.49-fold higher promoter activity than the DD genotype (P < 0.05). We hypothesized that this indel might affect the transcriptional activity of ADPRHL1, thereby affecting the growth traits of chickens. These findings may help to elucidate the function of the ADPRHL1 gene and facilitate enhanced reproduction in the chicken industry.
Collapse
Affiliation(s)
- Tong Li
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Bingjie Chen
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Chengjie Wei
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Dan Hou
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Panpan Qin
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Zhenzhu Jing
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Haoran Ma
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Xinran Niu
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Chunxiu Wang
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Ruili Han
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Hong Li
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Xiaojun Liu
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China.,Henan Innovative Engineering Research Center of Poultry Germplasm Resource, Henan Agricultural University, Zhengzhou, China
| | - Huifen Xu
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Xiangtao Kang
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China.,Henan Innovative Engineering Research Center of Poultry Germplasm Resource, Henan Agricultural University, Zhengzhou, China
| | - Zhuanjian Li
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, China.,Henan Innovative Engineering Research Center of Poultry Germplasm Resource, Henan Agricultural University, Zhengzhou, China
| |
Collapse
|