1
|
Weber SE, Roscher-Ehrig L, Kox T, Abbadi A, Stahl A, Snowdon RJ. Genomic prediction in Brassica napus: evaluating the benefit of imputed whole-genome sequencing data. Genome 2024; 67:210-222. [PMID: 38708850 DOI: 10.1139/gen-2023-0126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Advances in sequencing technology allow whole plant genomes to be sequenced with high quality. Combining genotypic and phenotypic data in genomic prediction helps breeders to select crossing partners in partially phenotyped populations. In plant breeding programs, the cost of sequencing entire breeding populations still exceeds available genotyping budgets. Hence, the method for genotyping is still mainly single nucleotide polymorphism (SNP) arrays; however, arrays are unable to assess the entire genome- and population-wide diversity. A compromise involves genotyping the entire population using an SNP array and a subset of the population with whole-genome sequencing. Both datasets can then be used to impute markers from whole-genome sequencing onto the entire population. Here, we evaluate whether imputation of whole-genome sequencing data enhances genomic predictions, using data from a nested association mapping population of rapeseed (Brassica napus). Employing two cross-validation schemes that mimic scenarios for the prediction of close and distant relatives, we show that imputed marker data do not significantly improve prediction accuracy, likely due to redundancy in relationship estimates and imputation errors. In simulation studies, only small improvements were observed, further corroborating the findings. We conclude that SNP arrays are already equipped with the information that is added by imputation through relationship and linkage disequilibrium.
Collapse
Affiliation(s)
- Sven E Weber
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Giessen, Germany
| | - Lennard Roscher-Ehrig
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Giessen, Germany
| | | | | | - Andreas Stahl
- Julius Kuehn Institute (JKI), Federal Research Centre for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Quedlinburg, Germany
| | - Rod J Snowdon
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Giessen, Germany
| |
Collapse
|
2
|
Bhérer C, Eveleigh R, Trajanoska K, St-Cyr J, Paccard A, Nadukkalam Ravindran P, Caron E, Bader Asbah N, McClelland P, Wei C, Baumgartner I, Schindewolf M, Döring Y, Perley D, Lefebvre F, Lepage P, Bourgey M, Bourque G, Ragoussis J, Mooser V, Taliun D. A cost-effective sequencing method for genetic studies combining high-depth whole exome and low-depth whole genome. NPJ Genom Med 2024; 9:8. [PMID: 38326393 PMCID: PMC10850497 DOI: 10.1038/s41525-024-00390-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 12/07/2023] [Indexed: 02/09/2024] Open
Abstract
Whole genome sequencing (WGS) at high-depth (30X) allows the accurate discovery of variants in the coding and non-coding DNA regions and helps elucidate the genetic underpinnings of human health and diseases. Yet, due to the prohibitive cost of high-depth WGS, most large-scale genetic association studies use genotyping arrays or high-depth whole exome sequencing (WES). Here we propose a cost-effective method which we call "Whole Exome Genome Sequencing" (WEGS), that combines low-depth WGS and high-depth WES with up to 8 samples pooled and sequenced simultaneously (multiplexed). We experimentally assess the performance of WEGS with four different depth of coverage and sample multiplexing configurations. We show that the optimal WEGS configurations are 1.7-2.0 times cheaper than standard WES (no-plexing), 1.8-2.1 times cheaper than high-depth WGS, reach similar recall and precision rates in detecting coding variants as WES, and capture more population-specific variants in the rest of the genome that are difficult to recover when using genotype imputation methods. We apply WEGS to 862 patients with peripheral artery disease and show that it directly assesses more known disease-associated variants than a typical genotyping array and thousands of non-imputable variants per disease-associated locus.
Collapse
Affiliation(s)
- Claude Bhérer
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Robert Eveleigh
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canadian Centre for Computational Genomics, McGill University, Montréal, Québec, Canada
| | - Katerina Trajanoska
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Janick St-Cyr
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Antoine Paccard
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Praveen Nadukkalam Ravindran
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Elizabeth Caron
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Nimara Bader Asbah
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Peyton McClelland
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Clare Wei
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Iris Baumgartner
- Division of Angiology, Swiss Cardiovascular Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland
| | - Marc Schindewolf
- Division of Angiology, Swiss Cardiovascular Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland
| | - Yvonne Döring
- Division of Angiology, Swiss Cardiovascular Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland
- Institute for Cardiovascular Prevention (IPEK), Ludwig-Maximilians University Munich, Pettenkoferstr 9, 80336, Munich, Germany
| | - Danielle Perley
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canadian Centre for Computational Genomics, McGill University, Montréal, Québec, Canada
| | - François Lefebvre
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canadian Centre for Computational Genomics, McGill University, Montréal, Québec, Canada
| | - Pierre Lepage
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | | | - Guillaume Bourque
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canadian Centre for Computational Genomics, McGill University, Montréal, Québec, Canada
| | - Jiannis Ragoussis
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
| | - Vincent Mooser
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada
| | - Daniel Taliun
- Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, Montréal, Québec, Canada.
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec, Canada.
- Canada Excellence Research Chair in Genomic Medicine, McGill University, Montréal, Québec, Canada.
| |
Collapse
|
3
|
Baldrighi GN, Nova A, Bernardinelli L, Fazia T. A Pipeline for Phasing and Genotype Imputation on Mixed Human Data (Parents-Offspring Trios and Unrelated Subjects) by Reviewing Current Methods and Software. LIFE (BASEL, SWITZERLAND) 2022; 12:life12122030. [PMID: 36556394 PMCID: PMC9781110 DOI: 10.3390/life12122030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 12/01/2022] [Accepted: 12/02/2022] [Indexed: 12/09/2022]
Abstract
Genotype imputation has become an essential prerequisite when performing association analysis. It is a computational technique that allows us to infer genetic markers that have not been directly genotyped, thereby increasing statistical power in subsequent association studies, which consequently has a crucial impact on the identification of causal variants. Many features need to be considered when choosing the proper algorithm for imputation, including the target sample on which it is performed, i.e., related individuals, unrelated individuals, or both. Problems could arise when dealing with a target sample made up of mixed data, composed of both related and unrelated individuals, especially since the scientific literature on this topic is not sufficiently clear. To shed light on this issue, we examined existing algorithms and software for performing phasing and imputation on mixed human data from SNP arrays, specifically when related subjects belong to trios. By discussing the advantages and limitations of the current algorithms, we identified LD-based methods as being the most suitable for reconstruction of haplotypes in this specific context, and we proposed a feasible pipeline that can be used for imputing genotypes in both phased and unphased human data.
Collapse
|
4
|
Xu P, Li D, Wu Z, Ni L, Liu J, Tang Y, Yu T, Ren J, Zhao X, Huang M. An imputation-based genome-wide association study for growth and fatness traits in Sujiang pigs. Animal 2022; 16:100591. [PMID: 35872387 DOI: 10.1016/j.animal.2022.100591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 06/15/2022] [Accepted: 06/16/2022] [Indexed: 11/01/2022] Open
Abstract
Sujiang pigs are a synthetic breed derived from Jiangquhai, Fengjing, and Duroc pigs. In this study, we sequenced the genome of 62 pigs with a coverage depth of 10× to 20×, including 27 Sujiang and 35 founder breed pigs, and we collected 360 global pigs' genome sequence data from public databases including 39 Duroc pigs. We obtained a high-quality variant dataset of 365 Sujiang pigs by imputing the porcine 80 K single nucleotide polymorphism (SNP) Beadchip to the whole-genome scale with a total of 422 pigs as a reference panel. A dataset of 365 imputated Sujiang pigs was used to perform single-trait genome-wide association study (GWAS) and meta-analyses for growth and fatness traits. Single-trait GWAS identified 1 907, 18, and 14 SNPs surpassing the suggestively significant threshold for backfat thickness, chest circumference, and chest width, respectively. Meta-analyses identified 2 400 genome-wide significant SNPs and 520 suggestively significant SNPs for backfat thickness and chest circumference, and 719 genome-wide significant SNPs and 1 225 suggestively significant SNPs for all seven traits. According to the meta-analysis of backfat thickness and chest circumference, a remarkable region of 2.69 Mb on Sus scrofa chromosome 4 containing FAM110B, IMPAD1, LYN, MOS, PENK, PLAG1, SDR16C5 and XKR4 was identified as a candidate region. The haplotype heat map of the 2.69 Mb region verified that Sujiang pigs were derived from Duroc and Chinese indigenous pigs, especially Jiangquhai pigs. The Kruskal-Wallis test showed that haplotypes of the 2.69 Mb region significantly affected backfat thickness and chest circumference traits. We then focused on PLAG1, an important growth-related gene, and identified two synonymous SNPs with obvious differences among different breeds in the PLAG1 gene. We then performed genotyping of 365 Sujiang, 150 Duroc, 95 Jiangquhai, and 100 Fengjing pigs to confirm the above result and verified that the two variants significantly affected phenotypes of growth and fatness traits. Our findings not only provide insights into the genetic architecture of porcine growth and fatness traits but also provide potential markers for selective breeding of these traits in Sujiang pigs.
Collapse
Affiliation(s)
- Pan Xu
- School of Animal Science and Technology, Jiangsu Agri-animal Husbandry Vocational College, Taizhou, PR China
| | - Desen Li
- College of Animal Science, South China Agricultural University, Guangzhou, PR China
| | - Zhongping Wu
- Zhongkai University of Agriculture and Engineering, Guangzhou, PR China
| | - Ligang Ni
- School of Animal Science and Technology, Jiangsu Agri-animal Husbandry Vocational College, Taizhou, PR China
| | - Jiaxing Liu
- School of Animal Science and Technology, Jiangsu Agri-animal Husbandry Vocational College, Taizhou, PR China
| | - Ying Tang
- School of Animal Science and Technology, Jiangsu Agri-animal Husbandry Vocational College, Taizhou, PR China
| | - Tongshun Yu
- School of Animal Science and Technology, Jiangsu Agri-animal Husbandry Vocational College, Taizhou, PR China
| | - Jun Ren
- College of Animal Science, South China Agricultural University, Guangzhou, PR China
| | - Xuting Zhao
- School of Animal Science and Technology, Jiangsu Agri-animal Husbandry Vocational College, Taizhou, PR China
| | - Min Huang
- College of Animal Science, South China Agricultural University, Guangzhou, PR China.
| |
Collapse
|
5
|
Avery CL, Howard AG, Ballou AF, Buchanan VL, Collins JM, Downie CG, Engel SM, Graff M, Highland HM, Lee MP, Lilly AG, Lu K, Rager JE, Staley BS, North KE, Gordon-Larsen P. Strengthening Causal Inference in Exposomics Research: Application of Genetic Data and Methods. ENVIRONMENTAL HEALTH PERSPECTIVES 2022; 130:55001. [PMID: 35533073 PMCID: PMC9084332 DOI: 10.1289/ehp9098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Advances in technologies to measure a broad set of exposures have led to a range of exposome research efforts. Yet, these efforts have insufficiently integrated methods that incorporate genetic data to strengthen causal inference, despite evidence that many exposome-associated phenotypes are heritable. Objective: We demonstrate how integration of methods and study designs that incorporate genetic data can strengthen causal inference in exposomics research by helping address six challenges: reverse causation and unmeasured confounding, comprehensive examination of phenotypic effects, low efficiency, replication, multilevel data integration, and characterization of tissue-specific effects. Examples are drawn from studies of biomarkers and health behaviors, exposure domains where the causal inference methods we describe are most often applied. Discussion: Technological, computational, and statistical advances in genotyping, imputation, and analysis, combined with broad data sharing and cross-study collaborations, offer multiple opportunities to strengthen causal inference in exposomics research. Full application of these opportunities will require an expanded understanding of genetic variants that predict exposome phenotypes as well as an appreciation that the utility of genetic variants for causal inference will vary by exposure and may depend on large sample sizes. However, several of these challenges can be addressed through international scientific collaborations that prioritize data sharing. Ultimately, we anticipate that efforts to better integrate methods that incorporate genetic data will extend the reach of exposomics research by helping address the challenges of comprehensively measuring the exposome and its health effects across studies, the life course, and in varied contexts and diverse populations. https://doi.org/10.1289/EHP9098.
Collapse
Affiliation(s)
- Christy L Avery
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Annie Green Howard
- Department of Biostatistics, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Anna F Ballou
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Victoria L Buchanan
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jason M Collins
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Carolina G Downie
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Stephanie M Engel
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Heather M Highland
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Moa P Lee
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Adam G Lilly
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Sociology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kun Lu
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Julia E Rager
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Brooke S Staley
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Penny Gordon-Larsen
- Department of Nutrition, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
6
|
Yatagai Y, Oshima H, Sakamoto T, Shigemasa R, Kitazawa H, Hyodo K, Masuko H, Iijima H, Naito T, Saito T, Hirota T, Tamari M, Hizawa N. Expression quantitative trait loci for ETV4 and MEOX1 are associated with adult asthma in Japanese populations. Sci Rep 2021; 11:18791. [PMID: 34552174 PMCID: PMC8458279 DOI: 10.1038/s41598-021-98348-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 09/06/2021] [Indexed: 12/04/2022] Open
Abstract
ETS variant transcription factor 4 (ETV4) is a recently identified transcription factor that regulates gene expression-based biomarkers of asthma and IL6 production in an airway epithelial cell line. Given that ETV4 has not yet been implicated in asthma genetics, we performed genetic association studies of adult asthma in the ETV4 region using two independent Japanese cohorts (a total of 1532 controls and 783 cases). SNPs located between ETV4 and mesenchyme homeobox 1 (MEOX1) were significantly associated with adult asthma, including rs4792901 and rs2880540 (P = 5.63E−5 and 2.77E−5, respectively). The CC haplotype of these two SNPs was also significantly associated with adult asthma (P = 8.43E−7). Even when both SNPs were included in a logistic regression model, the association of either rs4792901 or rs2880540 remained significant (P = 0.013 or 0.007, respectively), suggesting that the two SNPs may have independent effects on the development of asthma. Both SNPs were expression quantitative trait loci, and the asthma risk alleles at both SNPs were correlated with increased levels of ETV4 mRNA expression. In addition, the asthma risk allele at rs4792901 was associated with increased serum IL6 levels (P = 0.041) in 651 healthy adults. Our findings imply that ETV4 is involved in the pathogenesis of asthma, possibly through the heightened production of IL6.
Collapse
Affiliation(s)
- Yohei Yatagai
- Department of Pulmonary Medicine, Faculty of Medicine, University of Tsukuba, Ibaraki, Japan
| | - Hisayuki Oshima
- Department of Pulmonary Medicine, Faculty of Medicine, University of Tsukuba, Ibaraki, Japan
| | - Tohru Sakamoto
- Department of Pulmonary Medicine, Faculty of Medicine, University of Tsukuba, Ibaraki, Japan.
| | - Rie Shigemasa
- Department of Pulmonary Medicine, Faculty of Medicine, University of Tsukuba, Ibaraki, Japan
| | - Haruna Kitazawa
- Department of Pulmonary Medicine, Faculty of Medicine, University of Tsukuba, Ibaraki, Japan
| | - Kentaro Hyodo
- Department of Pulmonary Medicine, Faculty of Medicine, University of Tsukuba, Ibaraki, Japan
| | - Hironori Masuko
- Department of Pulmonary Medicine, Faculty of Medicine, University of Tsukuba, Ibaraki, Japan
| | | | | | - Takefumi Saito
- National Hospital Organization Ibaraki Higashi National Hospital, Ibaraki, Japan
| | - Tomomitsu Hirota
- Research Center for Medical Science, The Jikei University School of Medicine, Tokyo, Japan
| | - Mayumi Tamari
- Research Center for Medical Science, The Jikei University School of Medicine, Tokyo, Japan
| | - Nobuyuki Hizawa
- Department of Pulmonary Medicine, Faculty of Medicine, University of Tsukuba, Ibaraki, Japan
| |
Collapse
|
7
|
Joukhadar R, Thistlethwaite R, Trethowan R, Keeble-Gagnère G, Hayden MJ, Ullah S, Daetwyler HD. Meta-analysis of genome-wide association studies reveal common loci controlling agronomic and quality traits in a wide range of normal and heat stressed environments. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:2113-2127. [PMID: 33768282 DOI: 10.1007/s00122-021-03809-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 03/01/2021] [Indexed: 06/12/2023]
Abstract
Several stable QTL were detected using metaGWAS analysis for different agronomic and quality traits under 26 normal and heat stressed environments. Heat stress, exacerbated by global warming, has a negative influence on wheat production worldwide and climate resilient cultivars can help mitigate these impacts. Selection decisions should therefore depend on multi-environment experiments representing a range of temperatures at critical stages of development. Here, we applied a meta-genome wide association analysis (metaGWAS) approach to detect stable QTL with significant effects across multiple environments. The metaGWAS was applied to 11 traits scored in 26 trials that were sown at optimal or late times of sowing (TOS1 and TOS2, respectively) at five locations. A total of 2571 unique wheat genotypes (13,959 genotypes across all environments) were included and the analysis conducted on TOS1, TOS2 and both times of sowing combined (TOS1&2). The germplasm was genotyped using a 90 k Infinium chip and imputed to exome sequence level, resulting in 341,195 single nucleotide polymorphisms (SNPs). The average accuracy across all imputed SNPs was high (92.4%). The three metaGWAS analyses revealed 107 QTL for the 11 traits, of which 16 were detected in all three analyses and 23 were detected in TOS1&2 only. The remaining QTL were detected in either TOS1 or TOS2 with or without TOS1&2, reflecting the complex interactions between the environments and the detected QTL. Eight QTL were associated with grain yield and seven with multiple traits. The identified QTL provide an important resource for gene enrichment and fine mapping to further understand the mechanisms of gene × environment interaction under both heat stressed and unstressed conditions.
Collapse
Affiliation(s)
- Reem Joukhadar
- Agriculture Victoria, Centre for AgriBioscience, AgriBio, Bundoora, VIC, Australia.
| | - Rebecca Thistlethwaite
- School of Life and Environmental Sciences, Plant Breeding Institute, Sydney Institute of Agriculture, The University of Sydney, Narrabri, NSW, Australia
| | - Richard Trethowan
- School of Life and Environmental Sciences, Plant Breeding Institute, Sydney Institute of Agriculture, The University of Sydney, Narrabri, NSW, Australia
- School of Life and Environmental Sciences, Plant Breeding Institute, Sydney Institute of Agriculture, The University of Sydney, Cobbitty, NSW, Australia
| | | | - Matthew J Hayden
- Agriculture Victoria, Centre for AgriBioscience, AgriBio, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| | - Smi Ullah
- School of Life and Environmental Sciences, Plant Breeding Institute, Sydney Institute of Agriculture, The University of Sydney, Narrabri, NSW, Australia
| | - Hans D Daetwyler
- Agriculture Victoria, Centre for AgriBioscience, AgriBio, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| |
Collapse
|
8
|
Quick C, Anugu P, Musani S, Weiss ST, Burchard EG, White MJ, Keys KL, Cucca F, Sidore C, Boehnke M, Fuchsberger C. Sequencing and imputation in GWAS: Cost-effective strategies to increase power and genomic coverage across diverse populations. Genet Epidemiol 2020; 44:537-549. [PMID: 32519380 PMCID: PMC7449570 DOI: 10.1002/gepi.22326] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Revised: 04/02/2020] [Accepted: 05/22/2020] [Indexed: 01/03/2023]
Abstract
A key aim for current genome-wide association studies (GWAS) is to interrogate the full spectrum of genetic variation underlying human traits, including rare variants, across populations. Deep whole-genome sequencing is the gold standard to fully capture genetic variation, but remains prohibitively expensive for large sample sizes. Array genotyping interrogates a sparser set of variants, which can be used as a scaffold for genotype imputation to capture a wider set of variants. However, imputation quality depends crucially on reference panel size and genetic distance from the target population. Here, we consider sequencing a subset of GWAS participants and imputing the rest using a reference panel that includes both sequenced GWAS participants and an external reference panel. We investigate how imputation quality and GWAS power are affected by the number of participants sequenced for admixed populations (African and Latino Americans) and European population isolates (Sardinians and Finns), and identify powerful, cost-effective GWAS designs given current sequencing and array costs. For populations that are well-represented in existing reference panels, we find that array genotyping alone is cost-effective and well-powered to detect common- and rare-variant associations. For poorly represented populations, sequencing a subset of participants is often most cost-effective, and can substantially increase imputation quality and GWAS power.
Collapse
Affiliation(s)
- Corbin Quick
- Department of Biostatistics and Center for Statistical GeneticsUniversity of Michigan School of Public HealthAnn ArborMichigan
| | - Pramod Anugu
- University of Mississippi Medical CenterJacksonMississippi
| | - Solomon Musani
- University of Mississippi Medical CenterJacksonMississippi
| | - Scott T. Weiss
- Harvard Medical SchoolBostonMassachusetts
- Channing Department of Network MedicineBrigham and Women's HospitalBostonCalifornia
- Partners HealthCare Personalized MedicineBostonMassachusetts
| | - Esteban G. Burchard
- Department of MedicineUniversity of California San FranciscoSan FranciscoCalifornia
- Department of Bioengineering and Therapeutic SciencesUniversity of California San FranciscoSan FranciscoCalifornia
| | - Marquitta J. White
- Department of MedicineUniversity of California San FranciscoSan FranciscoCalifornia
| | - Kevin L. Keys
- Department of MedicineUniversity of California San FranciscoSan FranciscoCalifornia
| | - Francesco Cucca
- Istituto di Ricerca Genetica e Biomedica (IRGB), CNRMonserratoItaly
- Dipartimento di Scienze BiomedicheUniversità di SassariSassariItaly
| | - Carlo Sidore
- Istituto di Ricerca Genetica e Biomedica (IRGB), CNRMonserratoItaly
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical GeneticsUniversity of Michigan School of Public HealthAnn ArborMichigan
| | - Christian Fuchsberger
- Department of Biostatistics and Center for Statistical GeneticsUniversity of Michigan School of Public HealthAnn ArborMichigan
- Department of Genetics and Pharmacology, Institute of Genetic EpidemiologyMedical University of InnsbruckInnsbruckAustria
- Institute for Biomedicine, Eurac ResearchAffiliated Institute of the University of LübeckBolzanoItaly
| |
Collapse
|
9
|
Easing US restrictions on mitochondrial replacement therapy would protect research interests but grease the slippery slope. J Assist Reprod Genet 2019; 36:1781-1785. [PMID: 31463871 DOI: 10.1007/s10815-019-01529-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Accepted: 07/08/2019] [Indexed: 10/26/2022] Open
|
10
|
Pégard M, Rogier O, Bérard A, Faivre-Rampant P, Paslier MCL, Bastien C, Jorge V, Sánchez L. Sequence imputation from low density single nucleotide polymorphism panel in a black poplar breeding population. BMC Genomics 2019; 20:302. [PMID: 30999856 PMCID: PMC6471894 DOI: 10.1186/s12864-019-5660-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 03/29/2019] [Indexed: 12/30/2022] Open
Abstract
Background Genomic selection accuracy increases with the use of high SNP (single nucleotide polymorphism) coverage. However, such gains in coverage come at high costs, preventing their prompt operational implementation by breeders. Low density panels imputed to higher densities offer a cheaper alternative during the first stages of genomic resources development. Our study is the first to explore the imputation in a tree species: black poplar. About 1000 pure-breed Populus nigra trees from a breeding population were selected and genotyped with a 12K custom Infinium Bead-Chip. Forty-three of those individuals corresponding to nodal trees in the pedigree were fully sequenced (reference), while the remaining majority (target) was imputed from 8K to 1.4 million SNPs using FImpute. Each SNP and individual was evaluated for imputation errors by leave-one-out cross validation in the training sample of 43 sequenced trees. Some summary statistics such as Hardy-Weinberg Equilibrium exact test p-value, quality of sequencing, depth of sequencing per site and per individual, minor allele frequency, marker density ratio or SNP information redundancy were calculated. Principal component and Boruta analyses were used on all these parameters to rank the factors affecting the quality of imputation. Additionally, we characterize the impact of the relatedness between reference population and target population. Results During the imputation process, we used 7540 SNPs from the chip to impute 1,438,827 SNPs from sequences. At the individual level, imputation accuracy was high with a proportion of SNPs correctly imputed between 0.84 and 0.99. The variation in accuracies was mostly due to differences in relatedness between individuals. At a SNP level, the imputation quality depended on genotyped SNP density and on the original minor allele frequency. The imputation did not appear to result in an increase of linkage disequilibrium. The genotype densification not only brought a better distribution of markers all along the genome, but also we did not detect any substantial bias in annotation categories. Conclusions This study shows that it is possible to impute low-density marker panels to whole genome sequence with good accuracy under certain conditions that could be common to many breeding populations. Electronic supplementary material The online version of this article (10.1186/s12864-019-5660-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marie Pégard
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France
| | - Odile Rogier
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France
| | - Aurélie Bérard
- Etude du Polymorphisme des Génomes Végétaux (EPGV), INRA, Université Paris-Saclay, 91000, 2 rue Gaston Crémieux, Evry, 9100, France
| | - Patricia Faivre-Rampant
- Etude du Polymorphisme des Génomes Végétaux (EPGV), INRA, Université Paris-Saclay, 91000, 2 rue Gaston Crémieux, Evry, 9100, France
| | - Marie-Christine Le Paslier
- Etude du Polymorphisme des Génomes Végétaux (EPGV), INRA, Université Paris-Saclay, 91000, 2 rue Gaston Crémieux, Evry, 9100, France
| | - Catherine Bastien
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France
| | - Véronique Jorge
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France
| | - Leopoldo Sánchez
- BioForA, INRA, ONF, 45075, Orléans, France, 2163 Avenue de la Pomme de Pin CS 40001 ARDON, Orléans Cedex 2, 45075, France.
| |
Collapse
|
11
|
Comparison of genotype imputation strategies using a combined reference panel for chicken population. Animal 2018; 13:1119-1126. [PMID: 30370890 DOI: 10.1017/s1751731118002860] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Using whole-genome sequence (WGS) data are supposed to be optimal for genome-wide association studies and genomic predictions. However, sequencing thousands of individuals of interest is expensive. Imputation from single nucleotide polymorphisms panels to WGS data is an attractive approach to obtain highly reliable WGS data at low cost. Here, we conducted a genotype imputation study with a combined reference panel in yellow-feather dwarf broiler population. The combined reference panel was assembled by sequencing 24 key individuals of a yellow-feather dwarf broiler population (internal reference panel) and WGS data from 311 chickens in public databases (external reference panel). Three scenarios were investigated to determine how different factors affect the accuracy of imputation from 600 K array data to WGS data, including: genotype imputation with internal, external and combined reference panels; the number of internal reference individuals in the combined reference panel; and different reference sizes and selection strategies of an external reference panel. Results showed that imputation accuracy from 600 K to WGS data were 0.834±0.012, 0.920±0.007 and 0.982±0.003 for the internal, external and combined reference panels, respectively. Increasing the reference size from 50 to 250 improved the accuracy of genotype imputation from 0.848 to 0.974 for the combined reference panel and from 0.647 to 0.917 for the external reference panel. The selection strategies for the external reference panel had no impact on the accuracy of imputation using the combined reference panel. However, if only an external reference panel with reference size >50 was used, the selection strategy of minimizing the average distance to the closest leaf had the greatest imputation accuracy compared with other methods. Generally, using a combined reference panel provided greater imputation accuracy, especially for low-frequency variants. In conclusion, the optimal imputation strategy with a combined reference panel should comprehensively consider genetic diversity of the study population, availability and properties of external reference panels, sequencing and computing costs, and frequency of imputed variants. This work sheds light on how to design and execute genotype imputation with a combined external reference panel in a livestock population.
Collapse
|
12
|
Vergara C, Parker MM, Franco L, Cho MH, Valencia-Duarte AV, Beaty TH, Duggal P. Genotype imputation performance of three reference panels using African ancestry individuals. Hum Genet 2018; 137:281-292. [PMID: 29637265 PMCID: PMC6209094 DOI: 10.1007/s00439-018-1881-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 03/31/2018] [Indexed: 12/22/2022]
Abstract
Genotype imputation estimates unobserved genotypes from genome-wide makers, to increase genome coverage and power for genome-wide association studies. Imputation has been successful for European ancestry populations in which very large reference panels are available. Smaller subsets of African descent populations are available in 1000 Genomes (1000G), the Consortium on Asthma among African ancestry Populations in the Americas (CAAPA) and the Haplotype Reference Consortium (HRC). We compared the performance of these reference panels when imputing variation in 3747 African Americans (AA) from two cohorts (HCV and COPDGene) genotyped using Illumina Omni microarrays. The haplotypes of 2504 (1000G), 883 (CAAPA) and 32,470 individuals (HRC) were used as reference. We compared the number of variants, imputation quality, imputation accuracy and coverage between panels. In both cohorts, 1000G imputed 1.5-1.6× more variants than CAAPA and 1.2× more than HRC. Similar findings were observed for variants with imputation R2 > 0.5 and for rare, low-frequency, and common variants. When merging imputed variants of the three panels, the total number was 62-63 M with 20 M overlapping variants imputed by all three panels, and a range of 5-15 M variants imputed exclusively with one of them. For overlapping variants, imputation quality was highest for HRC, followed by 1000G, then CAAPA, and improved as the minor allele frequency increased. 1000G, HRC and CAAPA provided high performance and accuracy for imputation of African American individuals, increasing the number of variants available for subsequent analyses. These panels are complementary and would benefit from the development of an integrated African reference panel.
Collapse
Affiliation(s)
| | - Margaret M Parker
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Liliana Franco
- National School of Public Health, Universidad de Antioquia, Medellín, Colombia
- School of Medicine, Universidad Pontificia Bolivariana, Medellín, Colombia
| | - Michael H Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Terri H Beaty
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, MD, USA
| | - Priya Duggal
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
13
|
Herzig AF, Nutile T, Babron MC, Ciullo M, Bellenguez C, Leutenegger AL. Strategies for phasing and imputation in a population isolate. Genet Epidemiol 2018; 42:201-213. [PMID: 29319195 DOI: 10.1002/gepi.22109] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Revised: 11/16/2017] [Accepted: 11/16/2017] [Indexed: 11/05/2022]
Abstract
In the search for genetic associations with complex traits, population isolates offer the advantage of reduced genetic and environmental heterogeneity. In addition, cost-efficient next-generation association approaches have been proposed in these populations where only a subsample of representative individuals is sequenced and then genotypes are imputed into the rest of the population. Gene mapping in such populations thus requires high-quality genetic imputation and preliminary phasing. To identify an effective study design, we compare by simulation a range of phasing and imputation software and strategies. We simulated 1,115,604 variants on chromosome 10 for 477 members of the large complex pedigree of Campora, a village within the established isolate of Cilento in southern Italy. We assessed the phasing performance of identical by descent based software ALPHAPHASE and SLRP, LD-based software SHAPEIT2, SHAPEIT3, and BEAGLE, and new software EAGLE that combines both methodologies. For imputation we compared IMPUTE2, IMPUTE4, MINIMAC3, BEAGLE, and new software PBWT. Genotyping errors and missing genotypes were simulated to observe their effects on the performance of each software. Highly accurate phased data were achieved by all software with SHAPEIT2, SHAPEIT3, and EAGLE2 providing the most accurate results. MINIMAC3, IMPUTE4, and IMPUTE2 all performed strongly as imputation software and our study highlights the considerable gain in imputation accuracy provided by a genome sequenced reference panel specific to the population isolate.
Collapse
Affiliation(s)
- Anthony Francis Herzig
- Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.,Inserm, U946, Genetic Variation and Human Diseases, Paris, France
| | - Teresa Nutile
- Institute of Genetics and Biophysics A. Buzzati-Traverso-CNR, Naples, Italy
| | - Marie-Claude Babron
- Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.,Inserm, U946, Genetic Variation and Human Diseases, Paris, France
| | - Marina Ciullo
- Institute of Genetics and Biophysics A. Buzzati-Traverso-CNR, Naples, Italy.,IRCCS Neuromed, Pozzilli, Isernia, Italy
| | - Céline Bellenguez
- Inserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France.,Institut Pasteur de Lille, Lille, France.,Université de Lille, U1167-Excellence Laboratory LabEx DISTALZ, Lille, France
| | - Anne-Louise Leutenegger
- Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.,Inserm, U946, Genetic Variation and Human Diseases, Paris, France
| |
Collapse
|
14
|
Zhou W, Fritsche LG, Das S, Zhang H, Nielsen JB, Holmen OL, Chen J, Lin M, Elvestad MB, Hveem K, Abecasis GR, Kang HM, Willer CJ. Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels. Genet Epidemiol 2017; 41:744-755. [PMID: 28861891 DOI: 10.1002/gepi.22067] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 06/16/2017] [Accepted: 07/10/2017] [Indexed: 11/09/2022]
Abstract
The accuracy of genotype imputation depends upon two factors: the sample size of the reference panel and the genetic similarity between the reference panel and the target samples. When multiple reference panels are not consented to combine together, it is unclear how to combine the imputation results to optimize the power of genetic association studies. We compared the accuracy of 9,265 Norwegian genomes imputed from three reference panels-1000 Genomes phase 3 (1000G), Haplotype Reference Consortium (HRC), and a reference panel containing 2,201 Norwegian participants from the population-based Nord Trøndelag Health Study (HUNT) from low-pass genome sequencing. We observed that the population-matched reference panel allowed for imputation of more population-specific variants with lower frequency (minor allele frequency (MAF) between 0.05% and 0.5%). The overall imputation accuracy from the population-specific panel was substantially higher than 1000G and was comparable with HRC, despite HRC being 15-fold larger. These results recapitulate the value of population-specific reference panels for genotype imputation. We also evaluated different strategies to utilize multiple sets of imputed genotypes to increase the power of association studies. We observed that testing association for all variants imputed from any panel results in higher power to detect association than the alternative strategy of including only one version of each genetic variant, selected for having the highest imputation quality metric. This was particularly true for lower frequency variants (MAF < 1%), even after adjusting for the additional multiple testing burden.
Collapse
Affiliation(s)
- Wei Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Lars G Fritsche
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway.,Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Sayantan Das
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - He Zhang
- Department of Internal Medicine, Division of Cardiology, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Jonas B Nielsen
- Department of Internal Medicine, Division of Cardiology, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Oddgeir L Holmen
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway.,St. Olav Hospital, Trondheim University Hospital, Trondheim, Norway
| | - Jin Chen
- Department of Internal Medicine, Division of Cardiology, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Maoxuan Lin
- Department of Internal Medicine, Division of Cardiology, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Maiken B Elvestad
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway
| | - Kristian Hveem
- HUNT Research Centre, Department of Public Health and General Practice, Norwegian University of Science and Technology, Levanger, Norway.,Department of Medicine, Levanger Hospital, Nord-Trøndelag Health Trust, Levanger, Norway
| | - Goncalo R Abecasis
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Hyun Min Kang
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Cristen J Willer
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America.,Department of Internal Medicine, Division of Cardiology, University of Michigan Medical School, Ann Arbor, Michigan, United States of America.,Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| |
Collapse
|
15
|
Shi F, Tibbits J, Pasam RK, Kay P, Wong D, Petkowski J, Forrest KL, Hayes BJ, Akhunova A, Davies J, Webb S, Spangenberg GC, Akhunov E, Hayden MJ, Daetwyler HD. Exome sequence genotype imputation in globally diverse hexaploid wheat accessions. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2017; 130:1393-1404. [PMID: 28378053 DOI: 10.1007/s00122-017-2895-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 03/18/2017] [Indexed: 06/07/2023]
Abstract
Imputing genotypes from the 90K SNP chip to exome sequence in wheat was moderately accurate. We investigated the factors that affect imputation and propose several strategies to improve accuracy. Imputing genetic marker genotypes from low to high density has been proposed as a cost-effective strategy to increase the power of downstream analyses (e.g. genome-wide association studies and genomic prediction) for a given budget. However, imputation is often imperfect and its accuracy depends on several factors. Here, we investigate the effects of reference population selection algorithms, marker density and imputation algorithms (Beagle4 and FImpute) on the accuracy of imputation from low SNP density (9K array) to the Infinium 90K single-nucleotide polymorphism (SNP) array for a collection of 837 hexaploid wheat Watkins landrace accessions. Based on these results, we then used the best performing reference selection and imputation algorithms to investigate imputation from 90K to exome sequence for a collection of 246 globally diverse wheat accessions. Accession-to-nearest-entry and genomic relationship-based methods were the best performing selection algorithms, and FImpute resulted in higher accuracy and was more efficient than Beagle4. The accuracy of imputing exome capture SNPs was comparable to imputing from 9 to 90K at approximately 0.71. This relatively low imputation accuracy is in part due to inconsistency between 90K and exome sequence formats. We also found the accuracy of imputation could be substantially improved to 0.82 when choosing an equivalent number of exome SNP, instead of 90K SNPs on the existing array, as the lower density set. We present a number of recommendations to increase the accuracy of exome imputation.
Collapse
Affiliation(s)
- Fan Shi
- Agriculture Victoria, Agriculture Research Division, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia.
| | - Josquin Tibbits
- Agriculture Victoria, Agriculture Research Division, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
| | - Raj K Pasam
- Agriculture Victoria, Agriculture Research Division, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
| | - Pippa Kay
- Agriculture Victoria, Agriculture Research Division, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
| | - Debbie Wong
- Agriculture Victoria, Agriculture Research Division, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
| | - Joanna Petkowski
- Agriculture Victoria, Agriculture Research Division, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
| | - Kerrie L Forrest
- Agriculture Victoria, Agriculture Research Division, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
| | - Ben J Hayes
- Agriculture Victoria, Agriculture Research Division, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| | - Alina Akhunova
- Department of Plant Pathology, Kansas State University, Manhattan, KS, USA
- Integrated Genomics Facility, Kansas State University, Manhattan, KS, USA
| | | | | | - German C Spangenberg
- Agriculture Victoria, Agriculture Research Division, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| | - Eduard Akhunov
- Department of Plant Pathology, Kansas State University, Manhattan, KS, USA
| | - Matthew J Hayden
- Agriculture Victoria, Agriculture Research Division, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| | - Hans D Daetwyler
- Agriculture Victoria, Agriculture Research Division, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| |
Collapse
|
16
|
Comparing performance of modern genotype imputation methods in different ethnicities. Sci Rep 2016; 6:34386. [PMID: 27698363 PMCID: PMC5048136 DOI: 10.1038/srep34386] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 09/05/2016] [Indexed: 11/19/2022] Open
Abstract
A variety of modern software packages are available for genotype imputation relying on advanced concepts such as pre-phasing of the target dataset or utilization of admixed reference panels. In this study, we performed a comprehensive evaluation of the accuracy of modern imputation methods on the basis of the publicly available POPRES samples. Good quality genotypes were masked and re-imputed by different imputation frameworks: namely MaCH, IMPUTE2, MaCH-Minimac, SHAPEIT-IMPUTE2 and MaCH-Admix. Results were compared to evaluate the relative merit of pre-phasing and the usage of admixed references. We showed that the pre-phasing framework SHAPEIT-IMPUTE2 can overestimate the certainty of genotype distributions resulting in the lowest percentage of correctly imputed genotypes in our case. MaCH-Minimac performed better than SHAPEIT-IMPUTE2. Pre-phasing always reduced imputation accuracy. IMPUTE2 and MaCH-Admix, both relying on admixed-reference panels, showed comparable results. MaCH showed superior results if well-matched references were available (Nei’s GST ≤ 0.010). For small to medium datasets, frameworks using genetically closest reference panel are recommended if the genetic distance between target and reference data set is small. Our results are valid for small to medium data sets. As shown on a larger data set of population based German samples, the disadvantage of pre-phasing decreases for larger sample sizes.
Collapse
|