1
|
Guenther DT, Follett J, Amouri R, Sassi SB, Hentati F, Farrer MJ. The Evolution of Genetic Variability at the LRRK2 Locus. Genes (Basel) 2024; 15:878. [PMID: 39062657 PMCID: PMC11275506 DOI: 10.3390/genes15070878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 06/28/2024] [Accepted: 07/01/2024] [Indexed: 07/28/2024] Open
Abstract
Leucine-rich repeat kinase 2 (LRRK2) c.6055G>A (p.G2019S) is a frequent cause of Parkinson's disease (PD), accounting for >30% of Tunisian Arab-Berber patients. LRRK2 is widely expressed in the immune system and its kinase activity confers a survival advantage against infection in animal models. Here, we assess haplotype variability in cis and in trans of the LRRK2 c.6055G>A mutation, define the age of the pathogenic allele, explore its relationship to the age of disease onset (AOO), and provide evidence for its positive selection.
Collapse
Affiliation(s)
- Dylan T. Guenther
- Department of Neurology, University of Florida, Gainesville, FL 32610, USA
| | - Jordan Follett
- Department of Neurology, University of Florida, Gainesville, FL 32610, USA
| | - Rim Amouri
- Mongi Ben Hamida National Institute of Neurology, Av. de la Rabta, Tunis 1007, Tunisia
| | - Samia Ben Sassi
- Mongi Ben Hamida National Institute of Neurology, Av. de la Rabta, Tunis 1007, Tunisia
| | - Faycel Hentati
- Mongi Ben Hamida National Institute of Neurology, Av. de la Rabta, Tunis 1007, Tunisia
| | - Matthew J. Farrer
- Department of Neurology, University of Florida, Gainesville, FL 32610, USA
| |
Collapse
|
2
|
Yeon J, Le NT, Heo J, Sim SC. Low-density SNP markers with high prediction accuracy of genomic selection for bacterial wilt resistance in tomato. FRONTIERS IN PLANT SCIENCE 2024; 15:1402693. [PMID: 38872894 PMCID: PMC11169939 DOI: 10.3389/fpls.2024.1402693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 05/07/2024] [Indexed: 06/15/2024]
Abstract
Bacterial wilt (BW) is a soil-borne disease that leads to severe damage in tomato. Host resistance against BW is considered polygenic and effective in controlling this destructive disease. In this study, genomic selection (GS), which is a promising breeding strategy to improve quantitative traits, was investigated for BW resistance. Two tomato collections, TGC1 (n = 162) and TGC2 (n = 191), were used as training populations. Disease severity was assessed using three seedling assays in each population, and the best linear unbiased prediction (BLUP) values were obtained. The 31,142 SNP data were generated using the 51K Axiom array™ in the training populations. With these data, six GS models were trained to predict genomic estimated breeding values (GEBVs) in three populations (TGC1, TGC2, and combined). The parametric models Bayesian LASSO and RR-BLUP resulted in higher levels of prediction accuracy compared with all the non-parametric models (RKHS, SVM, and random forest) in two training populations. To identify low-density markers, two subsets of 1,557 SNPs were filtered based on marker effects (Bayesian LASSO) and variable importance values (random forest) in the combined population. An additional subset was generated using 1,357 SNPs from a genome-wide association study. These subsets showed prediction accuracies of 0.699 to 0.756 in Bayesian LASSO and 0.670 to 0.682 in random forest, which were higher relative to the 31,142 SNPs (0.625 and 0.614). Moreover, high prediction accuracies (0.743 and 0.702) were found with a common set of 135 SNPs derived from the three subsets. The resulting low-density SNPs will be useful to develop a cost-effective GS strategy for BW resistance in tomato breeding programs.
Collapse
Affiliation(s)
- Jeyun Yeon
- Department of Bioindustry and Bioresource Engineering, Sejong University, Seoul, Republic of Korea
| | - Ngoc Thi Le
- Department of Bioindustry and Bioresource Engineering, Sejong University, Seoul, Republic of Korea
| | - Jaehun Heo
- Department of Bioindustry and Bioresource Engineering, Sejong University, Seoul, Republic of Korea
| | - Sung-Chur Sim
- Department of Bioindustry and Bioresource Engineering, Sejong University, Seoul, Republic of Korea
- Plant Engineering Research Institute, Sejong University, Seoul, Republic of Korea
| |
Collapse
|
3
|
Yun JS, Jung SH, Lee SN, Jung SM, Won HH, Kim D, Choi JA. Polygenic risk score-based phenome-wide association for glaucoma and its impact on disease susceptibility in two large biobanks. J Transl Med 2024; 22:355. [PMID: 38622600 PMCID: PMC11020996 DOI: 10.1186/s12967-024-05152-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 04/01/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND Glaucoma is a leading cause of worldwide irreversible blindness. Considerable uncertainty remains regarding the association between a variety of phenotypes and the genetic risk of glaucoma, as well as the impact they exert on the glaucoma development. METHODS We investigated the associations of genetic liability for primary open angle glaucoma (POAG) with a wide range of potential risk factors and to assess its impact on the risk of incident glaucoma. The phenome-wide association study (PheWAS) approach was applied to determine the association of POAG polygenic risk score (PRS) with a wide range of phenotypes in 377, 852 participants from the UK Biobank study and 43,623 participants from the Penn Medicine Biobank study, all of European ancestry. Participants were stratified into four risk tiers: low, intermediate, high, and very high-risk. Cox proportional hazard models assessed the relationship of POAG PRS and ocular factors with new glaucoma events. RESULTS In both discovery and replication set in the PheWAS, a higher genetic predisposition to POAG was specifically correlated with ocular disease phenotypes. The POAG PRS exhibited correlations with low corneal hysteresis, refractive error, and ocular hypertension, demonstrating a strong association with the onset of glaucoma. Individuals carrying a high genetic burden exhibited a 9.20-fold, 11.88-fold, and 28.85-fold increase in glaucoma incidence when associated with low corneal hysteresis, high myopia, and elevated intraocular pressure, respectively. CONCLUSION Genetic susceptibility to POAG primarily influences ocular conditions, with limited systemic associations. Notably, the baseline polygenic risk for POAG robustly associates with new glaucoma events, revealing a large combined effect of genetic and ocular risk factors on glaucoma incidents.
Collapse
Affiliation(s)
- Jae-Seung Yun
- Department of Internal Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Sang-Hyuk Jung
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Su-Nam Lee
- Department of Internal Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Seung Min Jung
- Department of Internal Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Hong-Hee Won
- Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Samsung Medical Center, Sungkyunkwan University, Seoul, Republic of Korea.
- Samsung Genome Institute, Samsung Medical Center, Seoul, Republic of Korea.
| | - Dokyoon Kim
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| | - Jin A Choi
- Department of Ophthalmology, College of Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.
| |
Collapse
|
4
|
Jung SH, Lee YC, Shivakumar M, Kim J, Yun JS, Park WY, Won HH, Kim D. Association between genetic risk and adherence to healthy lifestyle for developing age-related hearing loss. BMC Med 2024; 22:141. [PMID: 38532472 DOI: 10.1186/s12916-024-03364-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 03/18/2024] [Indexed: 03/28/2024] Open
Abstract
BACKGROUND Previous studies have shown that lifestyle/environmental factors could accelerate the development of age-related hearing loss (ARHL). However, there has not yet been a study investigating the joint association among genetics, lifestyle/environmental factors, and adherence to healthy lifestyle for risk of ARHL. We aimed to assess the association between ARHL genetic variants, lifestyle/environmental factors, and adherence to healthy lifestyle as pertains to risk of ARHL. METHODS This case-control study included 376,464 European individuals aged 40 to 69 years, enrolled between 2006 and 2010 in the UK Biobank (UKBB). As a replication set, we also included a total of 26,523 individuals considered of European ancestry and 9834 individuals considered of African-American ancestry through the Penn Medicine Biobank (PMBB). The polygenic risk score (PRS) for ARHL was derived from a sensorineural hearing loss genome-wide association study from the FinnGen Consortium and categorized as low, intermediate, high, and very high. We selected lifestyle/environmental factors that have been previously studied in association with hearing loss. A composite healthy lifestyle score was determined using seven selected lifestyle behaviors and one environmental factor. RESULTS Of the 376,464 participants, 87,066 (23.1%) cases belonged to the ARHL group, and 289,398 (76.9%) individuals comprised the control group in the UKBB. A very high PRS for ARHL had a 49% higher risk of ARHL than those with low PRS (adjusted OR, 1.49; 95% CI, 1.36-1.62; P < .001), which was replicated in the PMBB cohort. A very poor lifestyle was also associated with risk of ARHL (adjusted OR, 3.03; 95% CI, 2.75-3.35; P < .001). These risk factors showed joint effects with the risk of ARHL. Conversely, adherence to healthy lifestyle in relation to hearing mostly attenuated the risk of ARHL even in individuals with very high PRS (adjusted OR, 0.21; 95% CI, 0.09-0.52; P < .001). CONCLUSIONS Our findings of this study demonstrated a significant joint association between genetic and lifestyle factors regarding ARHL. In addition, our analysis suggested that lifestyle adherence in individuals with high genetic risk could reduce the risk of ARHL.
Collapse
Affiliation(s)
- Sang-Hyuk Jung
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Young Chan Lee
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Kyung Hee University, Kyung Hee University Hospital at Gangdong, Seoul, Republic of Korea
| | - Manu Shivakumar
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jaeyoung Kim
- Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, Republic of Korea
| | - Jae-Seung Yun
- Division of Endocrinology and Metabolism, Department of Internal Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Woong-Yang Park
- Samsung Genome Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Hong-Hee Won
- Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, Republic of Korea
- Samsung Genome Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Dokyoon Kim
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, USA.
| |
Collapse
|
5
|
Lee YC, Jung SH, Shivakumar M, Cha S, Park WY, Won HH, Eun YG, Biobank PM, Kim D. Polygenic risk score-based phenome-wide association study of head and neck cancer across two large biobanks. BMC Med 2024; 22:120. [PMID: 38486201 PMCID: PMC10941505 DOI: 10.1186/s12916-024-03305-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Accepted: 02/15/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND Numerous observational studies have highlighted associations of genetic predisposition of head and neck squamous cell carcinoma (HNSCC) with diverse risk factors, but these findings are constrained by design limitations of observational studies. In this study, we utilized a phenome-wide association study (PheWAS) approach, incorporating a polygenic risk score (PRS) derived from a wide array of genomic variants, to systematically investigate phenotypes associated with genetic predisposition to HNSCC. Furthermore, we validated our findings across heterogeneous cohorts, enhancing the robustness and generalizability of our results. METHODS We derived PRSs for HNSCC and its subgroups, oropharyngeal cancer and oral cancer, using large-scale genome-wide association study summary statistics from the Genetic Associations and Mechanisms in Oncology Network. We conducted a comprehensive investigation, leveraging genotyping data and electronic health records from 308,492 individuals in the UK Biobank and 38,401 individuals in the Penn Medicine Biobank (PMBB), and subsequently performed PheWAS to elucidate the associations between PRS and a wide spectrum of phenotypes. RESULTS We revealed the HNSCC PRS showed significant association with phenotypes related to tobacco use disorder (OR, 1.06; 95% CI, 1.05-1.08; P = 3.50 × 10-15), alcoholism (OR, 1.06; 95% CI, 1.04-1.09; P = 6.14 × 10-9), alcohol-related disorders (OR, 1.08; 95% CI, 1.05-1.11; P = 1.09 × 10-8), emphysema (OR, 1.11; 95% CI, 1.06-1.16; P = 5.48 × 10-6), chronic airway obstruction (OR, 1.05; 95% CI, 1.03-1.07; P = 2.64 × 10-5), and cancer of bronchus (OR, 1.08; 95% CI, 1.04-1.13; P = 4.68 × 10-5). These findings were replicated in the PMBB cohort, and sensitivity analyses, including the exclusion of HNSCC cases and the major histocompatibility complex locus, confirmed the robustness of these associations. Additionally, we identified significant associations between HNSCC PRS and lifestyle factors related to smoking and alcohol consumption. CONCLUSIONS The study demonstrated the potential of PRS-based PheWAS in revealing associations between genetic risk factors for HNSCC and various phenotypic traits. The findings emphasized the importance of considering genetic susceptibility in understanding HNSCC and highlighted shared genetic bases between HNSCC and other health conditions and lifestyles.
Collapse
Affiliation(s)
- Young Chan Lee
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Kyung Hee University, Seoul, Republic of Korea
| | - Sang-Hyuk Jung
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Manu Shivakumar
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Soojin Cha
- Hanyang University Institute for Rheumatology Research, Seoul, Republic of Korea
| | - Woong-Yang Park
- Samsung Genome Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Hong-Hee Won
- Samsung Genome Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Samsung Medical Center, Samsung Advanced Institute for Health Sciences and Technology, Sungkyunkwan University, Seoul, Republic of Korea
| | - Young-Gyu Eun
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Kyung Hee University, Seoul, Republic of Korea
| | - Penn Medicine Biobank
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Dokyoon Kim
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
6
|
Childebayeva A, Zavala EI. Review: Computational analysis of human skeletal remains in ancient DNA and forensic genetics. iScience 2023; 26:108066. [PMID: 37927550 PMCID: PMC10622734 DOI: 10.1016/j.isci.2023.108066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023] Open
Abstract
Degraded DNA is used to answer questions in the fields of ancient DNA (aDNA) and forensic genetics. While aDNA studies typically center around human evolution and past history, and forensic genetics is often more concerned with identifying a specific individual, scientists in both fields face similar challenges. The overlap in source material has prompted periodic discussions and studies on the advantages of collaboration between fields toward mutually beneficial methodological advancements. However, most have been centered around wet laboratory methods (sampling, DNA extraction, library preparation, etc.). In this review, we focus on the computational side of the analytical workflow. We discuss limitations and considerations to consider when working with degraded DNA. We hope this review provides a framework to researchers new to computational workflows for how to think about analyzing highly degraded DNA and prompts an increase of collaboration between the forensic genetics and aDNA fields.
Collapse
Affiliation(s)
- Ainash Childebayeva
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Anthropology, University of Kansas, Lawrence, KS, USA
| | - Elena I. Zavala
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Biology, University of Oregon, Eugene, OR, USA
| |
Collapse
|
7
|
Kriaridou C, Tsairidou S, Fraslin C, Gorjanc G, Looseley ME, Johnston IA, Houston RD, Robledo D. Evaluation of low-density SNP panels and imputation for cost-effective genomic selection in four aquaculture species. Front Genet 2023; 14:1194266. [PMID: 37252666 PMCID: PMC10213886 DOI: 10.3389/fgene.2023.1194266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 04/26/2023] [Indexed: 05/31/2023] Open
Abstract
Genomic selection can accelerate genetic progress in aquaculture breeding programmes, particularly for traits measured on siblings of selection candidates. However, it is not widely implemented in most aquaculture species, and remains expensive due to high genotyping costs. Genotype imputation is a promising strategy that can reduce genotyping costs and facilitate the broader uptake of genomic selection in aquaculture breeding programmes. Genotype imputation can predict ungenotyped SNPs in populations genotyped at a low-density (LD), using a reference population genotyped at a high-density (HD). In this study, we used datasets of four aquaculture species (Atlantic salmon, turbot, common carp and Pacific oyster), phenotyped for different traits, to investigate the efficacy of genotype imputation for cost-effective genomic selection. The four datasets had been genotyped at HD, and eight LD panels (300-6,000 SNPs) were generated in silico. SNPs were selected to be: i) evenly distributed according to physical position ii) selected to minimise the linkage disequilibrium between adjacent SNPs or iii) randomly selected. Imputation was performed with three different software packages (AlphaImpute2, FImpute v.3 and findhap v.4). The results revealed that FImpute v.3 was faster and achieved higher imputation accuracies. Imputation accuracy increased with increasing panel density for both SNP selection methods, reaching correlations greater than 0.95 in the three fish species and 0.80 in Pacific oyster. In terms of genomic prediction accuracy, the LD and the imputed panels performed similarly, reaching values very close to the HD panels, except in the pacific oyster dataset, where the LD panel performed better than the imputed panel. In the fish species, when LD panels were used for genomic prediction without imputation, selection of markers based on either physical or genetic distance (instead of randomly) resulted in a high prediction accuracy, whereas imputation achieved near maximal prediction accuracy independently of the LD panel, showing higher reliability. Our results suggests that, in fish species, well-selected LD panels may achieve near maximal genomic selection prediction accuracy, and that the addition of imputation will result in maximal accuracy independently of the LD panel. These strategies represent effective and affordable methods to incorporate genomic selection into most aquaculture settings.
Collapse
Affiliation(s)
- Christina Kriaridou
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Smaragda Tsairidou
- Global Academy of Agriculture and Food Systems, University of Edinburgh, Edinburgh, United Kingdom
| | - Clémence Fraslin
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | | | | | - Ross D. Houston
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
- Benchmark Genetics, Penicuik, United Kingdom
| | - Diego Robledo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
8
|
Wienbrandt L, Ellinghaus D. EagleImp: fast and accurate genome-wide phasing and imputation in a single tool. Bioinformatics 2022; 38:4999-5006. [PMID: 36130053 PMCID: PMC9665855 DOI: 10.1093/bioinformatics/btac637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 09/15/2022] [Accepted: 09/19/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. RESULTS We developed EagleImp, a software based on the methods used in the existing tools Eagle2 and PBWT, which allows accurate and accelerated phasing and imputation in a single tool by algorithmic and technical improvements and new features. We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with 1 million reference genomes. EagleImp was 2-30 times faster (depending on the single or multiprocessor configuration selected and the size of the reference panel) than Eagle2 combined with PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical genome-wide association studies, EagleImp provided same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. Additional features include automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files and various user-configurable algorithm and output options. Due to the technical optimizations, EagleImp can perform fast and accurate reference-based phasing and imputation and is ready for future large reference panels in the order of 1 million genomes. AVAILABILITY AND IMPLEMENTATION EagleImp is implemented in C++ and freely available for download at https://github.com/ikmb/eagleimp. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - David Ellinghaus
- Institute of Clinical Molecular Biology, Kiel University, 24105 Kiel, Germany,Novo Nordisk Foundation Center for Protein Research, Disease Systems Biology, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| |
Collapse
|
9
|
Speck A, Trouvé JP, Enjalbert J, Geffroy V, Joets J, Moreau L. Genetic Architecture of Powdery Mildew Resistance Revealed by a Genome-Wide Association Study of a Worldwide Collection of Flax ( Linum usitatissimum L.). FRONTIERS IN PLANT SCIENCE 2022; 13:871633. [PMID: 35812909 PMCID: PMC9263915 DOI: 10.3389/fpls.2022.871633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 04/22/2022] [Indexed: 06/15/2023]
Abstract
Powdery mildew is one of the most important diseases of flax and is particularly prejudicial to its yield and oil or fiber quality. This disease, caused by the obligate biotrophic ascomycete Oïdium lini, is progressing in France. Genetic resistance of varieties is critical for the control of this disease, but very few resistance genes have been identified so far. It is therefore necessary to identify new resistance genes to powdery mildew suitable to the local context of pathogenicity. For this purpose, we studied a worldwide diversity panel composed of 311 flax genotypes both phenotyped for resistance to powdery mildew resistance over 2 years of field trials in France and resequenced. Sequence reads were mapped on the CDC Bethune reference genome revealing 1,693,910 high-quality SNPs, further used for both population structure analysis and genome-wide association studies (GWASs). A number of four major genetic groups were identified, separating oil flax accessions from America or Europe and those from Asia or Middle-East and fiber flax accessions originating from Eastern Europe and those from Western Europe. A number of eight QTLs were detected at the false discovery rate threshold of 5%, located on chromosomes 1, 2, 4, 13, and 14. Taking advantage of the moderate linkage disequilibrium present in the flax panel, and using the available genome annotation, we identified potential candidate genes. Our study shows the existence of new resistance alleles against powdery mildew in our diversity panel, of high interest for flax breeding program.
Collapse
Affiliation(s)
| | | | - Jérôme Enjalbert
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, Gif-sur-Yvette, France
| | - Valérie Geffroy
- Université Paris-Saclay, CNRS, INRAE, Université Evry, Institute of Plant Sciences Paris-Saclay (IPS2), Gif-sur-Yvette, France
- Université de Paris, Institute of Plant Sciences Paris-Saclay (IPS2), Gif-sur-Yvette, France
| | - Johann Joets
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, Gif-sur-Yvette, France
| | - Laurence Moreau
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution - Le Moulon, Gif-sur-Yvette, France
| |
Collapse
|
10
|
Ausmees K, Sanchez-Quinto F, Jakobsson M, Nettelblad C. An empirical evaluation of genotype imputation of ancient DNA. G3 (BETHESDA, MD.) 2022; 12:6575448. [PMID: 35482488 PMCID: PMC9157144 DOI: 10.1093/g3journal/jkac089] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 04/05/2022] [Indexed: 12/12/2022]
Abstract
With capabilities of sequencing ancient DNA to high coverage often limited by sample quality or cost, imputation of missing genotypes presents a possibility to increase the power of inference as well as cost-effectiveness for the analysis of ancient data. However, the high degree of uncertainty often associated with ancient DNA poses several methodological challenges, and performance of imputation methods in this context has not been fully explored. To gain further insights, we performed a systematic evaluation of imputation of ancient data using Beagle v4.0 and reference data from phase 3 of the 1000 Genomes project, investigating the effects of coverage, phased reference, and study sample size. Making use of five ancient individuals with high-coverage data available, we evaluated imputed data for accuracy, reference bias, and genetic affinities as captured by principal component analysis. We obtained genotype concordance levels of over 99% for data with 1× coverage, and similar levels of accuracy and reference bias at levels as low as 0.75×. Our findings suggest that using imputed data can be a realistic option for various population genetic analyses even for data in coverage ranges below 1×. We also show that a large and varied phased reference panel as well as the inclusion of low- to moderate-coverage ancient individuals in the study sample can increase imputation performance, particularly for rare alleles. In-depth analysis of imputed data with respect to genetic variants and allele frequencies gave further insight into the nature of errors arising during imputation, and can provide practical guidelines for postprocessing and validation prior to downstream analysis.
Collapse
Affiliation(s)
- Kristiina Ausmees
- Department of Information Technology, Uppsala University, Uppsala 751 05, Sweden
| | - Federico Sanchez-Quinto
- Instituto Nacional de Medicina Genómica (INMEGEN), Mexico City 14610, Mexico.,Human Evolution, Department of Organismal Biology, Uppsala University, Uppsala 752 36, Sweden
| | - Mattias Jakobsson
- Human Evolution, Department of Organismal Biology, Uppsala University, Uppsala 752 36, Sweden
| | - Carl Nettelblad
- Department of Information Technology, Uppsala University, Uppsala 751 05, Sweden
| |
Collapse
|
11
|
Genome-wide analysis reveals associations between climate and regional patterns of adaptive divergence and dispersal in American pikas. Heredity (Edinb) 2021; 127:443-454. [PMID: 34537819 PMCID: PMC8551249 DOI: 10.1038/s41437-021-00472-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 09/06/2021] [Accepted: 09/06/2021] [Indexed: 02/07/2023] Open
Abstract
Understanding the role of adaptation in species' responses to climate change is important for evaluating the evolutionary potential of populations and informing conservation efforts. Population genomics provides a useful approach for identifying putative signatures of selection and the underlying environmental factors or biological processes that may be involved. Here, we employed a population genomic approach within a space-for-time study design to investigate the genetic basis of local adaptation and reconstruct patterns of movement across rapidly changing environments in a thermally sensitive mammal, the American pika (Ochotona princeps). Using genotypic data at 49,074 single-nucleotide polymorphisms (SNPs), we analyzed patterns of genome-wide diversity, structure, and migration along three independent elevational transects located at the northern extent (Tweedsmuir South Provincial Park, British Columbia, Canada) and core (North Cascades National Park, Washington, USA) of the Cascades lineage. We identified 899 robust outlier SNPs within- and among-transects. Of those annotated to genes with known function, many were linked with cellular processes related to climate stress including ATP-binding, ATP citrate synthase activity, ATPase activity, hormone activity, metal ion-binding, and protein-binding. Moreover, we detected evidence for contrasting patterns of directional migration along transects across geographic regions that suggest an increased propensity for American pikas to disperse among lower elevation populations at higher latitudes where environments are generally cooler. Ultimately, our data indicate that fine-scale demographic patterns and adaptive processes may vary among populations of American pikas, providing an important context for evaluating biotic responses to climate change in this species and other alpine-adapted mammals.
Collapse
|
12
|
Meger J, Ulaszewski B, Burczyk J. Genomic signatures of natural selection at phenology-related genes in a widely distributed tree species Fagus sylvatica L. BMC Genomics 2021; 22:583. [PMID: 34332553 PMCID: PMC8325806 DOI: 10.1186/s12864-021-07907-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Accepted: 07/20/2021] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Diversity among phenology-related genes is predicted to be a contributing factor in local adaptations seen in widely distributed plant species that grow in climatically variable geographic areas, such as forest trees. European beech (Fagus sylvatica L.) is widespread, and is one of the most important broadleaved tree species in Europe; however, its potential for adaptation to climate change is a matter of uncertainty, and little is known about the molecular basis of climate change-relevant traits like bud burst. RESULTS We explored single nucleotide polymorphisms (SNP) at candidate genes related to bud burst in beech individuals sampled across 47 populations from Europe. SNP diversity was monitored for 380 candidate genes using a sequence capture approach, providing 2909 unlinked SNP loci. We used two complementary analytical methods to find loci significantly associated with geographic variables, climatic variables (expressed as principal components), or phenotypic variables (spring and autumn phenology, height, survival). Redundancy analysis (RDA) was used to detect candidate markers across two spatial scales (entire study area and within subregions). We revealed 201 candidate SNPs at the broadest scale, 53.2% of which were associated with phenotypic variables. Additive polygenic scores, which provide a measure of the cumulative signal across significant candidate SNPs, were correlated with a climate variable (first principal component, PC1) related to temperature and precipitation availability, and spring phenology. However, different genotype-environment associations were identified within Southeastern Europe as compared to the entire geographic range of European beech. CONCLUSIONS Environmental conditions play important roles as drivers of genetic diversity of phenology-related genes that could influence local adaptation in European beech. Selection in beech favors genotypes with earlier bud burst under warmer and wetter habitats within its range; however, selection pressures may differ across spatial scales.
Collapse
Affiliation(s)
- Joanna Meger
- Department of Genetics, Faculty of Biological Sciences, Kazimierz Wielki University, Chodkiewicza 30, 85-064, Bydgoszcz, Poland
| | - Bartosz Ulaszewski
- Department of Genetics, Faculty of Biological Sciences, Kazimierz Wielki University, Chodkiewicza 30, 85-064, Bydgoszcz, Poland
| | - Jaroslaw Burczyk
- Department of Genetics, Faculty of Biological Sciences, Kazimierz Wielki University, Chodkiewicza 30, 85-064, Bydgoszcz, Poland.
| |
Collapse
|
13
|
Jenkins CA, Schofield EC, Mellersh CS, De Risio L, Ricketts SL. Improving the resolution of canine genome-wide association studies using genotype imputation: A study of two breeds. Anim Genet 2021; 52:703-713. [PMID: 34252218 PMCID: PMC8514152 DOI: 10.1111/age.13117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 05/07/2021] [Accepted: 06/24/2021] [Indexed: 01/08/2023]
Abstract
Genotype imputation using a reference panel that combines high-density array data and publicly available whole genome sequence consortium variant data is potentially a cost-effective method to increase the density of extant lower-density array datasets. In this study, three datasets (two Border Collie; one Italian Spinone) generated using a legacy array (Illumina CanineHD, 173 662 SNPs) were utilised to assess the feasibility and accuracy of this approach and to gather additional evidence for the efficacy of canine genotype imputation. The cosmopolitan reference panels used to impute genotypes comprised dogs of 158 breeds, mixed breed dogs, wolves and Chinese indigenous dogs, as well as breed-specific individuals genotyped using the Axiom Canine HD array. The two Border Collie reference panels comprised 808 individuals including 79 Border Collies and 426 326 or 426 332 SNPs; and the Italian Spinone reference panel comprised 807 individuals including 38 Italian Spinoni and 476 313 SNPs. A high accuracy for imputation was observed, with the lowest accuracy observed for one of the Border Collie datasets (mean R2 = 0.94) and the highest for the Italian Spinone dataset (mean R2 = 0.97). This study’s findings demonstrate that imputation of a legacy array study set using a reference panel comprising both breed-specific array data and multi-breed variant data derived from whole genomes is effective and accurate. The process of canine genotype imputation, using the valuable growing resource of publicly available canine genome variant datasets alongside breed-specific data, is described in detail to facilitate and encourage use of this technique in canine genetics.
Collapse
Affiliation(s)
- Christopher A Jenkins
- Department of Veterinary Medicine, Kennel Club Genetics Centre1, University of Cambridge, Cambridge, UK.,Division of Population Health, Health Services Research & Primary Care, University of Manchester, Manchester, UK
| | | | - Ellen C Schofield
- Department of Veterinary Medicine, Kennel Club Genetics Centre1, University of Cambridge, Cambridge, UK
| | - Cathryn S Mellersh
- Department of Veterinary Medicine, Kennel Club Genetics Centre1, University of Cambridge, Cambridge, UK
| | - Luisa De Risio
- Neurology/Neurosurgery Service, Centre for Small Animal Studies, Animal Health Trust, Newmarket, Suffolk, UK
| | - Sally L Ricketts
- Department of Veterinary Medicine, Kennel Club Genetics Centre1, University of Cambridge, Cambridge, UK.,Division of Population Health, Health Services Research & Primary Care, University of Manchester, Manchester, UK
| |
Collapse
|
14
|
Charon C, Allodji R, Meyer V, Deleuze JF. Impact of pre- and post-variant filtration strategies on imputation. Sci Rep 2021; 11:6214. [PMID: 33737531 PMCID: PMC7973508 DOI: 10.1038/s41598-021-85333-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 02/22/2021] [Indexed: 01/04/2023] Open
Abstract
Quality control (QC) methods for genome-wide association studies and fine mapping are commonly used for imputation, however they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1089 NCBI recorded individuals for additional validation. Without QC-based variant pre-filtration, we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) < 0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). Thus, to maintain confidence and enough SNVs, we propose here a two-step filtering procedure which allows less stringent filtering prior to imputation and post-imputation in order to increase the number of very rare and rare variants compared to conservative filtration methods.
Collapse
Affiliation(s)
- Céline Charon
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France.
| | - Rodrigue Allodji
- Radiation Epidemiology Group CESP, Inserm Unit 1018, Gustave Roussy Université Paris Saclay, 114 rue Edouard Vaillant, Villejuif, 94805, France
| | - Vincent Meyer
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France
| | - Jean-François Deleuze
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France
| |
Collapse
|
15
|
Genome-wide haplotype association study in imaging genetics using whole-brain sulcal openings of 16,304 UK Biobank subjects. Eur J Hum Genet 2021; 29:1424-1437. [PMID: 33664500 PMCID: PMC8440755 DOI: 10.1038/s41431-021-00827-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 12/18/2020] [Accepted: 02/04/2021] [Indexed: 11/29/2022] Open
Abstract
Neuroimaging-genetics cohorts gather two types of data: brain imaging and genetic data. They allow the discovery of associations between genetic variants and brain imaging features. They are invaluable resources to study the influence of genetics and environment in the brain features variance observed in normal and pathological populations. This study presents a genome-wide haplotype analysis for 123 brain sulcus opening value (a measure of sulcal width) across the whole brain that include 16,304 subjects from UK Biobank. Using genetic maps, we defined 119,548 blocks of low recombination rate distributed along the 22 autosomal chromosomes and analyzed 1,051,316 haplotypes. To test associations between haplotypes and complex traits, we designed three statistical approaches. Two of them use a model that includes all the haplotypes for a single block, while the last approach considers each haplotype independently. All the statistics produced were assessed as rigorously as possible. Thanks to the rich imaging dataset at hand, we used resampling techniques to assess False Positive Rate for each statistical approach in a genome-wide and brain-wide context. The results on real data show that genome-wide haplotype analyses are more sensitive than single-SNP approach and account for local complex Linkage Disequilibrium (LD) structure, which makes genome-wide haplotype analysis an interesting and statistically sound alternative to the single-SNP counterpart.
Collapse
|
16
|
Negisho K, Shibru S, Pillen K, Ordon F, Wehner G. Genetic diversity of Ethiopian durum wheat landraces. PLoS One 2021; 16:e0247016. [PMID: 33596260 PMCID: PMC7888639 DOI: 10.1371/journal.pone.0247016] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Accepted: 01/30/2021] [Indexed: 01/27/2023] Open
Abstract
Genetic diversity and population structure assessment in crops is essential for marker trait association, marker assisted breeding and crop germplasm conservation. We analyzed a set of 285 durum wheat accessions comprising 215 Ethiopian durum wheat landraces, 10 released durum wheat varieties, 10 advanced durum wheat lines from Ethiopia, and 50 durum wheat lines from CIMMYT. We investigated the genetic diversity and population structure for the complete panel as well as for the 215 landraces, separately based on 11,919 SNP markers with known physical positions. The whole panel was clustered into two populations representing on the one hand mainly the landraces, and on the other hand mainly released, advanced and CIMMYT lines. Further population structure analysis of the landraces uncovered 4 subgroups emphasizing the high degree of genetic diversity within Ethiopian durum landraces. Population structure based AMOVA for both sets unveiled significant (P < 0.001) variation between populations and within populations. Total variation within population accessions (81%, 76%) was higher than total variation between populations (19%, 24%) for both sets. Population structure analysis based genetic differentiation (FST) and gene flow (Nm) for the whole set and the Ethiopian landraces were 0.19 and 0.24, 1.04, and 0.81, respectively indicating high genetic differentiation and limited gene flow. Diversity indices verify that the landrace panel was more diverse with (I = 0.7, He = 0.46, uHe = 0.46) than the advanced lines (I = 0.6, He = 0.42, uHe = 0.42). Similarly, differences within the landrace clusters were observed. In summary a high genetic diversity within Ethiopian durum wheat landraces was detected, which may be a target for national and international wheat improvement programs to exploit valuable traits for biotic and abiotic stresses.
Collapse
Affiliation(s)
- Kefyalew Negisho
- Ethiopian Institute of Agricultural Research (EIAR), National Agricultural Biotechnology Research Center, Holeta, Ethiopia
| | - Surafel Shibru
- Ethiopian Institute of Agricultural Research (EIAR), Melkassa Research Center, Melkassa, Ethiopia
| | - Klaus Pillen
- Martin-Luther-University, Institute of Agricultural and Nutritional Sciences, Halle (Saale), Germany
| | - Frank Ordon
- Julius Kühn Institute (JKI), Institute for Resistance Research and Stress Tolerance, Quedlinburg, Germany
| | - Gwendolin Wehner
- Julius Kühn Institute (JKI), Institute for Resistance Research and Stress Tolerance, Quedlinburg, Germany
| |
Collapse
|
17
|
Scott MF, Ladejobi O, Amer S, Bentley AR, Biernaskie J, Boden SA, Clark M, Dell'Acqua M, Dixon LE, Filippi CV, Fradgley N, Gardner KA, Mackay IJ, O'Sullivan D, Percival-Alwyn L, Roorkiwal M, Singh RK, Thudi M, Varshney RK, Venturini L, Whan A, Cockram J, Mott R. Multi-parent populations in crops: a toolbox integrating genomics and genetic mapping with breeding. Heredity (Edinb) 2020; 125:396-416. [PMID: 32616877 PMCID: PMC7784848 DOI: 10.1038/s41437-020-0336-6] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 06/16/2020] [Accepted: 06/16/2020] [Indexed: 11/21/2022] Open
Abstract
Crop populations derived from experimental crosses enable the genetic dissection of complex traits and support modern plant breeding. Among these, multi-parent populations now play a central role. By mixing and recombining the genomes of multiple founders, multi-parent populations combine many commonly sought beneficial properties of genetic mapping populations. For example, they have high power and resolution for mapping quantitative trait loci, high genetic diversity and minimal population structure. Many multi-parent populations have been constructed in crop species, and their inbred germplasm and associated phenotypic and genotypic data serve as enduring resources. Their utility has grown from being a tool for mapping quantitative trait loci to a means of providing germplasm for breeding programmes. Genomics approaches, including de novo genome assemblies and gene annotations for the population founders, have allowed the imputation of rich sequence information into the descendent population, expanding the breadth of research and breeding applications of multi-parent populations. Here, we report recent successes from crop multi-parent populations in crops. We also propose an ideal genotypic, phenotypic and germplasm 'package' that multi-parent populations should feature to optimise their use as powerful community resources for crop research, development and breeding.
Collapse
Affiliation(s)
| | | | - Samer Amer
- University of Reading, Reading, RG6 6AH, UK
- Faculty of Agriculture, Alexandria University, Alexandria, 23714, Egypt
| | - Alison R Bentley
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Jay Biernaskie
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Scott A Boden
- School of Agriculture, Food and Wine, University of Adelaide, Glen Osmond, SA, 5064, Australia
| | | | | | - Laura E Dixon
- Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK
| | - Carla V Filippi
- Instituto de Agrobiotecnología y Biología Molecular (IABIMO), INTA-CONICET, Nicolas Repetto y Los Reseros s/n, 1686, Hurlingham, Buenos Aires, Argentina
| | - Nick Fradgley
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Keith A Gardner
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Ian J Mackay
- SRUC, West Mains Road, Kings Buildings, Edinburgh, EH9 3JG, UK
| | | | | | - Manish Roorkiwal
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Rakesh Kumar Singh
- International Center for Biosaline Agriculture, Academic City, Dubai, United Arab Emirates
| | - Mahendar Thudi
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Rajeev Kumar Varshney
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | | | - Alex Whan
- CSIRO, GPO Box 1700, Canberra, ACT, 2601, Australia
| | - James Cockram
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Richard Mott
- UCL Genetics Institute, Gower Street, London, WC1E 6BT, UK
| |
Collapse
|
18
|
Cubry P, Pidon H, Ta KN, Tranchant-Dubreuil C, Thuillet AC, Holzinger M, Adam H, Kam H, Chrestin H, Ghesquière A, François O, Sabot F, Vigouroux Y, Albar L, Jouannic S. Genome Wide Association Study Pinpoints Key Agronomic QTLs in African Rice Oryza glaberrima. RICE (NEW YORK, N.Y.) 2020; 13:66. [PMID: 32936396 PMCID: PMC7494698 DOI: 10.1186/s12284-020-00424-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 08/31/2020] [Indexed: 05/08/2023]
Abstract
BACKGROUND African rice, Oryza glaberrima, is an invaluable resource for rice cultivation and for the improvement of biotic and abiotic resistance properties. Since its domestication in the inner Niger delta ca. 2500 years BP, African rice has colonized a variety of ecologically and climatically diverse regions. However, little is known about the genetic basis of quantitative traits and adaptive variation of agricultural interest for this species. RESULTS Using a reference set of 163 fully re-sequenced accessions, we report the results of a Genome Wide Association Study carried out for African rice. We investigated a diverse panel of traits, including flowering date, panicle architecture and resistance to Rice yellow mottle virus. For this, we devised a pipeline using complementary statistical association methods. First, using flowering time as a target trait, we found several association peaks, one of which co-localised with a well described gene in the Asian rice flowering pathway, OsGi, and identified new genomic regions that would deserve more study. Then we applied our pipeline to panicle- and resistance-related traits, highlighting some interesting genomic regions and candidate genes. Lastly, using a high-resolution climate database, we performed an association analysis based on climatic variables, searching for genomic regions that might be involved in adaptation to climatic variations. CONCLUSION Our results collectively provide insights into the extent to which adaptive variation is governed by sequence diversity within the O. glaberrima genome, paving the way for in-depth studies of the genetic basis of traits of interest that might be useful to the rice breeding community.
Collapse
Affiliation(s)
| | - Hélène Pidon
- DIADE, Univ Montpellier, IRD, Montpellier, France
- Present address: Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Kim Nhung Ta
- LMI RICE, AGI, IRD, Univ Montpellier, CIRAD, USTH, Hanoi, Vietnam
- Present address: National Institute of Genetics, Mishima, Shizuoka, Japan
| | | | | | | | - Hélène Adam
- DIADE, Univ Montpellier, IRD, Montpellier, France
| | | | | | | | - Olivier François
- Université Grenoble-Alpes, Centre National de la Recherche Scientifique, Grenoble, France
| | | | | | | | - Stefan Jouannic
- DIADE, Univ Montpellier, IRD, Montpellier, France.
- LMI RICE, AGI, IRD, Univ Montpellier, CIRAD, USTH, Hanoi, Vietnam.
| |
Collapse
|
19
|
Soifer L, Fong NL, Yi N, Ireland AT, Lam I, Sooknah M, Paw JS, Peluso P, Concepcion GT, Rank D, Hastie AR, Jojic V, Ruby JG, Botstein D, Roy MA. Fully Phased Sequence of a Diploid Human Genome Determined de Novo from the DNA of a Single Individual. G3 (BETHESDA, MD.) 2020; 10:2911-2925. [PMID: 32631951 PMCID: PMC7466960 DOI: 10.1534/g3.119.400995] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 06/26/2020] [Indexed: 12/17/2022]
Abstract
In recent years, improved sequencing technology and computational tools have made de novo genome assembly more accessible. Many approaches, however, generate either an unphased or only partially resolved representation of a diploid genome, in which polymorphisms are detected but not assigned to one or the other of the homologous chromosomes. Yet chromosomal phase information is invaluable for the understanding of phenotypic trait inheritance in the cases of compound heterozygosity, allele-specific expression or cis-acting variants. Here we use a combination of tools and sequencing technologies to generate a de novo diploid assembly of the human primary cell line WI-38. First, data from PacBio single molecule sequencing and Bionano Genomics optical mapping were combined to generate an unphased assembly. Next, 10x Genomics linked reads were combined with the hybrid assembly to generate a partially phased assembly. Lastly, we developed and optimized methods to use short-read (Illumina) sequencing of flow cytometry-sorted metaphase chromosomes to provide phase information. The final genome assembly was almost fully (94%) phased with the addition of approximately 2.5-fold coverage of Illumina data from the sequenced metaphase chromosomes. The diploid nature of the final de novo genome assembly improved the resolution of structural variants between the WI-38 genome and the human reference genome. The phased WI-38 sequence data are available for browsing and download at wi38.research.calicolabs.com. Our work shows that assembling a completely phased diploid genome de novo from the DNA of a single individual is now readily achievable.
Collapse
Affiliation(s)
- Llya Soifer
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | - Nicole L Fong
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | - Nelda Yi
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | | | - Irene Lam
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | | | | | | | | | - David Rank
- Pacific Biosciences, Menlo Park, CA 94025
| | | | | | - J Graham Ruby
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | | | | |
Collapse
|
20
|
Akdemir D, Knox R, Isidro y Sánchez J. Combining Partially Overlapping Multi-Omics Data in Databases Using Relationship Matrices. FRONTIERS IN PLANT SCIENCE 2020; 11:947. [PMID: 32765543 PMCID: PMC7381228 DOI: 10.3389/fpls.2020.00947] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 06/10/2020] [Indexed: 05/08/2023]
Abstract
Private and public breeding programs, as well as companies and universities, have developed different genomics technologies that have resulted in the generation of unprecedented amounts of sequence data, which bring new challenges in terms of data management, query, and analysis. The magnitude and complexity of these datasets bring new challenges but also an opportunity to use the data available as a whole. Detailed phenotype data, combined with increasing amounts of genomic data, have an enormous potential to accelerate the identification of key traits to improve our understanding of quantitative genetics. Data harmonization enables cross-national and international comparative research, facilitating the extraction of new scientific knowledge. In this paper, we address the complex issue of combining high dimensional and unbalanced omics data. More specifically, we propose a covariance-based method for combining partial datasets in the genotype to phenotype spectrum. This method can be used to combine partially overlapping relationship/covariance matrices. Here, we show with applications that our approach might be advantageous to feature imputation based approaches; we demonstrate how this method can be used in genomic prediction using heterogeneous marker data and also how to combine the data from multiple phenotypic experiments to make inferences about previously unobserved trait relationships. Our results demonstrate that it is possible to harmonize datasets to improve available information across gene-banks, data repositories, or other data resources.
Collapse
Affiliation(s)
- Deniz Akdemir
- Agriculture & Food Science Centre, Animal and Crop Science Division, University College Dublin, Dublin, Ireland
| | - Ron Knox
- SCRDC-CRDSW, Swift Current Research and Developmental Centre, Swift Current, SK, Canada
| | - Julio Isidro y Sánchez
- Agriculture & Food Science Centre, Animal and Crop Science Division, University College Dublin, Dublin, Ireland
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM – INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, Madrid, Spain
| |
Collapse
|
21
|
Magdy T, Kuo HH, Burridge PW. Precise and Cost-Effective Nanopore Sequencing for Post-GWAS Fine-Mapping and Causal Variant Identification. iScience 2020; 23:100971. [PMID: 32203907 PMCID: PMC7096756 DOI: 10.1016/j.isci.2020.100971] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 01/13/2020] [Accepted: 03/05/2020] [Indexed: 01/01/2023] Open
Abstract
Fine-mapping of interesting loci discovered by genome-wide association study (GWAS) is mandatory to pinpoint causal variants. Traditionally, this fine-mapping is completed through increasing the genotyping density at candidate loci, for which imputation is the current standard approach. Although imputation is a useful technique, it has a number of limitations that impede accuracy. In this work, we describe the development of a precise and cost-effective Nanopore sequencing-based pipeline that provides comprehensive and accurate information at candidate loci to identify potential causal single-nucleotide polymorphisms (SNPs). We demonstrate the utility of this technique via the fine-mapping of a GWAS positive hit comprising a synonymous SNP that is associated with doxorubicin-induced cardiotoxicity. In this work, we provide a proof of principle for the application of Nanopore sequencing in post-GWAS fine-mapping and pinpointing of potential causal SNPs with a minimal cost of just ~$10/100 kb/sample.
Collapse
Affiliation(s)
- Tarek Magdy
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA; Center for Pharmacogenomics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Hui-Hsuan Kuo
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA; Center for Pharmacogenomics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Paul W Burridge
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA; Center for Pharmacogenomics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA.
| |
Collapse
|
22
|
Zhang F, Wang Y, Mukiibi R, Chen L, Vinsky M, Plastow G, Basarab J, Stothard P, Li C. Genetic architecture of quantitative traits in beef cattle revealed by genome wide association studies of imputed whole genome sequence variants: I: feed efficiency and component traits. BMC Genomics 2020; 21:36. [PMID: 31931702 PMCID: PMC6956504 DOI: 10.1186/s12864-019-6362-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 12/02/2019] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Genome wide association studies (GWAS) on residual feed intake (RFI) and its component traits including daily dry matter intake (DMI), average daily gain (ADG), and metabolic body weight (MWT) were conducted in a population of 7573 animals from multiple beef cattle breeds based on 7,853,211 imputed whole genome sequence variants. The GWAS results were used to elucidate genetic architectures of the feed efficiency related traits in beef cattle. RESULTS The DNA variant allele substitution effects approximated a bell-shaped distribution for all the traits while the distribution of additive genetic variances explained by single DNA variants followed a scaled inverse chi-squared distribution to a greater extent. With a threshold of P-value < 1.00E-05, 16, 72, 88, and 116 lead DNA variants on multiple chromosomes were significantly associated with RFI, DMI, ADG, and MWT, respectively. In addition, lead DNA variants with potentially large pleiotropic effects on DMI, ADG, and MWT were found on chromosomes 6, 14 and 20. On average, missense, 3'UTR, 5'UTR, and other regulatory region variants exhibited larger allele substitution effects in comparison to other functional classes. Intergenic and intron variants captured smaller proportions of additive genetic variance per DNA variant. Instead 3'UTR and synonymous variants explained a greater amount of genetic variance per DNA variant for all the traits examined while missense, 5'UTR and other regulatory region variants accounted for relatively more additive genetic variance per sequence variant for RFI and ADG, respectively. In total, 25 to 27 enriched cellular and molecular functions were identified with lipid metabolism and carbohydrate metabolism being the most significant for the feed efficiency traits. CONCLUSIONS RFI is controlled by many DNA variants with relatively small effects whereas DMI, ADG, and MWT are influenced by a few DNA variants with large effects and many DNA variants with small effects. Nucleotide polymorphisms in regulatory region and synonymous functional classes play a more important role per sequence variant in determining variation of the feed efficiency traits. The genetic architecture as revealed by the GWAS of the imputed 7,853,211 DNA variants will improve our understanding on the genetic control of feed efficiency traits in beef cattle.
Collapse
Affiliation(s)
- Feng Zhang
- Lacombe Research and Development Centre, Agriculture and Agri-Food Canada, Lacombe, AB, Canada.,Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada.,State Key Laboratory for Swine Genetics, Breeding and Production Technology, Jiangxi Agricultural University, Nanchang, Jiangxi, China.,Present Address: Institute of Translational Medicine, Nanchang University, Nanchang, Jiangxi, China
| | - Yining Wang
- Lacombe Research and Development Centre, Agriculture and Agri-Food Canada, Lacombe, AB, Canada.,Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - Robert Mukiibi
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - Liuhong Chen
- Lacombe Research and Development Centre, Agriculture and Agri-Food Canada, Lacombe, AB, Canada.,Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - Michael Vinsky
- Lacombe Research and Development Centre, Agriculture and Agri-Food Canada, Lacombe, AB, Canada
| | - Graham Plastow
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - John Basarab
- Alberta Agriculture and Forestry, Lacombe Research and Development Centre, 6000 C&E Trail, Lacombe, AB, Canada
| | - Paul Stothard
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - Changxi Li
- Lacombe Research and Development Centre, Agriculture and Agri-Food Canada, Lacombe, AB, Canada. .,Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada.
| |
Collapse
|
23
|
Sure independence screening in the presence of missing data. Stat Pap (Berl) 2019. [DOI: 10.1007/s00362-019-01115-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
24
|
Statistical methods for genome-wide association studies. Semin Cancer Biol 2019; 55:53-60. [DOI: 10.1016/j.semcancer.2018.04.008] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2017] [Revised: 04/27/2018] [Accepted: 04/28/2018] [Indexed: 12/12/2022]
|
25
|
Ghoreishifar SM, Moradi-Shahrbabak H, Moradi-Shahrbabak M, Nicolazzi EL, Williams JL, Iamartino D, Nejati-Javaremi A. Accuracy of imputation of single-nucleotide polymorphism marker genotypes for water buffaloes (Bubalus bubalis) using different reference population sizes and imputation tools. Livest Sci 2018. [DOI: 10.1016/j.livsci.2018.08.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
26
|
Wu Y, Hormozdiari F, Joo JWJ, Eskin E. Improving Imputation Accuracy by Inferring Causal Variants in Genetic Studies. J Comput Biol 2018; 26:1203-1213. [PMID: 30272994 DOI: 10.1089/cmb.2018.0139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genotype imputation has been widely utilized for two reasons in the analysis of genome-wide association studies (GWAS). One reason is to increase the power for association studies when causal single nucleotide polymorphisms are not collected in the GWAS. The second reason is to aid the interpretation of a GWAS result by predicting the association statistics at untyped variants. In this article, we show that prediction of association statistics at untyped variants that have an influence on the trait produces is overly conservative. Current imputation methods assume that none of the variants in a region (locus consists of multiple variants) affect the trait, which is often inconsistent with the observed data. In this article, we propose a new method, CAUSAL-Imp, which can impute the association statistics at untyped variants while taking into account variants in the region that may affect the trait. Our method builds on recent methods that impute the marginal statistics for GWAS by utilizing the fact that marginal statistics follow a multivariate normal distribution. We utilize both simulated and real data sets to assess the performance of our method. We show that traditional imputation approaches underestimate the association statistics for variants involved in the trait, and our results demonstrate that our approach provides less biased estimates of these association statistics.
Collapse
Affiliation(s)
- Yue Wu
- Department of Computer Science, University of California Los Angeles, Los Angeles, California
| | - Farhad Hormozdiari
- Department of Computer Science, University of California Los Angeles, Los Angeles, California.,Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts
| | - Jong Wha J Joo
- Department of Computer Science and Engineering, Dongguk University, Seoul, South Korea
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, Los Angeles, California.,Department of Human Genetics, University of California Los Angeles, Los Angeles, California
| |
Collapse
|
27
|
Song Q, Xu W, Li W, He S, Liu J, Wang G, Ma L. Accurate haplotype imputation with individualized ancestry-adjusted reference panels. Genomics 2018; 110:329-335. [PMID: 29198611 DOI: 10.1016/j.ygeno.2017.11.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 11/01/2017] [Accepted: 11/29/2017] [Indexed: 11/23/2022]
Abstract
Accurate data imputation requires ethnicity-matched reference panels. However, recent admixtures have created mosaic human genomes, different chromosomal segments have different ethnic backgrounds, so it is impossible for a single-ethnicity reference panel to be the matched for data imputation. In this study, we explored a novel strategy for imputation. We created individualized mosaic reference panel for each person according to his/her ethnic backgrounds at each genomic locus. We examined on datasets with 70% missing values on haplotypes and 50% missing values on genotypes. Results showed that the imputation with mosaic references steadily yielded high imputation accuracy and outperforms the other strategies. With the mosaic reference panels, the imputation accuracy was 98.8±0.1% (CEU), 98.7±0.1% (YRI), 98.5±0.1% (CHB), 98.6±0.1% (ASW), 97.3±0.1% (MKK) and 98.2±0.1% (MXL). Mosaic reference panel will be one option for future missing value imputation in big data era.
Collapse
Affiliation(s)
- Qing Song
- Center of Big Data and Bioinformatics, First Affiliated Hospital of Medical School, Xi'an Jiaotong University, No. 277 Yanta Xi Street, Xi'an, Shaanxi 710061, China; Cardiovascular Research Institute, Department of Medicine, Morehouse School of Medicine, 720 Westview Drive SW, Atlanta, GA 30310, USA; 4DGENOME, 2360 Elon Way, Decatur, GA 30033, USA; Shapiro Cardiovascular Center, Brigham and Women's Hospital, Harvard Medical School, 75 Francis St., Boston MA02115, USA.
| | - Wei Xu
- Center of Big Data and Bioinformatics, First Affiliated Hospital of Medical School, Xi'an Jiaotong University, No. 277 Yanta Xi Street, Xi'an, Shaanxi 710061, China; Cardiovascular Research Institute, Department of Medicine, Morehouse School of Medicine, 720 Westview Drive SW, Atlanta, GA 30310, USA
| | - Wenzhi Li
- Center of Big Data and Bioinformatics, First Affiliated Hospital of Medical School, Xi'an Jiaotong University, No. 277 Yanta Xi Street, Xi'an, Shaanxi 710061, China; Cardiovascular Research Institute, Department of Medicine, Morehouse School of Medicine, 720 Westview Drive SW, Atlanta, GA 30310, USA
| | - Shaohua He
- 4DGENOME, 2360 Elon Way, Decatur, GA 30033, USA
| | - Jiankang Liu
- Shapiro Cardiovascular Center, Brigham and Women's Hospital, Harvard Medical School, 75 Francis St., Boston MA02115, USA
| | - Guangming Wang
- 4DGENOME, 2360 Elon Way, Decatur, GA 30033, USA; Genetic Test Center, First Affiliated Hospital of Dali University, Dali City, Yunnan 671000, China.
| | - Li Ma
- Cardiovascular Research Institute, Department of Medicine, Morehouse School of Medicine, 720 Westview Drive SW, Atlanta, GA 30310, USA; 4DGENOME, 2360 Elon Way, Decatur, GA 30033, USA.
| |
Collapse
|
28
|
Abstract
Genotype imputation has become a standard tool in genome-wide association studies because it enables researchers to inexpensively approximate whole-genome sequence data from genome-wide single-nucleotide polymorphism array data. Genotype imputation increases statistical power, facilitates fine mapping of causal variants, and plays a key role in meta-analyses of genome-wide association studies. Only variants that were previously observed in a reference panel of sequenced individuals can be imputed. However, the rapid increase in the number of deeply sequenced individuals will soon make it possible to assemble enormous reference panels that greatly increase the number of imputable variants. In this review, we present an overview of genotype imputation and describe the computational techniques that make it possible to impute genotypes from reference panels with millions of individuals.
Collapse
Affiliation(s)
- Sayantan Das
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109-2029, USA; ,
| | - Gonçalo R Abecasis
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109-2029, USA; ,
| | - Brian L Browning
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington 98195-7720, USA;
| |
Collapse
|
29
|
Liao B, Wang X, Zhu W, Li X, Cai L, Chen H. New multilocus linkage disequilibrium measure for tag SNP selection. J Bioinform Comput Biol 2017; 15:1750001. [DOI: 10.1142/s0219720017500019] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Numerous approaches have been proposed for selecting an optimal tag single-nucleotide polymorphism (SNP) set. Most of these approaches are based on linkage disequilibrium (LD). Classical LD measures, such as D′ and r2, are frequently used to quantify the relationship between two marker (pairwise) linkage disequilibria. Despite of their successful use in many applications, these measures cannot be used to measure the LD between multiple-marker. These LD measures need information about the frequencies of alleles collected from haplotype dataset. In this study, a cluster algorithm is proposed to cluster SNPs according to multilocus LD measure which is based on information theory. After that, tag SNPs are selected in each cluster optimized by the number of tag SNPs, prediction accuracy and so on. The experimental results show that this new LD measure can be directly applied to genotype dataset collected from the HapMap project, so that it saves the cost of haplotyping. More importantly, the proposed method significantly improves the efficiency and prediction accuracy of tag SNP selection.
Collapse
Affiliation(s)
- Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Xiangjun Wang
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Wen Zhu
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Xiong Li
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Haowen Chen
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| |
Collapse
|
30
|
|
31
|
Abstract
The Hardy-Weinberg principle, one of the most important principles in population genetics, was originally developed for the study of allele frequency changes in a population over generations. It is now, however, widely used in studies of human diseases to detect inbreeding, population stratification, and genotyping errors. For assessment of deviation from Hardy-Weinberg proportions in data, the most popular approaches include the asymptotic Pearson's chi-squared goodness-of-fit test and the exact test. Pearson's chi-squared goodness-of-fit test is simple and straightforward, but is very sensitive to a small sample size or rare allele frequency. The exact test of Hardy-Weinberg proportions is preferable in these situations. The exact test can be performed through complete enumeration of heterozygote genotypes or on the basis of the Markov chain Monte Carlo procedure. In this chapter, we describe the Hardy-Weinberg principle and the commonly used Hardy-Weinberg proportion tests and their applications, and we demonstrate how the chi-squared test and exact test of Hardy-Weinberg proportions can be performed step-by-step using the popular software programs SAS, R, and PLINK, which have been widely used in genetic association studies, along with numerical examples. We also discuss approaches for testing Hardy-Weinberg proportions in case-control study designs that are better than traditional approaches for testing Hardy-Weinberg proportions in controls only. Finally, we note that deviation from the Hardy-Weinberg proportions in affected individuals can provide evidence for an association between genetic variants and diseases.
Collapse
Affiliation(s)
- Jian Wang
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Sanjay Shete
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
| |
Collapse
|
32
|
Xiang T, Christensen OF, Vitezica ZG, Legarra A. Genomic evaluation by including dominance effects and inbreeding depression for purebred and crossbred performance with an application in pigs. Genet Sel Evol 2016; 48:92. [PMID: 27887565 PMCID: PMC5123321 DOI: 10.1186/s12711-016-0271-4] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 11/15/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Improved performance of crossbred animals is partly due to heterosis. One of the major genetic bases of heterosis is dominance, but it is seldom used in pedigree-based genetic evaluation of livestock. Recently, a trivariate genomic best linear unbiased prediction (GBLUP) model including dominance was developed, which can distinguish purebreds from crossbred animals explicitly. The objectives of this study were: (1) methodological, to show that inclusion of marker-based inbreeding accounts for directional dominance and inbreeding depression in purebred and crossbred animals, to revisit variance components of additive and dominance genetic effects using this model, and to develop marker-based estimators of genetic correlations between purebred and crossbred animals and of correlations of allele substitution effects between breeds; (2) to evaluate the impact of accounting for dominance effects and inbreeding depression on predictive ability for total number of piglets born (TNB) in a pig dataset composed of two purebred populations and their crossbreds. We also developed an equivalent model that makes the estimation of variance components tractable. RESULTS For TNB in Danish Landrace and Yorkshire populations and their reciprocal crosses, the estimated proportions of dominance genetic variance to additive genetic variance ranged from 5 to 11%. Genetic correlations between breeding values for purebred and crossbred performances for TNB ranged from 0.79 to 0.95 for Landrace and from 0.43 to 0.54 for Yorkshire across models. The estimated correlation of allele substitution effects between Landrace and Yorkshire was low for purebred performances, but high for crossbred performances. Predictive ability for crossbred animals was similar with or without dominance. The inbreeding depression effect increased predictive ability and the estimated inbreeding depression parameter was more negative for Landrace than for Yorkshire animals and was in between for crossbred animals. CONCLUSIONS Methodological developments led to closed-form estimators of inbreeding depression, variance components and correlations that can be easily interpreted in a quantitative genetics context. Our results confirm that genetic correlations of breeding values between purebred and crossbred performances within breed are positive and moderate. Inclusion of dominance in the GBLUP model does not improve predictive ability for crossbred animals, whereas inclusion of inbreeding depression does.
Collapse
Affiliation(s)
- Tao Xiang
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark. .,UR1388 GenPhySE, INRA, CS-52627, 31326, Castanet-Tolosan, France.
| | - Ole Fredslund Christensen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark
| | | | - Andres Legarra
- UR1388 GenPhySE, INRA, CS-52627, 31326, Castanet-Tolosan, France
| |
Collapse
|
33
|
SparRec: An effective matrix completion framework of missing data imputation for GWAS. Sci Rep 2016; 6:35534. [PMID: 27762341 PMCID: PMC5071878 DOI: 10.1038/srep35534] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 09/30/2016] [Indexed: 11/08/2022] Open
Abstract
Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.
Collapse
|
34
|
Brandariz SP, González Reymúndez A, Lado B, Malosetti M, Garcia AAF, Quincke M, von Zitzewitz J, Castro M, Matus I, del Pozo A, Castro AJ, Gutiérrez L. Ascertainment bias from imputation methods evaluation in wheat. BMC Genomics 2016; 17:773. [PMID: 27716058 PMCID: PMC5050639 DOI: 10.1186/s12864-016-3120-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 09/23/2016] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Whole-genome genotyping techniques like Genotyping-by-sequencing (GBS) are being used for genetic studies such as Genome-Wide Association (GWAS) and Genomewide Selection (GS), where different strategies for imputation have been developed. Nevertheless, imputation error may lead to poor performance (i.e. smaller power or higher false positive rate) when complete data is not required as it is for GWAS, and each marker is taken at a time. The aim of this study was to compare the performance of GWAS analysis for Quantitative Trait Loci (QTL) of major and minor effect using different imputation methods when no reference panel is available in a wheat GBS panel. RESULTS In this study, we compared the power and false positive rate of dissecting quantitative traits for imputed and not-imputed marker score matrices in: (1) a complete molecular marker barley panel array, and (2) a GBS wheat panel with missing data. We found that there is an ascertainment bias in imputation method comparisons. Simulating over a complete matrix and creating missing data at random proved that imputation methods have a poorer performance. Furthermore, we found that when QTL were simulated with imputed data, the imputation methods performed better than the not-imputed ones. On the other hand, when QTL were simulated with not-imputed data, the not-imputed method and one of the imputation methods performed better for dissecting quantitative traits. Moreover, larger differences between imputation methods were detected for QTL of major effect than QTL of minor effect. We also compared the different marker score matrices for GWAS analysis in a real wheat phenotype dataset, and we found minimal differences indicating that imputation did not improve the GWAS performance when a reference panel was not available. CONCLUSIONS Poorer performance was found in GWAS analysis when an imputed marker score matrix was used, no reference panel is available, in a wheat GBS panel.
Collapse
Affiliation(s)
- Sofía P. Brandariz
- Statistics Department, Facultad de Agronomía, Universidad de la República, Garzón 780, Montevideo, 12900 Uruguay
| | - Agustín González Reymúndez
- Statistics Department, Facultad de Agronomía, Universidad de la República, Garzón 780, Montevideo, 12900 Uruguay
| | - Bettina Lado
- Statistics Department, Facultad de Agronomía, Universidad de la República, Garzón 780, Montevideo, 12900 Uruguay
| | - Marcos Malosetti
- Biometris - Applied Statistics, Department of Plant Science, Wageningen University and Research Center, P.O. Box 16, 6700 AA Wageningen, Netherlands
| | - Antonio Augusto Franco Garcia
- Departamento de Ciências Exatas, Escola Superior de Agricultura “Luiz de Queiroz” (ESALQ), Universidade de São Paulo (USP), CP 9, CEP 13400-970 Piracicaba, SP Brazil
| | - Martín Quincke
- Programa Nacional de Investigación Cultivos de Secano, Instituto Nacional de investigación Agropecuaria, Est. Exp. La Estanzuela, Colonia, 70000 Uruguay
| | | | - Marina Castro
- Programa Nacional de Investigación Cultivos de Secano, Instituto Nacional de investigación Agropecuaria, Est. Exp. La Estanzuela, Colonia, 70000 Uruguay
| | - Iván Matus
- Instituto de Investigaciones Agropecuarias, Centro Regional de Investigación Quilamapu, Casilla 426, Chillán, Chile
| | - Alejandro del Pozo
- Facultad de Ciencias Agrarias, Universidad de Talca, Casilla 747, Talca, Chile
| | - Ariel J. Castro
- Department of Plant Production, Facultad de Agronomía, Universidad de la República, Ruta 3, Km.363, Paysandú, 60000 Uruguay
| | - Lucía Gutiérrez
- Statistics Department, Facultad de Agronomía, Universidad de la República, Garzón 780, Montevideo, 12900 Uruguay
- Department of Agronomy, University of Wisconsin-Madison, 1575 Linden Dr, Madison, WI 53706 USA
| |
Collapse
|
35
|
Grünwald NJ, McDonald BA, Milgroom MG. Population Genomics of Fungal and Oomycete Pathogens. ANNUAL REVIEW OF PHYTOPATHOLOGY 2016; 54:323-46. [PMID: 27296138 DOI: 10.1146/annurev-phyto-080614-115913] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
We are entering a new era in plant pathology in which whole-genome sequences of many individuals of a pathogen species are becoming readily available. Population genomics aims to discover genetic mechanisms underlying phenotypes associated with adaptive traits such as pathogenicity, virulence, fungicide resistance, and host specialization, as genome sequences or large numbers of single nucleotide polymorphisms become readily available from multiple individuals of the same species. This emerging field encompasses detailed genetic analyses of natural populations, comparative genomic analyses of closely related species, identification of genes under selection, and linkage analyses involving association studies in natural populations or segregating populations resulting from crosses. The era of pathogen population genomics will provide new opportunities and challenges, requiring new computational and analytical tools. This review focuses on conceptual and methodological issues as well as the approaches to answering questions in population genomics. The major steps start with defining relevant biological and evolutionary questions, followed by sampling, genotyping, and phenotyping, and ending in analytical methods and interpretations. We provide examples of recent applications of population genomics to fungal and oomycete plant pathogens.
Collapse
Affiliation(s)
- Niklaus J Grünwald
- Horticultural Crops Research Laboratory, USDA Agricultural Research Service, Corvallis, Oregon 97330;
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon 97331
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, New York 14853;
| | - Bruce A McDonald
- Plant Pathology, Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland;
| | - Michael G Milgroom
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, New York 14853;
| |
Collapse
|
36
|
Hormozdiari F, Kang EY, Bilow M, Ben-David E, Vulpe C, McLachlan S, Lusis AJ, Han B, Eskin E. Imputing Phenotypes for Genome-wide Association Studies. Am J Hum Genet 2016; 99:89-103. [PMID: 27292110 PMCID: PMC5005435 DOI: 10.1016/j.ajhg.2016.04.013] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 04/28/2016] [Indexed: 01/23/2023] Open
Abstract
Genome-wide association studies (GWASs) have been successful in detecting variants correlated with phenotypes of clinical interest. However, the power to detect these variants depends on the number of individuals whose phenotypes are collected, and for phenotypes that are difficult to collect, the sample size might be insufficient to achieve the desired statistical power. The phenotype of interest is often difficult to collect, whereas surrogate phenotypes or related phenotypes are easier to collect and have already been collected in very large samples. This paper demonstrates how we take advantage of these additional related phenotypes to impute the phenotype of interest or target phenotype and then perform association analysis. Our approach leverages the correlation structure between phenotypes to perform the imputation. The correlation structure can be estimated from a smaller complete dataset for which both the target and related phenotypes have been collected. Under some assumptions, the statistical power can be computed analytically given the correlation structure of the phenotypes used in imputation. In addition, our method can impute the summary statistic of the target phenotype as a weighted linear combination of the summary statistics of related phenotypes. Thus, our method is applicable to datasets for which we have access only to summary statistics and not to the raw genotypes. We illustrate our approach by analyzing associated loci to triglycerides (TGs), body mass index (BMI), and systolic blood pressure (SBP) in the Northern Finland Birth Cohort dataset.
Collapse
Affiliation(s)
- Farhad Hormozdiari
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Eun Yong Kang
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Michael Bilow
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Eyal Ben-David
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Chris Vulpe
- Department of Nutritional Science and Toxicology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Stela McLachlan
- Centre for Population Health Sciences, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh EH8 9AG, UK
| | - Aldons J Lusis
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Buhm Han
- Department of Convergence Medicine, University of Ulsan College of Medicine & Asan Institute for Life Sciences, Asan Medical Center, Seoul 05505, Republic of Korea.
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
37
|
Palmer C, Pe’er I. Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation. PLoS Genet 2016; 12:e1006091. [PMID: 27310603 PMCID: PMC4910998 DOI: 10.1371/journal.pgen.1006091] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 05/09/2016] [Indexed: 11/22/2022] Open
Abstract
Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data. Genetic research has been focused at analysis of datapoints that are assumed to be deterministically known. However, the majority of current, high throughput data is only probabilistically known, and proper methods for handing such uncertain genotypes are limited. Here, we build on existing theory from the field of statistics to introduce a general framework for handling probabilistic genotype data obtained through genotype imputation. This framework, called Multiple Imputation, matches or improves upon existing methods for handling uncertainty in basic analysis of genetic association. As opposed to such methods, our work furthermore extends to more advanced analysis, such as mixed-effects models, with no additional complication. Importantly, it generates posterior probabilities of association that are intrinsically weighted by the certainty of the underlying data, a feature unmatched by other existing methods. Multiple Imputation is also fully compatible with meta-analysis. Finally, our analysis of probabilistic genotype data brings into focus the accuracy and unreliability of imputation’s estimated probabilities. Taken together, these results substantially increase the utility of imputed genotypes in statistical genetics, and may have strong implications for analysis of sequencing data moving forward.
Collapse
Affiliation(s)
- Cameron Palmer
- Department of Systems Biology, Columbia University Medical Center, New York, New York, United States of America
- * E-mail:
| | - Itsik Pe’er
- Department of Systems Biology, Columbia University Medical Center, New York, New York, United States of America
- Department of Computer Science, Columbia University, New York, New York, United States of America
| |
Collapse
|
38
|
|
39
|
Xiang T, Nielsen B, Su G, Legarra A, Christensen OF. Application of single-step genomic evaluation for crossbred performance in pig1. J Anim Sci 2016; 94:936-48. [DOI: 10.2527/jas.2015-9930] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Affiliation(s)
- T. Xiang
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark
- INRA, UR1388 GenPhyse, CS-52627, F-31326 Castanet-Tolosan, France
| | - B. Nielsen
- SEGES, Pig Research Centre, DK-1609 Copenhagen, Denmark
| | - G. Su
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark
| | - A. Legarra
- INRA, UR1388 GenPhyse, CS-52627, F-31326 Castanet-Tolosan, France
| | - O. F. Christensen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark
| |
Collapse
|
40
|
Boettger LM, Salem RM, Handsaker RE, Peloso GM, Kathiresan S, Hirschhorn JN, McCarroll SA. Recurring exon deletions in the HP (haptoglobin) gene contribute to lower blood cholesterol levels. Nat Genet 2016; 48:359-66. [PMID: 26901066 PMCID: PMC4811681 DOI: 10.1038/ng.3510] [Citation(s) in RCA: 76] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 01/20/2016] [Indexed: 02/08/2023]
Abstract
Two exons of the human haptoglobin (HP) gene exhibit copy number variation that affects HP multimerization and underlies one of the first protein polymorphisms identified in humans. The evolutionary origins and medical significance of this polymorphism have been uncertain. Here we show that this variation has likely arisen from the recurring reversion of an ancient hominin-specific duplication of these exons. Though this polymorphism has been largely invisible to genome-wide genetic studies to date, we describe a way to analyze it by imputation from SNP haplotypes and find among 22,288 individuals that these HP exonic deletions associate with reduced LDL and total cholesterol levels. We show that these deletions, and a SNP that affects HP expression, are the likely drivers of the strong but complex association of cholesterol levels to SNPs near HP. Recurring exonic deletions in the haptoglobin gene likely enhance human health by lowering cholesterol levels in the blood.
Collapse
Affiliation(s)
- Linda M Boettger
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Rany M Salem
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Division of Endocrinology, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Robert E Handsaker
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Gina M Peloso
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Sekar Kathiresan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Joel N Hirschhorn
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Division of Endocrinology, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Steven A McCarroll
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
41
|
Abstract
Participants in the family-based analysis group at Genetic Analysis Workshop 19 addressed diverse topics, all of which used the family data. Topics addressed included questions of study design and data quality control (QC), genotype imputation to augment available sequence data, and linkage and/or association analyses. Results show that pedigree-based tests that are sensitive to genotype error may be useful for QC. Imputation quality improved with inclusion of small amounts of pedigree information used to phase the data in evaluation of 5 commonly used approaches for imputation in samples of (typically) unrelated subjects. It improved still further when pedigree-based imputation using larger pedigrees was also added. An important distinction was made between methods that do versus do not make use of Mendelian transmission in pedigrees, because this serves as a key difference between underlying models and assumptions. Methods that model relatedness generally had higher power in association testing than did analyses that carry out testing in the presence of a transmission model, but this may reflect details of implementation and/or ability of more general methods to jointly include data from larger pedigrees. In either case, for single nucleotide polymorphism-set approaches, weights that incorporate information on functional effects may be more useful than those that are based only on allele frequencies. The overall results demonstrate that family data continue to provide important information in the search for trait loci.
Collapse
Affiliation(s)
- Ellen M Wijsman
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98195, USA.
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA.
| |
Collapse
|
42
|
Li W, Xu W, Fu G, Ma L, Richards J, Rao W, Bythwood T, Guo S, Song Q. High-accuracy haplotype imputation using unphased genotype data as the references. Gene 2015; 572:279-84. [PMID: 26232609 PMCID: PMC5373555 DOI: 10.1016/j.gene.2015.07.082] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Revised: 06/24/2015] [Accepted: 07/28/2015] [Indexed: 12/19/2022]
Abstract
Enormously growing genomic datasets present a new challenge on missing data imputation, a notoriously resource-demanding task. Haplotype imputation requires ethnicity-matched references. However, to date, haplotype references are not available for the majority of populations in the world. We explored to use existing unphased genotype datasets as references; if it succeeds, it will cover almost all of the populations in the world. The results showed that our HiFi software successfully yields 99.43% accuracy with unphased genotype references. Our method provides a cost-effective solution to breakthrough the bottleneck of limited reference availability for haplotype imputation in the big data era.
Collapse
Affiliation(s)
- Wenzhi Li
- Department of Neurosurgery, First Affiliated Hospital of Medical School, Xi'an Jiaotong University, Xi'an, Shaanxi, China; Cardiovascular Research Institute, Morehouse School of Medicine, Atlanta, GA, USA
| | - Wei Xu
- Cardiovascular Research Institute, Morehouse School of Medicine, Atlanta, GA, USA
| | | | - Li Ma
- Cardiovascular Research Institute, Morehouse School of Medicine, Atlanta, GA, USA; 4DGenome Inc, Atlanta, GA, USA
| | - Jendai Richards
- Cardiovascular Research Institute, Morehouse School of Medicine, Atlanta, GA, USA
| | | | - Tameka Bythwood
- Cardiovascular Research Institute, Morehouse School of Medicine, Atlanta, GA, USA
| | - Shiwen Guo
- Department of Neurosurgery, First Affiliated Hospital of Medical School, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
| | - Qing Song
- Cardiovascular Research Institute, Morehouse School of Medicine, Atlanta, GA, USA; 4DGenome Inc, Atlanta, GA, USA; First Affiliated Hospital of Medical School, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
| |
Collapse
|
43
|
Abstract
Imputation is a powerful in silico approach to fill in those missing values in the big datasets. This process requires a reference panel, which is a collection of big data from which the missing information can be extracted and imputed. Haplotype imputation requires ethnicity-matched references; a mismatched reference panel will significantly reduce the quality of imputation. However, currently existing big datasets cover only a small number of ethnicities, there is a lack of ethnicity-matched references for many ethnic populations in the world, which has hampered the data imputation of haplotypes and its downstream applications. To solve this issue, several approaches have been proposed and explored, including the mixed reference panel, the internal reference panel and genotype-converted reference panel. This review article provides the information and comparison between these approaches. Increasing evidence showed that not just one or two genetic elements dictate the gene activity and functions; instead, cis-interactions of multiple elements dictate gene activity. Cis-interactions require the interacting elements to be on the same chromosome molecule, therefore, haplotype analysis is essential for the investigation of cis-interactions among multiple genetic variants at different loci, and appears to be especially important for studying the common diseases. It will be valuable in a wide spectrum of applications from academic research, to clinical diagnosis, prevention, treatment, and pharmaceutical industry.
Collapse
Affiliation(s)
- Wenzhi Li
- Center of Big Data and Bioinformatics, First Affiliated Hospital of Medicine School, Xi'an Jiaotong University, Xi'an, Shaanxi, China; Cardiovascular Research Institute and Department of Medicine, Morehouse School of Medicine, Atlanta, Georgia, USA
| | - Wei Xu
- Center of Big Data and Bioinformatics, First Affiliated Hospital of Medicine School, Xi'an Jiaotong University, Xi'an, Shaanxi, China; Cardiovascular Research Institute and Department of Medicine, Morehouse School of Medicine, Atlanta, Georgia, USA
| | - Qiling Li
- Center of Big Data and Bioinformatics, First Affiliated Hospital of Medicine School, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Li Ma
- Cardiovascular Research Institute and Department of Medicine, Morehouse School of Medicine, Atlanta, Georgia, USA; 4DGenome Inc, Atlanta, Georgia, USA
| | - Qing Song
- Center of Big Data and Bioinformatics, First Affiliated Hospital of Medicine School, Xi'an Jiaotong University, Xi'an, Shaanxi, China; Cardiovascular Research Institute and Department of Medicine, Morehouse School of Medicine, Atlanta, Georgia, USA; 4DGenome Inc, Atlanta, Georgia, USA
| |
Collapse
|
44
|
Wang Y, Wylie T, Stothard P, Lin G. Whole genome SNP genotype piecemeal imputation. BMC Bioinformatics 2015; 16:340. [PMID: 26498158 PMCID: PMC4619096 DOI: 10.1186/s12859-015-0770-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 10/09/2015] [Indexed: 11/10/2022] Open
Abstract
Background Despite ongoing reductions in the cost of sequencing technologies, whole genome SNP genotype imputation is often used as an alternative for obtaining abundant SNP genotypes for genome wide association studies. Several existing genotype imputation methods can be efficient for this purpose, while achieving various levels of imputation accuracy. Recent empirical results have shown that the two-step imputation may improve accuracy by imputing the low density genotyped study animals to a medium density array first and then to the target density. We are interested in building a series of staircase arrays that lead the low density array to the high density array or even the whole genome, such that genotype imputation along these staircases can achieve the highest accuracy. Results For genotype imputation from a lower density to a higher density, we first show how to select untyped SNPs to construct a medium density array. Subsequently, we determine for each selected SNP those untyped SNPs to be imputed in the add-one two-step imputation, and lastly how the clusters of imputed genotype are pieced together as the final imputation result. We design extensive empirical experiments using several hundred sequenced and genotyped animals to demonstrate that our novel two-step piecemeal imputation always achieves an improvement compared to the one-step imputation by the state-of-the-art methods Beagle and FImpute. Using the two-step piecemeal imputation, we present some preliminary success on whole genome SNP genotype imputation for genotyped animals via a series of staircase arrays. Conclusions From a low SNP density to the whole genome, intermediate pseudo-arrays can be computationally constructed by selecting the most informative SNPs for untyped SNP genotype imputation. Such pseudo-array staircases are able to impute more accurately than the classic one-step imputation.
Collapse
Affiliation(s)
- Yining Wang
- Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.
| | - Tim Wylie
- Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada. .,Currently with Department of Computer Science, University of Texas - Rio Grande Valley, Edinburg, 78539, Texas, USA.
| | - Paul Stothard
- Department of Agricultural, Food, and Nutritional Science, University of Alberta, Edmonton, T6G 2C8, Alberta, Canada.
| | - Guohui Lin
- Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.
| |
Collapse
|
45
|
Xiang T, Ma P, Ostersen T, Legarra A, Christensen OF. Imputation of genotypes in Danish purebred and two-way crossbred pigs using low-density panels. Genet Sel Evol 2015; 47:54. [PMID: 26122927 PMCID: PMC4486706 DOI: 10.1186/s12711-015-0134-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Accepted: 06/13/2015] [Indexed: 01/30/2023] Open
Abstract
Background Genotype imputation is commonly used as an initial step in genomic selection since the accuracy of genomic selection does not decline if accurately imputed genotypes are used instead of actual genotypes but for a lower cost. Performance of imputation has rarely been investigated in crossbred animals and, in particular, in pigs. The extent and pattern of linkage disequilibrium differ in crossbred versus purebred animals, which may impact the performance of imputation. In this study, first we compared different scenarios of imputation from 5 K to 8 K single nucleotide polymorphisms (SNPs) in genotyped Danish Landrace and Yorkshire and crossbred Landrace-Yorkshire datasets and, second, we compared imputation from 8 K to 60 K SNPs in genotyped purebred and simulated crossbred datasets. All imputations were done using software Beagle version 3.3.2. Then, we investigated the reasons that could explain the differences observed. Results Genotype imputation performs as well in crossbred animals as in purebred animals when both parental breeds are included in the reference population. When the size of the reference population is very large, it is not necessary to use a reference population that combines the two breeds to impute the genotypes of purebred animals because a within-breed reference population can provide a very high level of imputation accuracy (correct rate ≥ 0.99, correlation ≥ 0.95). However, to ensure that similar imputation accuracies are obtained for crossbred animals, a reference population that combines both parental purebred animals is required. Imputation accuracies are higher when a larger proportion of haplotypes are shared between the reference population and the validation (imputed) populations. Conclusions The results from both real data and pedigree-based simulated data demonstrate that genotype imputation from low-density panels to medium-density panels is highly accurate in both purebred and crossbred pigs. In crossbred pigs, combining the parental purebred animals in the reference population is necessary to obtain high imputation accuracy. Electronic supplementary material The online version of this article (doi:10.1186/s12711-015-0134-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tao Xiang
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, DK-8830, Denmark. .,INRA, UR1388 GenPhySE, CS-52627, Castanet-Tolosan, F-31326, France.
| | - Peipei Ma
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, DK-8830, Denmark.
| | - Tage Ostersen
- Pig Research Centre, Danish Agricultural and Food Council, Copenhagen, DK-1609, Denmark.
| | - Andres Legarra
- INRA, UR1388 GenPhySE, CS-52627, Castanet-Tolosan, F-31326, France.
| | - Ole F Christensen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, DK-8830, Denmark.
| |
Collapse
|
46
|
Verma SS, de Andrade M, Tromp G, Kuivaniemi H, Pugh E, Namjou-Khales B, Mukherjee S, Jarvik GP, Kottyan LC, Burt A, Bradford Y, Armstrong GD, Derr K, Crawford DC, Haines JL, Li R, Crosslin D, Ritchie MD. Imputation and quality control steps for combining multiple genome-wide datasets. Front Genet 2014; 5:370. [PMID: 25566314 PMCID: PMC4263197 DOI: 10.3389/fgene.2014.00370] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Accepted: 10/03/2014] [Indexed: 12/16/2022] Open
Abstract
The electronic MEdical Records and GEnomics (eMERGE) network brings together DNA biobanks linked to electronic health records (EHRs) from multiple institutions. Approximately 51,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes), and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2) were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.
Collapse
Affiliation(s)
- Shefali S Verma
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University Pennsylvania, PA, USA
| | - Mariza de Andrade
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic Rochester, MN, USA
| | - Gerard Tromp
- The Sigfried and Janet Weis Center for Research, Geisinger Health System Danville, PA, USA
| | - Helena Kuivaniemi
- The Sigfried and Janet Weis Center for Research, Geisinger Health System Danville, PA, USA
| | - Elizabeth Pugh
- Center for Inherited Disease Research, John Hopkins University Baltimore, MD, USA
| | | | | | - Gail P Jarvik
- Department of Medicine, University of Washington Seattle, WA, USA
| | - Leah C Kottyan
- Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Amber Burt
- Department of Medicine, University of Washington Seattle, WA, USA
| | - Yuki Bradford
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University Pennsylvania, PA, USA
| | - Gretta D Armstrong
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University Pennsylvania, PA, USA
| | - Kimberly Derr
- The Sigfried and Janet Weis Center for Research, Geisinger Health System Danville, PA, USA
| | - Dana C Crawford
- Center for Human Genetics Research, Vanderbilt University Nashville, TN, USA ; Department of Epidemiology and Biostatistics, Case Western University Cleveland, OH, USA
| | - Jonathan L Haines
- Department of Epidemiology and Biostatistics, Case Western University Cleveland, OH, USA
| | - Rongling Li
- Division of Genomic Medicine, National Human Genome Research Institute Bethesda, MD, USA
| | - David Crosslin
- Department of Medicine, University of Washington Seattle, WA, USA
| | - Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University Pennsylvania, PA, USA
| |
Collapse
|
47
|
Zeng P, Zhao Y, Qian C, Zhang L, Zhang R, Gou J, Liu J, Liu L, Chen F. Statistical analysis for genome-wide association study. J Biomed Res 2014; 29:285-97. [PMID: 26243515 PMCID: PMC4547377 DOI: 10.7555/jbr.29.20140007] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 06/07/2014] [Accepted: 09/27/2014] [Indexed: 12/19/2022] Open
Abstract
In the past few years, genome-wide association study (GWAS) has made great successes in identifying genetic susceptibility loci underlying many complex diseases and traits. The findings provide important genetic insights into understanding pathogenesis of diseases. In this paper, we present an overview of widely used approaches and strategies for analysis of GWAS, offered a general consideration to deal with GWAS data. The issues regarding data quality control, population structure, association analysis, multiple comparison and visual presentation of GWAS results are discussed; other advanced topics including the issue of missing heritability, meta-analysis, set-based association analysis, copy number variation analysis and GWAS cohort analysis are also briefly introduced.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China.,Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu 221004, China
| | - Yang Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Cheng Qian
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Liwei Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Ruyang Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Jianwei Gou
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Jin Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Liya Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Feng Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China.
| |
Collapse
|
48
|
Kauwe JSK, Bailey MH, Ridge PG, Perry R, Wadsworth ME, Hoyt KL, Staley LA, Karch CM, Harari O, Cruchaga C, Ainscough BJ, Bales K, Pickering EH, Bertelsen S, Fagan AM, Holtzman DM, Morris JC, Goate AM. Genome-wide association study of CSF levels of 59 alzheimer's disease candidate proteins: significant associations with proteins involved in amyloid processing and inflammation. PLoS Genet 2014; 10:e1004758. [PMID: 25340798 PMCID: PMC4207667 DOI: 10.1371/journal.pgen.1004758] [Citation(s) in RCA: 96] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Accepted: 09/16/2014] [Indexed: 01/25/2023] Open
Abstract
Cerebrospinal fluid (CSF) 42 amino acid species of amyloid beta (Aβ42) and tau levels are strongly correlated with the presence of Alzheimer's disease (AD) neuropathology including amyloid plaques and neurodegeneration and have been successfully used as endophenotypes for genetic studies of AD. Additional CSF analytes may also serve as useful endophenotypes that capture other aspects of AD pathophysiology. Here we have conducted a genome-wide association study of CSF levels of 59 AD-related analytes. All analytes were measured using the Rules Based Medicine Human DiscoveryMAP Panel, which includes analytes relevant to several disease-related processes. Data from two independently collected and measured datasets, the Knight Alzheimer's Disease Research Center (ADRC) and Alzheimer's Disease Neuroimaging Initiative (ADNI), were analyzed separately, and combined results were obtained using meta-analysis. We identified genetic associations with CSF levels of 5 proteins (Angiotensin-converting enzyme (ACE), Chemokine (C-C motif) ligand 2 (CCL2), Chemokine (C-C motif) ligand 4 (CCL4), Interleukin 6 receptor (IL6R) and Matrix metalloproteinase-3 (MMP3)) with study-wide significant p-values (p<1.46×10−10) and significant, consistent evidence for association in both the Knight ADRC and the ADNI samples. These proteins are involved in amyloid processing and pro-inflammatory signaling. SNPs associated with ACE, IL6R and MMP3 protein levels are located within the coding regions of the corresponding structural gene. The SNPs associated with CSF levels of CCL4 and CCL2 are located in known chemokine binding proteins. The genetic associations reported here are novel and suggest mechanisms for genetic control of CSF and plasma levels of these disease-related proteins. Significant SNPs in ACE and MMP3 also showed association with AD risk. Our findings suggest that these proteins/pathways may be valuable therapeutic targets for AD. Robust associations in cognitively normal individuals suggest that these SNPs also influence regulation of these proteins more generally and may therefore be relevant to other diseases. The use of quantitative endophenotypes from cerebrospinal fluid has led to the identification of several genetic variants that alter risk or rate of progression of Alzheimer's disease. Here we have analyzed the levels of 58 disease-related proteins in the cerebrospinal fluid for association with millions of variants across the human genome. We have identified significant, replicable associations with 5 analytes, Angiotensin-converting enzyme, Chemokine (C-C motif) ligand 2, Chemokine (C-C motif) ligand 4, Interleukin 6 receptor and Matrix metalloproteinase-3. Our results suggest that these variants play a regulatory role in the respective protein levels and are relevant to the inflammatory and amyloid processing pathways. Variants in associated with ACE and those associated with MMP3 levels also show association with risk for Alzheimer's disease in the expected directions. These associations are consistent in cerebrospinal fluid and plasma and in samples with only cognitively normal individuals suggesting that they are relevant in the regulation of these protein levels beyond the context of Alzheimer's disease.
Collapse
Affiliation(s)
- John S. K. Kauwe
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| | - Matthew H. Bailey
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| | - Perry G. Ridge
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| | - Rachel Perry
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| | - Mark E. Wadsworth
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| | - Kaitlyn L. Hoyt
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| | - Lyndsay A. Staley
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| | - Celeste M. Karch
- Department of Psychiatry, Washington University School of Medicine, St Louis, Missouri, United States of America
- Hope Center for Neurological Disorders, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Oscar Harari
- Department of Psychiatry, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Carlos Cruchaga
- Department of Psychiatry, Washington University School of Medicine, St Louis, Missouri, United States of America
- Hope Center for Neurological Disorders, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Benjamin J. Ainscough
- The Genome Institute, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Kelly Bales
- Neuroscience Research Unit, Worldwide Research and Development, Pfizer Inc., Groton, Connecticut, United States of America
| | - Eve H. Pickering
- Neuroscience Research Unit, Worldwide Research and Development, Pfizer Inc., Groton, Connecticut, United States of America
| | - Sarah Bertelsen
- Department of Psychiatry, Washington University School of Medicine, St Louis, Missouri, United States of America
| | | | - Anne M. Fagan
- Hope Center for Neurological Disorders, Washington University School of Medicine, St Louis, Missouri, United States of America
- Knight Alzheimer's Disease Research Center, Washington University School of Medicine, St Louis, Missouri, United States of America
- Department of Neurology, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - David M. Holtzman
- Hope Center for Neurological Disorders, Washington University School of Medicine, St Louis, Missouri, United States of America
- Knight Alzheimer's Disease Research Center, Washington University School of Medicine, St Louis, Missouri, United States of America
- Department of Neurology, Washington University School of Medicine, St Louis, Missouri, United States of America
- Department of Developmental Biology, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - John C. Morris
- Hope Center for Neurological Disorders, Washington University School of Medicine, St Louis, Missouri, United States of America
- Knight Alzheimer's Disease Research Center, Washington University School of Medicine, St Louis, Missouri, United States of America
- Department of Neurology, Washington University School of Medicine, St Louis, Missouri, United States of America
- Department of Pathology and Immunology, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Alison M. Goate
- Department of Psychiatry, Washington University School of Medicine, St Louis, Missouri, United States of America
- Hope Center for Neurological Disorders, Washington University School of Medicine, St Louis, Missouri, United States of America
- Knight Alzheimer's Disease Research Center, Washington University School of Medicine, St Louis, Missouri, United States of America
- Department of Neurology, Washington University School of Medicine, St Louis, Missouri, United States of America
- Department of Genetics, Washington University School of Medicine, St Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
49
|
Impact of pre-imputation SNP-filtering on genotype imputation results. BMC Genet 2014; 15:88. [PMID: 25112433 PMCID: PMC4236550 DOI: 10.1186/s12863-014-0088-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Accepted: 07/18/2014] [Indexed: 11/10/2022] Open
Abstract
Background Imputation of partially missing or unobserved genotypes is an indispensable tool for SNP data analyses. However, research and understanding of the impact of initial SNP-data quality control on imputation results is still limited. In this paper, we aim to evaluate the effect of different strategies of pre-imputation quality filtering on the performance of the widely used imputation algorithms MaCH and IMPUTE. Results We considered three scenarios: imputation of partially missing genotypes with usage of an external reference panel, without usage of an external reference panel, as well as imputation of completely un-typed SNPs using an external reference panel. We first created various datasets applying different SNP quality filters and masking certain percentages of randomly selected high-quality SNPs. We imputed these SNPs and compared the results between the different filtering scenarios by using established and newly proposed measures of imputation quality. While the established measures assess certainty of imputation results, our newly proposed measures focus on the agreement with true genotypes. These measures showed that pre-imputation SNP-filtering might be detrimental regarding imputation quality. Moreover, the strongest drivers of imputation quality were in general the burden of missingness and the number of SNPs used for imputation. We also found that using a reference panel always improves imputation quality of partially missing genotypes. MaCH performed slightly better than IMPUTE2 in most of our scenarios. Again, these results were more pronounced when using our newly defined measures of imputation quality. Conclusion Even a moderate filtering has a detrimental effect on the imputation quality. Therefore little or no SNP filtering prior to imputation appears to be the best strategy for imputing small to moderately sized datasets. Our results also showed that for these datasets, MaCH performs slightly better than IMPUTE2 in most scenarios at the cost of increased computing time.
Collapse
|
50
|
Efficiency of haplotype-based methods to fine-map QTLs and embryonic lethal variants affecting fertility: Illustration with a deletion segregating in Nordic Red cattle. Livest Sci 2014. [DOI: 10.1016/j.livsci.2014.04.030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|