1
|
Abney M, ElSherbiny A. Kinpute: using identity by descent to improve genotype imputation. Bioinformatics 2019; 35:4321-4326. [PMID: 30918937 PMCID: PMC6821425 DOI: 10.1093/bioinformatics/btz221] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Revised: 02/21/2019] [Accepted: 03/26/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Genotype imputation, though generally accurate, often results in many genotypes being poorly imputed, particularly in studies where the individuals are not well represented by standard reference panels. When individuals in the study share regions of the genome identical by descent (IBD), it is possible to use this information in combination with a study-specific reference panel (SSRP) to improve the imputation results. Kinpute uses IBD information-due to recent, familial relatedness or distant, unknown ancestors-in conjunction with the output from linkage disequilibrium (LD) based imputation methods to compute more accurate genotype probabilities. Kinpute uses a novel method for IBD imputation, which works even in the absence of a pedigree, and results in substantially improved imputation quality. RESULTS Given initial estimates of average IBD between subjects in the study sample, Kinpute uses a novel algorithm to select an optimal set of individuals to sequence and use as an SSRP. Kinpute is designed to use as input both this SSRP and the genotype probabilities output from other LD-based imputation software, and uses a new method to combine the LD imputed genotype probabilities with IBD configurations to substantially improve imputation. We tested Kinpute on a human population isolate where 98 individuals have been sequenced. In half of this sample, whose sequence data was masked, we used Impute2 to perform LD-based imputation and Kinpute was used to obtain higher accuracy genotype probabilities. Measures of imputation accuracy improved significantly, particularly for those genotypes that Impute2 imputed with low certainty. AVAILABILITY AND IMPLEMENTATION Kinpute is an open-source and freely available C++ software package that can be downloaded from. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mark Abney
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Aisha ElSherbiny
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
2
|
Wang X, Long Y, Wang N, Zou J, Ding G, Broadley MR, White PJ, Yuan P, Zhang Q, Luo Z, Liu P, Zhao H, Zhang Y, Cai H, King GJ, Xu F, Meng J, Shi L. Breeding histories and selection criteria for oilseed rape in Europe and China identified by genome wide pedigree dissection. Sci Rep 2017; 7:1916. [PMID: 28507329 PMCID: PMC5432491 DOI: 10.1038/s41598-017-02188-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2016] [Accepted: 04/13/2017] [Indexed: 12/17/2022] Open
Abstract
Selection breeding has played a key role in the improvement of seed yield and quality in oilseed rape (Brassica napus L.). We genotyped Tapidor (European), Ningyou7 (Chinese) and their progenitors with the Brassica 60 K Illumina Infinium SNP array and mapped a total of 29,347 SNP markers onto the reference genome of Darmor-bzh. Identity by descent (IBD) refers to a haplotype segment of a chromosome inherited from a shared common ancestor. IBDs identified on the C subgenome were larger than those on the A subgenome within both the Tapidor and Ningyou7 pedigrees. IBD number and length were greater in the Ningyou7 pedigree than in the Tapidor pedigree. Seventy nine QTLs for flowering time, seed quality and root morphology traits were identified in the IBDs of Tapidor and Ningyou7. Many more candidate genes had been selected within the Ningyou7 pedigree than within the Tapidor pedigree. These results highlight differences in the transfer of favorable gene clusters controlling key traits during selection breeding in Europe and China.
Collapse
Affiliation(s)
- Xiaohua Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Lab of Cultivated Land Conservation, Ministry of Agriculture, Microelement Research Centre, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yan Long
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Biotechnology Research Institute, Chinese Academy of agricultural Science, Beijing, 100081, China
| | - Nian Wang
- College of Horticulture & Forestry Sciences, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jun Zou
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Guangda Ding
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Lab of Cultivated Land Conservation, Ministry of Agriculture, Microelement Research Centre, Huazhong Agricultural University, Wuhan, 430070, China
| | - Martin R Broadley
- Plant and Crop Sciences Division, School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough, LE12 5RD, United Kingdom
| | - Philip J White
- The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, United Kingdom
- King Saud University, Riyadh, 11451, Saudi Arabia
| | - Pan Yuan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Lab of Cultivated Land Conservation, Ministry of Agriculture, Microelement Research Centre, Huazhong Agricultural University, Wuhan, 430070, China
| | - Qianwen Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Lab of Cultivated Land Conservation, Ministry of Agriculture, Microelement Research Centre, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ziliang Luo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Peifa Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hua Zhao
- College of Horticulture & Forestry Sciences, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ying Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Lab of Cultivated Land Conservation, Ministry of Agriculture, Microelement Research Centre, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hongmei Cai
- Key Lab of Cultivated Land Conservation, Ministry of Agriculture, Microelement Research Centre, Huazhong Agricultural University, Wuhan, 430070, China
| | - Graham J King
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Southern Cross Plant Science, Southern Cross University, Lismore, NSW 2480, Australia
| | - Fangsen Xu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Lab of Cultivated Land Conservation, Ministry of Agriculture, Microelement Research Centre, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinling Meng
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Lei Shi
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
- Key Lab of Cultivated Land Conservation, Ministry of Agriculture, Microelement Research Centre, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
3
|
Saad M, Nato AQ, Grimson FL, Lewis SM, Brown LA, Blue EM, Thornton TA, Thompson EA, Wijsman EM. Identity-by-descent estimation with population- and pedigree-based imputation in admixed family data. BMC Proc 2016; 10:295-301. [PMID: 27980652 PMCID: PMC5133511 DOI: 10.1186/s12919-016-0046-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023] Open
Abstract
Background In the past few years, imputation approaches have been mainly used in population-based designs of genome-wide association studies, although both family- and population-based imputation methods have been proposed. With the recent surge of family-based designs, family-based imputation has become more important. Imputation methods for both designs are based on identity-by-descent (IBD) information. Apart from imputation, the use of IBD information is also common for several types of genetic analysis, including pedigree-based linkage analysis. Methods We compared the performance of several family- and population-based imputation methods in large pedigrees provided by Genetic Analysis Workshop 19 (GAW19). We also evaluated the performance of a new IBD mapping approach that we propose, which combines IBD information from known pedigrees with information from unrelated individuals. Results Different combinations of the imputation methods have varied imputation accuracies. Moreover, we showed gains from the use of both known pedigrees and unrelated individuals with our IBD mapping approach over the use of known pedigrees only. Conclusions Our results represent accuracies of different combinations of imputation methods that may be useful for data sets similar to the GAW19 pedigree data. Our IBD mapping approach, which uses both known pedigree and unrelated individuals, performed better than classical linkage analysis.
Collapse
Affiliation(s)
- Mohamad Saad
- Department of Biostatistics, University of Washington, Seattle, WA USA ; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA USA
| | - Alejandro Q Nato
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA USA
| | - Fiona L Grimson
- Department of Statistics, University of Washington, Seattle, WA USA
| | - Steven M Lewis
- Department of Statistics, University of Washington, Seattle, WA USA
| | - Lisa A Brown
- Department of Biostatistics, University of Washington, Seattle, WA USA
| | - Elizabeth M Blue
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA USA
| | | | | | - Ellen M Wijsman
- Department of Biostatistics, University of Washington, Seattle, WA USA ; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA USA ; Department of Genome Sciences, University of Washington, Seattle, WA USA
| |
Collapse
|
4
|
Saad M, Wijsman EM. Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees. Genet Epidemiol 2014; 38:579-90. [PMID: 25132070 PMCID: PMC4190076 DOI: 10.1002/gepi.21844] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 05/24/2014] [Accepted: 06/27/2014] [Indexed: 12/27/2022]
Abstract
In the last two decades, complex traits have become the main focus of genetic studies. The hypothesis that both rare and common variants are associated with complex traits is increasingly being discussed. Family-based association studies using relatively large pedigrees are suitable for both rare and common variant identification. Because of the high cost of sequencing technologies, imputation methods are important for increasing the amount of information at low cost. A recent family-based imputation method, Genotype Imputation Given Inheritance (GIGI), is able to handle large pedigrees and accurately impute rare variants, but does less well for common variants where population-based methods perform better. Here, we propose a flexible approach to combine imputation data from both family- and population-based methods. We also extend the Sequence Kernel Association Test for Rare and Common variants (SKAT-RC), originally proposed for data from unrelated subjects, to family data in order to make use of such imputed data. We call this extension "famSKAT-RC." We compare the performance of famSKAT-RC and several other existing burden and kernel association tests. In simulated pedigree sequence data, our results show an increase of imputation accuracy from use of our combining approach. Also, they show an increase of power of the association tests with this approach over the use of either family- or population-based imputation methods alone, in the context of rare and common variants. Moreover, our results show better performance of famSKAT-RC compared to the other considered tests, in most scenarios investigated here.
Collapse
Affiliation(s)
- Mohamad Saad
- Division of Medical Genetics, Department of Medicine; and Department
of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Ellen M. Wijsman
- Division of Medical Genetics, Department of Medicine; and Department
of Biostatistics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
5
|
Blue EM, Sun L, Tintle NL, Wijsman EM. Value of Mendelian laws of segregation in families: data quality control, imputation, and beyond. Genet Epidemiol 2014; 38 Suppl 1:S21-8. [PMID: 25112184 DOI: 10.1002/gepi.21821] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
When analyzing family data, we dream of perfectly informative data, even whole-genome sequences (WGSs) for all family members. Reality intervenes, and we find that next-generation sequencing (NGS) data have errors and are often too expensive or impossible to collect on everyone. The Genetic Analysis Workshop 18 working groups on quality control and dropping WGSs through families using a genome-wide association framework focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single-nucleotide polymorphisms, NGS data, and imputed data are generally concordant but that errors are particularly likely at rare variants, for homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelated individuals. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Computationally, fast rule-based imputation was accurate but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods and suggest possible future directions, such as improving communication between data collectors and data analysts, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models.
Collapse
Affiliation(s)
- Elizabeth M Blue
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | | | | | | |
Collapse
|