1
|
Jurcic EJ, Villalba PV, Pathauer PS, Palazzini DA, Oberschelp GPJ, Harrand L, Garcia MN, Aguirre NC, Acuña CV, Martínez MC, Rivas JG, Cisneros EF, López JA, Poltri SNM, Munilla S, Cappa EP. Single-step genomic prediction of Eucalyptus dunnii using different identity-by-descent and identity-by-state relationship matrices. Heredity (Edinb) 2021; 127:176-189. [PMID: 34145424 PMCID: PMC8322403 DOI: 10.1038/s41437-021-00450-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 06/07/2021] [Accepted: 06/07/2021] [Indexed: 02/05/2023] Open
Abstract
Genomic selection based on the single-step genomic best linear unbiased prediction (ssGBLUP) approach is becoming an important tool in forest tree breeding. The quality of the variance components and the predictive ability of the estimated breeding values (GEBV) depends on how well marker-based genomic relationships describe the actual genetic relationships at unobserved causal loci. We investigated the performance of GEBV obtained when fitting models with genomic covariance matrices based on two identity-by-descent (IBD) and two identity-by-state (IBS) relationship measures. Multiple-trait multiple-site ssGBLUP models were fitted to diameter and stem straightness in five open-pollinated progeny trials of Eucalyptus dunnii, genotyped using the EUChip60K. We also fitted the conventional ABLUP model with a pedigree-based covariance matrix. Estimated relationships from the IBD estimators displayed consistently lower standard deviations than those from the IBS approaches. Although ssGBLUP based in IBS estimators resulted in higher trait-site heritabilities, the gain in accuracy of the relationships using IBD estimators has resulted in higher predictive ability and lower bias of GEBV, especially for low-heritability trait-site. ssGBLUP based on IBS and IBD approaches performed considerably better than the traditional ABLUP. In summary, our results advocate the use of the ssGBLUP approach jointly with the IBD relationship matrix in open-pollinated forest tree evaluation.
Collapse
Affiliation(s)
- Esteban J Jurcic
- Instituto Nacional de Tecnología Agropecuaria (INTA), Instituto de Recursos Biológicos, Centro de Investigación en Recursos Naturales, Buenos Aires, Argentina.
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina.
| | - Pamela V Villalba
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - Pablo S Pathauer
- Instituto Nacional de Tecnología Agropecuaria (INTA), Instituto de Recursos Biológicos, Centro de Investigación en Recursos Naturales, Buenos Aires, Argentina
| | - Dino A Palazzini
- Instituto Nacional de Tecnología Agropecuaria (INTA), Instituto de Recursos Biológicos, Centro de Investigación en Recursos Naturales, Buenos Aires, Argentina
| | - Gustavo P J Oberschelp
- Instituto Nacional de Tecnología Agropecuaria (INTA), Estación Experimental Agropecuaria Concordia, Entre Ríos, Argentina
| | - Leonel Harrand
- Instituto Nacional de Tecnología Agropecuaria (INTA), Estación Experimental Agropecuaria Concordia, Entre Ríos, Argentina
| | - Martín N Garcia
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - Natalia C Aguirre
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - Cintia V Acuña
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - María C Martínez
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - Juan G Rivas
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - Esteban F Cisneros
- Facultad de Ciencias Forestales, Universidad Nacional de Santiago del Estero, Santiago del Estero, Argentina
| | - Juan A López
- Instituto Nacional de Tecnología Agropecuaria (INTA), Estación Experimental Agropecuaria Bella Vista, Corrientes, Argentina
| | - Susana N Marcucci Poltri
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - Sebastián Munilla
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Eduardo P Cappa
- Instituto Nacional de Tecnología Agropecuaria (INTA), Instituto de Recursos Biológicos, Centro de Investigación en Recursos Naturales, Buenos Aires, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| |
Collapse
|
2
|
Herzig AF, Nutile T, Ruggiero D, Ciullo M, Perdry H, Leutenegger AL. Detecting the dominance component of heritability in isolated and outbred human populations. Sci Rep 2018; 8:18048. [PMID: 30575761 PMCID: PMC6303332 DOI: 10.1038/s41598-018-36050-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Accepted: 11/10/2018] [Indexed: 11/21/2022] Open
Abstract
Inconsistencies between published estimates of dominance heritability between studies of human genetic isolates and human outbred populations incite investigation into whether such differences result from particular trait architectures or specific population structures. We analyse simulated datasets, characteristic of genetic isolates and of unrelated individuals, before analysing the isolate of Cilento for various commonly studied traits. We show the strengths of using genetic relationship matrices for variance decomposition over identity-by-descent based methods in a population isolate and that heritability estimates in isolates will avoid the downward biases that may occur in studies of samples of unrelated individuals; irrespective of the simulated distribution of causal variants. Yet, we also show that precise estimates of dominance in isolates are demonstrably problematic in the presence of shared environmental effects and such effects should be accounted for. Nevertheless, we demonstrate how studying isolates can help determine the existence or non-existence of dominance for complex traits, and we find strong indications of non-zero dominance for low-density lipoprotein level in Cilento. Finally, we recommend future study designs to analyse trait variance decomposition from ensemble data across multiple population isolates.
Collapse
Affiliation(s)
- Anthony F Herzig
- Inserm, U946, Genetic variation and Human diseases, Paris, France. .,Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.
| | - Teresa Nutile
- Institute of Genetics and Biophysics A. Buzzati-Traverso - CNR, Naples, Italy
| | - Daniela Ruggiero
- Institute of Genetics and Biophysics A. Buzzati-Traverso - CNR, Naples, Italy.,IRCCS Neuromed, Pozzilli, Isernia, Italy
| | - Marina Ciullo
- Institute of Genetics and Biophysics A. Buzzati-Traverso - CNR, Naples, Italy. .,IRCCS Neuromed, Pozzilli, Isernia, Italy.
| | - Hervé Perdry
- Université Paris-Saclay, University. Paris-Sud, Inserm, CESP, Villejuif, France
| | - Anne-Louise Leutenegger
- Inserm, U946, Genetic variation and Human diseases, Paris, France.,Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France
| |
Collapse
|
3
|
Gonzales NM, Seo J, Hernandez Cordero AI, St Pierre CL, Gregory JS, Distler MG, Abney M, Canzar S, Lionikas A, Palmer AA. Genome wide association analysis in a mouse advanced intercross line. Nat Commun 2018; 9:5162. [PMID: 30514929 PMCID: PMC6279738 DOI: 10.1038/s41467-018-07642-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 11/15/2018] [Indexed: 12/14/2022] Open
Abstract
The LG/J x SM/J advanced intercross line of mice (LG x SM AIL) is a multigenerational outbred population. High minor allele frequencies, a simple genetic background, and the fully sequenced LG and SM genomes make it a powerful population for genome-wide association studies. Here we use 1,063 AIL mice to identify 126 significant associations for 50 traits relevant to human health and disease. We also identify thousands of cis- and trans-eQTLs in the hippocampus, striatum, and prefrontal cortex of ~200 mice. We replicate an association between locomotor activity and Csmd1, which we identified in an earlier generation of this AIL, and show that Csmd1 mutant mice recapitulate the locomotor phenotype. Our results demonstrate the utility of the LG x SM AIL as a mapping population, identify numerous novel associations, and shed light on the genetic architecture of mammalian behavior.
Collapse
Affiliation(s)
- Natalia M Gonzales
- Department of Human Genetics, University of Chicago, Chicago, IL, 60637, USA
| | - Jungkyun Seo
- Center for Genomic & Computational Biology, Duke University, Durham, NC, 27708, USA
- Graduate Program in Computational Biology and Bioinformatics, Duke University, Durham, NC, 27708, USA
| | - Ana I Hernandez Cordero
- School of Medicine, Medical Sciences and Nutrition, College of Life Sciences and Medicine, University of Aberdeen, Aberdeen, AB25 2ZD, UK
| | - Celine L St Pierre
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, 63108, USA
| | - Jennifer S Gregory
- School of Medicine, Medical Sciences and Nutrition, College of Life Sciences and Medicine, University of Aberdeen, Aberdeen, AB25 2ZD, UK
| | - Margaret G Distler
- Department of Psychiatry and Biobehavioral Sciences, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Mark Abney
- Department of Human Genetics, University of Chicago, Chicago, IL, 60637, USA
| | - Stefan Canzar
- Gene Center, Ludwig-Maximilians-Universität München, 81377, Munich, Germany
| | - Arimantas Lionikas
- School of Medicine, Medical Sciences and Nutrition, College of Life Sciences and Medicine, University of Aberdeen, Aberdeen, AB25 2ZD, UK
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, La Jolla, CA, 92093, USA.
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
4
|
Peralta JM, Blackburn NB, Porto A, Blangero J, Charlesworth J. Genome-wide linkage scan for loci influencing plasma triglyceride levels. BMC Proc 2018; 12:52. [PMID: 30275898 PMCID: PMC6157192 DOI: 10.1186/s12919-018-0137-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
We conducted a genome-wide linkage scan to detect loci that influence the levels of fasting triglycerides in plasma. Fasting triglyceride levels were available at 4 time points (visits), 2 pre- and 2 post-fenofibrate intervention. Multipoint identity-by-descent (MIBD) matrices were derived from genotypes using IBDLD. Variance-component linkage analyses were then conducted using SOLAR (Sequential Oligogenic Linkage Analysis Routines). We found evidence of linkage (logarithm of odds [LOD] ≥3) at 5 chromosomal regions with triglyceride levels in plasma. The highest LOD scores were observed for linkage to the estimated genetic value (additive genetic component) of the log-normalized triglyceride levels in plasma. Our results suggest that a chromosome 10 locus at 37 cM (LODpre = 3.01, LODpost = 3.72) influences fasting triglyceride levels in plasma regardless of the fenofibrate intervention, and that loci in chromosomes 1 at 170 cM and 4 at 24 cM ceases to affect the triglyceride levels when fenofibrate is present, while the regions in chromosomes 6 at 136 to 162 cM and 11 at 39 to 40 cM appear to influence triglyceride levels in response to fenofibrate.
Collapse
Affiliation(s)
- Juan M. Peralta
- South Texas Diabetes and Obesity Institute, University of Texas at the Rio Grande Valley, One West University Blvd, Brownsville, TX 78520 USA
- Menzies Institute for Medical Research, University of Tasmania, 17 Liverpool Street, Hobart, TAS 7000 Australia
| | - Nicholas B. Blackburn
- South Texas Diabetes and Obesity Institute, University of Texas at the Rio Grande Valley, One West University Blvd, Brownsville, TX 78520 USA
- Menzies Institute for Medical Research, University of Tasmania, 17 Liverpool Street, Hobart, TAS 7000 Australia
| | - Arthur Porto
- South Texas Diabetes and Obesity Institute, University of Texas at the Rio Grande Valley, One West University Blvd, Brownsville, TX 78520 USA
| | - John Blangero
- South Texas Diabetes and Obesity Institute, University of Texas at the Rio Grande Valley, One West University Blvd, Brownsville, TX 78520 USA
- Menzies Institute for Medical Research, University of Tasmania, 17 Liverpool Street, Hobart, TAS 7000 Australia
| | - Jac Charlesworth
- Menzies Institute for Medical Research, University of Tasmania, 17 Liverpool Street, Hobart, TAS 7000 Australia
| |
Collapse
|
5
|
Hormozdiari F, Kang EY, Bilow M, Ben-David E, Vulpe C, McLachlan S, Lusis AJ, Han B, Eskin E. Imputing Phenotypes for Genome-wide Association Studies. Am J Hum Genet 2016; 99:89-103. [PMID: 27292110 PMCID: PMC5005435 DOI: 10.1016/j.ajhg.2016.04.013] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 04/28/2016] [Indexed: 01/23/2023] Open
Abstract
Genome-wide association studies (GWASs) have been successful in detecting variants correlated with phenotypes of clinical interest. However, the power to detect these variants depends on the number of individuals whose phenotypes are collected, and for phenotypes that are difficult to collect, the sample size might be insufficient to achieve the desired statistical power. The phenotype of interest is often difficult to collect, whereas surrogate phenotypes or related phenotypes are easier to collect and have already been collected in very large samples. This paper demonstrates how we take advantage of these additional related phenotypes to impute the phenotype of interest or target phenotype and then perform association analysis. Our approach leverages the correlation structure between phenotypes to perform the imputation. The correlation structure can be estimated from a smaller complete dataset for which both the target and related phenotypes have been collected. Under some assumptions, the statistical power can be computed analytically given the correlation structure of the phenotypes used in imputation. In addition, our method can impute the summary statistic of the target phenotype as a weighted linear combination of the summary statistics of related phenotypes. Thus, our method is applicable to datasets for which we have access only to summary statistics and not to the raw genotypes. We illustrate our approach by analyzing associated loci to triglycerides (TGs), body mass index (BMI), and systolic blood pressure (SBP) in the Northern Finland Birth Cohort dataset.
Collapse
Affiliation(s)
- Farhad Hormozdiari
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Eun Yong Kang
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Michael Bilow
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Eyal Ben-David
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Chris Vulpe
- Department of Nutritional Science and Toxicology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Stela McLachlan
- Centre for Population Health Sciences, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh EH8 9AG, UK
| | - Aldons J Lusis
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Buhm Han
- Department of Convergence Medicine, University of Ulsan College of Medicine & Asan Institute for Life Sciences, Asan Medical Center, Seoul 05505, Republic of Korea.
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
6
|
Park DS, Brown B, Eng C, Huntsman S, Hu D, Torgerson DG, Burchard EG, Zaitlen N. Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses. Bioinformatics 2015; 31:i181-9. [PMID: 26072481 PMCID: PMC4553832 DOI: 10.1093/bioinformatics/btv230] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation: Approaches to identifying new risk loci, training risk prediction models, imputing untyped variants and fine-mapping causal variants from summary statistics of genome-wide association studies are playing an increasingly important role in the human genetics community. Current summary statistics-based methods rely on global ‘best guess’ reference panels to model the genetic correlation structure of the dataset being studied. This approach, especially in admixed populations, has the potential to produce misleading results, ignores variation in local structure and is not feasible when appropriate reference panels are missing or small. Here, we develop a method, Adapt-Mix, that combines information across all available reference panels to produce estimates of local genetic correlation structure for summary statistics-based methods in arbitrary populations. Results: We applied Adapt-Mix to estimate the genetic correlation structure of both admixed and non-admixed individuals using simulated and real data. We evaluated our method by measuring the performance of two summary statistics-based methods: imputation and joint-testing. When using our method as opposed to the current standard of ‘best guess’ reference panels, we observed a 28% decrease in mean-squared error for imputation and a 73.7% decrease in mean-squared error for joint-testing. Availability and implementation: Our method is publicly available in a software package called ADAPT-Mix available at https://github.com/dpark27/adapt_mix. Contact:noah.zaitlen@ucsf.edu
Collapse
Affiliation(s)
- Danny S Park
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, Department of Computer Science, University of California Berkeley, Berkeley and Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Brielin Brown
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, Department of Computer Science, University of California Berkeley, Berkeley and Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Celeste Eng
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, Department of Computer Science, University of California Berkeley, Berkeley and Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Scott Huntsman
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, Department of Computer Science, University of California Berkeley, Berkeley and Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Donglei Hu
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, Department of Computer Science, University of California Berkeley, Berkeley and Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Dara G Torgerson
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, Department of Computer Science, University of California Berkeley, Berkeley and Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Esteban G Burchard
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, Department of Computer Science, University of California Berkeley, Berkeley and Department of Medicine, University of California San Francisco, San Francisco, CA, USA Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, Department of Computer Science, University of California Berkeley, Berkeley and Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Noah Zaitlen
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, Department of Computer Science, University of California Berkeley, Berkeley and Department of Medicine, University of California San Francisco, San Francisco, CA, USA Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, Department of Computer Science, University of California Berkeley, Berkeley and Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
7
|
FAPI: Fast and accurate P-value Imputation for genome-wide association study. Eur J Hum Genet 2015; 24:761-6. [PMID: 26306642 DOI: 10.1038/ejhg.2015.190] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Revised: 06/08/2015] [Accepted: 07/03/2015] [Indexed: 11/08/2022] Open
Abstract
Imputing individual-level genotypes (or genotype imputation) is now a standard procedure in genome-wide association studies (GWAS) to examine disease associations at untyped common genetic variants. Meta-analysis of publicly available GWAS summary statistics can allow more disease-associated loci to be discovered, but these data are usually provided for various variant sets. Thus imputing these summary statistics of different variant sets into a common reference panel for meta-analyses is impossible using traditional genotype imputation methods. Here we develop a fast and accurate P-value imputation (FAPI) method that utilizes summary statistics of common variants only. Its computational cost is linear with the number of untyped variants and has similar accuracy compared with IMPUTE2 with prephasing, one of the leading methods in genotype imputation. In addition, based on the FAPI idea, we develop a metric to detect abnormal association at a variant and showed that it had a significantly greater power compared with LD-PAC, a method that quantifies the evidence of spurious associations based on likelihood ratio. Our method is implemented in a user-friendly software tool, which is available at http://statgenpro.psychiatry.hku.hk/fapi.
Collapse
|
8
|
Xu Z, Duan Q, Yan S, Chen W, Li M, Lange E, Li Y. DISSCO: direct imputation of summary statistics allowing covariates. Bioinformatics 2015; 31:2434-42. [PMID: 25810429 DOI: 10.1093/bioinformatics/btv168] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Accepted: 03/17/2015] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Imputation of individual level genotypes at untyped markers using an external reference panel of genotyped or sequenced individuals has become standard practice in genetic association studies. Direct imputation of summary statistics can also be valuable, for example in meta-analyses where individual level genotype data are not available. Two methods (DIST and ImpG-Summary/LD), that assume a multivariate Gaussian distribution for the association summary statistics, have been proposed for imputing association summary statistics. However, both methods assume that the correlations between association summary statistics are the same as the correlations between the corresponding genotypes. This assumption can be violated in the presence of confounding covariates. METHODS We analytically show that in the absence of covariates, correlation among association summary statistics is indeed the same as that among the corresponding genotypes, thus serving as a theoretical justification for the recently proposed methods. We continue to prove that in the presence of covariates, correlation among association summary statistics becomes the partial correlation of the corresponding genotypes controlling for covariates. We therefore develop direct imputation of summary statistics allowing covariates (DISSCO). RESULTS We consider two real-life scenarios where the correlation and partial correlation likely make practical difference: (i) association studies in admixed populations; (ii) association studies in presence of other confounding covariate(s). Application of DISSCO to real datasets under both scenarios shows at least comparable, if not better, performance compared with existing correlation-based methods, particularly for lower frequency variants. For example, DISSCO can reduce the absolute deviation from the truth by 3.9-15.2% for variants with minor allele frequency <5%.
Collapse
Affiliation(s)
- Zheng Xu
- Department of Biostatistics, Department of Genetics, Department of Computer Science
| | - Qing Duan
- Department of Genetics, Curriculum in Bioinformatics and Computational Biology, Department of Statistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Song Yan
- Department of Biostatistics, Department of Genetics, Department of Computer Science
| | - Wei Chen
- Division of Pediatric Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh School of Medicine, Department of Biostatistics, Department of Human Genetics, University of Pittsburgh School of Public Health, Pittsburgh, PA 15224, USA and
| | - Mingyao Li
- Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA, USA
| | - Ethan Lange
- Department of Biostatistics, Department of Genetics
| | - Yun Li
- Department of Biostatistics, Department of Genetics, Department of Computer Science
| |
Collapse
|
9
|
Zheng C, Kuhner MK, Thompson EA. Joint inference of identity by descent along multiple chromosomes from population samples. J Comput Biol 2014; 21:185-200. [PMID: 24606562 DOI: 10.1089/cmb.2013.0140] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
There has been much interest in detecting genomic identity by descent (IBD) segments from modern dense genetic marker data and in using them to identify human disease susceptibility loci. Here we present a novel Bayesian framework using Markov chain Monte Carlo (MCMC) realizations to jointly infer IBD states among multiple individuals not known to be related, together with the allelic typing error rate and the IBD process parameters. The data are phased single nucleotide polymorphism (SNP) haplotypes. We model changes in latent IBD state along homologous chromosomes by a continuous time Markov model having the Ewens sampling formula as its stationary distribution. We show by simulation that this model for the IBD process fits quite well with the coalescent predictions. Using simulation data sets of 40 haplotypes over regions of 1 and 10 million base pairs (Mbp), we show that the jointly estimated IBD states are very close to the true values, although the presence of linkage disequilibrium decreases the accuracy. We also present comparisons with the ibd_haplo program, which estimates IBD among sets of four haplotypes. Our new IBD detection method focuses on the scale between genome-wide methods using simple IBD models and complex coalescent-based methods that are limited to short genome segments. At the scale of a few Mbp, our approach offers potentially more power for fine-scale IBD association mapping.
Collapse
Affiliation(s)
- Chaozhi Zheng
- 1 Department of Statistics, University of Washington , Seattle, Washington
| | | | | |
Collapse
|
10
|
Identity-by-descent mapping in a Scandinavian multiple sclerosis cohort. Eur J Hum Genet 2014; 23:688-92. [PMID: 25159868 PMCID: PMC4402631 DOI: 10.1038/ejhg.2014.155] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Revised: 06/16/2014] [Accepted: 07/01/2014] [Indexed: 01/15/2023] Open
Abstract
In an attempt to map chromosomal regions carrying rare gene variants contributing to the risk of multiple sclerosis (MS), we identified segments shared identical-by-descent (IBD) using the software BEAGLE 4.0's refined IBD analysis. IBD mapping aims at identifying segments inherited from a common ancestor and shared more frequently in case–case pairs. A total of 2106 MS patients of Nordic origin and 624 matched controls were genotyped on Illumina Human Quad 660 chip and an additional 1352 ethnically matched controls typed on Illumina HumanHap 550 and Illumina 1M were added. The quality control left a total of 441 731 markers for the analysis. After identification of segments shared by descent and significance testing, a filter function for markers with low IBD sharing was applied. Four regions on chromosomes 5, 9, 14 and 19 were found to be significantly associated with the risk for MS. However, all markers but for one were located telomerically, including the very distal markers. For methodological reasons, such segments have a low sharing of IBD signals and are prone to be false positives. One marker on chromosome 19 reached genome-wide significance and was not one of the distal markers. This marker was located within the GNA11 gene, which contains no previous association with MS. We conclude that IBD mapping is not sufficiently powered to identify MS risk loci even in ethnically relatively homogenous populations, or that alternatively rare variants are not adequately present.
Collapse
|
11
|
Gauvin H, Moreau C, Lefebvre JF, Laprise C, Vézina H, Labuda D, Roy-Gagnon MH. Genome-wide patterns of identity-by-descent sharing in the French Canadian founder population. Eur J Hum Genet 2013; 22:814-21. [PMID: 24129432 PMCID: PMC4023206 DOI: 10.1038/ejhg.2013.227] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Revised: 08/07/2013] [Accepted: 09/04/2013] [Indexed: 12/16/2022] Open
Abstract
In genetics the ability to accurately describe the familial relationships among a group of individuals can be very useful. Recent statistical tools succeeded in assessing the degree of relatedness up to 6-7 generations with good power using dense genome-wide single-nucleotide polymorphism data to estimate the extent of identity-by-descent (IBD) sharing. It is therefore important to describe genome-wide patterns of IBD sharing for more remote and complex relatedness between individuals, such as that observed in a founder population like Quebec, Canada. Taking advantage of the extended genealogical records of the French Canadian founder population, we first compared different tools to identify regions of IBD in order to best describe genome-wide IBD sharing and its correlation with genealogical characteristics. Results showed that the extent of IBD sharing identified with FastIBD correlates best with relatedness measured using genealogical data. Total length of IBD sharing explained 85% of the genealogical kinship's variance. In addition, we observed significantly higher sharing in pairs of individuals with at least one inbred ancestor compared with those without any. Furthermore, patterns of IBD sharing and average sharing were different across regional populations, consistent with the settlement history of Quebec. Our results suggest that, as expected, the complex relatedness present in founder populations is reflected in patterns of IBD sharing. Using these patterns, it is thus possible to gain insight on the types of distant relationships in a sample from a founder population like Quebec.
Collapse
Affiliation(s)
- Héloïse Gauvin
- 1] Département de médecine sociale et préventive, Université de Montréal, Montréal, Québec, Canada [2] Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada
| | - Claudia Moreau
- Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada
| | - Jean-François Lefebvre
- Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada
| | - Catherine Laprise
- Département des sciences fondamentales, Université du Québec à Chicoutimi, Chicoutimi, Québec, Canada
| | - Hélène Vézina
- Département des sciences humaines, Université du Québec à Chicoutimi, Chicoutimi, Québec, Canada
| | - Damian Labuda
- 1] Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada [2] Département de pédiatrie, Université de Montréal, Montréal, Québec, Canada
| | - Marie-Hélène Roy-Gagnon
- 1] Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada [2] Department of Epidemiology and Community Medicine, University of Ottawa, Ottawa, Ontario, Canada
| |
Collapse
|
12
|
Campbell CD, Chong JX, Malig M, Ko A, Dumont BL, Han L, Vives L, O’Roak BJ, Sudmant PH, Shendure J, Abney M, Ober C, Eichler EE. Estimating the human mutation rate using autozygosity in a founder population. Nat Genet 2012; 44:1277-81. [PMID: 23001126 PMCID: PMC3483378 DOI: 10.1038/ng.2418] [Citation(s) in RCA: 176] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 08/30/2012] [Indexed: 01/30/2023]
Abstract
Knowledge of the rate and pattern of new mutation is critical to the understanding of human disease and evolution. We used extensive autozygosity in a genealogically well-defined population of Hutterites to estimate the human sequence mutation rate over multiple generations. We sequenced whole genomes from 5 parent-offspring trios and identified 44 segments of autozygosity. Using the number of meioses separating each pair of autozygous alleles and the 72 validated heterozygous single-nucleotide variants (SNVs) from 512 Mb of autozygous DNA, we obtained an SNV mutation rate of 1.20 × 10(-8) (95% confidence interval 0.89-1.43 × 10(-8)) mutations per base pair per generation. The mutation rate for bases within CpG dinucleotides (9.72 × 10(-8)) was 9.5-fold that of non-CpG bases, and there was strong evidence (P = 2.67 × 10(-4)) for a paternal bias in the origin of new mutations (85% paternal). We observed a non-uniform distribution of heterozygous SNVs (both newly identified and known) in the autozygous segments (P = 0.001), which is suggestive of mutational hotspots or sites of long-range gene conversion.
Collapse
Affiliation(s)
| | - Jessica X. Chong
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637
| | - Maika Malig
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Arthur Ko
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Beth L. Dumont
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Lide Han
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637
| | - Laura Vives
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Brian J. O’Roak
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Peter H. Sudmant
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Mark Abney
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637
| | - Carole Ober
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637
- Department of Obstetrics and Gynecology, The University of Chicago, Chicago, IL 60637
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
- Howard Hughes Medical Institute, Seattle, WA 98195
| |
Collapse
|
13
|
Browning SR, Browning BL. Identity by descent between distant relatives: detection and applications. Annu Rev Genet 2012; 46:617-33. [PMID: 22994355 DOI: 10.1146/annurev-genet-110711-155534] [Citation(s) in RCA: 116] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Short segments of identity by descent (IBD) between individuals with no known relationship can be detected using genome-wide single nucleotide polymorphism data and recently developed statistical methodology. Emerging applications for the detected IBD segments include IBD mapping, haplotype phase inference, genotype imputation, and inference of population structure. In this review, we explain the principles behind methods for IBD segment detection, describe recently developed methods, discuss approaches to comparing methods, and give an overview of applications.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Statistics, University of Washington, Seattle, Washington 98195, USA.
| | | |
Collapse
|
14
|
Using identity by descent estimation with dense genotype data to detect positive selection. Eur J Hum Genet 2012; 21:205-11. [PMID: 22781100 DOI: 10.1038/ejhg.2012.148] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Identification of genomic loci and segments that are identical by descent (IBD) allows inference on problems such as relatedness detection, IBD disease mapping, heritability estimation and detection of recent or ongoing positive selection. Here, employing a novel statistical method, we use IBD to find signals of selection in the Maasai from Kinyawa, Kenya (MKK). In doing so, we demonstrate the advantage of statistical tools that can probabilistically estimate IBD sharing without having to thin genotype data because of linkage disequilibrium (LD), and that allow for both inbreeding and more than one allele to be shared IBD. We use our novel method, GIBDLD, to estimate IBD sharing between all pairs of individuals at all genotyped SNPs in the MKK, and, by looking for genomic regions showing excess IBD sharing in unrelated pairs, find loci that are known to have undergone recent selection (eg, the LCT gene and the HLA region) as well as many novel loci. Intriguingly, those loci that show the highest amount of excess IBD, with the exception of HLA, also show a substantial number of unrelated pairs sharing all four of their alleles IBD. In contrast to other IBD detection methods, GIBDLD provides accurate probabilistic estimates at each locus for all nine possible IBD sharing states between a pair of individuals, thus allowing for consanguinity, while also modeling LD, thus removing the need to thin SNPs. These characteristics will prove valuable for those doing genetic studies, and estimating IBD, in the wide variety of human populations.
Collapse
|
15
|
Zuvich RL, Armstrong LL, Bielinski SJ, Bradford Y, Carlson CS, Crawford DC, Crenshaw AT, de Andrade M, Doheny KF, Haines JL, Hayes MG, Jarvik GP, Jiang L, Kullo IJ, Li R, Ling H, Manolio TA, Matsumoto ME, McCarty CA, McDavid AN, Mirel DB, Olson LM, Paschall JE, Pugh EW, Rasmussen LV, Rasmussen-Torvik LJ, Turner SD, Wilke RA, Ritchie MD. Pitfalls of merging GWAS data: lessons learned in the eMERGE network and quality control procedures to maintain high data quality. Genet Epidemiol 2012; 35:887-98. [PMID: 22125226 DOI: 10.1002/gepi.20639] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Genome-wide association studies (GWAS) are a useful approach in the study of the genetic components of complex phenotypes. Aside from large cohorts, GWAS have generally been limited to the study of one or a few diseases or traits. The emergence of biobanks linked to electronic medical records (EMRs) allows the efficient reuse of genetic data to yield meaningful genotype-phenotype associations for multiple phenotypes or traits. Phase I of the electronic MEdical Records and GEnomics (eMERGE-I) Network is a National Human Genome Research Institute-supported consortium composed of five sites to perform various genetic association studies using DNA repositories and EMR systems. Each eMERGE site has developed EMR-based algorithms to comprise a core set of 14 phenotypes for extraction of study samples from each site's DNA repository. Each eMERGE site selected samples for a specific phenotype, and these samples were genotyped at either the Broad Institute or at the Center for Inherited Disease Research using the Illumina Infinium BeadChip technology. In all, approximately 17,000 samples from across the five sites were genotyped. A unified quality control (QC) pipeline was developed by the eMERGE Genomics Working Group and used to ensure thorough cleaning of the data. This process includes examination of sample and marker quality and various batch effects. Upon completion of the genotyping and QC analyses for each site's primary study, eMERGE Coordinating Center merged the datasets from all five sites. This larger merged dataset reentered the established eMERGE QC pipeline. Based on lessons learned during the process, additional analyses and QC checkpoints were added to the pipeline to ensure proper merging. Here, we explore the challenges associated with combining datasets from different genotyping centers and describe the expansion to eMERGE QC pipeline for merged datasets. These additional steps will be useful as the eMERGE project expands to include additional sites in eMERGE-II, and also serve as a starting point for investigators merging multiple genotype datasets accessible through the National Center for Biotechnology Information in the database of Genotypes and Phenotypes. Our experience demonstrates that merging multiple datasets after additional QC can be an efficient use of genotype data despite new challenges that appear in the process.
Collapse
Affiliation(s)
- Rebecca L Zuvich
- Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Fan YH, Song YQ. IPGWAS: an integrated pipeline for rational quality control and association analysis of genome-wide genetic studies. Biochem Biophys Res Commun 2012; 422:363-8. [PMID: 22564732 DOI: 10.1016/j.bbrc.2012.04.117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2012] [Accepted: 04/21/2012] [Indexed: 01/03/2023]
Abstract
Large numbers of samples and marker loci were tested for association in genome-wide association studies (GWAS). Hence, quality control (QC) by removing individuals or markers with low genotyping quality is of utmost importance to minimize potential false positive associations. IPGWAS was developed to facilitate the identification of the rational thresholds in QC of GWAS datasets, association analysis, Manhattan plot, quantile-quantile (QQ) plot, and format conversion for genetic analyses, such as meta-analysis, genotype phasing, and imputation. IPGWAS is a multiplatform application written in Perl with a graphical user interface (GUI) and available for free at http://sourceforge.net/projects/ipgwas/.
Collapse
Affiliation(s)
- Yan-Hui Fan
- Department of Biochemistry, The University of Hong Kong, Pokfulam, Hong Kong.
| | | |
Collapse
|
17
|
Abstract
We propose a novel aggregating U-test for gene-based association analysis. The method considers both rare and common variants. It adaptively searches for potential disease-susceptibility rare variants and collapses them into a single “supervariant.” A forward U-test is then used to assess the joint association of the supervariant and other common variants with quantitative traits. Using 200 simulated replicates from the Genetic Analysis Workshop 17 mini-exome data, we compare the performance of the proposed method with that of a commonly used approach, QuTie. We find that our method has an equivalent or greater power than QuTie to detect nine genes that influence the quantitative trait Q1. This new approach provides a powerful tool for detecting both common and rare variants associated with quantitative traits.
Collapse
Affiliation(s)
- Ming Li
- 1Department of Epidemiology, Michigan State University, B601 West Fee Hall, East Lansing, MI 48824, USA.
| | | | | |
Collapse
|