51
|
Mackay TFC, Huang W. Charting the genotype-phenotype map: lessons from the Drosophila melanogaster Genetic Reference Panel. WILEY INTERDISCIPLINARY REVIEWS. DEVELOPMENTAL BIOLOGY 2018; 7:10.1002/wdev.289. [PMID: 28834395 PMCID: PMC5746472 DOI: 10.1002/wdev.289+10.1002/wdev.289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Revised: 07/11/2017] [Accepted: 07/13/2017] [Indexed: 01/20/2024]
Abstract
Understanding the genetic architecture (causal molecular variants, their effects, and frequencies) of quantitative traits is important for precision agriculture and medicine and predicting adaptive evolution, but is challenging in most species. The Drosophila melanogaster Genetic Reference Panel (DGRP) is a collection of 205 inbred strains with whole genome sequences derived from a single wild population in Raleigh, NC, USA. The large amount of quantitative genetic variation, lack of population structure, and rapid local decay of linkage disequilibrium in the DGRP and outbred populations derived from DGRP lines present a favorable scenario for performing genome-wide association (GWA) mapping analyses to identify candidate causal genes, polymorphisms, and pathways affecting quantitative traits. The many GWA studies utilizing the DGRP have revealed substantial natural genetic variation for all reported traits, little evidence for variants with large effects but enrichment for variants with low P-values, and a tendency for lower frequency variants to have larger effects than more common variants. The variants detected in the GWA analyses rarely overlap those discovered using mutagenesis, and often are the first functional annotations of computationally predicted genes. Variants implicated in GWA analyses typically have sex-specific and genetic background-specific (epistatic) effects, as well as pleiotropic effects on other quantitative traits. Studies in the DGRP reveal substantial genetic control of environmental variation. Taking account of genetic architecture can greatly improve genomic prediction in the DGRP. These features of the genetic architecture of quantitative traits are likely to apply to other species, including humans. WIREs Dev Biol 2018, 7:e289. doi: 10.1002/wdev.289 This article is categorized under: Invertebrate Organogenesis > Flies.
Collapse
Affiliation(s)
- Trudy F C Mackay
- Program in Genetics, W. M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA
| | - Wen Huang
- Program in Genetics, W. M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
52
|
Abstract
While genome-wide association studies have been very successful in identifying associations of common genetic variants with many different traits, the rarer frequency spectrum of the genome has not yet been comprehensively explored. Technological developments increasingly lift restrictions to access rare genetic variation. Dense reference panels enable improved genotype imputation for rarer variants in studies using DNA microarrays. Moreover, the decreasing cost of next generation sequencing makes whole exome and genome sequencing increasingly affordable for large samples. Large-scale efforts based on sequencing, such as ExAC, 100,000 Genomes, and TopMed, are likely to significantly advance this field.The main challenge in evaluating complex trait associations of rare variants is statistical power. The choice of population should be considered carefully because allele frequencies and linkage disequilibrium structure differ between populations. Genetically isolated populations can have favorable genomic characteristics for the study of rare variants.One strategy to increase power is to assess the combined effect of multiple rare variants within a region, known as aggregate testing. A range of methods have been developed for this. Model performance depends on the genetic architecture of the region of interest.
Collapse
Affiliation(s)
- Karoline Kuchenbaecker
- Wellcome Trust Sanger Institute, Cambridge, UK. .,University College London, London, UK.
| | - Emil Vincent Rosenbaum Appel
- Novo Nordisk Foundation Center for Basic Metabolic Research, Section for Metabolic Genetics, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
53
|
Mackay TFC, Huang W. Charting the genotype-phenotype map: lessons from the Drosophila melanogaster Genetic Reference Panel. WILEY INTERDISCIPLINARY REVIEWS. DEVELOPMENTAL BIOLOGY 2018; 7:10.1002/wdev.289. [PMID: 28834395 PMCID: PMC5746472 DOI: 10.1002/wdev.289 10.1002/wdev.289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Revised: 07/11/2017] [Accepted: 07/13/2017] [Indexed: 11/30/2023]
Abstract
Understanding the genetic architecture (causal molecular variants, their effects, and frequencies) of quantitative traits is important for precision agriculture and medicine and predicting adaptive evolution, but is challenging in most species. The Drosophila melanogaster Genetic Reference Panel (DGRP) is a collection of 205 inbred strains with whole genome sequences derived from a single wild population in Raleigh, NC, USA. The large amount of quantitative genetic variation, lack of population structure, and rapid local decay of linkage disequilibrium in the DGRP and outbred populations derived from DGRP lines present a favorable scenario for performing genome-wide association (GWA) mapping analyses to identify candidate causal genes, polymorphisms, and pathways affecting quantitative traits. The many GWA studies utilizing the DGRP have revealed substantial natural genetic variation for all reported traits, little evidence for variants with large effects but enrichment for variants with low P-values, and a tendency for lower frequency variants to have larger effects than more common variants. The variants detected in the GWA analyses rarely overlap those discovered using mutagenesis, and often are the first functional annotations of computationally predicted genes. Variants implicated in GWA analyses typically have sex-specific and genetic background-specific (epistatic) effects, as well as pleiotropic effects on other quantitative traits. Studies in the DGRP reveal substantial genetic control of environmental variation. Taking account of genetic architecture can greatly improve genomic prediction in the DGRP. These features of the genetic architecture of quantitative traits are likely to apply to other species, including humans. WIREs Dev Biol 2018, 7:e289. doi: 10.1002/wdev.289 This article is categorized under: Invertebrate Organogenesis > Flies.
Collapse
Affiliation(s)
- Trudy F C Mackay
- Program in Genetics, W. M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA
| | - Wen Huang
- Program in Genetics, W. M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
54
|
Chen Z, Lu Y, Lin T, Liu Q, Wang K. Gene-based genetic association test with adaptive optimal weights. Genet Epidemiol 2017; 42:95-103. [DOI: 10.1002/gepi.22098] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 10/22/2017] [Indexed: 12/13/2022]
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics; School of Public Health; Indiana University Bloomington; Bloomington Indiana United States of America
| | - Yan Lu
- Department of Mathematics and Statistics; University of New Mexico; Albuquerque New Mexico United States of America
| | - Tong Lin
- The Key Laboratory of Machine Perception (Ministry of Education); School of EECS; Peking University; Beijing China
| | - Qingzhong Liu
- Department of Computer Science; Sam Houston State University; Huntsville Texas United States of America
| | - Kai Wang
- Department of Biostatistics; College of Public Health; University of Iowa; Iowa City Iowa United States of America
| |
Collapse
|
55
|
A Powerful Variant-Set Association Test Based on Chi-Square Distribution. Genetics 2017; 207:903-910. [PMID: 28912342 DOI: 10.1534/genetics.117.300287] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 09/10/2017] [Indexed: 01/19/2023] Open
Abstract
Detecting the association between a set of variants and a given phenotype has attracted a large amount of attention in the scientific community, although it is a difficult task. Recently, several related statistical approaches have been proposed in the literature; powerful statistical tests are still highly desired and yet to be developed in this area. In this paper, we propose a powerful test that combines information from each individual single nucleotide polymorphism (SNP) based on principal component analysis without relying on the eigenvalues associated with the principal components. We compare the proposed approach with some popular tests through a simulation study and real data applications. Our results show that, in general, the new test is more powerful than its competitors considered in this study; the gain in detecting power can be substantial in many situations.
Collapse
|
56
|
Jadhav S, Tong X, Lu Q. A functional U-statistic method for association analysis of sequencing data. Genet Epidemiol 2017; 41:636-643. [PMID: 28850771 DOI: 10.1002/gepi.22063] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Revised: 06/06/2017] [Accepted: 07/10/2017] [Indexed: 11/08/2022]
Abstract
Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence.
Collapse
Affiliation(s)
- Sneha Jadhav
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, United States of America
| | - Xiaoran Tong
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, United States of America
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
57
|
Mackay TFC, Huang W. Charting the genotype-phenotype map: lessons from the Drosophila melanogaster Genetic Reference Panel. WILEY INTERDISCIPLINARY REVIEWS-DEVELOPMENTAL BIOLOGY 2017; 7. [PMID: 28834395 DOI: 10.1002/wdev.289] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Revised: 07/11/2017] [Accepted: 07/13/2017] [Indexed: 11/08/2022]
Abstract
Understanding the genetic architecture (causal molecular variants, their effects, and frequencies) of quantitative traits is important for precision agriculture and medicine and predicting adaptive evolution, but is challenging in most species. The Drosophila melanogaster Genetic Reference Panel (DGRP) is a collection of 205 inbred strains with whole genome sequences derived from a single wild population in Raleigh, NC, USA. The large amount of quantitative genetic variation, lack of population structure, and rapid local decay of linkage disequilibrium in the DGRP and outbred populations derived from DGRP lines present a favorable scenario for performing genome-wide association (GWA) mapping analyses to identify candidate causal genes, polymorphisms, and pathways affecting quantitative traits. The many GWA studies utilizing the DGRP have revealed substantial natural genetic variation for all reported traits, little evidence for variants with large effects but enrichment for variants with low P-values, and a tendency for lower frequency variants to have larger effects than more common variants. The variants detected in the GWA analyses rarely overlap those discovered using mutagenesis, and often are the first functional annotations of computationally predicted genes. Variants implicated in GWA analyses typically have sex-specific and genetic background-specific (epistatic) effects, as well as pleiotropic effects on other quantitative traits. Studies in the DGRP reveal substantial genetic control of environmental variation. Taking account of genetic architecture can greatly improve genomic prediction in the DGRP. These features of the genetic architecture of quantitative traits are likely to apply to other species, including humans. WIREs Dev Biol 2018, 7:e289. doi: 10.1002/wdev.289 This article is categorized under: Invertebrate Organogenesis > Flies.
Collapse
Affiliation(s)
- Trudy F C Mackay
- Program in Genetics, W. M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA
| | - Wen Huang
- Program in Genetics, W. M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
58
|
Persyn E, Karakachoff M, Le Scouarnec S, Le Clézio C, Campion D, Consortium FE, Schott JJ, Redon R, Bellanger L, Dina C. DoEstRare: A statistical test to identify local enrichments in rare genomic variants associated with disease. PLoS One 2017; 12:e0179364. [PMID: 28742119 PMCID: PMC5524342 DOI: 10.1371/journal.pone.0179364] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 05/29/2017] [Indexed: 01/01/2023] Open
Abstract
Next-generation sequencing technologies made it possible to assay the effect of rare variants on complex diseases. As an extension of the "common disease-common variant" paradigm, rare variant studies are necessary to get a more complete insight into the genetic architecture of human traits. Association studies of these rare variations show new challenges in terms of statistical analysis. Due to their low frequency, rare variants must be tested by groups. This approach is then hindered by the fact that an unknown proportion of the variants could be neutral. The risk level of a rare variation may be determined by its impact but also by its position in the protein sequence. More generally, the molecular mechanisms underlying the disease architecture may involve specific protein domains or inter-genic regulatory regions. While a large variety of methods are optimizing functionality weights for each single marker, few evaluate variant position differences between cases and controls. Here, we propose a test called DoEstRare, which aims to simultaneously detect clusters of disease risk variants and global allele frequency differences in genomic regions. This test estimates, for cases and controls, variant position densities in the genetic region by a kernel method, weighted by a function of allele frequencies. We compared DoEstRare with previously published strategies through simulation studies as well as re-analysis of real datasets. Based on simulation under various scenarios, DoEstRare was the sole to consistently show highest performance, in terms of type I error and power both when variants were clustered or not. DoEstRare was also applied to Brugada syndrome and early-onset Alzheimer's disease data and provided complementary results to other existing tests. DoEstRare, by integrating variant position information, gives new opportunities to explain disease susceptibility. DoEstRare is implemented in a user-friendly R package.
Collapse
Affiliation(s)
- Elodie Persyn
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
| | - Matilde Karakachoff
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
- CHU Nantes, l’institut du thorax, Nantes, France
| | | | - Camille Le Clézio
- Inserm U1079, Rouen University, Normandy Center for Genomic Medicine and Personalized Medicine, Normandy University, Rouen, France
| | - Dominique Campion
- Inserm U1079, Rouen University, Normandy Center for Genomic Medicine and Personalized Medicine, Normandy University, Rouen, France
| | | | - Jean-Jacques Schott
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
- CHU Nantes, l’institut du thorax, Nantes, France
| | - Richard Redon
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
- CHU Nantes, l’institut du thorax, Nantes, France
| | - Lise Bellanger
- Laboratoire de Mathématiques Jean Leray, UMR CNRS 6629, Nantes, France
- * E-mail: (LB); (CD)
| | - Christian Dina
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
- CHU Nantes, l’institut du thorax, Nantes, France
- * E-mail: (LB); (CD)
| |
Collapse
|
59
|
Abstract
Despite thousands of genetic loci identified to date, a large proportion of genetic variation predisposing to complex disease and traits remains unaccounted for. Advances in sequencing technology enable focused explorations on the contribution of low-frequency and rare variants to human traits. Here we review experimental approaches and current knowledge on the contribution of these genetic variants in complex disease and discuss challenges and opportunities for personalised medicine.
Collapse
Affiliation(s)
- Lorenzo Bomba
- Human Genetics, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, CB10 1HH, UK
| | - Klaudia Walter
- Human Genetics, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, CB10 1HH, UK
| | - Nicole Soranzo
- Human Genetics, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, CB10 1HH, UK. .,Department of Haematology, University of Cambridge, Hills Rd, Cambridge, CB2 0AH, UK. .,The National Institute for Health Research Blood and Transplant Unit (NIHR BTRU) in Donor Health and Genomics at the University of Cambridge, University of Cambridge, Strangeways Research Laboratory, Wort's Causeway, Cambridge, CB1 8RN, UK.
| |
Collapse
|
60
|
Salehe BR, Jones CI, Di Fatta G, McGuffin LJ. RAPIDSNPs: A new computational pipeline for rapidly identifying key genetic variants reveals previously unidentified SNPs that are significantly associated with individual platelet responses. PLoS One 2017; 12:e0175957. [PMID: 28441463 PMCID: PMC5404774 DOI: 10.1371/journal.pone.0175957] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 04/03/2017] [Indexed: 01/14/2023] Open
Abstract
Advances in omics technologies have led to the discovery of genetic markers, or single nucleotide polymorphisms (SNPs), that are associated with particular diseases or complex traits. Although there have been significant improvements in the approaches used to analyse associations of SNPs with disease, further optimised and rapid techniques are needed to keep up with the rate of SNP discovery, which has exacerbated the 'missing heritability' problem. Here, we have devised a novel, integrated, heuristic-based, hybrid analytical computational pipeline, for rapidly detecting novel or key genetic variants that are associated with diseases or complex traits. Our pipeline is particularly useful in genetic association studies where the genotyped SNP data are highly dimensional, and the complex trait phenotype involved is continuous. In particular, the pipeline is more efficient for investigating small sets of genotyped SNPs defined in high dimensional spaces that may be associated with continuous phenotypes, rather than for the investigation of whole genome variants. The pipeline, which employs a consensus approach based on the random forest, was able to rapidly identify previously unseen key SNPs, that are significantly associated with the platelet response phenotype, which was used as our complex trait case study. Several of these SNPs, such as rs6141803 of COMMD7 and rs41316468 in PKT2B, have independently confirmed associations with cardiovascular diseases (CVDs) according to other unrelated studies, suggesting that our pipeline is robust in identifying key genetic variants. Our new pipeline provides an important step towards addressing the problem of 'missing heritability' through enhanced detection of key genetic variants (SNPs) that are associated with continuous complex traits/disease phenotypes.
Collapse
Affiliation(s)
| | - Chris Ian Jones
- School of Biological Sciences, University of Reading, Reading, United Kingdom
| | - Giuseppe Di Fatta
- Department of Computer Science, University of Reading, Reading, United Kingdom
| | - Liam James McGuffin
- School of Biological Sciences, University of Reading, Reading, United Kingdom
| |
Collapse
|
61
|
Wang C, Sun J, Guillaume B, Ge T, Hibar DP, Greenwood CMT, Qiu A. A Set-Based Mixed Effect Model for Gene-Environment Interaction and Its Application to Neuroimaging Phenotypes. Front Neurosci 2017; 11:191. [PMID: 28428742 PMCID: PMC5382297 DOI: 10.3389/fnins.2017.00191] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Accepted: 03/21/2017] [Indexed: 11/23/2022] Open
Abstract
Imaging genetics is an emerging field for the investigation of neuro-mechanisms linked to genetic variation. Although imaging genetics has recently shown great promise in understanding biological mechanisms for brain development and psychiatric disorders, studying the link between genetic variants and neuroimaging phenotypes remains statistically challenging due to the high-dimensionality of both genetic and neuroimaging data. This becomes even more challenging when studying gene-environment interaction (G×E) on neuroimaging phenotypes. In this study, we proposed a set-based mixed effect model for gene-environment interaction (MixGE) on neuroimaging phenotypes, such as structural volumes and tensor-based morphometry (TBM). MixGE incorporates both fixed and random effects of G×E to investigate homogeneous and heterogeneous contributions of multiple genetic variants and their interaction with environmental risks to phenotypes. We discuss the construction of score statistics for the terms associated with fixed and random effects of G×E to avoid direct parameter estimation in the MixGE model, which would greatly increase computational cost. We also describe how the score statistics can be combined into a single significance value to increase statistical power. We evaluated MixGE using simulated and real Alzheimer's Disease Neuroimaging Initiative (ADNI) data, and showed statistical power superior to other burden and variance component methods. We then demonstrated the use of MixGE for exploring the voxelwise effect of G×E on TBM, made feasible by the computational efficiency of MixGE. Through this, we discovered a potential interaction effect of gene ABCA7 and cardiovascular risk on local volume change of the right superior parietal cortex, which warrants further investigation.
Collapse
Affiliation(s)
- Changqing Wang
- NUS Graduate School for Integrative Sciences and Engineering, National University of SingaporeSingapore, Singapore
| | - Jianping Sun
- Department of Epidemiology, Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, McGill UniversityMontreal, QC, Canada
| | - Bryan Guillaume
- Department of Biomedical Engineering, National University of SingaporeSingapore, Singapore
| | - Tian Ge
- Athinoula A. Martinos Center for Biomedical Imaging, Harvard Medical School, Massachusetts General HospitalBoston, MA, USA.,Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General HospitalBoston, MA, USA
| | - Derrek P Hibar
- Imaging Genetics Center, Institute for Neuroimaging and Informatics, Keck School of Medicine of the University of Southern CaliforniaLos Angeles, CA, USA
| | - Celia M T Greenwood
- Department of Epidemiology, Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, McGill UniversityMontreal, QC, Canada.,Departments of Oncology, Epidemiology, Biostatistics and Occupational Health, and Human Genetics, McGill UniversityMontreal, QC, Canada
| | - Anqi Qiu
- Department of Biomedical Engineering, National University of SingaporeSingapore, Singapore.,Clinical Imaging Research Centre, National University of SingaporeSingapore, Singapore.,Singapore Institute for Clinical Sciences, Agency for Science, Technology, and ResearchSingapore, Singapore
| | | |
Collapse
|
62
|
Longitudinal data analysis for rare variants detection with penalized quadratic inference function. Sci Rep 2017; 7:650. [PMID: 28381821 PMCID: PMC5429681 DOI: 10.1038/s41598-017-00712-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Accepted: 03/08/2017] [Indexed: 11/08/2022] Open
Abstract
Longitudinal genetic data provide more information regarding genetic effects over time compared with cross-sectional data. Coupled with next-generation sequencing technologies, it becomes reality to identify important genes containing both rare and common variants in a longitudinal design. In this work, we adopted a weighted sum statistic (WSS) to collapse multiple variants in a gene region to form a gene score. When multiple genes in a pathway were considered together, a penalized longitudinal model under the quadratic inference function (QIF) framework was applied for efficient gene selection. We evaluated the estimation accuracy and model selection performance under different model settings, then applied the method to a real dataset from the Genetic Analysis Workshop 18 (GAW18). Compared with the unpenalized QIF method, the penalized QIF (pQIF) method achieved better estimation accuracy and higher selection efficiency. The pQIF remained optimal even when the working correlation structure was mis-specified. The real data analysis identified one important gene, angiotensin II receptor type 1 (AGTR1), in the Ca2+/AT-IIR/α-AR signaling pathway. The estimated effect implied that AGTR1 may have a protective effect for hypertension. Our pQIF method provides a general tool for longitudinal sequencing studies involving large numbers of genetic variants.
Collapse
|
63
|
Sofer T, Schifano ED, Christiani DC, Lin X. Weighted pseudolikelihood for SNP set analysis with multiple secondary outcomes in case-control genetic association studies. Biometrics 2017; 73:1210-1220. [PMID: 28346824 DOI: 10.1111/biom.12680] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 01/01/2017] [Accepted: 02/01/2017] [Indexed: 11/29/2022]
Abstract
We propose a weighted pseudolikelihood method for analyzing the association of a SNP set, example, SNPs in a gene or a genetic pathway or network, with multiple secondary phenotypes in case-control genetic association studies. To boost analysis power, we assume that the SNP-specific effects are shared across all secondary phenotypes using a scaled mean model. We estimate regression parameters using Inverse Probability Weighted (IPW) estimating equations obtained from the weighted pseudolikelihood, which accounts for case-control sampling to prevent potential ascertainment bias. To test the effect of a SNP set, we propose a weighted variance component pseudo-score test. We also propose a penalized IPW pseudolikelihood method for selecting a subset of SNPs that are associated with the multiple secondary phenotypes. We show that the proposed variable selection procedure has the oracle properties and is robust to misspecification of the correlation structure among secondary phenotypes. We select the tuning parameter using a weighted Bayesian Information-like Criterion (wBIC). We evaluate the finite sample performance of the proposed methods via simulations, and illustrate the methods by the analysis of the multiple secondary smoking behavior outcomes in a lung cancer case-control genetic association study.
Collapse
Affiliation(s)
- Tamar Sofer
- Department of Biostatistics, University of Washington, Seattle, Washington 98105, U.S.A
| | - Elizabeth D Schifano
- Department of Statistics, University of Connecticut, Storrs, Connecticut 06269, U.S.A
| | - David C Christiani
- Department of Environmental Health, Harvard School of Public Health, Boston, Massachusetts 02115, U.S.A
| | - Xihong Lin
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts 02115, U.S.A
| |
Collapse
|
64
|
Rytova AI, Khlebus EY, Shevtsov AE, Kutsenko VA, Shcherbakova NV, Zharikova AA, Ershova AI, Kiseleva AV, Boytsov SA, Yarovaya EB, Meshkov AN. Modern probabilistic and statistical approaches to search for nucleotide sequence options associated with integrated diseases. RUSS J GENET+ 2017. [DOI: 10.1134/s1022795417100088] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
65
|
Yang X, Wang S, Zhang S, Sha Q. Detecting association of rare and common variants based on cross-validation prediction error. Genet Epidemiol 2017; 41:233-243. [PMID: 28176359 DOI: 10.1002/gepi.22034] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 11/22/2016] [Accepted: 11/26/2016] [Indexed: 12/13/2022]
Abstract
Despite the extensive discovery of disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants may explain additional disease risk or trait variability. Although sequencing technology provides a supreme opportunity to investigate the roles of rare variants in complex diseases, detection of these variants in sequencing-based association studies presents substantial challenges. In this article, we propose novel statistical tests to test the association between rare and common variants in a genomic region and a complex trait of interest based on cross-validation prediction error (PE). We first propose a PE method based on Ridge regression. Based on PE, we also propose another two tests PE-WS and PE-TOW by testing a weighted combination of variants with two different weighting schemes. PE-WS is the PE version of the test based on the weighted sum statistic (WS) and PE-TOW is the PE version of the test based on the optimally weighted combination of variants (TOW). Using extensive simulation studies, we are able to show that (1) PE-TOW and PE-WS are consistently more powerful than TOW and WS, respectively, and (2) PE is the most powerful test when causal variants contain both common and rare variants.
Collapse
Affiliation(s)
- Xinlan Yang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | | | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| |
Collapse
|
66
|
Choi SH, Labadorf AT, Myers RH, Lunetta KL, Dupuis J, DeStefano AL. Evaluation of logistic regression models and effect of covariates for case-control study in RNA-Seq analysis. BMC Bioinformatics 2017; 18:91. [PMID: 28166718 PMCID: PMC5294900 DOI: 10.1186/s12859-017-1498-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 01/27/2017] [Indexed: 02/06/2023] Open
Abstract
Background Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. Results When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth’s logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. Conclusions We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth’s logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1498-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Seung Hoan Choi
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue, Boston, Massachusetts, USA
| | - Adam T Labadorf
- Department of Neurology, Boston University, 72 East Concord Street, Boston, Massachusetts, USA
| | - Richard H Myers
- Department of Neurology, Boston University, 72 East Concord Street, Boston, Massachusetts, USA
| | - Kathryn L Lunetta
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue, Boston, Massachusetts, USA
| | - Josée Dupuis
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue, Boston, Massachusetts, USA
| | - Anita L DeStefano
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue, Boston, Massachusetts, USA. .,Department of Neurology, Boston University, 72 East Concord Street, Boston, Massachusetts, USA.
| |
Collapse
|
67
|
Chen Z, Han S, Wang K. Genetic association test based on principal component analysis. Stat Appl Genet Mol Biol 2017; 16:189-198. [DOI: 10.1515/sagmb-2016-0061] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
AbstractMany gene- and pathway-based association tests have been proposed in the literature. Among them, the SKAT is widely used, especially for rare variants association studies. In this paper, we investigate the connection between SKAT and a principal component analysis. This investigation leads to a procedure that encompasses SKAT as a special case. Through simulation studies and real data applications, we compare the proposed method with some existing tests.
Collapse
|
68
|
Wang Z, Xu K, Zhang X, Wu X, Wang Z. Longitudinal SNP-set association analysis of quantitative phenotypes. Genet Epidemiol 2017; 41:81-93. [PMID: 27859628 PMCID: PMC5154867 DOI: 10.1002/gepi.22016] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 08/10/2016] [Accepted: 09/19/2016] [Indexed: 02/06/2023]
Abstract
Many genetic epidemiological studies collect repeated measurements over time. This design not only provides a more accurate assessment of disease condition, but allows us to explore the genetic influence on disease development and progression. Thus, it is of great interest to study the longitudinal contribution of genes to disease susceptibility. Most association testing methods for longitudinal phenotypes are developed for single variant, and may have limited power to detect association, especially for variants with low minor allele frequency. We propose Longitudinal SNP-set/sequence kernel association test (LSKAT), a robust, mixed-effects method for association testing of rare and common variants with longitudinal quantitative phenotypes. LSKAT uses several random effects to account for the within-subject correlation in longitudinal data, and allows for adjustment for both static and time-varying covariates. We also present a longitudinal trait burden test (LBT), where we test association between the trait and the burden score in linear mixed models. In simulation studies, we demonstrate that LBT achieves high power when variants are almost all deleterious or all protective, while LSKAT performs well in a wide range of genetic models. By making full use of trait values from repeated measures, LSKAT is more powerful than several tests applied to a single measurement or average over all time points. Moreover, LSKAT is robust to misspecification of the covariance structure. We apply the LSKAT and LBT methods to detect association with longitudinally measured body mass index in the Framingham Heart Study, where we are able to replicate association with a circadian gene NR1D2.
Collapse
Affiliation(s)
- Zhong Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Baker Institute for Animal Health, Cornell University, Ithaca, New York, United States of America
- Center for Computational Biology, Beijing Forestry University, Beijing, China
| | - Ke Xu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America
- VA Connecticut Healthcare System, West Haven, Connecticut, United States of America
| | - Xinyu Zhang
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America
- VA Connecticut Healthcare System, West Haven, Connecticut, United States of America
| | - Xiaowei Wu
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| |
Collapse
|
69
|
Zhu H, Wang Z, Wang X, Sha Q. A novel statistical method for rare-variant association studies in general pedigrees. BMC Proc 2016; 10:193-196. [PMID: 27980635 PMCID: PMC5133499 DOI: 10.1186/s12919-016-0029-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Both population-based and family-based designs are commonly used in genetic association studies to identify rare variants that underlie complex diseases. For any type of study design, the statistical power will be improved if rare variants can be enriched in the samples. Family-based designs, with ascertainment based on phenotype, may enrich the sample for causal rare variants and thus can be more powerful than population-based designs. Therefore, it is important to develop family-based statistical methods that can account for ascertainment. In this paper, we develop a novel statistical method for rare-variant association studies in general pedigrees for quantitative traits. This method uses a retrospective view that treats the traits as fixed and the genotypes as random, which allows us to account for complex and undefined ascertainment of families. We then apply the newly developed method to the Genetic Analysis Workshop 19 data set and compare the power of the new method with two other methods for general pedigrees. The results show that the newly proposed method increases power in most of the cases we consider, more than the other two methods.
Collapse
Affiliation(s)
- Huanhuan Zhu
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| | - Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, TX 76203-5017 USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| |
Collapse
|
70
|
Yoo YJ, Sun L, Poirier JG, Paterson AD, Bull SB. Multiple linear combination (MLC) regression tests for common variants adapted to linkage disequilibrium structure. Genet Epidemiol 2016; 41:108-121. [PMID: 27885705 PMCID: PMC5245123 DOI: 10.1002/gepi.22024] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Revised: 05/25/2016] [Accepted: 09/27/2016] [Indexed: 11/21/2022]
Abstract
By jointly analyzing multiple variants within a gene, instead of one at a time, gene‐based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive. It combines variant effects within the same cluster linearly, and aggregates cluster‐specific effects in a quadratic sum of squares and cross‐products, producing a test statistic with reduced degrees of freedom (df) equal to the number of clusters. By simulation studies of 1000 genes from across the genome, we demonstrate that MLC is a well‐powered and robust choice among existing methods across a broad range of gene structures. Compared to minimum P‐value, variance‐component, and principal‐component methods, the mean power of MLC is never much lower than that of other methods, and can be higher, particularly with multiple causal variants. Moreover, the variation in gene‐specific MLC test size and power across 1000 genes is less than that of other methods, suggesting it is a complementary approach for discovery in genome‐wide analysis. The cluster construction of the MLC test statistics helps reveal within‐gene LD structure, allowing interpretation of clustered variants as haplotypic effects, while multiple regression helps to distinguish direct and indirect associations.
Collapse
Affiliation(s)
- Yun Joo Yoo
- Department of Mathematics Education, Seoul National University, Seoul, South Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Lei Sun
- Department of Statistical Sciences, University of Toronto, Toronto, Canada.,Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Julia G Poirier
- Prosserman Centre for Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Canada
| | - Andrew D Paterson
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada.,Program in Genetics and Genome Biology, Hospital for Sick Children Research Institute, Toronto, Canada
| | - Shelley B Bull
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada.,Prosserman Centre for Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Canada
| |
Collapse
|
71
|
Sha Q, Zhang K, Zhang S. A Nonparametric Regression Approach to Control for Population Stratification in Rare Variant Association Studies. Sci Rep 2016; 6:37444. [PMID: 27857226 PMCID: PMC5114546 DOI: 10.1038/srep37444] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 10/28/2016] [Indexed: 01/31/2023] Open
Abstract
Recently, there is increasing interest to detect associations between rare variants and complex traits. Rare variant association studies usually need large sample sizes due to the rarity of the variants, and large sample sizes typically require combining information from different geographic locations within and across countries. Although several statistical methods have been developed to control for population stratification in common variant association studies, these methods are not necessarily controlling for population stratification in rare variant association studies. Thus, new statistical methods that can control for population stratification in rare variant association studies are needed. In this article, we propose a principal component based nonparametric regression (PC-nonp) approach to control for population stratification in rare variant association studies. Our simulations show that the proposed PC-nonp can control for population stratification well in all scenarios, while existing methods cannot control for population stratification at least in some scenarios. Simulations also show that PC-nonp's robustness to population stratification will not reduce power. Furthermore, we illustrate our proposed method by using whole genome sequencing data from genetic analysis workshop 18 (GAW18).
Collapse
Affiliation(s)
- Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| |
Collapse
|
72
|
Wang P, Rahman M, Jin L, Xiong M. A new statistical framework for genetic pleiotropic analysis of high dimensional phenotype data. BMC Genomics 2016; 17:881. [PMID: 27821073 PMCID: PMC5100198 DOI: 10.1186/s12864-016-3169-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 10/18/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The widely used genetic pleiotropic analyses of multiple phenotypes are often designed for examining the relationship between common variants and a few phenotypes. They are not suited for both high dimensional phenotypes and high dimensional genotype (next-generation sequencing) data. To overcome limitations of the traditional genetic pleiotropic analysis of multiple phenotypes, we develop sparse structural equation models (SEMs) as a general framework for a new paradigm of genetic analysis of multiple phenotypes. To incorporate both common and rare variants into the analysis, we extend the traditional multivariate SEMs to sparse functional SEMs. To deal with high dimensional phenotype and genotype data, we employ functional data analysis and the alternative direction methods of multiplier (ADMM) techniques to reduce data dimension and improve computational efficiency. RESULTS Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods. Simulations also demonstrate that the gene-based pleiotropic analysis has higher power than the single variant-based pleiotropic analysis. The proposed method is applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) with 11 phenotypes, which identifies a network with 137 genes connected to 11 phenotypes and 341 edges. Among them, 114 genes showed pleiotropic genetic effects and 45 genes were reported to be associated with phenotypes in the analysis or other cardiovascular disease (CVD) related phenotypes in the literature. CONCLUSIONS Our proposed sparse functional SEMs can incorporate both common and rare variants into the analysis and the ADMM algorithm can efficiently solve the penalized SEMs. Using this model we can jointly infer genetic architecture and casual phenotype network structure, and decompose the genetic effect into direct, indirect and total effect. Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods.
Collapse
Affiliation(s)
- Panpan Wang
- Human Genetics Center, Department of Biostatistics, University of Texas School of Public Health, Houston, TX, 77030, USA.,State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200433, China
| | - Mohammad Rahman
- Human Genetics Center, Department of Biostatistics, University of Texas School of Public Health, Houston, TX, 77030, USA
| | - Li Jin
- State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200433, China.
| | - Momiao Xiong
- Human Genetics Center, Department of Biostatistics, University of Texas School of Public Health, Houston, TX, 77030, USA. .,Human Genetics Center, The University of Texas Health Science Center at Houston, P.O. Box 20186, Houston, TX, 77225, USA.
| |
Collapse
|
73
|
Block-based association tests for rare variants using Kullback–Leibler divergence. J Hum Genet 2016; 61:965-975. [DOI: 10.1038/jhg.2016.90] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Revised: 05/03/2016] [Accepted: 06/17/2016] [Indexed: 11/09/2022]
|
74
|
POSTULA MAREK, JANICKI PIOTRKAZIMIERZ, ROSIAK MAREK, EYILETEN CEREN, ZAREMBA MAŁGORZATA, KAPLON-CIESLICKA AGNIESZKA, SUGINO SHIGEKAZU, KOSIOR DARIUSZARTUR, OPOLSKI GRZEGORZ, FILIPIAK KRZYSZTOFJERZY, MIROWSKA-GUZEL DAGMARA. Targeted deep resequencing of ALOX5 and ALOX5AP in patients with diabetes and association of rare variants with leukotriene pathways. Exp Ther Med 2016; 12:415-421. [PMID: 27347071 PMCID: PMC4906979 DOI: 10.3892/etm.2016.3334] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2015] [Accepted: 02/11/2016] [Indexed: 02/07/2023] Open
Abstract
The aim of the present study was to investigate a possible association between the accumulation of rare coding variants in the genes for arachidonate 5-lipoxygenase (ALOX5) and ALOX5-activating protein (ALOX5AP), and corresponding production of leukotrienes (LTs) in patients with type 2 diabetes mellitus (T2DM) receiving acetylsalicylic therapy. Twenty exons and corresponding introns of the selected genes were resequenced in 303 DNA samples from patients with T2DM using pooled polymerase chain reaction amplification and next-generation sequencing, using an Illumina HiSeq 2000 sequencing system. The observed non-synonymous variants were further confirmed by individual genotyping of DNA samples comprising of all individuals from the original discovery pools. The association between the investigated phenotypes was based on LTB4 and LTE4 concentrations, and the accumulation of rare missense variants (genetic burden) in investigated genes was evaluated using statistical collapsing tests. A total of 10 exonic variants were identified for each resequenced gene, including 5 missense and 5 synonymous variants. The rare missense variants did not exhibit statistically significant differences in the accumulation pattern between the patients with low and high LTs concentrations. As the present study only included patients with T2DM, it is unclear whether the absence of observed association between the accumulation of rare missense variants in investigated genes and LT production is associated with diabetic populations only or may also be applied to other populations.
Collapse
Affiliation(s)
- MAREK POSTULA
- Department of Experimental and Clinical Pharmacology, Medical University of Warsaw, Center for Preclinical Research and Technology CEPT, Warsaw 02-097, Poland
- Perioperative Genomics Laboratory, Penn State University, College of Medicine, Hershey, PA 17033, USA
| | - PIOTR KAZIMIERZ JANICKI
- Perioperative Genomics Laboratory, Penn State University, College of Medicine, Hershey, PA 17033, USA
| | - MAREK ROSIAK
- Department of Experimental and Clinical Pharmacology, Medical University of Warsaw, Center for Preclinical Research and Technology CEPT, Warsaw 02-097, Poland
- Department of Cardiology and Hypertension, Central Clinical Hospital, The Ministry of the Interior, Warsaw 02-507, Poland
| | - CEREN EYILETEN
- Department of Experimental and Clinical Pharmacology, Medical University of Warsaw, Center for Preclinical Research and Technology CEPT, Warsaw 02-097, Poland
| | - MAŁGORZATA ZAREMBA
- Department of Experimental and Clinical Pharmacology, Medical University of Warsaw, Center for Preclinical Research and Technology CEPT, Warsaw 02-097, Poland
| | | | - SHIGEKAZU SUGINO
- Perioperative Genomics Laboratory, Penn State University, College of Medicine, Hershey, PA 17033, USA
| | - DARIUSZ ARTUR KOSIOR
- Department of Cardiology and Hypertension, Central Clinical Hospital, The Ministry of the Interior, Warsaw 02-507, Poland
- Department of Applied Physiology, Mossakowski Medical Research Centre, Polish Academy of Sciences, Warsaw 02-106, Poland
| | - GRZEGORZ OPOLSKI
- Department of Cardiology, Medical University of Warsaw, Warsaw 02-091, Poland
| | | | - DAGMARA MIROWSKA-GUZEL
- Department of Experimental and Clinical Pharmacology, Medical University of Warsaw, Center for Preclinical Research and Technology CEPT, Warsaw 02-097, Poland
| |
Collapse
|
75
|
Yan Q, Weeks DE, Tiwari HK, Yi N, Zhang K, Gao G, Lin WY, Lou XY, Chen W, Liu N. Rare-Variant Kernel Machine Test for Longitudinal Data from Population and Family Samples. Hum Hered 2016; 80:126-38. [PMID: 27161037 DOI: 10.1159/000445057] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2015] [Accepted: 02/24/2016] [Indexed: 01/12/2023] Open
Abstract
OBJECTIVE The kernel machine (KM) test reportedly performs well in the set-based association test of rare variants. Many studies have been conducted to measure phenotypes at multiple time points, but the standard KM methodology has only been available for phenotypes at a single time point. In addition, family-based designs have been widely used in genetic association studies; therefore, the data analysis method used must appropriately handle familial relatedness. A rare-variant test does not currently exist for longitudinal data from family samples. Therefore, in this paper, we aim to introduce an association test for rare variants, which includes multiple longitudinal phenotype measurements for either population or family samples. METHODS This approach uses KM regression based on the linear mixed model framework and is applicable to longitudinal data from either population (L-KM) or family samples (LF-KM). RESULTS In our population-based simulation studies, L-KM has good control of Type I error rate and increased power in all the scenarios we considered compared with other competing methods. Conversely, in the family-based simulation studies, we found an inflated Type I error rate when L-KM was applied directly to the family samples, whereas LF-KM retained the desired Type I error rate and had the best power performance overall. Finally, we illustrate the utility of our proposed LF-KM approach by analyzing data from an association study between rare variants and blood pressure from the Genetic Analysis Workshop 18 (GAW18). CONCLUSION We propose a method for rare-variant association testing in population and family samples using phenotypes measured at multiple time points for each subject. The proposed method has the best power performance compared to competing approaches in our simulation study.
Collapse
Affiliation(s)
- Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pa., USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
76
|
Abstract
Over the past few years, interest in the identification of rare variants that influence human phenotype has led to the development of many statistical methods for testing for association between sets of rare variants and binary or quantitative traits. Here, I review some of the most important ideas that underlie these methods and the most relevant issues when choosing a method for analysis. In addition to the tests for association, I review crucial issues in performing a rare variant study, from experimental design to interpretation and validation. I also discuss the many challenges of these studies, some of their limitations, and future research directions.
Collapse
Affiliation(s)
- Dan L Nicolae
- Departments of Medicine and Statistics, University of Chicago, Chicago, Illinois 60637;
| |
Collapse
|
77
|
A uniform survey of allele-specific binding and expression over 1000-Genomes-Project individuals. Nat Commun 2016; 7:11101. [PMID: 27089393 PMCID: PMC4837449 DOI: 10.1038/ncomms11101] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2015] [Accepted: 02/19/2016] [Indexed: 02/07/2023] Open
Abstract
Large-scale sequencing in the 1000 Genomes Project has revealed multitudes of single nucleotide variants (SNVs). Here, we provide insights into the functional effect of these variants using allele-specific behaviour. This can be assessed for an individual by mapping ChIP-seq and RNA-seq reads to a personal genome, and then measuring 'allelic imbalances' between the numbers of reads mapped to the paternal and maternal chromosomes. We annotate variants associated with allele-specific binding and expression in 382 individuals by uniformly processing 1,263 functional genomics data sets, developing approaches to reduce the heterogeneity between data sets due to overdispersion and mapping bias. Since many allelic variants are rare, aggregation across multiple individuals is necessary to identify broadly applicable 'allelic elements'. We also found SNVs for which we can anticipate allelic imbalance from the disruption of a binding motif. Our results serve as an allele-specific annotation for the 1000 Genomes variant catalogue and are distributed as an online resource (alleledb.gersteinlab.org).
Collapse
|
78
|
Lin WY. Beyond Rare-Variant Association Testing: Pinpointing Rare Causal Variants in Case-Control Sequencing Study. Sci Rep 2016; 6:21824. [PMID: 26903168 PMCID: PMC4763184 DOI: 10.1038/srep21824] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 02/01/2016] [Indexed: 12/31/2022] Open
Abstract
Rare-variant association testing usually requires some method of aggregation. The next important step is to pinpoint individual rare causal variants among a large number of variants within a genetic region. Recently Ionita-Laza et al. propose a backward elimination (BE) procedure that can identify individual causal variants among the many variants in a gene. The BE procedure removes a variant if excluding this variant can lead to a smaller P-value for the BURDEN test (referred to as "BE-BURDEN") or the SKAT test (referred to as "BE-SKAT"). We here use the adaptive combination of P-values (ADA) method to pinpoint causal variants. Unlike most gene-based association tests, the ADA statistic is built upon per-site P-values of individual variants. It is straightforward to select important variants given the optimal P-value truncation threshold found by ADA. We performed comprehensive simulations to compare ADA with BE-SKAT and BE-BURDEN. Ranking these three approaches according to positive predictive values (PPVs), the percentage of truly causal variants among the total selected variants, we found ADA > BE-SKAT > BE-BURDEN across all simulation scenarios. We therefore recommend using ADA to pinpoint plausible rare causal variants in a gene.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
- Department of Public Health, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
79
|
Abstract
Empirical studies and evolutionary theory support a role for rare variants in the etiology of complex traits. Given this motivation and increasing affordability of whole-exome and whole-genome sequencing, methods for rare variant association have been an active area of research for the past decade. Here, we provide a survey of the current literature and developments from the Genetics Analysis Workshop 19 (GAW19) Collapsing Rare Variants working group. In particular, we present the generalized linear regression framework and associated score statistic for the 2 major types of methods: burden and variance components methods. We further show that by simply modifying weights within these frameworks we arrive at many of the popular existing methods, for example, the cohort allelic sums test and sequence kernel association test. Meta-analysis techniques are also described. Next, we describe the 6 contributions from the GAW19 Collapsing Rare Variants working group. These included development of new methods, such as a retrospective likelihood for family data, a method using genomic structure to compare cases and controls, a haplotype-based meta-analysis, and a permutation-based method for combining different statistical tests. In addition, one contribution compared a mega-analysis of family-based and population-based data to meta-analysis. Finally, the power of existing family-based methods for binary traits was compared. We conclude with suggestions for open research questions.
Collapse
Affiliation(s)
- Stephanie A Santorico
- Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, 80217-3364, USA.
| | - Audrey E Hendricks
- Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, 80217-3364, USA.
| |
Collapse
|
80
|
Meta-analysis of Complex Diseases at Gene Level with Generalized Functional Linear Models. Genetics 2015; 202:457-70. [PMID: 26715663 DOI: 10.1534/genetics.115.180869] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2015] [Accepted: 12/09/2015] [Indexed: 11/18/2022] Open
Abstract
We developed generalized functional linear models (GFLMs) to perform a meta-analysis of multiple case-control studies to evaluate the relationship of genetic data to dichotomous traits adjusting for covariates. Unlike the previously developed meta-analysis for sequence kernel association tests (MetaSKATs), which are based on mixed-effect models to make the contributions of major gene loci random, GFLMs are fixed models; i.e., genetic effects of multiple genetic variants are fixed. Based on GFLMs, we developed chi-squared-distributed Rao's efficient score test and likelihood-ratio test (LRT) statistics to test for an association between a complex dichotomous trait and multiple genetic variants. We then performed extensive simulations to evaluate the empirical type I error rates and power performance of the proposed tests. The Rao's efficient score test statistics of GFLMs are very conservative and have higher power than MetaSKATs when some causal variants are rare and some are common. When the causal variants are all rare [i.e., minor allele frequencies (MAF) < 0.03], the Rao's efficient score test statistics have similar or slightly lower power than MetaSKATs. The LRT statistics generate accurate type I error rates for homogeneous genetic-effect models and may inflate type I error rates for heterogeneous genetic-effect models owing to the large numbers of degrees of freedom and have similar or slightly higher power than the Rao's efficient score test statistics. GFLMs were applied to analyze genetic data of 22 gene regions of type 2 diabetes data from a meta-analysis of eight European studies and detected significant association for 18 genes (P < 3.10 × 10(-6)), tentative association for 2 genes (HHEX and HMGA2; P ≈ 10(-5)), and no association for 2 genes, while MetaSKATs detected none. In addition, the traditional additive-effect model detects association at gene HHEX. GFLMs and related tests can analyze rare or common variants or a combination of the two and can be useful in whole-genome and whole-exome association studies.
Collapse
|
81
|
Postula M, Janicki PK, Eyileten C, Rosiak M, Kaplon-Cieslicka A, Sugino S, Wilimski R, Kosior DA, Opolski G, Filipiak KJ, Mirowska-Guzel D. Next-generation re-sequencing of genes involved in increased platelet reactivity in diabetic patients on acetylsalicylic acid. Platelets 2015; 27:357-64. [PMID: 26599574 DOI: 10.3109/09537104.2015.1109071] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The objective of this study was to investigate whether rare missense genetic variants in several genes related to platelet functions and acetylsalicylic acid (ASA) response are associated with the platelet reactivity in patients with diabetes type 2 (T2D) on ASA therapy. Fifty eight exons and corresponding introns of eight selected genes, including PTGS1, PTGS2, TXBAS1, PTGIS, ADRA2A, ADRA2B, TXBA2R, and P2RY1 were re-sequenced in 230 DNA samples from T2D patients by using a pooled PCR amplification and next-generation sequencing by Illumina HiSeq2000. The observed non-synonymous variants were confirmed by individual genotyping of 384 DNA samples comprising of the individuals from the original discovery pools and additional verification cohort of 154 ASA-treated T2DM patients. The association between investigated phenotypes (ASA induced changes in platelets reactivity by PFA-100, VerifyNow and serum thromboxane B2 level [sTxB2]), and accumulation of rare missense variants (genetic burden) in investigated genes was tested using statistical collapsing tests. We identified a total of 35 exonic variants, including 3 common missense variants, 15 rare missense variants, and 17 synonymous variants in 8 investigated genes. The rare missense variants exhibited statistically significant difference in the accumulation pattern between a group of patients with increased and normal platelet reactivity based on PFA-100 assay. Our study suggests that genetic burden of the rare functional variants in eight genes may contribute to differences in the platelet reactivity measured with the PFA-100 assay in the T2DM patients treated with ASA.
Collapse
Affiliation(s)
- Marek Postula
- a Department of Experimental and Clinical Pharmacology , Medical University of Warsaw, Center for Preclinical Research and Technology CEPT , Warsaw , Poland.,b Perioperative Genomics Laboratory , Penn State College of Medicine , Hershey , PA , USA
| | - Piotr K Janicki
- b Perioperative Genomics Laboratory , Penn State College of Medicine , Hershey , PA , USA
| | - Ceren Eyileten
- a Department of Experimental and Clinical Pharmacology , Medical University of Warsaw, Center for Preclinical Research and Technology CEPT , Warsaw , Poland
| | - Marek Rosiak
- a Department of Experimental and Clinical Pharmacology , Medical University of Warsaw, Center for Preclinical Research and Technology CEPT , Warsaw , Poland.,c Department of Cardiology and Hypertension , Central Clinical Hospital, The Ministry of the Interior , Warsaw , Poland
| | | | - Shigekazu Sugino
- b Perioperative Genomics Laboratory , Penn State College of Medicine , Hershey , PA , USA
| | - Radosław Wilimski
- e Department of Cardiac Surgery , Medical University of Warsaw , Warsaw , Poland
| | - Dariusz A Kosior
- c Department of Cardiology and Hypertension , Central Clinical Hospital, The Ministry of the Interior , Warsaw , Poland.,f Department of Applied Physiology , Mossakowski Medical Research Centre, Polish Academy of Sciences , Warsaw , Poland
| | - Grzegorz Opolski
- d Department of Cardiology , Medical University of Warsaw , Warsaw , Poland
| | | | - Dagmara Mirowska-Guzel
- a Department of Experimental and Clinical Pharmacology , Medical University of Warsaw, Center for Preclinical Research and Technology CEPT , Warsaw , Poland
| |
Collapse
|
82
|
Associating Multivariate Quantitative Phenotypes with Genetic Variants in Family Samples with a Novel Kernel Machine Regression Method. Genetics 2015; 201:1329-39. [PMID: 26482791 DOI: 10.1534/genetics.115.178590] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 10/04/2015] [Indexed: 11/18/2022] Open
Abstract
The recent development of sequencing technology allows identification of association between the whole spectrum of genetic variants and complex diseases. Over the past few years, a number of association tests for rare variants have been developed. Jointly testing for association between genetic variants and multiple correlated phenotypes may increase the power to detect causal genes in family-based studies, but familial correlation needs to be appropriately handled to avoid an inflated type I error rate. Here we propose a novel approach for multivariate family data using kernel machine regression (denoted as MF-KM) that is based on a linear mixed-model framework and can be applied to a large range of studies with different types of traits. In our simulation studies, the usual kernel machine test has inflated type I error rates when applied directly to familial data, while our proposed MF-KM method preserves the expected type I error rates. Moreover, the MF-KM method has increased power compared to methods that either analyze each phenotype separately while considering family structure or use only unrelated founders from the families. Finally, we illustrate our proposed methodology by analyzing whole-genome genotyping data from a lung function study.
Collapse
|
83
|
Kim S, Lee K, Sun H. Statistical selection strategy for risk and protective rare variants associated with complex traits. J Comput Biol 2015; 22:1034-43. [PMID: 26469994 DOI: 10.1089/cmb.2015.0091] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
In genetic association studies with deep sequencing data, it is a challenging statistical problem to precisely locate rare variants associated with complex diseases or traits due to the limited number of observed genetic mutations. In particular, both risk and protective rare variants can be present in the same gene or genetic region. There currently exist very few statistical methods to separate casual rare variants from noncausal variants within a disease/trait-related gene or a genetic region, while there are relatively many statistical tests to detect a phenotypic association of a group of rare variants such as a gene or a genetic region. In this article, we propose a new statistical selection strategy that is able to locate causal rare variants within the disease/trait-related gene or a genetic region. The proposed procedure is to linearly combine potential risk and protective variants in order to find the optimal combination of rare variants that can have the strongest association signal. It is also computationally very efficient since the procedure is based on forward selection. In simulation studies we demonstrate that the selection performance of the proposed procedure is more powerful than other existing methods when both risk and protective variants are present. We also applied it to the real sequencing data on the ANGPTL gene family from the Dallas Heart Study.
Collapse
Affiliation(s)
- Sera Kim
- Department of Statistics, Pusan National University , Busan, Korea
| | - Kyeongjun Lee
- Department of Statistics, Pusan National University , Busan, Korea
| | - Hokeun Sun
- Department of Statistics, Pusan National University , Busan, Korea
| |
Collapse
|
84
|
Schmidt EM, Willer CJ. Insights into blood lipids from rare variant discovery. Curr Opin Genet Dev 2015; 33:25-31. [PMID: 26241468 DOI: 10.1016/j.gde.2015.06.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Revised: 06/19/2015] [Accepted: 06/22/2015] [Indexed: 12/18/2022]
Abstract
Large-scale genome wide screens have discovered over 160 common variants associated with plasma lipids, which are risk factors often linked to heart disease. A large fraction of lipid heritability remains unexplained, and it is hypothesized that rare variants of functional consequence may account for some of the missing heritability. Finding lipid-associated variants that occur less frequently in the human population poses a challenge, primarily due to lack of power and difficulties to identify and test them. Interrogation of the protein-coding regions of the genome using array and sequencing techniques has led to important discoveries of rare variants that affect lipid levels and related disease risk. Here, we summarize the latest methods and findings that contribute to our current understanding of rare variant lipid genetics.
Collapse
Affiliation(s)
- Ellen M Schmidt
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Cristen J Willer
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
85
|
Fouladi R, Bessonov K, Van Lishout F, Van Steen K. Model-Based Multifactor Dimensionality Reduction for Rare Variant Association Analysis. Hum Hered 2015. [PMID: 26201701 DOI: 10.1159/000381286] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Genome-wide association studies have revealed a vast amount of common loci associated to human complex diseases. Still, a large proportion of heritability remains unexplained. The extent to which rare genetic variants (RVs) are able to explain a relevant portion of the genetic heritability for complex traits leaves room for several debates and paves the way to the collection of RV databases and the development of novel analytic tools to analyze these. To date, several statistical methods have been proposed to uncover the association of RVs with complex diseases, but none of them is the clear winner in all possible scenarios of study design and assumed underlying disease model. The latter may involve differences in the distributions of effect sizes, proportions of causal variants, and ratios of protective to deleterious variants at distinct regions throughout the genome. Therefore, there is a need for robust scalable methods with acceptable overall performance in terms of power and type I error under various realistic scenarios. In this paper, we propose a novel RV association analysis strategy, which satisfies several of the desired properties that a RV analysis tool should exhibit.
Collapse
Affiliation(s)
- Ramouna Fouladi
- Systems and Modeling Unit, Montefiore Institute, and Bioinformatics and Modeling, GIGA-R, University of Liège, Liège, Belgium
| | | | | | | |
Collapse
|
86
|
Svishcheva GR, Belonogova NM, Axenovich TI. Region-Based Association Test for Familial Data under Functional Linear Models. PLoS One 2015; 10:e0128999. [PMID: 26111046 PMCID: PMC4481467 DOI: 10.1371/journal.pone.0128999] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 05/04/2015] [Indexed: 12/22/2022] Open
Abstract
Region-based association analysis is a more powerful tool for gene mapping than testing of individual genetic variants, particularly for rare genetic variants. The most powerful methods for regional mapping are based on the functional data analysis approach, which assumes that the regional genome of an individual may be considered as a continuous stochastic function that contains information about both linkage and linkage disequilibrium. Here, we extend this powerful approach, earlier applied only to independent samples, to the samples of related individuals. To this end, we additionally include a random polygene effects in functional linear model used for testing association between quantitative traits and multiple genetic variants in the region. We compare the statistical power of different methods using Genetic Analysis Workshop 17 mini-exome family data and a wide range of simulation scenarios. Our method increases the power of regional association analysis of quantitative traits compared with burden-based and kernel-based methods for the majority of the scenarios. In addition, we estimate the statistical power of our method using regions with small number of genetic variants, and show that our method retains its advantage over burden-based and kernel-based methods in this case as well. The new method is implemented as the R-function 'famFLM' using two types of basis functions: the B-spline and Fourier bases. We compare the properties of the new method using models that differ from each other in the type of their function basis. The models based on the Fourier basis functions have an advantage in terms of speed and power over the models that use the B-spline basis functions and those that combine B-spline and Fourier basis functions. The 'famFLM' function is distributed under GPLv3 license and is freely available at http://mga.bionet.nsc.ru/soft/famFLM/.
Collapse
Affiliation(s)
- Gulnara R. Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Nadezhda M. Belonogova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Tatiana I. Axenovich
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
87
|
Zeng P, Wang T. Detecting the Genomic Signature of Divergent Selection in Presence of Gene Flow. Curr Genomics 2015; 16:203-12. [PMID: 26069460 PMCID: PMC4460224 DOI: 10.2174/1389202916666150313230943] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2014] [Revised: 02/23/2015] [Accepted: 03/09/2015] [Indexed: 11/22/2022] Open
Abstract
In this paper the detection of rare variants association with continuous phenotypes of interest is investigated via the likelihood-ratio based variance component test under the framework of linear mixed models. The hypothesis testing is challenging and nonstandard, since under the null the variance component is located on the boundary of its parameter space. In this situation the usual asymptotic chisquare distribution of the likelihood ratio statistic does not necessarily hold. To circumvent the derivation of the null distribution we resort to the bootstrap method due to its generic applicability and being easy to implement. Both parametric and nonparametric bootstrap likelihood ratio tests are studied. Numerical studies are implemented to evaluate the performance of the proposed bootstrap likelihood ratio test and compare to some existing methods for the identification of rare variants. To reduce the computational time of the bootstrap likelihood ratio test we propose an effective approximation mixture for the bootstrap null distribution. The GAW17 data is used to illustrate the proposed test.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, and Center of Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, 221004, P. R. China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, 221004, P. R. China
| |
Collapse
|
88
|
Wang Y, Liu A, Mills JL, Boehnke M, Wilson AF, Bailey-Wilson JE, Xiong M, Wu CO, Fan R. Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models. Genet Epidemiol 2015; 39:259-75. [PMID: 25809955 PMCID: PMC4443751 DOI: 10.1002/gepi.21895] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 01/28/2015] [Accepted: 01/28/2015] [Indexed: 10/23/2022]
Abstract
In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case.
Collapse
Affiliation(s)
- Yifan Wang
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Aiyi Liu
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - James L. Mills
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Michael Boehnke
- Department of Biostatistics, School of Public Health, The University of Michigan, Ann Arbor, Michigan, United States of America
| | - Alexander F. Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Joan E. Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Momiao Xiong
- Human Genetics Center, University of Texas - Houston, Houston, Texas, United States of America
| | - Colin O. Wu
- Office of Biostatistics Research, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Ruzong Fan
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
89
|
Wang Q, Lu Q, Zhao H. A review of study designs and statistical methods for genomic epidemiology studies using next generation sequencing. Front Genet 2015; 6:149. [PMID: 25941534 PMCID: PMC4403555 DOI: 10.3389/fgene.2015.00149] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2015] [Accepted: 03/30/2015] [Indexed: 12/22/2022] Open
Abstract
Results from numerous linkage and association studies have greatly deepened scientists’ understanding of the genetic basis of many human diseases, yet some important questions remain unanswered. For example, although a large number of disease-associated loci have been identified from genome-wide association studies in the past 10 years, it is challenging to interpret these results as most disease-associated markers have no clear functional roles in disease etiology, and all the identified genomic factors only explain a small portion of disease heritability. With the help of next-generation sequencing (NGS), diverse types of genomic and epigenetic variations can be detected with high accuracy. More importantly, instead of using linkage disequilibrium to detect association signals based on a set of pre-set probes, NGS allows researchers to directly study all the variants in each individual, therefore promises opportunities for identifying functional variants and a more comprehensive dissection of disease heritability. Although the current scale of NGS studies is still limited due to the high cost, the success of several recent studies suggests the great potential for applying NGS in genomic epidemiology, especially as the cost of sequencing continues to drop. In this review, we discuss several pioneer applications of NGS, summarize scientific discoveries for rare and complex diseases, and compare various study designs including targeted sequencing and whole-genome sequencing using population-based and family-based cohorts. Finally, we highlight recent advancements in statistical methods proposed for sequencing analysis, including group-based association tests, meta-analysis techniques, and annotation tools for variant prioritization.
Collapse
Affiliation(s)
- Qian Wang
- Program of Computational Biology and Bioinformatics, Yale University New Haven, CT, USA
| | - Qiongshi Lu
- Department of Biostatistics, Yale School of Public Health New Haven, CT, USA
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University New Haven, CT, USA ; Department of Biostatistics, Yale School of Public Health New Haven, CT, USA ; Veterans Affairs Cooperative Studies Program Coordinating Center West Haven, CT, USA
| |
Collapse
|
90
|
Kao CF, Liu JR, Hung H, Kuo PH. A robust GWSS method to simultaneously detect rare and common variants for complex disease. PLoS One 2015; 10:e0120873. [PMID: 25880329 PMCID: PMC4399906 DOI: 10.1371/journal.pone.0120873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Accepted: 01/26/2015] [Indexed: 11/19/2022] Open
Abstract
The rapid advances in sequencing technologies and the resulting next-generation sequencing data provide the opportunity to detect disease-associated variants with a better solution, in particular for low-frequency variants. Although both common and rare variants might exert their independent effects on the risk for the trait of interest, previous methods to detect the association effects rarely consider them simultaneously. We proposed a class of test statistics, the generalized weighted-sum statistic (GWSS), to detect disease associations in the presence of common and rare variants with a case-control study design. Information of rare variants was aggregated using a weighted sum method, while signal directions and strength of the variants were considered at the same time. Permutations were performed to obtain the empirical p-values of the test statistics. Our simulation showed that, compared to the existing methods, the GWSS method had better performance in most of the scenarios. The GWSS (in particular VDWSS-t) method is particularly robust for opposite association directions, association strength, and varying distributions of minor-allele frequencies. It is therefore promising for detecting disease-associated loci. For empirical data application, we also applied our GWSS method to the Genetic Analysis Workshop 17 data, and the results were consistent with the simulation, suggesting good performance of our method. As re-sequencing studies become more popular to identify putative disease loci, we recommend the use of this newly developed GWSS to detect associations with both common and rare variants.
Collapse
Affiliation(s)
- Chung-Feng Kao
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Jia-Rou Liu
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- Department of Public Health, Chang Gung University, Taoyuan,Taiwan
| | - Hung Hung
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- Research Center for Genes, Environment and Human Health, National Taiwan University, Taipei, Taiwan
- * E-mail: (PHK); (HH)
| | - Po-Hsiu Kuo
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- Research Center for Genes, Environment and Human Health, National Taiwan University, Taipei, Taiwan
- * E-mail: (PHK); (HH)
| |
Collapse
|
91
|
Zeng P, Zhao Y, Liu J, Liu L, Zhang L, Wang T, Huang S, Chen F. Likelihood ratio tests in rare variant detection for continuous phenotypes. Ann Hum Genet 2015; 78:320-32. [PMID: 25117149 DOI: 10.1111/ahg.12071] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2013] [Accepted: 04/22/2014] [Indexed: 12/30/2022]
Abstract
It is believed that rare variants play an important role in human phenotypes; however, the detection of rare variants is extremely challenging due to their very low minor allele frequency. In this paper, the likelihood ratio test (LRT) and restricted likelihood ratio test (ReLRT) are proposed to test the association of rare variants based on the linear mixed effects model, where a group of rare variants are treated as random effects. Like the sequence kernel association test (SKAT), a state-of-the-art method for rare variant detection, LRT and ReLRT can effectively overcome the problem of directionality of effect inherent in the burden test in practice. By taking full advantage of the spectral decomposition, exact finite sample null distributions for LRT and ReLRT are obtained by simulation. We perform extensive numerical studies to evaluate the performance of LRT and ReLRT, and compare to the burden test, SKAT and SKAT-O. The simulations have shown that LRT and ReLRT can correctly control the type I error, and the controls are robust to the weights chosen and the number of rare variants under study. LRT and ReLRT behave similarly to the burden test when all the causal rare variants share the same direction of effect, and outperform SKAT across various situations. When both positive and negative effects exist, LRT and ReLRT suffer from few power reductions compared to the other two competing methods; under this case, an additional finding from our simulations is that SKAT-O is no longer the optimal test, and its power is even lower than that of SKAT. The exome sequencing SNP data from Genetic Analysis Workshop 17 were employed to illustrate the proposed methods, and interesting results are described.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, 211166, P. R. China; Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, 221004, P. R. China
| | | | | | | | | | | | | | | |
Collapse
|
92
|
Yan Q, Tiwari HK, Yi N, Gao G, Zhang K, Lin WY, Lou XY, Cui X, Liu N. A Sequence Kernel Association Test for Dichotomous Traits in Family Samples under a Generalized Linear Mixed Model. Hum Hered 2015; 79:60-8. [PMID: 25791389 DOI: 10.1159/000375409] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 01/21/2015] [Indexed: 01/15/2023] Open
Abstract
OBJECTIVE The existing methods for identifying multiple rare variants underlying complex diseases in family samples are underpowered. Therefore, we aim to develop a new set-based method for an association study of dichotomous traits in family samples. METHODS We introduce a framework for testing the association of genetic variants with diseases in family samples based on a generalized linear mixed model. Our proposed method is based on a kernel machine regression and can be viewed as an extension of the sequence kernel association test (SKAT and famSKAT) for application to family data with dichotomous traits (F-SKAT). RESULTS Our simulation studies show that the original SKAT has inflated type I error rates when applied directly to family data. By contrast, our proposed F-SKAT has the correct type I error rate. Furthermore, in all of the considered scenarios, F-SKAT, which uses all family data, has higher power than both SKAT, which uses only unrelated individuals from the family data, and another method, which uses all family data. CONCLUSION We propose a set-based association test that can be used to analyze family data with dichotomous phenotypes while handling genetic variants with the same or opposite directions of effects as well as any types of family relationships.
Collapse
Affiliation(s)
- Qi Yan
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Ala., USA
| | | | | | | | | | | | | | | | | |
Collapse
|
93
|
Wang X, Zhang S, Li Y, Li M, Sha Q. A powerful approach to test an optimally weighted combination of rare variants in admixed populations. Genet Epidemiol 2015; 39:294-305. [PMID: 25758547 DOI: 10.1002/gepi.21894] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Revised: 01/09/2015] [Accepted: 01/26/2015] [Indexed: 11/09/2022]
Abstract
Population stratification has long been recognized as an issue in genetic association studies because unrecognized population stratification can lead to both false-positive and false-negative findings and can obscure true association signals if not appropriately corrected. This issue can be even worse in rare variant association analyses because rare variants often demonstrate stronger and potentially different patterns of stratification than common variants. To correct for population stratification in genetic association studies, we proposed a novel method to Test the effect of an Optimally Weighted combination of variants in Admixed populations (TOWA) in which the analytically derived optimal weights can be calculated from existing phenotype and genotype data. TOWA up weights rare variants and those variants that have strong associations with the phenotype. Additionally, it can adjust for the direction of the association, and allows for local ancestry difference among study subjects. Extensive simulations show that the type I error rate of TOWA is under control in the presence of population stratification and it is more powerful than existing methods. We have also applied TOWA to a real sequencing data. Our simulation studies as well as real data analysis results indicate that TOWA is a useful tool for rare variant association analyses in admixed populations.
Collapse
Affiliation(s)
- Xuexia Wang
- Joseph J. Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, United States of America
| | | | | | | | | |
Collapse
|
94
|
Cheng KF, Lee JY. Detecting disease association signals with multiple genetic variants and covariates. Stat Methods Med Res 2015; 26:1281-1294. [DOI: 10.1177/0962280215574541] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Due to the improvements in the efficiency of resequencing technologies, discoveries and analyses of rare variants in sequencing-based association studies at the gene level, or even exome-wide are becoming increasingly feasible. Powerful association tests have been suggested in literature for testing whether a group of variants in a gene region is associated with a particular disease of interest. Their performance depends on the correct assumption of regression model and conditions such as the size of the case and control sample, numbers of causal and noncausal variants (rare or common), variant frequency, effect size and directionality, rate of missing genotype, etc. Most of these model-based tests require genotype data to be complete at each variant. Our previous results showed that in the case of no covariate, the power of these tests might be greatly influenced, when there were missing genotypes and only simple imputation was used. In this paper, we demonstrate by simulations that in the presence of covariates, the type I errors of these approaches might be inflated, even when genotype missing rate was very small. We present an association test based on testing zero proportion of causal variants in the gene region, and show this test to be almost uniformly most powerful among the competing tests under very general simulation conditions. This test does not require genotype to be complete and hence is robust against missing genotype. We discuss how to adjust for population stratification based on principal components and show the power loss of this approach was small when the population stratification effect was moderate. We use a Shanghai Breast Cancer Study to demonstrate application of the tests and show the proposed test is more powerful in detecting variants related to breast cancer, and robust against the inclusion of noncausal variants.
Collapse
Affiliation(s)
- KF Cheng
- Biostatistics Center and School of Public Health, Taipei Medical University, Taipei, Taiwan
| | - JY Lee
- Biostatistics Center and School of Public Health, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
95
|
Garner C. Confounded by sequencing depth in association studies of rare alleles. Genet Epidemiol 2015; 35:261-8. [PMID: 21328616 DOI: 10.1002/gepi.20574] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Accepted: 01/12/2011] [Indexed: 11/12/2022]
Abstract
Next-generation DNA sequencing technologies are facilitating large-scale association studies of rare genetic variants. The depth of the sequence read coverage is an important experimental variable in the next-generation technologies and it is a major determinant of the quality of genotype calls generated from sequence data. When case and control samples are sequenced separately or in different proportions across batches, they are unlikely to be matched on sequencing read depth and a differential misclassification of genotypes can result, causing confounding and an increased false-positive rate. Data from Pilot Study 3 of the 1000 Genomes project was used to demonstrate that a difference between the mean sequencing read depth of case and control samples can result in false-positive association for rare and uncommon variants, even when the mean coverage depth exceeds 30× in both groups. The degree of the confounding and inflation in the false-positive rate depended on the extent to which the mean depth was different in the case and control groups. A logistic regression model was used to test for association between case-control status and the cumulative number of alleles in a collapsed set of rare and uncommon variants. Including each individual's mean sequence read depth across the variant sites in the logistic regression model nearly eliminated the confounding effect and the inflated false-positive rate. Furthermore, accounting for the potential error by modeling the probability of the heterozygote genotype calls in the regression analysis had a relatively minor but beneficial effect on the statistical results.
Collapse
Affiliation(s)
- Chad Garner
- Department of Epidemiology, University of California, Irvine, CA 92697-3905, USA.
| |
Collapse
|
96
|
Chen R, Wei Q, Zhan X, Zhong X, Sutcliffe JS, Cox NJ, Cook EH, Li C, Chen W, Li B. A haplotype-based framework for group-wise transmission/disequilibrium tests for rare variant association analysis. ACTA ACUST UNITED AC 2015; 31:1452-9. [PMID: 25568282 DOI: 10.1093/bioinformatics/btu860] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 12/23/2014] [Indexed: 12/30/2022]
Abstract
MOTIVATION A major focus of current sequencing studies for human genetics is to identify rare variants associated with complex diseases. Aside from reduced power of detecting associated rare variants, controlling for population stratification is particularly challenging for rare variants. Transmission/disequilibrium tests (TDT) based on family designs are robust to population stratification and admixture, and therefore provide an effective approach to rare variant association studies to eliminate spurious associations. To increase power of rare variant association analysis, gene-based collapsing methods become standard approaches for analyzing rare variants. Existing methods that extend this strategy to rare variants in families usually combine TDT statistics at individual variants and therefore lack the flexibility of incorporating other genetic models. RESULTS In this study, we describe a haplotype-based framework for group-wise TDT (gTDT) that is flexible to encompass a variety of genetic models such as additive, dominant and compound heterozygous (CH) (i.e. recessive) models as well as other complex interactions. Unlike existing methods, gTDT constructs haplotypes by transmission when possible and inherently takes into account the linkage disequilibrium among variants. Through extensive simulations we showed that type I error was correctly controlled for rare variants under all models investigated, and this remained true in the presence of population stratification. Under a variety of genetic models, gTDT showed increased power compared with the single marker TDT. Application of gTDT to an autism exome sequencing data of 118 trios identified potentially interesting candidate genes with CH rare variants. AVAILABILITY AND IMPLEMENTATION We implemented gTDT in C++ and the source code and the detailed usage are available on the authors' website (https://medschool.vanderbilt.edu/cgg). CONTACT bingshan.li@vanderbilt.edu or wei.chen@chp.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rui Chen
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Qiang Wei
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xiaowei Zhan
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xue Zhong
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - James S Sutcliffe
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nancy J Cox
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Edwin H Cook
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Chun Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei Chen
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
97
|
Urrutia E, Lee S, Maity A, Zhao N, Shen J, Li Y, Wu MC. Rare variant testing across methods and thresholds using the multi-kernel sequence kernel association test (MK-SKAT). STATISTICS AND ITS INTERFACE 2015; 8:495-505. [PMID: 26740853 PMCID: PMC4698916 DOI: 10.4310/sii.2015.v8.n4.a8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Analysis of rare genetic variants has focused on region-based analysis wherein a subset of the variants within a genomic region is tested for association with a complex trait. Two important practical challenges have emerged. First, it is difficult to choose which test to use. Second, it is unclear which group of variants within a region should be tested. Both depend on the unknown true state of nature. Therefore, we develop the Multi-Kernel SKAT (MK-SKAT) which tests across a range of rare variant tests and groupings. Specifically, we demonstrate that several popular rare variant tests are special cases of the sequence kernel association test which compares pair-wise similarity in trait value to similarity in the rare variant genotypes between subjects as measured through a kernel function. Choosing a particular test is equivalent to choosing a kernel. Similarly, choosing which group of variants to test also reduces to choosing a kernel. Thus, MK-SKAT uses perturbation to test across a range of kernels. Simulations and real data analyses show that our framework controls type I error while maintaining high power across settings: MK-SKAT loses power when compared to the kernel for a particular scenario but has much greater power than poor choices.
Collapse
Affiliation(s)
- Eugene Urrutia
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Seunggeun Lee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48105, USA
| | - Arnab Maity
- Department of Statistics, North Carolina State University, 2311 Stinson Drive, Raleigh, NC 27695, USA
| | - Ni Zhao
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
| | - Judong Shen
- Quantitative Sciences, R&D, GlaxoSmithKline, 5 Moore Drive, Research Triangle Park, NC 27709, USA
| | - Yun Li
- Department of Genetics and Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael C. Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
| |
Collapse
|
98
|
Lin WY. Adaptive combination of P-values for family-based association testing with sequence data. PLoS One 2014; 9:e115971. [PMID: 25541952 PMCID: PMC4277421 DOI: 10.1371/journal.pone.0115971] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Accepted: 12/01/2014] [Indexed: 12/24/2022] Open
Abstract
Family-based study design will play a key role in identifying rare causal variants, because rare causal variants can be enriched in families with multiple affected subjects. Furthermore, different from population-based studies, family studies are robust to bias induced by population substructure. It is well known that rare causal variants are difficult to detect from single-locus tests. Therefore, burden tests and non-burden tests have been developed, by combining signals of multiple variants in a chromosomal region or a functional unit. This inevitably incorporates some neutral variants into the test statistics, which can dilute the power of statistical methods. To guard against the noise caused by neutral variants, we here propose an 'adaptive combination of P-values method' (abbreviated as 'ADA'). This method combines per-site P-values of variants that are more likely to be causal. Variants with large P-values (which are more likely to be neutral variants) are discarded from the combined statistic. In addition to performing extensive simulation studies, we applied these tests to the Genetic Analysis Workshop 17 data sets, where real sequence data were generated according to the 1000 Genomes Project. Compared with some existing methods, ADA is more robust to the inclusion of neutral variants. This is a merit especially when dichotomous traits are analyzed. However, there are some limitations for ADA. First, it is more computationally intensive. Second, pedigree structures and founders' sequence data are required for the permutation procedure. Third, unrelated controls cannot be included. We here show that, for family-based studies, the application of ADA is limited to dichotomous trait analyses with full pedigree information.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
99
|
Xuan J, Yang L, Wu Z. Higher criticism approach to detect rare variants using whole genome sequencing data. BMC Proc 2014; 8:S14. [PMID: 25519367 PMCID: PMC4145405 DOI: 10.1186/1753-6561-8-s1-s14] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Because of low statistical power of single-variant tests for whole genome sequencing (WGS) data, the association test for variant groups is a key approach for genetic mapping. To address the features of sparse and weak genetic effects to be detected, the higher criticism (HC) approach has been proposed and theoretically has proven optimal for detecting sparse and weak genetic effects. Here we develop a strategy to apply the HC approach to WGS data that contains rare variants as the majority. By using Genetic Analysis Workshop 18 "dose" genetic data with simulated phenotypes, we assess the performance of HC under a variety of strategies for grouping variants and collapsing rare variants. The HC approach is compared with the minimal p-value method and the sequence kernel association test. The results show that the HC approach is preferred for detecting weak genetic effects.
Collapse
Affiliation(s)
- Jing Xuan
- Department of Mathematical Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609-2280, USA
| | - Li Yang
- Department of Mathematical Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609-2280, USA
| | - Zheyang Wu
- Department of Mathematical Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609-2280, USA
| |
Collapse
|
100
|
Abstract
Rare variants may, in part, explain some of the hereditability missing in current genome-wide association studies. Many gene-based rare-variant analysis approaches proposed in recent years are aimed at population-based samples, although analysis strategies for family-based samples are clearly warranted since the family-based design has the potential to enhance our ability to enrich for rare causal variants. We have recently developed the generalized least squares, sequence kernel association test, or GLS-SKAT, approach for the rare-variant analyses in family samples, in which the kinship matrix that was computed from the high dimension genetic data was used to decorrelate the family structure. We then applied the SKAT-O approach for gene-/region-based inference in the decorrelated data. In this study, we applied this GLS-SKAT method to the systolic blood pressure data in the simulated family sample distributed by the Genetic Analysis Workshop 18. We compared the GLS-SKAT approach to the rare-variant analysis approach implemented in family-based association test-v1 and demonstrated that the GLS-SKAT approach provides superior power and good control of type I error rate.
Collapse
Affiliation(s)
- Dalin Li
- Medical Genetics Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA ; David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Jerome I Rotter
- David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA ; Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute and Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Xiuqing Guo
- David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA ; Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute and Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| |
Collapse
|