1
|
Chien LC. Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods. Int J Biostat 2023; 0:ijb-2022-0123. [PMID: 37743670 DOI: 10.1515/ijb-2022-0123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 07/28/2023] [Indexed: 09/26/2023]
Abstract
In genome-wide association studies (GWAS), logistic regression is one of the most popular analytics methods for binary traits. Multinomial regression is an extension of binary logistic regression that allows for multiple categories. However, many GWAS methods have been limited application to binary traits. These methods have improperly often been used to account for ordinal traits, which causes inappropriate type I error rates and poor statistical power. Owing to the lack of analysis methods, GWAS of ordinal traits has been known to be problematic and gaining attention. In this paper, we develop a general framework for identifying ordinal traits associated with genetic variants in pedigree-structured samples by collapsing and kernel methods. We use the local odds ratios GEE technology to account for complicated correlation structures between family members and ordered categorical traits. We use the retrospective idea to treat the genetic markers as random variables for calculating genetic correlations among markers. The proposed genetic association method can accommodate ordinal traits and allow for the covariate adjustment. We conduct simulation studies to compare the proposed tests with the existing models for analyzing the ordered categorical data under various configurations. We illustrate application of the proposed tests by simultaneously analyzing a family study and a cross-sectional study from the Genetic Analysis Workshop 19 (GAW19) data.
Collapse
Affiliation(s)
- Li-Chu Chien
- Center for Fundamental Science, Kaohsiung Medical University, Kaohsiung, Taiwan, ROC
| |
Collapse
|
2
|
Riggs K, Chen HS, Rotunno M, Li B, Simonds NI, Mechanic LE, Peng B. On the application, reporting, and sharing of in silico simulations for genetic studies. Genet Epidemiol 2020; 45:131-141. [PMID: 33063887 PMCID: PMC7984380 DOI: 10.1002/gepi.22362] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 09/11/2020] [Accepted: 09/14/2020] [Indexed: 12/31/2022]
Abstract
In silico simulations play an indispensable role in the development and application of statistical models and methods for genetic studies. Simulation tools allow for the evaluation of methods and investigation of models in a controlled manner. With the growing popularity of evolutionary models and simulation‐based statistical methods, genetic simulations have been applied to a wide variety of research disciplines such as population genetics, evolutionary genetics, genetic epidemiology, ecology, and conservation biology. In this review, we surveyed 1409 articles from five journals that publish on major application areas of genetic simulations. We identified 432 papers in which genetic simulations were used and examined the targets and applications of simulation studies and how these simulation methods and simulated data sets are reported and shared. Whereas a large proportion (30%) of the surveyed articles reported the use of genetic simulations, only 28% of these genetic simulation studies used existing simulation software, 2% used existing simulated data sets, and 19% and 12% made source code and simulated data sets publicly available, respectively. Moreover, 15% of articles provided no information on how simulation studies were performed. These findings suggest a need to encourage sharing and reuse of existing simulation software and data sets, as well as providing more information regarding the performance of simulations.
Collapse
Affiliation(s)
- Kaleigh Riggs
- Department of Statistics, Rice University, Houston, Texas, USA
| | - Huann-Sheng Chen
- Division of Cancer Control and Population Sciences, Statistical Research and Applications Branch, Surveillance Research Program, National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, USA
| | - Melissa Rotunno
- Division of Cancer Control and Population Sciences, Genomic Epidemiology Branch, Epidemiology and Genomics Research Program, NCI, NIH, Bethesda, Maryland, USA
| | - Bing Li
- Department of Biostatistics, Brown University, Providence, Rhode Island, USA
| | | | - Leah E Mechanic
- Division of Cancer Control and Population Sciences, Genomic Epidemiology Branch, Epidemiology and Genomics Research Program, NCI, NIH, Bethesda, Maryland, USA
| | - Bo Peng
- Department of Medicine, Baylor College of Medicine, Houston, Texas, USA
| |
Collapse
|
3
|
Li G, Hou L, Liu X, Wu C. A weighted empirical Bayes risk prediction model using multiple traits. Stat Appl Genet Mol Biol 2020; 19:/j/sagmb.ahead-of-print/sagmb-2019-0056/sagmb-2019-0056.xml. [PMID: 32887211 DOI: 10.1515/sagmb-2019-0056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 07/06/2020] [Indexed: 11/15/2022]
Abstract
With rapid advances in high-throughput sequencing technology, millions of single-nucleotide variants (SNVs) can be simultaneously genotyped in a sequencing study. These SNVs residing in functional genomic regions such as exons may play a crucial role in biological process of the body. In particular, non-synonymous SNVs are closely related to the protein sequence and its function, which are important in understanding the biological mechanism of sequence evolution. Although statistically challenging, models incorporating such SNV annotation information can improve the estimation of genetic effects, and multiple responses may further strengthen the signals of these variants on the assessment of disease risk. In this work, we develop a new weighted empirical Bayes method to integrate SNV annotation information in a multi-trait design. The performance of this proposed model is evaluated in simulation as well as a real sequencing data; thus, the proposed method shows improved prediction accuracy compared to other approaches.
Collapse
Affiliation(s)
- Gengxin Li
- Department of Mathematics and Statistics, University of Michigan Dearborn, 4901 Evergreen Rd, Dearborn, MI48128,USA
| | - Lin Hou
- Center for Statistical Science, Tsinghua University, 30 Shuangqing Rd, Haidian District, Beijing100084,China
| | - Xiaoyu Liu
- Department of Mathematics and Statistics, Wright State University, 3640 Colonel Glenn Hwy, Dayton, OH45435,USA
| | - Cen Wu
- Department of Statistics, Kansas State University, 1116 Mid-Campus Drive N., Manhattan, KS66506,USA
| |
Collapse
|
4
|
Svishcheva GR, Belonogova NM, Zorkoltseva IV, Kirichenko AV, Axenovich TI. Gene-based association tests using GWAS summary statistics. Bioinformatics 2020; 35:3701-3708. [PMID: 30860568 DOI: 10.1093/bioinformatics/btz172] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 02/12/2019] [Accepted: 03/11/2019] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION A huge number of genome-wide association studies (GWAS) summary statistics freely available in databases provide a new material for gene-based association analysis aimed at identifying rare genetic variants. Only a few of the many popular gene-based methods developed for individual genotype and phenotype data are adapted for the practical use of the GWAS summary statistics as input. RESULTS We analytically prove and numerically illustrate that all popular powerful methods developed for gene-based association analysis of individual phenotype and genotype data can be modified to utilize GWAS summary statistics. We have modified and implemented all of the popular methods, including burden and kernel machine-based tests, multiple and functional linear regression, principal components analysis and others, in the R package sumFREGAT. Using real summary statistics for coronary artery disease, we show that the new package is able to detect genes not found by the existing packages. AVAILABILITY AND IMPLEMENTATION The R package sumFREGAT is freely and publicly available at: https://CRAN.R-project.org/package=sumFREGAT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gulnara R Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Vavilov Institute of General Genetics, the Russian Academy of Sciences, Moscow, Russia
| | - Nadezhda M Belonogova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Irina V Zorkoltseva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anatoly V Kirichenko
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Tatiana I Axenovich
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia.,Department of Biotechnology, L.K. Ernst Federal Center for Animal Husbandry, Dubrovitsy, Russia
| |
Collapse
|
5
|
He J, Ma W, Zhou Y. Gene association detection via local linear regression method. J Hum Genet 2019; 65:115-123. [PMID: 31602004 DOI: 10.1038/s10038-019-0676-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 09/10/2019] [Accepted: 09/18/2019] [Indexed: 11/09/2022]
Abstract
The development of next-generation sequencing technology has provided us with great convenience in genetic association studies and many effective analysis methods were proposed continuously. However, population stratification is still a major issue in current genetic association studies. Many existing methods have been developed to remove the bias due to population stratification for common variant association studies, but such methods may be not effective for rare variant, which will lead to power reduction. Therefore, in this paper, we develop a principal component analysis strategy (called PC-LLR) based on local linear regression method to eliminate population stratification effect in both rare variant and common variant association studies. Simulation results indicate that the new PC-LLR method can eliminate population stratification effect well. It has correct type I error rates in all cases and higher powers in most cases, while most existing methods have inflated type I error rates at least in some cases. We also demonstrate that the PC-LLR is more effective to eliminate population stratification effect through applying the PC-LLR to the whole-exome sequencing data set from genetic analysis workshop 19 (GAW19).
Collapse
Affiliation(s)
- Jinli He
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University and Heilongjiang Provincial Key Laboratory of the Theory and Computation of Complex Systems, Harbin, 150080, China
| | - Weijun Ma
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University and Heilongjiang Provincial Key Laboratory of the Theory and Computation of Complex Systems, Harbin, 150080, China
| | - Ying Zhou
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University and Heilongjiang Provincial Key Laboratory of the Theory and Computation of Complex Systems, Harbin, 150080, China.
| |
Collapse
|
6
|
Aslibekyan S, Almasy L, Province MA, Absher DM, Arnett DK. Data for GAW20: genome-wide DNA sequence variation and epigenome-wide DNA methylation before and after fenofibrate treatment in a family study of metabolic phenotypes. BMC Proc 2018; 12:35. [PMID: 30275886 PMCID: PMC6157153 DOI: 10.1186/s12919-018-0114-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
GAW20 provided participants with an opportunity to comprehensively examine genetic and epigenetic variation among related individuals in the context of drug treatment response. GAW20 used data from 188 families (N = 1105) participating in the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study (clinicaltrials.gov identifier NCT00083369), which included CD4+ T-cell DNA methylation at 463,995 cytosine-phosphate-guanine (CpG) sites measured before and after a 3-week treatment with fenofibrate, single-nucleotide variation at 906,600 loci, metabolic syndrome components ascertained before and after the drug intervention, and relevant covariates. All GOLDN participants were of European descent, with an average age of 48 years. In addition, approximately half were women and approximately 40% met the diagnostic criteria for metabolic syndrome. Unique advantages of the GAW20data set included longitudinal (3 weeks apart) measurements of DNA methylation, the opportunity to explore the contributions of both genotype and DNA methylation to the interindividual variability in drug treatment response, and the familial relationships between study participants. The principal disadvantage of GAW20/GOLDN data was the spurious correlation between batch effects and fenofibrate effects on methylation, which arose because the pre- and posttreatment methylation data were generated and normalized separately, and any attempts to remove time-dependent technical artifacts would also remove biologically meaningful changes brought on by fenofibrate. Despite this limitation, the GAW20 data set offered informative, multilayered omics data collected in a large population-based study of common disease traits, which resulted in creative approaches to integration and analysis of inherited human variation.
Collapse
Affiliation(s)
- Stella Aslibekyan
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, 1665 University Blvd, Birmingham, AL 35205 USA
| | - Laura Almasy
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA 19104 USA
| | - Michael A. Province
- Division of Statistical Genomics, Washington University in St Louis, 660 South Euclid Ave, St Louis, MO 63110 USA
| | - Devin M. Absher
- Hudson Alpha Institute for Biotechnology, 601 Genome Way, Huntsville, AL 35806 USA
| | - Donna K. Arnett
- College of Public Health, University of Kentucky, 111 Washington Ave, Lexington, KY 40536 USA
| |
Collapse
|
7
|
Chien LC, Chiu YF. General retrospective mega-analysis framework for rare variant association tests. Genet Epidemiol 2018; 42:621-635. [PMID: 30188589 DOI: 10.1002/gepi.22147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 06/05/2018] [Accepted: 06/05/2018] [Indexed: 11/09/2022]
Abstract
Here, we describe a retrospective mega-analysis framework for gene- or region-based multimarker rare variant association tests. Our proposed mega-analysis association tests allow investigators to combine longitudinal and cross-sectional family- and/or population-based studies. This framework can be applied to a continuous, categorical, or survival trait. In addition to autosomal variants, the tests can be applied to conduct mega-analyses on X-chromosome variants. Tests were built on study-specific region- or gene-level quasiscore statistics and, therefore, do not require estimates of effects of individual rare variants. We used the generalized estimating equation approach to account for complex multiple correlation structures between family members, repeated measurements, and genetic markers. While accounting for multilevel correlations and heterogeneity across studies, the test statistics were computationally efficient and feasible for large-scale sequencing studies. The retrospective aspect of association tests helps alleviate bias due to phenotype-related sampling and type I errors due to misspecification of phenotypic distribution. We evaluated our developed mega-analysis methods through comprehensive simulations with varying sample sizes, covariates, population stratification structures, and study designs across multiple studies. To illustrate application of the proposed framework, we conducted a mega-association analysis combining a longitudinal family study and a cross-sectional case-control study from Genetic Analysis Workshop 19.
Collapse
Affiliation(s)
- Li-Chu Chien
- Center for Fundamental Science, Kaohsiung Medical University, Kaohsiung, Taiwan, ROC
| | - Yen-Feng Chiu
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan, ROC
| |
Collapse
|
8
|
Russo A, Di Gaetano C, Cugliari G, Matullo G. Advances in the Genetics of Hypertension: The Effect of Rare Variants. Int J Mol Sci 2018; 19:E688. [PMID: 29495593 PMCID: PMC5877549 DOI: 10.3390/ijms19030688] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 02/19/2018] [Accepted: 02/26/2018] [Indexed: 12/22/2022] Open
Abstract
Worldwide, hypertension still represents a serious health burden with nine million people dying as a consequence of hypertension-related complications. Essential hypertension is a complex trait supported by multifactorial genetic inheritance together with environmental factors. The heritability of blood pressure (BP) is estimated to be 30-50%. A great effort was made to find genetic variants affecting BP levels through Genome-Wide Association Studies (GWAS). This approach relies on the "common disease-common variant" hypothesis and led to the identification of multiple genetic variants which explain, in aggregate, only 2-3% of the genetic variance of hypertension. Part of the missing genetic information could be caused by variants too rare to be detected by GWAS. The use of exome chips and Next-Generation Sequencing facilitated the discovery of causative variants. Here, we report the advances in the detection of novel rare variants, genes, and/or pathways through the most promising approaches, and the recent statistical tests that have emerged to handle rare variants. We also discuss the need to further support rare novel variants with replication studies within larger consortia and with deeper functional studies to better understand how new genes might improve patient care and the stratification of the response to antihypertensive treatments.
Collapse
Affiliation(s)
- Alessia Russo
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Cornelia Di Gaetano
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Giovanni Cugliari
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Giuseppe Matullo
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| |
Collapse
|
9
|
Konigorski S, Wang Y, Cigsar C, Yilmaz YE. Estimating and testing direct genetic effects in directed acyclic graphs using estimating equations. Genet Epidemiol 2017; 42:174-186. [PMID: 29265408 PMCID: PMC6619348 DOI: 10.1002/gepi.22107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Revised: 10/26/2017] [Accepted: 11/14/2017] [Indexed: 12/12/2022]
Abstract
In genetic association studies, it is important to distinguish direct and indirect genetic effects in order to build truly functional models. For this purpose, we consider a directed acyclic graph setting with genetic variants, primary and intermediate phenotypes, and confounding factors. In order to make valid statistical inference on direct genetic effects on the primary phenotype, it is necessary to consider all potential effects in the graph, and we propose to use the estimating equations method with robust Huber-White sandwich standard errors. We evaluate the proposed causal inference based on estimating equations (CIEE) method and compare it with traditional multiple regression methods, the structural equation modeling method, and sequential G-estimation methods through a simulation study for the analysis of (completely observed) quantitative traits and time-to-event traits subject to censoring as primary phenotypes. The results show that CIEE provides valid estimators and inference by successfully removing the effect of intermediate phenotypes from the primary phenotype and is robust against measured and unmeasured confounding of the indirect effect through observed factors. All other methods except the sequential G-estimation method for quantitative traits fail in some scenarios where their test statistics yield inflated type I errors. In the analysis of the Genetic Analysis Workshop 19 dataset, we estimate and test genetic effects on blood pressure accounting for intermediate gene expression phenotypes. The results show that CIEE can identify genetic variants that would be missed by traditional regression analyses. CIEE is computationally fast, widely applicable to different fields, and available as an R package.
Collapse
Affiliation(s)
- Stefan Konigorski
- Molecular Epidemiology Research Group, Max Delbrück Center (MDC) for Molecular Medicine in the Helmholtz Association, Berlin, Germany.,Department of Mathematics and Statistics, Memorial University of Newfoundland, St. John's, Canada
| | - Yuan Wang
- Department of Mathematics and Statistics, Memorial University of Newfoundland, St. John's, Canada
| | - Candemir Cigsar
- Department of Mathematics and Statistics, Memorial University of Newfoundland, St. John's, Canada
| | - Yildiz E Yilmaz
- Department of Mathematics and Statistics, Memorial University of Newfoundland, St. John's, Canada.,Discipline of Genetics, Faculty of Medicine, Memorial University of Newfoundland, St. John's, Canada.,Discipline of Medicine, Faculty of Medicine, Memorial University of Newfoundland, St. John's, Canada
| |
Collapse
|
10
|
Espin-Garcia O, Craiu RV, Bull SB. Two-phase designs for joint quantitative-trait-dependent and genotype-dependent sampling in post-GWAS regional sequencing. Genet Epidemiol 2017; 42:104-116. [PMID: 29239496 PMCID: PMC5814750 DOI: 10.1002/gepi.22099] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Revised: 10/23/2017] [Accepted: 10/23/2017] [Indexed: 11/09/2022]
Abstract
We evaluate two‐phase designs to follow‐up findings from genome‐wide association study (GWAS) when the cost of regional sequencing in the entire cohort is prohibitive. We develop novel expectation‐maximization‐based inference under a semiparametric maximum likelihood formulation tailored for post‐GWAS inference. A GWAS‐SNP (where SNP is single nucleotide polymorphism) serves as a surrogate covariate in inferring association between a sequence variant and a normally distributed quantitative trait (QT). We assess test validity and quantify efficiency and power of joint QT‐SNP‐dependent sampling and analysis under alternative sample allocations by simulations. Joint allocation balanced on SNP genotype and extreme‐QT strata yields significant power improvements compared to marginal QT‐ or SNP‐based allocations. We illustrate the proposed method and evaluate the sensitivity of sample allocation to sampling variation using data from a sequencing study of systolic blood pressure.
Collapse
Affiliation(s)
- Osvaldo Espin-Garcia
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Radu V Craiu
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Shelley B Bull
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| |
Collapse
|
11
|
Konigorski S, Yilmaz YE, Pischon T. Comparison of single-marker and multi-marker tests in rare variant association studies of quantitative traits. PLoS One 2017; 12:e0178504. [PMID: 28562689 PMCID: PMC5451057 DOI: 10.1371/journal.pone.0178504] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Accepted: 05/15/2017] [Indexed: 11/19/2022] Open
Abstract
In genetic association studies of rare variants, low statistical power and potential violations of established estimator properties are among the main challenges of association tests. Multi-marker tests (MMTs) have been proposed to target these challenges, but any comparison with single-marker tests (SMTs) has to consider that their aim is to identify causal genomic regions instead of variants. Valid power comparisons have been performed for the analysis of binary traits indicating that MMTs have higher power, but there is a lack of conclusive studies for quantitative traits. The aim of our study was therefore to fairly compare SMTs and MMTs in their empirical power to identify the same causal loci associated with a quantitative trait. The results of extensive simulation studies indicate that previous results for binary traits cannot be generalized. First, we show that for the analysis of quantitative traits, conventional estimation methods and test statistics of single-marker approaches have valid properties yielding association tests with valid type I error, even when investigating singletons or doubletons. Furthermore, SMTs lead to more powerful association tests for identifying causal genes than MMTs when the effect sizes of causal variants are large, and less powerful tests when causal variants have small effect sizes. For moderate effect sizes, whether SMTs or MMTs have higher power depends on the sample size and percentage of causal SNVs. For a more complete picture, we also compare the power in studies of quantitative and binary traits, and the power to identify causal genes with the power to identify causal rare variants. In a genetic association analysis of systolic blood pressure in the Genetic Analysis Workshop 19 data, SMTs yielded smaller p-values compared to MMTs for most of the investigated blood pressure genes, and were least influenced by the definition of gene regions.
Collapse
Affiliation(s)
- Stefan Konigorski
- Molecular Epidemiology Research Group, Max Delbrück Center (MDC) for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Yildiz E. Yilmaz
- Department of Mathematics and Statistics, Memorial University of Newfoundland, St. John’s, Newfoundland and Labrador, Canada
- Discipline of Genetics, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, Newfoundland and Labrador, Canada
- Discipline of Medicine, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, Newfoundland and Labrador, Canada
| | - Tobias Pischon
- Molecular Epidemiology Research Group, Max Delbrück Center (MDC) for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Charité Universitätsmedizin Berlin, Berlin, Germany
- DZHK (German Center for Cardiovascular Research), Berlin, Germany
| |
Collapse
|
12
|
Shin JH, Yi R, Bull SB. Identification of low frequency and rare variants for hypertension using sparse-data methods. BMC Proc 2016; 10:389-395. [PMID: 27980667 PMCID: PMC5133522 DOI: 10.1186/s12919-016-0061-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Availability of genomic sequence data provides opportunities to study the role of low-frequency and rare variants in the etiology of complex disease. In this study, we conduct association analyses of hypertension status in the cohort of 1943 unrelated Mexican Americans provided by Genetic Analysis Workshop 19, focusing on exonic variants in MAP4 on chromosome 3. Our primary interest is to compare the performance of standard and sparse-data approaches for single-variant tests and variant-collapsing tests for sets of rare and low-frequency variants. We analyze both the real and the simulated phenotypes.
Collapse
Affiliation(s)
- Ji-Hyung Shin
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, University of Toronto, Toronto, ON M5T 3L9 Canada
| | - Ruiyang Yi
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, University of Toronto, Toronto, ON M5T 3L9 Canada
| | - Shelley B Bull
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, University of Toronto, Toronto, ON M5T 3L9 Canada
| |
Collapse
|
13
|
Valcarcel A, Grinde K, Cook K, Green A, Tintle N. A multistep approach to single nucleotide polymorphism-set analysis: an evaluation of power and type I error of gene-based tests of association after pathway-based association tests. BMC Proc 2016; 10:349-355. [PMID: 27980661 PMCID: PMC5133510 DOI: 10.1186/s12919-016-0055-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The aggregation of functionally associated variants given a priori biological information can aid in the discovery of rare variants associated with complex diseases. Many methods exist that aggregate rare variants into a set and compute a single p value summarizing association between the set of rare variants and a phenotype of interest. These methods are often called gene-based, rare variant tests of association because the variants in the set are often all contained within the same gene. A reasonable extension of these approaches involves aggregating variants across an even larger set of variants (eg, all variants contained in genes within a pathway). Testing sets of variants such as pathways for association with a disease phenotype reduces multiple testing penalties, may increase power, and allows for straightforward biological interpretation. However, a significant variant-set association test does not indicate precisely which variants contained within that set are causal. Because pathways often contain many variants, it may be helpful to follow-up significant pathway tests by conducting gene-based tests on each gene in that pathway to narrow in on the region of causal variants. In this paper, we propose such a multistep approach for variant-set analysis that can also account for covariates and complex pedigree structure. We demonstrate this approach on simulated phenotypes from Genetic Analysis Workshop 19. We find generally better power for the multistep approach when compared to a more conventional, single-step approach that simply runs gene-based tests of association on each gene across the genome. Further work is necessary to evaluate the multistep approach on different data sets with different characteristics.
Collapse
Affiliation(s)
- Alessandra Valcarcel
- Department of Statistics, University of Connecticut, 2390 Alumni Drive, Storrs, CT 06269 USA
| | - Kelsey Grinde
- Department of Biostatistics, University of Washington, NE Pacific St, Seattle, WA 98195 USA
| | - Kaitlyn Cook
- Department of Mathematics and Statistics, Carleton College, 1 N College St, Northfield, MN 55057 USA
| | - Alden Green
- Department of Statistics, Harvard University, Massachusetts Hall, Cambridge, MA 02138 USA
| | - Nathan Tintle
- Department of Mathematics, Statistics and Computer Science, Dordt College, 498 4th Ave. NE, Dordt College, Sioux Center, IA 51250 USA
| |
Collapse
|
14
|
Konigorski S, Yilmaz YE, Pischon T. Genetic association analysis based on a joint model of gene expression and blood pressure. BMC Proc 2016; 10:289-294. [PMID: 27980651 PMCID: PMC5133480 DOI: 10.1186/s12919-016-0045-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Recent work on genetic association studies suggests that much of the heritable variation in complex traits is unexplained, which indicates a need for using more biologically meaningful modeling approaches and appropriate statistical methods. In this study, we propose a biological framework and a corresponding statistical model incorporating multilevel biological measures, and illustrate it in the analysis of the real data provided by the Genetic Analysis Workshop (GAW) 19, which contains whole genome sequence (WGS), gene expression (GE), and blood pressure (BP) data. We investigate the direct effect of single-nucleotide variants (SNVs) on BP and GE, while considering the non-directional dependence between BP and GE, by using copula functions to jointly model BP and GE conditional on SNVs. We implement the method for analysis on a genome-wide scale, and illustrate it within an association analysis of 68,727 SNVs on chromosome 19 that lie in or around genes with available GE measures. Although there is no indication for inflated type I errors under the proposed method, our results show that the association tests have smaller p values than tests under univariate models for common and rare variants using single-variant tests and gene-based multimarker tests. Hence, considering multilevel biological measures and modeling the dependence structure between these measures by using a plausible graphical approach may lead to more informative findings than standard univariate tests of common variants and well-recognized gene-based rare variant tests.
Collapse
Affiliation(s)
- Stefan Konigorski
- Molecular Epidemiology Research Group, Max Delbrück Center (MDC) for Molecular Medicine, Robert-Rössle-Straße 10, 13125 Berlin, Germany
| | - Yildiz E Yilmaz
- Department of Mathematics and Statistics, Memorial University of Newfoundland, St. John's, NL A1C 5S7 Canada ; Discipline of Genetics, Faculty of Medicine, Memorial University of Newfoundland, St. John's, NL A1C 5S7 Canada
| | - Tobias Pischon
- Molecular Epidemiology Research Group, Max Delbrück Center (MDC) for Molecular Medicine, Robert-Rössle-Straße 10, 13125 Berlin, Germany
| |
Collapse
|
15
|
Sun J, Bhatnagar SR, Oualkacha K, Ciampi A, Greenwood CMT. Joint analysis of multiple blood pressure phenotypes in GAW19 data by using a multivariate rare-variant association test. BMC Proc 2016; 10:309-313. [PMID: 27980654 PMCID: PMC5133485 DOI: 10.1186/s12919-016-0048-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
INTRODUCTION Large-scale sequencing studies often measure many related phenotypes in addition to the genetic variants. Joint analysis of multiple phenotypes in genetic association studies may increase power to detect disease-associated loci. METHODS We apply a recently developed multivariate rare-variant association test to the Genetic Analysis Workshop 19 data in order to test associations between genetic variants and multiple blood pressure phenotypes simultaneously. We also compare this multivariate test with a widely used univariate test that analyzes phenotypes separately. RESULTS The multivariate test identified 2 genetic variants that have been previously reported as associated with hypertension or coronary artery disease. In addition, our region-based analyses also show that the multivariate test tends to give smaller p values than the univariate test. CONCLUSIONS Hence, the multivariate test has potential to improve test power, especially when multiple phenotypes are correlated.
Collapse
Affiliation(s)
- Jianping Sun
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC H3A 1A2 Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC H3T 1E2 Canada
| | - Sahir R. Bhatnagar
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC H3A 1A2 Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal, Montréal, QC H2X 3Y7 Canada
| | - Antonio Ciampi
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC H3A 1A2 Canada
| | - Celia M. T. Greenwood
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC H3A 1A2 Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC H3T 1E2 Canada
- Department of Oncology, McGill University, Montreal, QC H2W 1S6 Canada
- Department of Human Genetics, McGill University, Montreal, QC H3A 1B1 Canada
| |
Collapse
|