1
|
Sun W, Jon K, Zhu W. Multiple phenotype association tests based on sliced inverse regression. BMC Bioinformatics 2024; 25:144. [PMID: 38575890 PMCID: PMC10996256 DOI: 10.1186/s12859-024-05731-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 03/05/2024] [Indexed: 04/06/2024] Open
Abstract
BACKGROUND Joint analysis of multiple phenotypes in studies of biological systems such as Genome-Wide Association Studies is critical to revealing the functional interactions between various traits and genetic variants, but growth of data in dimensionality has become a very challenging problem in the widespread use of joint analysis. To handle the excessiveness of variables, we consider the sliced inverse regression (SIR) method. Specifically, we propose a novel SIR-based association test that is robust and powerful in testing the association between multiple predictors and multiple outcomes. RESULTS We conduct simulation studies in both low- and high-dimensional settings with various numbers of Single-Nucleotide Polymorphisms and consider the correlation structure of traits. Simulation results show that the proposed method outperforms the existing methods. We also successfully apply our method to the genetic association study of ADNI dataset. Both the simulation studies and real data analysis show that the SIR-based association test is valid and achieves a higher efficiency compared with its competitors. CONCLUSION Several scenarios with low- and high-dimensional responses and genotypes are considered in this paper. Our SIR-based method controls the estimated type I error at the pre-specified level α .
Collapse
Affiliation(s)
- Wenyuan Sun
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, Jilin, China
- Department of Mathematics, College of Science, Yanbian University, Yanji, 133002, Jilin, China
| | - Kyongson Jon
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, Jilin, China
- Faculty of Mathematics, Kim Il Sung University, Pyongyan , 999093, Democratic People's Republic of Korea
| | - Wensheng Zhu
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, Jilin, China.
- School of Mathematical Sciences, Harbin Normal University, Harbin, 150025, Heilongjiang, China.
| |
Collapse
|
2
|
St-Pierre J, Oualkacha K. A copula-based set-variant association test for bivariate continuous, binary or mixed phenotypes. Int J Biostat 2023; 19:369-387. [PMID: 36279152 PMCID: PMC10644254 DOI: 10.1515/ijb-2022-0010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 05/26/2022] [Accepted: 08/23/2022] [Indexed: 11/15/2022]
Abstract
In genome wide association studies (GWAS), researchers are often dealing with dichotomous and non-normally distributed traits, or a mixture of discrete-continuous traits. However, most of the current region-based methods rely on multivariate linear mixed models (mvLMMs) and assume a multivariate normal distribution for the phenotypes of interest. Hence, these methods are not applicable to disease or non-normally distributed traits. Therefore, there is a need to develop unified and flexible methods to study association between a set of (possibly rare) genetic variants and non-normal multivariate phenotypes. Copulas are multivariate distribution functions with uniform margins on the [0, 1] interval and they provide suitable models to deal with non-normality of errors in multivariate association studies. We propose a novel unified and flexible copula-based multivariate association test (CBMAT) for discovering association between a genetic region and a bivariate continuous, binary or mixed phenotype. We also derive a data-driven analytic p-value procedure of the proposed region-based score-type test. Through simulation studies, we demonstrate that CBMAT has well controlled type I error rates and higher power to detect associations compared with other existing methods, for discrete and non-normally distributed traits. At last, we apply CBMAT to detect the association between two genes located on chromosome 11 and several lipid levels measured on 1477 subjects from the ASLPAC study.
Collapse
Affiliation(s)
- Julien St-Pierre
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal, Montreal, QC, Canada
| |
Collapse
|
3
|
Xie H, Cao X, Zhang S, Sha Q. Joint analysis of multiple phenotypes for extremely unbalanced case-control association studies. Genet Epidemiol 2023; 47:185-197. [PMID: 36691904 DOI: 10.1002/gepi.22513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 11/16/2022] [Accepted: 01/11/2023] [Indexed: 01/25/2023]
Abstract
In genome-wide association studies (GWAS) for thousands of phenotypes in biobanks, most binary phenotypes have substantially fewer cases than controls. Many widely used approaches for joint analysis of multiple phenotypes produce inflated type I error rates for such extremely unbalanced case-control phenotypes. In this research, we develop a method to jointly analyze multiple unbalanced case-control phenotypes to circumvent this issue. We first group multiple phenotypes into different clusters based on a hierarchical clustering method, then we merge phenotypes in each cluster into a single phenotype. In each cluster, we use the saddlepoint approximation to estimate the p value of an association test between the merged phenotype and a single nucleotide polymorphism (SNP) which eliminates the issue of inflated type I error rate of the test for extremely unbalanced case-control phenotypes. Finally, we use the Cauchy combination method to obtain an integrated p value for all clusters to test the association between multiple phenotypes and a SNP. We use extensive simulation studies to evaluate the performance of the proposed approach. The results show that the proposed approach can control type I error rate very well and is more powerful than other available methods. We also apply the proposed approach to phenotypes in category IX (diseases of the circulatory system) in the UK Biobank. We find that the proposed approach can identify more significant SNPs than the other viable methods we compared with.
Collapse
Affiliation(s)
- Hongjing Xie
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| |
Collapse
|
4
|
Jin C, Lee B, Shen L, Long Q. Integrating multi-omics summary data using a Mendelian randomization framework. Brief Bioinform 2022; 23:bbac376. [PMID: 36094096 PMCID: PMC9677504 DOI: 10.1093/bib/bbac376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 07/29/2022] [Accepted: 08/09/2022] [Indexed: 12/14/2022] Open
Abstract
Mendelian randomization is a versatile tool to identify the possible causal relationship between an omics biomarker and disease outcome using genetic variants as instrumental variables. A key theme is the prioritization of genes whose omics readouts can be used as predictors of the disease outcome through analyzing GWAS and QTL summary data. However, there is a dearth of study of the best practice in probing the effects of multiple -omics biomarkers annotated to the same gene of interest. To bridge this gap, we propose powerful combination tests that integrate multiple correlated $P$-values without assuming the dependence structure between the exposures. Our extensive simulation experiments demonstrate the superiority of our proposed approach compared with existing methods that are adapted to the setting of our interest. The top hits of the analyses of multi-omics Alzheimer's disease datasets include genes ABCA7 and ATP1B1.
Collapse
Affiliation(s)
- Chong Jin
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Brian Lee
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | | |
Collapse
|
5
|
Cinar O, Viechtbauer W. A Comparison of Methods for Gene-Based Testing That Account for Linkage Disequilibrium. Front Genet 2022; 13:867724. [PMID: 35601489 PMCID: PMC9117705 DOI: 10.3389/fgene.2022.867724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/07/2022] [Indexed: 11/16/2022] Open
Abstract
Controlling the type I error rate while retaining sufficient power is a major concern in genome-wide association studies, which nowadays often examine more than a million single-nucleotide polymorphisms (SNPs) simultaneously. Methods such as the Bonferroni correction can lead to a considerable decrease in power due to the large number of tests conducted. Shifting the focus to higher functional structures (e.g., genes) can reduce the loss of power. This can be accomplished via the combination of p-values of SNPs that belong to the same structural unit to test their joint null hypothesis. However, standard methods for this purpose (e.g., Fisher’s method) do not account for the dependence among the tests due to linkage disequilibrium (LD). In this paper, we review various adjustments to methods for combining p-values that take LD information explicitly into consideration and evaluate their performance in a simulation study based on data from the HapMap project. The results illustrate the importance of incorporating LD information into the methods for controlling the type I error rate at the desired level. Furthermore, some methods are more successful in controlling the type I error rate than others. Among them, Brown’s method was the most robust technique with respect to the characteristics of the genes and outperformed the Bonferroni method in terms of power in many scenarios. Examining the genetic factors of a phenotype of interest at the gene-rather than SNP-level can provide researchers benefits in terms of the power of the study. While doing so, one should be careful to account for LD in SNPs belonging to the same gene, for which Brown’s method seems the most robust technique.
Collapse
|
6
|
Fu L, Wang Y, Li T, Yang S, Hu YQ. A Novel Hierarchical Clustering Approach for Joint Analysis of Multiple Phenotypes Uncovers Obesity Variants Based on ARIC. Front Genet 2022; 13:791920. [PMID: 35391794 PMCID: PMC8981031 DOI: 10.3389/fgene.2022.791920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 01/27/2022] [Indexed: 12/02/2022] Open
Abstract
Genome-wide association studies (GWASs) have successfully discovered numerous variants underlying various diseases. Generally, one-phenotype one-variant association study in GWASs is not efficient in identifying variants with weak effects, indicating that more signals have not been identified yet. Nowadays, jointly analyzing multiple phenotypes has been recognized as an important approach to elevate the statistical power for identifying weak genetic variants on complex diseases, shedding new light on potential biological mechanisms. Therefore, hierarchical clustering based on different methods for calculating correlation coefficients (HCDC) is developed to synchronously analyze multiple phenotypes in association studies. There are two steps involved in HCDC. First, a clustering approach based on the similarity matrix between two groups of phenotypes is applied to choose a representative phenotype in each cluster. Then, we use existing methods to estimate the genetic associations with the representative phenotypes rather than the individual phenotypes in every cluster. A variety of simulations are conducted to demonstrate the capacity of HCDC for boosting power. As a consequence, existing methods embedding HCDC are either more powerful or comparable with those of without embedding HCDC in most scenarios. Additionally, the application of obesity-related phenotypes from Atherosclerosis Risk in Communities via existing methods with HCDC uncovered several associated variants. Among these, UQCC1-rs1570004 is reported as a significant obesity signal for the first time, whose differential expression in subcutaneous fat, visceral fat, and muscle tissue is worthy of further functional studies.
Collapse
Affiliation(s)
- Liwan Fu
- Center for Non-communicable Disease Management, National Center for Children's Health, Beijing Children's Hospital, Capital Medical University, Beijing, China.,State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yuquan Wang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Tingting Li
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Siqian Yang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yue-Qing Hu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China.,Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
7
|
Zhang H, Wu Z. The generalized Fisher's combination and accurate p-value calculation under dependence. Biometrics 2022. [PMID: 35178716 DOI: 10.1111/biom.13634] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Accepted: 02/03/2022] [Indexed: 11/28/2022]
Abstract
Combining dependent tests of significance has broad applications but the related p-value calculation is challenging. For Fisher's combination test, current p-value calculation methods (e.g., Brown's approximation) tend to inflate the type I error rate when the desired significance level is substantially less than 0.05. The problem could lead to significant false discoveries in big data analyses. This paper provides two main contributions. First, it presents a general family of Fisher type statistics, referred to as the GFisher, which covers many classic statistics, such as Fisher's combination, Good's statistic, Lancaster's statistic, weighted Z-score combination, etc. The GFisher allows a flexible weighting scheme, as well as an omnibus procedure that automatically adapts proper weights and the statistic-defining parameters to a given data. Second, the paper presents several new p-value calculation methods based on two novel ideas: moment-ratio matching and joint-distribution surrogating. Systematic simulations show that the new calculation methods are more accurate under multivariate Gaussian, and more robust under the generalized linear model and the multivariate t-distribution. The applications of the GFisher and the new p-value calculation methods are demonstrated by a gene-based SNP-set association study. Relevant computation has been implemented to an R package GFisher available on the Comprehensive R Archive Network. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Hong Zhang
- Biostatistics and Research Decision Sciences, Merck Research Laboratories, Rahway, New Jersey, U.S.A
| | - Zheyang Wu
- Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, Massachusetts, U.S.A
| |
Collapse
|
8
|
Ravi S, Campagna G, Della Lucia MC, Broccanello C, Bertoldo G, Chiodi C, Maretto L, Moro M, Eslami AS, Srinivasan S, Squartini A, Concheri G, Stevanato P. SNP Alleles Associated With Low Bolting Tendency in Sugar Beet. FRONTIERS IN PLANT SCIENCE 2021; 12:693285. [PMID: 34322145 PMCID: PMC8311237 DOI: 10.3389/fpls.2021.693285] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Accepted: 06/07/2021] [Indexed: 06/13/2023]
Abstract
The identification of efficient molecular markers related to low bolting tendency is a priority in sugar beet (Beta vulgaris L.) breeding. This study aimed to identify SNP markers associated with low bolting tendency by establishing a genome-wide association study. An elaborate 3-year field trial comprising 13 sugar beet lines identified L14 as the one exhibiting the lowest bolting tendency along with an increased survival rate after autumnal sowing. For SNP discovery following phenotyping, contrasting phenotypes of 24 non-bolting and 15 bolting plants of the L14 line were sequenced by restriction site-associated DNA sequencing (RAD-seq). An association model was established with a set of 10,924 RAD-based single nucleotide polymorphism (SNP) markers. The allelic status of the most significantly associated SNPs ranked based on their differential allelic status between contrasting phenotypes (p < 0.01) was confirmed on three different validation datasets comprising diverse sugar beet lines and varieties adopting a range of SNP detection technologies. This study has led to the identification of SNP_36780842 and SNP_48607347 linked to low bolting tendency and can be used for marker-assisted breeding and selection in sugar beet.
Collapse
Affiliation(s)
- Samathmika Ravi
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Legnaro, Italy
| | - Giovanni Campagna
- Cooperativa Produttori Agricoli Società Cooperativa Agricola (COPROB), Minerbio, Italy
| | - Maria Cristina Della Lucia
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Legnaro, Italy
| | - Chiara Broccanello
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Legnaro, Italy
| | - Giovanni Bertoldo
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Legnaro, Italy
| | - Claudia Chiodi
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Legnaro, Italy
| | - Laura Maretto
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Legnaro, Italy
| | - Matteo Moro
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Legnaro, Italy
| | - Azam Sadat Eslami
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Legnaro, Italy
| | | | - Andrea Squartini
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Legnaro, Italy
| | - Giuseppe Concheri
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Legnaro, Italy
| | - Piergiorgio Stevanato
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Legnaro, Italy
| |
Collapse
|
9
|
Fu L, Wang Y, Li T, Hu YQ. A Novel Approach Integrating Hierarchical Clustering and Weighted Combination for Association Study of Multiple Phenotypes and a Genetic Variant. Front Genet 2021; 12:654804. [PMID: 34220938 PMCID: PMC8249926 DOI: 10.3389/fgene.2021.654804] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 04/20/2021] [Indexed: 11/26/2022] Open
Abstract
As a pivotal research tool, genome-wide association study has successfully identified numerous genetic variants underlying distinct diseases. However, these identified genetic variants only explain a small proportion of the phenotypic variation for certain diseases, suggesting that there are still more genetic signals to be detected. One of the reasons may be that one-phenotype one-variant association study is not so efficient in detecting variants of weak effects. Nowadays, it is increasingly worth noting that joint analysis of multiple phenotypes may boost the statistical power to detect pathogenic variants with weak genetic effects on complex diseases, providing more clues for their underlying biology mechanisms. So a Weighted Combination of multiple phenotypes following Hierarchical Clustering method (WCHC) is proposed for simultaneously analyzing multiple phenotypes in association studies. A series of simulations are conducted, and the results show that WCHC is either the most powerful method or comparable with the most powerful competitor in most of the simulation scenarios. Additionally, we evaluated the performance of WCHC in its application to the obesity-related phenotypes from Atherosclerosis Risk in Communities, and several associated variants are reported.
Collapse
Affiliation(s)
- Liwan Fu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China.,Center for Non-communicable Disease Management, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Yuquan Wang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Tingting Li
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yue-Qing Hu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China.,Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
10
|
Zeng P, Shao Z, Zhou X. Statistical methods for mediation analysis in the era of high-throughput genomics: Current successes and future challenges. Comput Struct Biotechnol J 2021; 19:3209-3224. [PMID: 34141140 PMCID: PMC8187160 DOI: 10.1016/j.csbj.2021.05.042] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 05/21/2021] [Accepted: 05/21/2021] [Indexed: 12/12/2022] Open
Abstract
Mediation analysis investigates the intermediate mechanism through which an exposure exerts its influence on the outcome of interest. Mediation analysis is becoming increasingly popular in high-throughput genomics studies where a common goal is to identify molecular-level traits, such as gene expression or methylation, which actively mediate the genetic or environmental effects on the outcome. Mediation analysis in genomics studies is particularly challenging, however, thanks to the large number of potential mediators measured in these studies as well as the composite null nature of the mediation effect hypothesis. Indeed, while the standard univariate and multivariate mediation methods have been well-established for analyzing one or multiple mediators, they are not well-suited for genomics studies with a large number of mediators and often yield conservative p-values and limited power. Consequently, over the past few years many new high-dimensional mediation methods have been developed for analyzing the large number of potential mediators collected in high-throughput genomics studies. In this work, we present a thorough review of these important recent methodological advances in high-dimensional mediation analysis. Specifically, we describe in detail more than ten high-dimensional mediation methods, focusing on their motivations, basic modeling ideas, specific modeling assumptions, practical successes, methodological limitations, as well as future directions. We hope our review will serve as a useful guidance for statisticians and computational biologists who develop methods of high-dimensional mediation analysis as well as for analysts who apply mediation methods to high-throughput genomics studies.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Zhonghe Shao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor 48109, MI, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor 48109, MI, USA
| |
Collapse
|
11
|
Fang R, Yang H, Gao Y, Cao H, Goode EL, Cui Y. Gene-based mediation analysis in epigenetic studies. Brief Bioinform 2021; 22:bbaa113. [PMID: 32608480 PMCID: PMC8660163 DOI: 10.1093/bib/bbaa113] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 04/07/2020] [Accepted: 05/12/2020] [Indexed: 12/15/2022] Open
Abstract
Mediation analysis has been a useful tool for investigating the effect of mediators that lie in the path from the independent variable to the outcome. With the increasing dimensionality of mediators such as in (epi)genomics studies, high-dimensional mediation model is needed. In this work, we focus on epigenetic studies with the goal to identify important DNA methylations that act as mediators between an exposure disease outcome. Specifically, we focus on gene-based high-dimensional mediation analysis implemented with kernel principal component analysis to capture potential nonlinear mediation effect. We first review the current high-dimensional mediation models and then propose two gene-based analytical approaches: gene-based high-dimensional mediation analysis based on linearity assumption between mediators and outcome (gHMA-L) and gene-based high-dimensional mediation analysis based on nonlinearity assumption (gHMA-NL). Since the underlying true mediation relationship is unknown in practice, we further propose an omnibus test of gene-based high-dimensional mediation analysis (gHMA-O) by combing gHMA-L and gHMA-NL. Extensive simulation studies show that gHMA-L performs better under the model linear assumption and gHMA-NL does better under the model nonlinear assumption, while gHMA-O is a more powerful and robust method by combining the two. We apply the proposed methods to two datasets to investigate genes whose methylation levels act as important mediators in the relationship: (1) between alcohol consumption and epithelial ovarian cancer risk using data from the Mayo Clinic Ovarian Cancer Case-Control Study and (2) between childhood maltreatment and comorbid post-traumatic stress disorder and depression in adulthood using data from the Gray Trauma Project.
Collapse
|
12
|
Yurko R, Roeder K, Devlin B, G'Sell M. H-MAGMA, inheriting a shaky statistical foundation, yields excess false positives. Ann Hum Genet 2020; 85:97-100. [PMID: 33372276 DOI: 10.1111/ahg.12412] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 12/04/2020] [Accepted: 12/07/2020] [Indexed: 11/26/2022]
Affiliation(s)
- Ronald Yurko
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, Pennsylvania
| | - Kathryn Roeder
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, Pennsylvania.,Department of Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Max G'Sell
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, Pennsylvania
| |
Collapse
|
13
|
Deng Y, Wu S, Fan H. Genome-wide pathway-based quantitative multiple phenotypes analysis. PLoS One 2020; 15:e0240910. [PMID: 33175855 PMCID: PMC7657528 DOI: 10.1371/journal.pone.0240910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 10/06/2020] [Indexed: 11/18/2022] Open
Abstract
For complex diseases, genome-wide pathway association studies have become increasingly promising. Currently, however, pathway-based association analysis mainly focus on a single phenotype, which may insufficient to describe the complex diseases and physiological processes. This work proposes a combination model to evaluate the association between a pathway and multiple phenotypes and to reduce the run time based on asymptotic results. For a single phenotype, we propose a semi-supervised maximum kernel-based U-statistics (mSKU) method to assess the pathway-based association analysis. For multiple phenotypes, we propose the fisher combination function with dependent phenotypes (FC) to transform the p-values between the pathway and each marginal phenotype individually to achieve pathway-based multiple phenotypes analysis. With real data from the Alzheimer Disease Neuroimaging Initiative (ADNI) study and Human Liver Cohort (HLC) study, the FC-mSKU method allows us to specify which pathways are specific to a single phenotype or contribute to common genetic constructions of multiple phenotypes. If we only focus on single-phenotype tests, we may miss some findings for etiology studies. Through extensive simulation studies, the FC-mSKU method demonstrates its advantages compared with its counterparts.
Collapse
Affiliation(s)
- Yamin Deng
- Statistics Center, First Hospital of Shanxi Medical University, Taiyuan, China.,Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Shiman Wu
- Statistics Center, First Hospital of Shanxi Medical University, Taiyuan, China
| | - Huifang Fan
- Statistics Center, First Hospital of Shanxi Medical University, Taiyuan, China
| |
Collapse
|
14
|
Deng Y, He T, Fang R, Li S, Cao H, Cui Y. Genome-Wide Gene-Based Multi-Trait Analysis. Front Genet 2020; 11:437. [PMID: 32508874 PMCID: PMC7248273 DOI: 10.3389/fgene.2020.00437] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 04/08/2020] [Indexed: 11/29/2022] Open
Abstract
Genome-wide association studies focusing on a single phenotype have been broadly conducted to identify genetic variants associated with a complex disease. The commonly applied single variant analysis is limited by failing to consider the complex interactions between variants, which motivated the development of association analyses focusing on genes or gene sets. Moreover, when multiple correlated phenotypes are available, methods based on a multi-trait analysis can improve the association power. However, most currently available multi-trait analyses are single variant-based analyses; thus have limited power when disease variants function as a group in a gene or a gene set. In this work, we propose a genome-wide gene-based multi-trait analysis method by considering genes as testing units. For a given phenotype, we adopt a rapid and powerful kernel-based testing method which can evaluate the joint effect of multiple variants within a gene. The joint effect, either linear or nonlinear, is captured through kernel functions. Given a series of candidate kernel functions, we propose an omnibus test strategy to integrate the test results based on different candidate kernels. A p-value combination method is then applied to integrate dependent p-values to assess the association between a gene and multiple correlated phenotypes. Simulation studies show a reasonable type I error control and an excellent power of the proposed method compared to its counterparts. We further show the utility of the method by applying it to two data sets: the Human Liver Cohort and the Alzheimer Disease Neuroimaging Initiative data set, and novel genes are identified. Our method has broad applications in other fields in which the interest is to evaluate the joint effect (linear or nonlinear) of a set of variants.
Collapse
Affiliation(s)
- Yamin Deng
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Tao He
- Department of Mathematics, San Francisco State University, San Francisco, CA, United States
| | - Ruiling Fang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Shaoyu Li
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC, United States
| | - Hongyan Cao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, United States
| |
Collapse
|
15
|
Sha Q, Wang Z, Zhang X, Zhang S. A clustering linear combination approach to jointly analyze multiple phenotypes for GWAS. Bioinformatics 2020; 35:1373-1379. [PMID: 30239574 DOI: 10.1093/bioinformatics/bty810] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Revised: 08/29/2018] [Accepted: 09/18/2018] [Indexed: 12/16/2022] Open
Abstract
SUMMARY There is an increasing interest in joint analysis of multiple phenotypes for genome-wide association studies (GWASs) based on the following reasons. First, cohorts usually collect multiple phenotypes and complex diseases are usually measured by multiple correlated intermediate phenotypes. Second, jointly analyzing multiple phenotypes may increase statistical power for detecting genetic variants associated with complex diseases. Third, there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. In this paper, we develop a clustering linear combination (CLC) method to jointly analyze multiple phenotypes for GWASs. In the CLC method, we first cluster individual statistics into positively correlated clusters and then, combine the individual statistics linearly within each cluster and combine the between-cluster terms in a quadratic form. CLC is not only robust to different signs of the means of individual statistics, but also reduce the degrees of freedom of the test statistic. We also theoretically prove that if we can cluster the individual statistics correctly, CLC is the most powerful test among all tests with certain quadratic forms. Our simulation results show that CLC is either the most powerful test or has similar power to the most powerful test among the tests we compared, and CLC is much more powerful than other tests when effect sizes align with inferred clusters. We also evaluate the performance of CLC through a real case study. AVAILABILITY AND IMPLEMENTATION R code for implementing our method is available at http://www.math.mtu.edu/∼shuzhang/software.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Xiao Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| |
Collapse
|
16
|
Li X, Zhang S, Sha Q. Joint analysis of multiple phenotypes using a clustering linear combination method based on hierarchical clustering. Genet Epidemiol 2020; 44:67-78. [PMID: 31541490 PMCID: PMC7480017 DOI: 10.1002/gepi.22263] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 07/19/2019] [Accepted: 08/28/2019] [Indexed: 12/24/2022]
Abstract
Emerging evidence suggests that a genetic variant can affect multiple phenotypes, especially in complex human diseases. Therefore, joint analysis of multiple phenotypes may offer new insights into disease etiology. Recently, many statistical methods have been developed for joint analysis of multiple phenotypes, including the clustering linear combination (CLC) method. Due to the unknown number of clusters for a given data, a simulation procedure must be used to evaluate the p-value of the final test statistic of CLC. This makes the CLC method computationally demanding. In this paper, we use a stopping criterion to determine the number of clusters in the CLC method. We have named our method, hierarchical clustering CLC (HCLC). HCLC has an asymptotic distribution, which is very computationally efficient and makes it applicable for genome-wide association studies. Extensive simulations together with the COPDGene data analysis have been used to assess the type I error rates and power of our proposed method. Our simulation results demonstrate that the type I error rates of HCLC are effectively controlled in different realistic settings. HCLC either outperforms all other methods or has statistical power that is very close to the most powerful method with which it has been compared.
Collapse
Affiliation(s)
- Xueling Li
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
17
|
Dapas M, Sisk R, Legro RS, Urbanek M, Dunaif A, Hayes MG. Family-Based Quantitative Trait Meta-Analysis Implicates Rare Noncoding Variants in DENND1A in Polycystic Ovary Syndrome. J Clin Endocrinol Metab 2019; 104:3835-3850. [PMID: 31038695 PMCID: PMC6660913 DOI: 10.1210/jc.2018-02496] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 04/17/2019] [Indexed: 02/07/2023]
Abstract
CONTEXT Polycystic ovary syndrome (PCOS) is among the most common endocrine disorders of premenopausal women, affecting 5% to15% of this population depending on the diagnostic criteria applied. It is characterized by hyperandrogenism, ovulatory dysfunction, and polycystic ovarian morphology. PCOS is highly heritable, but only a small proportion of this heritability can be accounted for by the common genetic susceptibility variants identified to date. OBJECTIVE The objective of this study was to test whether rare genetic variants contribute to PCOS pathogenesis. DESIGN, PATIENTS, AND METHODS We performed whole-genome sequencing on DNA from 261 individuals from 62 families with one or more daughters with PCOS. We tested for associations of rare variants with PCOS and its concomitant hormonal traits using a quantitative trait meta-analysis. RESULTS We found rare variants in DENND1A (P = 5.31 × 10-5, adjusted P = 0.039) that were significantly associated with reproductive and metabolic traits in PCOS families. CONCLUSIONS Common variants in DENND1A have previously been associated with PCOS diagnosis in genome-wide association studies. Subsequent studies indicated that DENND1A is an important regulator of human ovarian androgen biosynthesis. Our findings provide additional evidence that DENND1A plays a central role in PCOS and suggest that rare noncoding variants contribute to disease pathogenesis.
Collapse
Affiliation(s)
- Matthew Dapas
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Ryan Sisk
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Richard S Legro
- Department of Obstetrics and Gynecology, Penn State College of Medicine, Hershey, Pennsylvania
| | - Margrit Urbanek
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Reproductive Science, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Andrea Dunaif
- Division of Endocrinology, Diabetes, and Bone Disease, Icahn School of Medicine at Mount Sinai, New York, New York
| | - M Geoffrey Hayes
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Department of Anthropology, Northwestern University, Evanston, Illinois
| |
Collapse
|
18
|
Xia Y, Cai TT, Li H. Joint testing and false discovery rate control in high-dimensional multivariate regression. Biometrika 2019; 105:249-269. [PMID: 30799872 DOI: 10.1093/biomet/asx085] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Indexed: 01/15/2023] Open
Abstract
Multivariate regression with high-dimensional covariates has many applications in genomic and genetic research, in which some covariates are expected to be associated with multiple responses. This paper considers joint testing for regression coefficients over multiple responses and develops simultaneous testing methods with false discovery rate control. The test statistic is based on inverse regression and bias-corrected group lasso estimates of the regression coefficients and is shown to have an asymptotic chi-squared null distribution. A row-wise multiple testing procedure is developed to identify the covariates associated with the responses. The procedure is shown to control the false discovery proportion and false discovery rate at a prespecified level asymptotically. Simulations demonstrate the gain in power, relative to entrywise testing, in detecting the covariates associated with the responses. The test is applied to an ovarian cancer dataset to identify the microRNA regulators that regulate protein expression.
Collapse
Affiliation(s)
- Yin Xia
- Department of Statistics, School of Management, Fudan University, Shanghai, China
| | - T Tony Cai
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania, U.S.A
| | - Hongzhe Li
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, U.S.A
| |
Collapse
|
19
|
Chen Y, Adrianto I, Ianuzzi MC, Garman L, Montgomery CG, Rybicki BA, Levin AM, Li J. Extended methods for gene-environment-wide interaction scans in studies of admixed individuals with varying degrees of relationships. Genet Epidemiol 2019; 43:414-426. [PMID: 30793815 DOI: 10.1002/gepi.22196] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 12/26/2018] [Accepted: 01/24/2019] [Indexed: 11/08/2022]
Abstract
The etiology of many complex diseases involves both environmental exposures and inherited genetic predisposition as well as interactions between them. Gene-environment-wide interaction studies (GEWIS) provide a means to identify the interactions between genetic variation and environmental exposures that underlie disease risk. However, current GEWIS methods lack the capability to adjust for the potentially complex correlations in studies with varying degrees of relationships (both known and unknown) among individuals in admixed populations. We developed novel generalized estimating equation (GEE) based methods-GEE-adaptive and GEE-joint-to account for phenotypic correlations due to kinship while accounting for covariates, including, measures of genome-wide ancestry. In simulation studies of admixed individuals, both methods controlled family-wise error rates, an advantage over the case-only approach. They demonstrated higher power than traditional case-control methods across a wide range of underlying alternative hypotheses, especially where both marginal and interaction effects were present. We applied the proposed method to conduct a GEWIS of a known sarcoidosis risk factor (insecticide exposure) and risk of sarcoidosis in African Americans and identified two novel loci with suggestive evidence of G × E interaction.
Collapse
Affiliation(s)
- Yalei Chen
- Department of Public Health Sciences, Henry Ford Health System, Detroit, Michigan.,Center for Bioinformatics, Henry Ford Health System, Detroit, Michigan
| | - Indra Adrianto
- Department of Public Health Sciences, Henry Ford Health System, Detroit, Michigan.,Center for Bioinformatics, Henry Ford Health System, Detroit, Michigan
| | - Michael C Ianuzzi
- Department of Internal Medicine, Northwell Staten Island University Hospital, Staten Island, New York, New York
| | - Lori Garman
- Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation, Oklahoma City, Oklahoma
| | - Courtney G Montgomery
- Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation, Oklahoma City, Oklahoma
| | - Benjamin A Rybicki
- Department of Public Health Sciences, Henry Ford Health System, Detroit, Michigan
| | - Albert M Levin
- Department of Public Health Sciences, Henry Ford Health System, Detroit, Michigan.,Center for Bioinformatics, Henry Ford Health System, Detroit, Michigan
| | - Jia Li
- Department of Public Health Sciences, Henry Ford Health System, Detroit, Michigan.,Center for Bioinformatics, Henry Ford Health System, Detroit, Michigan
| |
Collapse
|
20
|
Joint Analysis of Multiple Phenotypes in Association Studies based on Cross-Validation Prediction Error. Sci Rep 2019; 9:1073. [PMID: 30705317 PMCID: PMC6355816 DOI: 10.1038/s41598-018-37538-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 11/19/2018] [Indexed: 01/28/2023] Open
Abstract
In genome-wide association studies (GWAS), joint analysis of multiple phenotypes could have increased statistical power over analyzing each phenotype individually to identify genetic variants that are associated with complex diseases. With this motivation, several statistical methods that jointly analyze multiple phenotypes have been developed, such as O’Brien’s method, Trait-based Association Test that uses Extended Simes procedure (TATES), multivariate analysis of variance (MANOVA), and joint model of multiple phenotypes (MultiPhen). However, the performance of these methods under a wide range of scenarios is not consistent: one test may be powerful in some situations, but not in the others. Thus, one challenge in joint analysis of multiple phenotypes is to construct a test that could maintain good performance across different scenarios. In this article, we develop a novel statistical method to test associations between a genetic variant and Multiple Phenotypes based on cross-validation Prediction Error (MultP-PE). Extensive simulations are conducted to evaluate the type I error rates and to compare the power performance of MultP-PE with various existing methods. The simulation studies show that MultP-PE controls type I error rates very well and has consistently higher power than the tests we compared in all simulation scenarios. We conclude with the recommendation for the use of MultP-PE for its good performance in association studies with multiple phenotypes.
Collapse
|
21
|
Liang X, Sha Q, Zhang S. Joint analysis of multiple phenotypes in association studies using allele-based clustering approach for non-normal distributions. Ann Hum Genet 2018; 82:389-395. [PMID: 29932453 PMCID: PMC6188849 DOI: 10.1111/ahg.12260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Revised: 03/15/2018] [Accepted: 05/11/2018] [Indexed: 11/29/2022]
Abstract
In the study of complex diseases, several correlated phenotypes are usually measured. There is also increasing evidence showing that testing the association between a single-nucleotide polymorphism (SNP) and multiple-dependent phenotypes jointly is often more powerful than analyzing only one phenotype at a time. Therefore, developing statistical methods to test for genetic association with multiple phenotypes has become increasingly important. In this paper, we develop an Allele-based Clustering Approach (ACA) for the joint analysis of multiple non-normal phenotypes in association studies. In ACA, we consider the alleles at a SNP of interest as a dependent variable with two classes, and the correlated phenotypes as predictors to predict the alleles at the SNP of interest. We perform extensive simulation studies to evaluate the performance of ACA and compare the power of ACA with the powers of Adaptive Fisher's Combination test, Trait-based Association Test that uses Extended Simes procedure, Fisher's Combination test, the standard MANOVA, and the joint model of Multiple Phenotypes. Our simulation studies show that the proposed method has correct type I error rates and is much more powerful than other methods for some non-normal distributions.
Collapse
Affiliation(s)
- Xiaoyu Liang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| |
Collapse
|
22
|
Yang JJ, Trucco EM, Buu A. A hybrid method of the sequential Monte Carlo and the Edgeworth expansion for computation of very small p-values in permutation tests. Stat Methods Med Res 2018; 28:2937-2951. [PMID: 30073912 DOI: 10.1177/0962280218791918] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Permutation tests are very useful when parametric assumptions are violated or distributions of test statistics are mathematically intractable. The major advantage of permutation tests is that the procedure is so general that it is applicable to most test statistics. The computational expense is, however, impractical in high-dimensional settings such as genomewide association studies. This study provides a comprehensive review of existing methods that can compute very small p-values efficiently. A common issue with existing methods is that they can only be applied to a specific test statistic. To fill in the knowledge gap, we propose a hybrid method of the sequential Monte Carlo and the Edgeworth expansion approximation for a studentized statistic, which is applicable to a variety of test statistics. The simulation results show that the proposed method performs better than competing methods. Furthermore, applications of the proposed method are demonstrated by statistical analysis on the genomewide association studies data from the Study of Addiction: Genetics and Environment (SAGE).
Collapse
Affiliation(s)
- James J Yang
- School of Nursing, University of Michigan, Ann Arbor, MI, USA
| | - Elisa M Trucco
- Department of Psychology, Florida International University, Miami, FL, USA
| | - Anne Buu
- Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
23
|
Rovadoscki GA, Pertile SFN, Alvarenga AB, Cesar ASM, Pértille F, Petrini J, Franzo V, Soares WVB, Morota G, Spangler ML, Pinto LFB, Carvalho GGP, Lanna DPD, Coutinho LL, Mourão GB. Estimates of genomic heritability and genome-wide association study for fatty acids profile in Santa Inês sheep. BMC Genomics 2018; 19:375. [PMID: 29783944 PMCID: PMC5963081 DOI: 10.1186/s12864-018-4777-8] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Accepted: 05/10/2018] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Despite the health concerns and nutritional importance of fatty acids, there is a relative paucity of studies in the literature that report genetic or genomic parameters, especially in the case of sheep populations. To investigate the genetic architecture of fatty acid composition of sheep, we conducted genome-wide association studies (GWAS) and estimated genomic heritabilities for fatty acid profile in Longissimus dorsi muscle of 216 male sheep. RESULTS Genomic heritability estimates for fatty acid content ranged from 0.25 to 0.46, indicating that substantial genetic variation exists for the evaluated traits. Therefore, it is possible to alter fatty acid profiles through selection. Twenty-seven genomic regions of 10 adjacent SNPs associated with fatty acids composition were identified on chromosomes 1, 2, 3, 5, 8, 12, 14, 15, 16, 17, and 18, each explaining ≥0.30% of the additive genetic variance. Twenty-three genes supporting the understanding of genetic mechanisms of fat composition in sheep were identified in these regions, such as DGAT2, TRHDE, TPH2, ME1, C6, C7, UBE3D, PARP14, and MRPS30. CONCLUSIONS Estimates of genomic heritabilities and elucidating important genomic regions can contribute to a better understanding of the genetic control of fatty acid deposition and improve the selection strategies to enhance meat quality and health attributes.
Collapse
Affiliation(s)
- G A Rovadoscki
- Department of Animal Science, University of São Paulo (USP) / Luiz de Queiroz College of Agriculture (ESALQ), Av. Pádua Dias, 11, ESALQ/USP, Piracicaba, São Paulo, 13418-900, Brazil
| | - S F N Pertile
- Department of Animal Science, University of São Paulo (USP) / Luiz de Queiroz College of Agriculture (ESALQ), Av. Pádua Dias, 11, ESALQ/USP, Piracicaba, São Paulo, 13418-900, Brazil
| | - A B Alvarenga
- Department of Animal Science, University of São Paulo (USP) / Luiz de Queiroz College of Agriculture (ESALQ), Av. Pádua Dias, 11, ESALQ/USP, Piracicaba, São Paulo, 13418-900, Brazil
| | - A S M Cesar
- Department of Animal Science, University of São Paulo (USP) / Luiz de Queiroz College of Agriculture (ESALQ), Av. Pádua Dias, 11, ESALQ/USP, Piracicaba, São Paulo, 13418-900, Brazil
| | - F Pértille
- Department of Animal Science, University of São Paulo (USP) / Luiz de Queiroz College of Agriculture (ESALQ), Av. Pádua Dias, 11, ESALQ/USP, Piracicaba, São Paulo, 13418-900, Brazil
| | - J Petrini
- Department of Animal Science, University of São Paulo (USP) / Luiz de Queiroz College of Agriculture (ESALQ), Av. Pádua Dias, 11, ESALQ/USP, Piracicaba, São Paulo, 13418-900, Brazil
| | - V Franzo
- Department of Animal Science, University of São Paulo (USP) / Luiz de Queiroz College of Agriculture (ESALQ), Av. Pádua Dias, 11, ESALQ/USP, Piracicaba, São Paulo, 13418-900, Brazil
| | - W V B Soares
- Institute of Zootechny (IZ), Nova Odessa, SP, Brazil
| | - G Morota
- Department of Animal Science, University of Nebraska, Lincoln, NE, USA
| | - M L Spangler
- Department of Animal Science, University of Nebraska, Lincoln, NE, USA
| | - L F B Pinto
- Department of Animal Science, Federal University of Bahia (UFBA), Salvador, BA, Brazil
| | - G G P Carvalho
- Department of Animal Science, Federal University of Bahia (UFBA), Salvador, BA, Brazil
| | - D P D Lanna
- Department of Animal Science, University of São Paulo (USP) / Luiz de Queiroz College of Agriculture (ESALQ), Av. Pádua Dias, 11, ESALQ/USP, Piracicaba, São Paulo, 13418-900, Brazil
| | - L L Coutinho
- Department of Animal Science, University of São Paulo (USP) / Luiz de Queiroz College of Agriculture (ESALQ), Av. Pádua Dias, 11, ESALQ/USP, Piracicaba, São Paulo, 13418-900, Brazil
| | - G B Mourão
- Department of Animal Science, University of São Paulo (USP) / Luiz de Queiroz College of Agriculture (ESALQ), Av. Pádua Dias, 11, ESALQ/USP, Piracicaba, São Paulo, 13418-900, Brazil.
| |
Collapse
|
24
|
Liang X, Sha Q, Rho Y, Zhang S. A hierarchical clustering method for dimension reduction in joint analysis of multiple phenotypes. Genet Epidemiol 2018; 42:344-353. [PMID: 29682782 DOI: 10.1002/gepi.22124] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 02/01/2018] [Accepted: 02/19/2018] [Indexed: 12/25/2022]
Abstract
Genome-wide association studies (GWAS) have become a very effective research tool to identify genetic variants of underlying various complex diseases. In spite of the success of GWAS in identifying thousands of reproducible associations between genetic variants and complex disease, in general, the association between genetic variants and a single phenotype is usually weak. It is increasingly recognized that joint analysis of multiple phenotypes can be potentially more powerful than the univariate analysis, and can shed new light on underlying biological mechanisms of complex diseases. In this paper, we develop a novel variable reduction method using hierarchical clustering method (HCM) for joint analysis of multiple phenotypes in association studies. The proposed method involves two steps. The first step applies a dimension reduction technique by using a representative phenotype for each cluster of phenotypes. Then, existing methods are used in the second step to test the association between genetic variants and the representative phenotypes rather than the individual phenotypes. We perform extensive simulation studies to compare the powers of multivariate analysis of variance (MANOVA), joint model of multiple phenotypes (MultiPhen), and trait-based association test that uses extended simes procedure (TATES) using HCM with those of without using HCM. Our simulation studies show that using HCM is more powerful than without using HCM in most scenarios. We also illustrate the usefulness of using HCM by analyzing a whole-genome genotyping data from a lung function study.
Collapse
Affiliation(s)
- Xiaoyu Liang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Yeonwoo Rho
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
25
|
Aliev F, Salvatore JE, Agrawal A, Almasy L, Chan G, Edenberg HJ, Hesselbrock V, Kuperman S, Meyers J, Dick DM. A Brief Critique of the TATES Procedure. Behav Genet 2018; 48:155-167. [PMID: 29468442 PMCID: PMC6028780 DOI: 10.1007/s10519-018-9890-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 02/07/2018] [Indexed: 10/18/2022]
Abstract
The Trait-based test that uses the Extended Simes procedure (TATES) was developed as a method for conducting multivariate GWAS for correlated phenotypes whose underlying genetic architecture is complex. In this paper, we provide a brief methodological critique of the TATES method using simulated examples and a mathematical proof. Our simulated examples using correlated phenotypes show that the Type I error rate is higher than expected, and that more TATES p values fall outside of the confidence interval relative to expectation. Thus the method may result in systematic inflation when used with correlated phenotypes. In a mathematical proof we further demonstrate that the distribution of TATES p values deviates from expectation in a manner indicative of inflation. Our findings indicate the need for caution when using TATES for multivariate GWAS of correlated phenotypes.
Collapse
Affiliation(s)
- Fazil Aliev
- Department of Psychology, Virginia Commonwealth University, Richmond, VA, USA.
- Department of Actuarial and Risk Management, Karabuk University, Karabuk, Turkey.
| | - Jessica E Salvatore
- Department of Psychology, Virginia Commonwealth University, Richmond, VA, USA.
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA.
| | - Arpana Agrawal
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA
| | - Laura Almasy
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, TX, USA
| | - Grace Chan
- Department of Psychiatry, University of Connecticut Health Center, Farmington, CT, USA
| | - Howard J Edenberg
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Victor Hesselbrock
- Department of Psychiatry, University of Connecticut Health Center, Farmington, CT, USA
| | - Samuel Kuperman
- Division of Child Psychiatry, University of Iowa Hospitals, Iowa City, IA, USA
| | - Jacquelyn Meyers
- Department of Psychiatry, State University of New York Downstate Medical Center, New York, NY, USA
| | - Danielle M Dick
- Department of Psychology, Virginia Commonwealth University, Richmond, VA, USA
- Department of Human & Molecular Genetics, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
26
|
Zhu H, Zhang S, Sha Q. A novel method to test associations between a weighted combination of phenotypes and genetic variants. PLoS One 2018; 13:e0190788. [PMID: 29329304 PMCID: PMC5766098 DOI: 10.1371/journal.pone.0190788] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2017] [Accepted: 12/20/2017] [Indexed: 11/18/2022] Open
Abstract
Many complex diseases like diabetes, hypertension, metabolic syndrome, et cetera, are measured by multiple correlated phenotypes. However, most genome-wide association studies (GWAS) focus on one phenotype of interest or study multiple phenotypes separately for identifying genetic variants associated with complex diseases. Analyzing one phenotype or the related phenotypes separately may lose power due to ignoring the information obtained by combining phenotypes, such as the correlation between phenotypes. In order to increase statistical power to detect genetic variants associated with complex diseases, we develop a novel method to test a weighted combination of multiple phenotypes (WCmulP). We perform extensive simulation studies as well as real data (COPDGene) analysis to evaluate the performance of the proposed method. Our simulation results show that WCmulP has correct type I error rates and is either the most powerful test or comparable to the most powerful test among the methods we compared. WCmulP also has an outstanding performance for identifying single-nucleotide polymorphisms (SNPs) associated with COPD-related phenotypes.
Collapse
Affiliation(s)
- Huanhuan Zhu
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
- * E-mail:
| |
Collapse
|
27
|
Lin N, Zhu Y, Fan R, Xiong M. A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data. PLoS Comput Biol 2017; 13:e1005788. [PMID: 29040274 PMCID: PMC5659802 DOI: 10.1371/journal.pcbi.1005788] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Revised: 10/27/2017] [Accepted: 09/21/2017] [Indexed: 01/12/2023] Open
Abstract
Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics.
Collapse
Affiliation(s)
- Nan Lin
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
| | - Yun Zhu
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, United States of America
| | - Ruzong Fan
- Biostatistics and Bioinformatics Branch (BBB), Division of Intramural Population Health Research (DIPHR), Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, MD, United States of America
| | - Momiao Xiong
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
| |
Collapse
|
28
|
Yang JJ, Williams LK, Buu A. Identifying pleiotropic genes in genome-wide association studies from related subjects using the linear mixed model and Fisher combination function. BMC Bioinformatics 2017; 18:376. [PMID: 28836938 PMCID: PMC5571642 DOI: 10.1186/s12859-017-1791-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Accepted: 08/15/2017] [Indexed: 11/11/2022] Open
Abstract
Background A multivariate genome-wide association test is proposed for analyzing data on multivariate quantitative phenotypes collected from related subjects. The proposed method is a two-step approach. The first step models the association between the genotype and marginal phenotype using a linear mixed model. The second step uses the correlation between residuals of the linear mixed model to estimate the null distribution of the Fisher combination test statistic. Results The simulation results show that the proposed method controls the type I error rate and is more powerful than the marginal tests across different population structures (admixed or non-admixed) and relatedness (related or independent). The statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that applying the multivariate association test may facilitate identification of the pleiotropic genes contributing to the risk for alcohol dependence commonly expressed by four correlated phenotypes. Conclusions This study proposes a multivariate method for identifying pleiotropic genes while adjusting for cryptic relatedness and population structure between subjects. The two-step approach is not only powerful but also computationally efficient even when the number of subjects and the number of phenotypes are both very large.
Collapse
Affiliation(s)
- James J Yang
- School of Nursing, University of Michigan, Ann Arbor, 48104, Michigan, USA.
| | - L Keoki Williams
- Department of Internal Medicine, Henry Ford Health System, Detroit, 48202, Michigan, USA.,The Center for Health Policy and Health Services Research, Henry Ford Health System, Detroit, 48202, Michigan, USA
| | - Anne Buu
- Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, 48104, Michigan, USA
| |
Collapse
|
29
|
Nine differentially expressed genes from a post mortem study and their association with suicidal status in a sample of suicide completers, attempters and controls. J Psychiatr Res 2017; 91:98-104. [PMID: 28327445 DOI: 10.1016/j.jpsychires.2017.03.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Revised: 03/01/2017] [Accepted: 03/09/2017] [Indexed: 01/28/2023]
Abstract
Several lines of evidence indicate that suicidal behaviour is partly heritable, with multiple genes implicated in its aetiology. We focused on nine genes (S100A13, EFEMP1, PCDHB5, PDGFRB, CDCA7L, SCN2B, PTPRR, MLC1 and ZFP36) which we previously detected as differentially expressed in the cortex of suicide victims compared to controls. We investigated 84 variants within these genes in 495 suicidal subjects (299 completers and 196 attempters) and 1513 controls (109 post-mortem and 1404 healthy). We evaluated associations with: 1) suicidal phenotype; 2) possible endophenotypes for suicidal behaviour. Overall positive results did not survive the correction threshold. However, we found a nominally different distribution of EFEMP1 genotypes, alleles and haplotypes between suicidal subjects and controls, results that were partially replicated when we separately considered the subgroup of suicide completers and post-mortem controls. A weaker association emerged also for PTPRR. Both EFEMP1 and PTPRR genes were also related to possible endophenotypes for suicidal behaviour such as anger, depression-anxiety and fatigue. Because of the large number of analyses performed and the low significance values further replication are mandatory. Nevertheless, neurotrophic gene variants, in particular EFEMP1 and PTPRR, may have a role in the pathogenesis of suicidal behaviour.
Collapse
|
30
|
Identifying Pleiotropic Genes in Genome-Wide Association Studies for Multivariate Phenotypes with Mixed Measurement Scales. PLoS One 2017; 12:e0169893. [PMID: 28081206 PMCID: PMC5231271 DOI: 10.1371/journal.pone.0169893] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Accepted: 12/22/2016] [Indexed: 11/30/2022] Open
Abstract
We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher’s combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches–dichotomizing all observed phenotypes or treating them as continuous variables–could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies.
Collapse
|
31
|
Abstract
For over a decade, genome-wide association studies (GWAS) have been a major tool for detecting genetic variants underlying complex traits. Recent studies have demonstrated that the same variant or gene can be associated with multiple traits, and such associations are termed cross-phenotype (CP) associations. CP association analysis can improve statistical power by searching for variants that contribute to multiple traits, which is often relevant to pleiotropy. In this chapter, we discuss existing statistical methods for analyzing association between a single marker and multivariate phenotypes, we introduce a general approach, CPASSOC, to detect the CP associations, and explain how to conduct the analysis in practice.
Collapse
Affiliation(s)
- Xiaoyin Li
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA.
| | - Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| |
Collapse
|
32
|
An Adaptive Fisher's Combination Method for Joint Analysis of Multiple Phenotypes in Association Studies. Sci Rep 2016; 6:34323. [PMID: 27694844 PMCID: PMC5046106 DOI: 10.1038/srep34323] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Accepted: 09/12/2016] [Indexed: 12/22/2022] Open
Abstract
Currently, the analyses of most genome-wide association studies (GWAS) have been performed on a single phenotype. There is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. Therefore, using only one single phenotype may lose statistical power to identify the underlying genetic mechanism. There is an increasing need to develop and apply powerful statistical tests to detect association between multiple phenotypes and a genetic variant. In this paper, we develop an Adaptive Fisher’s Combination (AFC) method for joint analysis of multiple phenotypes in association studies. The AFC method combines p-values obtained in standard univariate GWAS by using the optimal number of p-values which is determined by the data. We perform extensive simulations to evaluate the performance of the AFC method and compare the power of our method with the powers of TATES, Tippett’s method, Fisher’s combination test, MANOVA, MultiPhen, and SUMSCORE. Our simulation studies show that the proposed method has correct type I error rates and is either the most powerful test or comparable with the most powerful test. Finally, we illustrate our proposed methodology by analyzing whole-genome genotyping data from a lung function study.
Collapse
|
33
|
Buu A, Williams LK, Yang JJ. An efficient genome-wide association test for mixed binary and continuous phenotypes with applications to substance abuse research. Stat Methods Med Res 2016; 27:905-919. [PMID: 27215414 DOI: 10.1177/0962280216647422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We propose a new genome-wide association test for mixed binary and continuous phenotypes that uses an efficient numerical method to estimate the empirical distribution of the Fisher's combination statistic under the null hypothesis. Our simulation study shows that the proposed method controls the type I error rate and also maintains its power at the level of the permutation method. More importantly, the computational efficiency of the proposed method is much higher than the one of the permutation method. The simulation results also indicate that the power of the test increases when the genetic effect increases, the minor allele frequency increases, and the correlation between responses decreases. The statistical analysis on the database of the Study of Addiction: Genetics and Environment demonstrates that the proposed method combining multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests.
Collapse
Affiliation(s)
- Anne Buu
- 1 Department of Health Behavior and Biological Sciences, University of Michigan, USA
| | - L Keoki Williams
- 2 Department of Internal Medicine, Henry Ford Health System, USA.,3 The Center for Health Policy and Health Services Research, Henry Ford Health System, USA
| | - James J Yang
- 4 School of Nursing, University of Michigan, USA
| |
Collapse
|