1
|
Cao X, Zhang S, Sha Q. A novel method for multiple phenotype association studies based on genotype and phenotype network. PLoS Genet 2024; 20:e1011245. [PMID: 38728360 PMCID: PMC11111089 DOI: 10.1371/journal.pgen.1011245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 05/22/2024] [Accepted: 03/29/2024] [Indexed: 05/12/2024] Open
Abstract
Joint analysis of multiple correlated phenotypes for genome-wide association studies (GWAS) can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits. Meanwhile, constructing a network based on associations between phenotypes and genotypes provides a new insight to analyze multiple phenotypes, which can explore whether phenotypes and genotypes might be related to each other at a higher level of cellular and organismal organization. In this paper, we first develop a bipartite signed network by linking phenotypes and genotypes into a Genotype and Phenotype Network (GPN). The GPN can be constructed by a mixture of quantitative and qualitative phenotypes and is applicable to binary phenotypes with extremely unbalanced case-control ratios in large-scale biobank datasets. We then apply a powerful community detection method to partition phenotypes into disjoint network modules based on GPN. Finally, we jointly test the association between multiple phenotypes in a network module and a single nucleotide polymorphism (SNP). Simulations and analyses of 72 complex traits in the UK Biobank show that multiple phenotype association tests based on network modules detected by GPN are much more powerful than those without considering network modules. The newly proposed GPN provides a new insight to investigate the genetic architecture among different types of phenotypes. Multiple phenotypes association studies based on GPN are improved by incorporating the genetic information into the phenotype clustering. Notably, it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
2
|
Buchardt AS, Zhou X, Ekstrøm CT. Joint regression analysis of multiple traits based on genetic relationships. BIOINFORMATICS ADVANCES 2024; 4:vbad192. [PMID: 38264461 PMCID: PMC10805347 DOI: 10.1093/bioadv/vbad192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 12/11/2023] [Accepted: 12/31/2023] [Indexed: 01/25/2024]
Abstract
Motivation Polygenic scores (PGSs) are widely available and employed in genomic data analyses for predicting and understanding genetic architectures. Existing approaches either require information on SNP level, do not infer clusters of traits sharing genetic characteristic, or do not have any immediate predictive properties. Results Here, we present geneJAM, which is a novel clustering and estimation method using PGSs for inferring a genetic relationship among multiple, simultaneously measured and potentially correlated traits in a multivariate GWAS.Using graphical lasso, we estimate a sparse covariance matrix of the PGSs and obtain clusters of traits sharing genetic characteristics. We use the clusters to specify the structure of the error covariance matrix of a generalized least squares (GLS) model and use the feasible GLS estimator for estimating a linear regression model with a certain unknown degree of correlation between the residuals.The method suits many biology studies well with traits embedded in some genetic functioning groups and facilitates development of the PGS research. We compare the method with fully parametric techniques on simulated data and illustrate the utility of the methods by examining a heterogeneous stock mouse data set from the Wellcome Trust Centre for Human Genetics. We demonstrate that the method successfully identifies clusters of traits and increases precision, power, and computational efficiency. Availability and implementation GeneJAM is implemented in R and available at: https://github.com/abuchardt/geneJAM.
Collapse
Affiliation(s)
- Ann-Sophie Buchardt
- Department of Public Health, University of Copenhagen, 1014 Copenhagen, Denmark
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, United States
| | - Claus Thorn Ekstrøm
- Department of Public Health, University of Copenhagen, 1014 Copenhagen, Denmark
| |
Collapse
|
3
|
Li X, Feng X, Liu X. Heritability estimation for a linear combination of phenotypes via ridge regression. Bioinformatics 2022; 38:4687-4696. [PMID: 36053166 DOI: 10.1093/bioinformatics/btac587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 07/23/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The joint analysis of multiple phenotypes is important in many biological studies, such as plant and animal breeding. The heritability estimation for a linear combination of phenotypes is designed to account for correlation information. Existing methods for estimating heritability mainly focus on single phenotypes under random-effect models. These methods also require some stringent conditions, which calls for a more flexible and interpretable method for estimating heritability. Fixed-effect models emerge as a useful alternative. RESULTS In this paper, we propose a novel heritability estimator based on multivariate ridge regression for linear combinations of phenotypes, yielding accurate estimates in both sparse and dense cases. Under mild conditions in the high-dimensional setting, the proposed estimator appears to be consistent and asymptotically normally distributed. Simulation studies show that the proposed estimator is promising under different scenarios. Compared with independently combined heritability estimates in the case of multiple phenotypes, the proposed method significantly improves the performance by considering correlations among those phenotypes. We further demonstrate its application in heritability estimation and correlation analysis for the Oryza sativa rice dataset. AVAILABILITY AND IMPLEMENTATION An R package implementing the proposed method is available at https://github.com/xg-SUFE1/MultiRidgeVar, where covariance estimates are also given together with heritability estimates. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoguang Li
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, 200433, China
| | - Xingdong Feng
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, 200433, China
| | - Xu Liu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, 200433, China
| |
Collapse
|
4
|
Wang M, Zhang S, Sha Q. A computationally efficient clustering linear combination approach to jointly analyze multiple phenotypes for GWAS. PLoS One 2022; 17:e0260911. [PMID: 35482827 PMCID: PMC9049312 DOI: 10.1371/journal.pone.0260911] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 04/13/2022] [Indexed: 11/18/2022] Open
Abstract
There has been an increasing interest in joint analysis of multiple phenotypes in genome-wide association studies (GWAS) because jointly analyzing multiple phenotypes may increase statistical power to detect genetic variants associated with complex diseases or traits. Recently, many statistical methods have been developed for joint analysis of multiple phenotypes in genetic association studies, including the Clustering Linear Combination (CLC) method. The CLC method works particularly well with phenotypes that have natural groupings, but due to the unknown number of clusters for a given data, the final test statistic of CLC method is the minimum p-value among all p-values of the CLC test statistics obtained from each possible number of clusters. Therefore, a simulation procedure needs to be used to evaluate the p-value of the final test statistic. This makes the CLC method computationally demanding. We develop a new method called computationally efficient CLC (ceCLC) to test the association between multiple phenotypes and a genetic variant. Instead of using the minimum p-value as the test statistic in the CLC method, ceCLC uses the Cauchy combination test to combine all p-values of the CLC test statistics obtained from each possible number of clusters. The test statistic of ceCLC approximately follows a standard Cauchy distribution, so the p-value can be obtained from the cumulative density function without the need for the simulation procedure. Through extensive simulation studies and application on the COPDGene data, the results demonstrate that the type I error rates of ceCLC are effectively controlled in different simulation settings and ceCLC either outperforms all other methods or has statistical power that is very close to the most powerful method with which it has been compared.
Collapse
Affiliation(s)
- Meida Wang
- Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America
| | - Shuanglin Zhang
- Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America
| | - Qiuying Sha
- Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America
| |
Collapse
|
5
|
Fu L, Wang Y, Li T, Yang S, Hu YQ. A Novel Hierarchical Clustering Approach for Joint Analysis of Multiple Phenotypes Uncovers Obesity Variants Based on ARIC. Front Genet 2022; 13:791920. [PMID: 35391794 PMCID: PMC8981031 DOI: 10.3389/fgene.2022.791920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 01/27/2022] [Indexed: 12/02/2022] Open
Abstract
Genome-wide association studies (GWASs) have successfully discovered numerous variants underlying various diseases. Generally, one-phenotype one-variant association study in GWASs is not efficient in identifying variants with weak effects, indicating that more signals have not been identified yet. Nowadays, jointly analyzing multiple phenotypes has been recognized as an important approach to elevate the statistical power for identifying weak genetic variants on complex diseases, shedding new light on potential biological mechanisms. Therefore, hierarchical clustering based on different methods for calculating correlation coefficients (HCDC) is developed to synchronously analyze multiple phenotypes in association studies. There are two steps involved in HCDC. First, a clustering approach based on the similarity matrix between two groups of phenotypes is applied to choose a representative phenotype in each cluster. Then, we use existing methods to estimate the genetic associations with the representative phenotypes rather than the individual phenotypes in every cluster. A variety of simulations are conducted to demonstrate the capacity of HCDC for boosting power. As a consequence, existing methods embedding HCDC are either more powerful or comparable with those of without embedding HCDC in most scenarios. Additionally, the application of obesity-related phenotypes from Atherosclerosis Risk in Communities via existing methods with HCDC uncovered several associated variants. Among these, UQCC1-rs1570004 is reported as a significant obesity signal for the first time, whose differential expression in subcutaneous fat, visceral fat, and muscle tissue is worthy of further functional studies.
Collapse
Affiliation(s)
- Liwan Fu
- Center for Non-communicable Disease Management, National Center for Children's Health, Beijing Children's Hospital, Capital Medical University, Beijing, China.,State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yuquan Wang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Tingting Li
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Siqian Yang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yue-Qing Hu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China.,Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
6
|
Julienne H, Laville V, McCaw ZR, He Z, Guillemot V, Lasry C, Ziyatdinov A, Nerin C, Vaysse A, Lechat P, Ménager H, Le Goff W, Dube MP, Kraft P, Ionita-Laza I, Vilhjálmsson BJ, Aschard H. Multitrait GWAS to connect disease variants and biological mechanisms. PLoS Genet 2021; 17:e1009713. [PMID: 34460823 PMCID: PMC8437297 DOI: 10.1371/journal.pgen.1009713] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 09/13/2021] [Accepted: 07/12/2021] [Indexed: 12/30/2022] Open
Abstract
Genome-wide association studies (GWASs) have uncovered a wealth of associations between common variants and human phenotypes. Here, we present an integrative analysis of GWAS summary statistics from 36 phenotypes to decipher multitrait genetic architecture and its link with biological mechanisms. Our framework incorporates multitrait association mapping along with an investigation of the breakdown of genetic associations into clusters of variants harboring similar multitrait association profiles. Focusing on two subsets of immunity and metabolism phenotypes, we then demonstrate how genetic variants within clusters can be mapped to biological pathways and disease mechanisms. Finally, for the metabolism set, we investigate the link between gene cluster assignment and the success of drug targets in randomized controlled trials.
Collapse
Affiliation(s)
- Hanna Julienne
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Vincent Laville
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Zachary R. McCaw
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, California, United States of America
| | - Vincent Guillemot
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Carla Lasry
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Andrey Ziyatdinov
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Cyril Nerin
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Amaury Vaysse
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Pierre Lechat
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Hervé Ménager
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Wilfried Le Goff
- Sorbonne Université, INSERM, Institute of Cardiometabolism and Nutrition (ICAN), UMR_S 1166, Paris, France
| | - Marie-Pierre Dube
- Université de Montréal Beaulieu-Saucier Pharmacogenomics Centre, Montreal Heart Institute, Montreal, Canada
- Université de Montréal, Faculty of Medicine, Department of medicine, Université de Montréal, Montreal, Canada
| | - Peter Kraft
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Iuliana Ionita-Laza
- Department of Biostatistics, Columbia University, New York, New York, United States of America
| | - Bjarni J. Vilhjálmsson
- National Centre for Register-based Research, Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Paris, France
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
7
|
Fu L, Wang Y, Li T, Hu YQ. A Novel Approach Integrating Hierarchical Clustering and Weighted Combination for Association Study of Multiple Phenotypes and a Genetic Variant. Front Genet 2021; 12:654804. [PMID: 34220938 PMCID: PMC8249926 DOI: 10.3389/fgene.2021.654804] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 04/20/2021] [Indexed: 11/26/2022] Open
Abstract
As a pivotal research tool, genome-wide association study has successfully identified numerous genetic variants underlying distinct diseases. However, these identified genetic variants only explain a small proportion of the phenotypic variation for certain diseases, suggesting that there are still more genetic signals to be detected. One of the reasons may be that one-phenotype one-variant association study is not so efficient in detecting variants of weak effects. Nowadays, it is increasingly worth noting that joint analysis of multiple phenotypes may boost the statistical power to detect pathogenic variants with weak genetic effects on complex diseases, providing more clues for their underlying biology mechanisms. So a Weighted Combination of multiple phenotypes following Hierarchical Clustering method (WCHC) is proposed for simultaneously analyzing multiple phenotypes in association studies. A series of simulations are conducted, and the results show that WCHC is either the most powerful method or comparable with the most powerful competitor in most of the simulation scenarios. Additionally, we evaluated the performance of WCHC in its application to the obesity-related phenotypes from Atherosclerosis Risk in Communities, and several associated variants are reported.
Collapse
Affiliation(s)
- Liwan Fu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China.,Center for Non-communicable Disease Management, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Yuquan Wang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Tingting Li
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yue-Qing Hu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China.,Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
8
|
Associating Multivariate Traits with Genetic Variants Using Collapsing and Kernel Methods with Pedigree- or Population-Based Studies. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:8812282. [PMID: 33628328 PMCID: PMC7889379 DOI: 10.1155/2021/8812282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 01/02/2021] [Accepted: 01/08/2021] [Indexed: 11/18/2022]
Abstract
In genetic association analysis, several relevant phenotypes or multivariate traits with different types of components are usually collected to study complex or multifactorial diseases. Over the past few years, jointly testing for association between multivariate traits and multiple genetic variants has become more popular because it can increase statistical power to identify causal genes in pedigree- or population-based studies. However, most of the existing methods mainly focus on testing genetic variants associated with multiple continuous phenotypes. In this investigation, we develop a framework for identifying the pleiotropic effects of genetic variants on multivariate traits by using collapsing and kernel methods with pedigree- or population-structured data. The proposed framework is applicable to the burden test, the kernel test, and the omnibus test for autosomes and the X chromosome. The proposed multivariate trait association methods can accommodate continuous phenotypes or binary phenotypes and further can adjust for covariates. Simulation studies show that the performance of our methods is satisfactory with respect to the empirical type I error rates and power rates in comparison with the existing methods.
Collapse
|
9
|
Julienne H, Lechat P, Guillemot V, Lasry C, Yao C, Araud R, Laville V, Vilhjalmsson B, Ménager H, Aschard H. JASS: command line and web interface for the joint analysis of GWAS results. NAR Genom Bioinform 2020; 2:lqaa003. [PMID: 32002517 PMCID: PMC6978790 DOI: 10.1093/nargab/lqaa003] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 12/03/2019] [Accepted: 01/09/2020] [Indexed: 12/11/2022] Open
Abstract
Genome-wide association study (GWAS) has been the driving force for identifying association between genetic variants and human phenotypes. Thousands of GWAS summary statistics covering a broad range of human traits and diseases are now publicly available. These GWAS have proven their utility for a range of secondary analyses, including in particular the joint analysis of multiple phenotypes to identify new associated genetic variants. However, although several methods have been proposed, there are very few large-scale applications published so far because of challenges in implementing these methods on real data. Here, we present JASS (Joint Analysis of Summary Statistics), a polyvalent Python package that addresses this need. Our package incorporates recently developed joint tests such as the omnibus approach and various weighted sum of Z-score tests while solving all practical and computational barriers for large-scale multivariate analysis of GWAS summary statistics. This includes data cleaning and harmonization tools, an efficient algorithm for fast derivation of joint statistics, an optimized data management process and a web interface for exploration purposes. Both benchmark analyses and real data applications demonstrated the robustness and strong potential of JASS for the detection of new associated genetic variants. Our package is freely available at https://gitlab.pasteur.fr/statistical-genetics/jass.
Collapse
Affiliation(s)
- Hanna Julienne
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Pierre Lechat
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Vincent Guillemot
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Carla Lasry
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Chunzi Yao
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Robinson Araud
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Vincent Laville
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Bjarni Vilhjalmsson
- National Center for Register-Based Research, Aarhus University, DK-8210 Aarhus, Denmark
| | - Hervé Ménager
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Hugues Aschard
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, 02115 Boston, MA, USA
| |
Collapse
|
10
|
Li X, Zhang S, Sha Q. Joint analysis of multiple phenotypes using a clustering linear combination method based on hierarchical clustering. Genet Epidemiol 2020; 44:67-78. [PMID: 31541490 PMCID: PMC7480017 DOI: 10.1002/gepi.22263] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 07/19/2019] [Accepted: 08/28/2019] [Indexed: 12/24/2022]
Abstract
Emerging evidence suggests that a genetic variant can affect multiple phenotypes, especially in complex human diseases. Therefore, joint analysis of multiple phenotypes may offer new insights into disease etiology. Recently, many statistical methods have been developed for joint analysis of multiple phenotypes, including the clustering linear combination (CLC) method. Due to the unknown number of clusters for a given data, a simulation procedure must be used to evaluate the p-value of the final test statistic of CLC. This makes the CLC method computationally demanding. In this paper, we use a stopping criterion to determine the number of clusters in the CLC method. We have named our method, hierarchical clustering CLC (HCLC). HCLC has an asymptotic distribution, which is very computationally efficient and makes it applicable for genome-wide association studies. Extensive simulations together with the COPDGene data analysis have been used to assess the type I error rates and power of our proposed method. Our simulation results demonstrate that the type I error rates of HCLC are effectively controlled in different realistic settings. HCLC either outperforms all other methods or has statistical power that is very close to the most powerful method with which it has been compared.
Collapse
Affiliation(s)
- Xueling Li
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
11
|
Sha Q, Wang Z, Zhang X, Zhang S. A clustering linear combination approach to jointly analyze multiple phenotypes for GWAS. Bioinformatics 2019; 35:1373-1379. [PMID: 30239574 PMCID: PMC6477981 DOI: 10.1093/bioinformatics/bty810] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Revised: 08/29/2018] [Accepted: 09/18/2018] [Indexed: 12/16/2022] Open
Abstract
SUMMARY There is an increasing interest in joint analysis of multiple phenotypes for genome-wide association studies (GWASs) based on the following reasons. First, cohorts usually collect multiple phenotypes and complex diseases are usually measured by multiple correlated intermediate phenotypes. Second, jointly analyzing multiple phenotypes may increase statistical power for detecting genetic variants associated with complex diseases. Third, there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. In this paper, we develop a clustering linear combination (CLC) method to jointly analyze multiple phenotypes for GWASs. In the CLC method, we first cluster individual statistics into positively correlated clusters and then, combine the individual statistics linearly within each cluster and combine the between-cluster terms in a quadratic form. CLC is not only robust to different signs of the means of individual statistics, but also reduce the degrees of freedom of the test statistic. We also theoretically prove that if we can cluster the individual statistics correctly, CLC is the most powerful test among all tests with certain quadratic forms. Our simulation results show that CLC is either the most powerful test or has similar power to the most powerful test among the tests we compared, and CLC is much more powerful than other tests when effect sizes align with inferred clusters. We also evaluate the performance of CLC through a real case study. AVAILABILITY AND IMPLEMENTATION R code for implementing our method is available at http://www.math.mtu.edu/∼shuzhang/software.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Xiao Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| |
Collapse
|
12
|
Joint Analysis of Multiple Phenotypes in Association Studies based on Cross-Validation Prediction Error. Sci Rep 2019; 9:1073. [PMID: 30705317 PMCID: PMC6355816 DOI: 10.1038/s41598-018-37538-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 11/19/2018] [Indexed: 01/28/2023] Open
Abstract
In genome-wide association studies (GWAS), joint analysis of multiple phenotypes could have increased statistical power over analyzing each phenotype individually to identify genetic variants that are associated with complex diseases. With this motivation, several statistical methods that jointly analyze multiple phenotypes have been developed, such as O’Brien’s method, Trait-based Association Test that uses Extended Simes procedure (TATES), multivariate analysis of variance (MANOVA), and joint model of multiple phenotypes (MultiPhen). However, the performance of these methods under a wide range of scenarios is not consistent: one test may be powerful in some situations, but not in the others. Thus, one challenge in joint analysis of multiple phenotypes is to construct a test that could maintain good performance across different scenarios. In this article, we develop a novel statistical method to test associations between a genetic variant and Multiple Phenotypes based on cross-validation Prediction Error (MultP-PE). Extensive simulations are conducted to evaluate the type I error rates and to compare the power performance of MultP-PE with various existing methods. The simulation studies show that MultP-PE controls type I error rates very well and has consistently higher power than the tests we compared in all simulation scenarios. We conclude with the recommendation for the use of MultP-PE for its good performance in association studies with multiple phenotypes.
Collapse
|
13
|
Wang Z, Sha Q, Fang S, Zhang K, Zhang S. Testing an optimally weighted combination of common and/or rare variants with multiple traits. PLoS One 2018; 13:e0201186. [PMID: 30048520 PMCID: PMC6062080 DOI: 10.1371/journal.pone.0201186] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 07/10/2018] [Indexed: 12/25/2022] Open
Abstract
Recently, joint analysis of multiple traits has become popular because it can increase statistical power to identify genetic variants associated with complex diseases. In addition, there is increasing evidence indicating that pleiotropy is a widespread phenomenon in complex diseases. Currently, most of existing methods test the association between multiple traits and a single genetic variant. However, these methods by analyzing one variant at a time may not be ideal for rare variant association studies because of the allelic heterogeneity as well as the extreme rarity of rare variants. In this article, we developed a statistical method by testing an optimally weighted combination of variants with multiple traits (TOWmuT) to test the association between multiple traits and a weighted combination of variants (rare and/or common) in a genomic region. TOWmuT is robust to the directions of effects of causal variants and is applicable to different types of traits. Using extensive simulation studies, we compared the performance of TOWmuT with the following five existing methods: gene association with multiple traits (GAMuT), multiple sequence kernel association test (MSKAT), adaptive weighting reverse regression (AWRR), single-TOW, and MANOVA. Our results showed that, in all of the simulation scenarios, TOWmuT has correct type I error rates and is consistently more powerful than the other five tests. We also illustrated the usefulness of TOWmuT by analyzing a whole-genome genotyping data from a lung function study.
Collapse
Affiliation(s)
- Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shurong Fang
- Department of Mathematics and Computer Science, John Carroll University, University Heights, Ohio, United States of America
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
14
|
Abstract
OBJECTIVES Glucocorticoids such as dexamethasone have pleiotropic effects, including desired antileukemic, anti-inflammatory, or immunosuppressive effects, and undesired metabolic or toxic effects. The most serious adverse effects of dexamethasone among patients with acute lymphoblastic leukemia are osteonecrosis and thrombosis. To identify inherited genomic variation involved in these severe adverse effects, we carried out genome-wide association studies (GWAS) by analyzing 14 pleiotropic glucocorticoid phenotypes in 391 patients with acute lymphoblastic leukemia. PATIENTS AND METHODS We used the Projection Onto the Most Interesting Statistical Evidence integrative analysis technique to identify genetic variants associated with pleiotropic dexamethasone phenotypes, stratifying for age, sex, race, and treatment, and compared the results with conventional single-phenotype GWAS. The phenotypes were osteonecrosis, central nervous system toxicity, hyperglycemia, hypokalemia, thrombosis, dexamethasone exposure, BMI, growth trajectory, and levels of cortisol, albumin, and asparaginase antibodies, and changes in cholesterol, triglycerides, and low-density lipoproteins after dexamethasone. RESULTS The integrative analysis identified more pleiotropic single nucleotide polymorphism variants (P=1.46×10(-215), and these variants were more likely to be in gene-regulatory regions (P=1.22×10(-6)) than traditional single-phenotype GWAS. The integrative analysis yielded genomic variants (rs2243057 and rs6453253) in F2RL1, a receptor that functions in hemostasis, thrombosis, and inflammation, which were associated with pleiotropic effects, including osteonecrosis and thrombosis, and were in regulatory gene regions. CONCLUSION The integrative pleiotropic analysis identified risk variants for osteonecrosis and thrombosis not identified by single-phenotype analysis that may have importance for patients with underlying sensitivity to multiple dexamethasone adverse effects.
Collapse
|
15
|
Liang X, Sha Q, Rho Y, Zhang S. A hierarchical clustering method for dimension reduction in joint analysis of multiple phenotypes. Genet Epidemiol 2018; 42:344-353. [PMID: 29682782 DOI: 10.1002/gepi.22124] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 02/01/2018] [Accepted: 02/19/2018] [Indexed: 12/25/2022]
Abstract
Genome-wide association studies (GWAS) have become a very effective research tool to identify genetic variants of underlying various complex diseases. In spite of the success of GWAS in identifying thousands of reproducible associations between genetic variants and complex disease, in general, the association between genetic variants and a single phenotype is usually weak. It is increasingly recognized that joint analysis of multiple phenotypes can be potentially more powerful than the univariate analysis, and can shed new light on underlying biological mechanisms of complex diseases. In this paper, we develop a novel variable reduction method using hierarchical clustering method (HCM) for joint analysis of multiple phenotypes in association studies. The proposed method involves two steps. The first step applies a dimension reduction technique by using a representative phenotype for each cluster of phenotypes. Then, existing methods are used in the second step to test the association between genetic variants and the representative phenotypes rather than the individual phenotypes. We perform extensive simulation studies to compare the powers of multivariate analysis of variance (MANOVA), joint model of multiple phenotypes (MultiPhen), and trait-based association test that uses extended simes procedure (TATES) using HCM with those of without using HCM. Our simulation studies show that using HCM is more powerful than without using HCM in most scenarios. We also illustrate the usefulness of using HCM by analyzing a whole-genome genotyping data from a lung function study.
Collapse
Affiliation(s)
- Xiaoyu Liang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Yeonwoo Rho
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
16
|
Zhu H, Zhang S, Sha Q. A novel method to test associations between a weighted combination of phenotypes and genetic variants. PLoS One 2018; 13:e0190788. [PMID: 29329304 PMCID: PMC5766098 DOI: 10.1371/journal.pone.0190788] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2017] [Accepted: 12/20/2017] [Indexed: 11/18/2022] Open
Abstract
Many complex diseases like diabetes, hypertension, metabolic syndrome, et cetera, are measured by multiple correlated phenotypes. However, most genome-wide association studies (GWAS) focus on one phenotype of interest or study multiple phenotypes separately for identifying genetic variants associated with complex diseases. Analyzing one phenotype or the related phenotypes separately may lose power due to ignoring the information obtained by combining phenotypes, such as the correlation between phenotypes. In order to increase statistical power to detect genetic variants associated with complex diseases, we develop a novel method to test a weighted combination of multiple phenotypes (WCmulP). We perform extensive simulation studies as well as real data (COPDGene) analysis to evaluate the performance of the proposed method. Our simulation results show that WCmulP has correct type I error rates and is either the most powerful test or comparable to the most powerful test among the methods we compared. WCmulP also has an outstanding performance for identifying single-nucleotide polymorphisms (SNPs) associated with COPD-related phenotypes.
Collapse
Affiliation(s)
- Huanhuan Zhu
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
- * E-mail:
| |
Collapse
|
17
|
Deng Y, Pan W. Testing Genetic Pleiotropy with GWAS Summary Statistics for Marginal and Conditional Analyses. Genetics 2017; 207:1285-1299. [PMID: 28971959 PMCID: PMC5714448 DOI: 10.1534/genetics.117.300347] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Accepted: 09/29/2017] [Indexed: 11/18/2022] Open
Abstract
There is growing interest in testing genetic pleiotropy, which is when a single genetic variant influences multiple traits. Several methods have been proposed; however, these methods have some limitations. First, all the proposed methods are based on the use of individual-level genotype and phenotype data; in contrast, for logistical, and other, reasons, summary statistics of univariate SNP-trait associations are typically only available based on meta- or mega-analyzed large genome-wide association study (GWAS) data. Second, existing tests are based on marginal pleiotropy, which cannot distinguish between direct and indirect associations of a single genetic variant with multiple traits due to correlations among the traits. Hence, it is useful to consider conditional analysis, in which a subset of traits is adjusted for another subset of traits. For example, in spite of substantial lowering of low-density lipoprotein cholesterol (LDL) with statin therapy, some patients still maintain high residual cardiovascular risk, and, for these patients, it might be helpful to reduce their triglyceride (TG) level. For this purpose, in order to identify new therapeutic targets, it would be useful to identify genetic variants with pleiotropic effects on LDL and TG after adjusting the latter for LDL; otherwise, a pleiotropic effect of a genetic variant detected by a marginal model could simply be due to its association with LDL only, given the well-known correlation between the two types of lipids. Here, we develop a new pleiotropy testing procedure based only on GWAS summary statistics that can be applied for both marginal analysis and conditional analysis. Although the main technical development is based on published union-intersection testing methods, care is needed in specifying conditional models to avoid invalid statistical estimation and inference. In addition to the previously used likelihood ratio test, we also propose using generalized estimating equations under the working independence model for robust inference. We provide numerical examples based on both simulated and real data, including two large lipid GWAS summary association datasets based on ∼100,000 and ∼189,000 samples, respectively, to demonstrate the difference between marginal and conditional analyses, as well as the effectiveness of our new approach.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
18
|
Deng Y, Pan W. Conditional analysis of multiple quantitative traits based on marginal GWAS summary statistics. Genet Epidemiol 2017; 41:427-436. [PMID: 28464407 PMCID: PMC5536980 DOI: 10.1002/gepi.22046] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 01/09/2017] [Accepted: 02/04/2017] [Indexed: 12/22/2022]
Abstract
There has been an increasing interest in joint association testing of multiple traits for possible pleiotropic effects. However, even in the presence of pleiotropy, most of the existing methods cannot distinguish direct and indirect effects of a genetic variant, say single-nucleotide polymorphism (SNP), on multiple traits, and a conditional analysis of a trait adjusting for other traits is perhaps the simplest and most common approach to addressing this question. However, without individual-level genotypic and phenotypic data but with only genome-wide association study (GWAS) summary statistics, as typical with most large-scale GWAS consortium studies, we are not aware of any existing method for such a conditional analysis. We propose such a conditional analysis, offering formulas of necessary calculations to fit a joint linear regression model for multiple quantitative traits. Furthermore, our method can also accommodate conditional analysis on multiple SNPs in addition to on multiple quantitative traits, which is expected to be useful for fine mapping. We provide numerical examples based on both simulated and real GWAS data to demonstrate the effectiveness of our proposed approach, and illustrate possible usefulness of conditional analysis by contrasting its result differences from those of standard marginal analyses.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
19
|
Kim J, Pan W. Adaptive testing for multiple traits in a proportional odds model with applications to detect SNP-brain network associations. Genet Epidemiol 2017; 41:259-277. [PMID: 28191669 DOI: 10.1002/gepi.22033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Revised: 10/07/2016] [Accepted: 10/31/2016] [Indexed: 12/15/2022]
Abstract
There has been increasing interest in developing more powerful and flexible statistical tests to detect genetic associations with multiple traits, as arising from neuroimaging genetic studies. Most of existing methods treat a single trait or multiple traits as response while treating an SNP as a predictor coded under an additive inheritance mode. In this paper, we follow an earlier approach in treating an SNP as an ordinal response while treating traits as predictors in a proportional odds model (POM). In this way, it is not only easier to handle mixed types of traits, e.g., some quantitative and some binary, but it is also potentially more robust to the commonly adopted additive inheritance mode. More importantly, we develop an adaptive test in a POM so that it can maintain high power across many possible situations. Compared to the existing methods treating multiple traits as responses, e.g., in a generalized estimating equation (GEE) approach, the proposed method can be applied to a high dimensional setting where the number of phenotypes (p) can be larger than the sample size (n), in addition to a usual small P setting. The promising performance of the proposed method was demonstrated with applications to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data, in which either structural MRI driven phenotypes or resting-state functional MRI (rs-fMRI) derived brain functional connectivity measures were used as phenotypes. The applications led to the identification of several top SNPs of biological interest. Furthermore, simulation studies showed competitive performance of the new method, especially for p>n.
Collapse
Affiliation(s)
- Junghi Kim
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | -
- Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http: //adni.loni.usc.edu/wp-content/uploads/how to apply/ADNI Acknowledgement List.pdf
| |
Collapse
|
20
|
Kwak IY, Pan W. Gene- and pathway-based association tests for multiple traits with GWAS summary statistics. Bioinformatics 2017; 33:64-71. [PMID: 27592708 PMCID: PMC5198520 DOI: 10.1093/bioinformatics/btw577] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Revised: 08/08/2016] [Accepted: 08/29/2016] [Indexed: 11/15/2022] Open
Abstract
To identify novel genetic variants associated with complex traits and to shed new insights on underlying biology, in addition to the most popular single SNP-single trait association analysis, it would be useful to explore multiple correlated (intermediate) traits at the gene- or pathway-level by mining existing single GWAS or meta-analyzed GWAS data. For this purpose, we present an adaptive gene-based test and a pathway-based test for association analysis of multiple traits with GWAS summary statistics. The proposed tests are adaptive at both the SNP- and trait-levels; that is, they account for possibly varying association patterns (e.g. signal sparsity levels) across SNPs and traits, thus maintaining high power across a wide range of situations. Furthermore, the proposed methods are general: they can be applied to mixed types of traits, and to Z-statistics or P-values as summary statistics obtained from either a single GWAS or a meta-analysis of multiple GWAS. Our numerical studies with simulated and real data demonstrated the promising performance of the proposed methods. AVAILABILITY AND IMPLEMENTATION The methods are implemented in R package aSPU, freely and publicly available at: https://cran.r-project.org/web/packages/aSPU/ CONTACT: weip@biostat.umn.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Il-Youp Kwak
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|