1
|
Zhu L, Zhang S, Sha Q. Meta-analysis of set-based multiple phenotype association test based on GWAS summary statistics from different cohorts. Front Genet 2024; 15:1359591. [PMID: 39301532 PMCID: PMC11410627 DOI: 10.3389/fgene.2024.1359591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 08/23/2024] [Indexed: 09/22/2024] Open
Abstract
Genome-wide association studies (GWAS) have emerged as popular tools for identifying genetic variants that are associated with complex diseases. Standard analysis of a GWAS involves assessing the association between each variant and a disease. However, this approach suffers from limited reproducibility and difficulties in detecting multi-variant and pleiotropic effects. Although joint analysis of multiple phenotypes for GWAS can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits, most of the multiple phenotype association tests are designed for a single variant, resulting in much lower power, especially when their effect sizes are small and only their cumulative effect is associated with multiple phenotypes. To overcome these limitations, set-based multiple phenotype association tests have been developed to enhance statistical power and facilitate the identification and interpretation of pleiotropic regions. In this research, we propose a new method, named Meta-TOW-S, which conducts joint association tests between multiple phenotypes and a set of variants (such as variants in a gene) utilizing GWAS summary statistics from different cohorts. Our approach applies the set-based method that Tests for the effect of an Optimal Weighted combination of variants in a gene (TOW) and accounts for sample size differences across GWAS cohorts by employing the Cauchy combination method. Meta-TOW-S combines the advantages of set-based tests and multi-phenotype association tests, exhibiting computational efficiency and enabling analysis across multiple phenotypes while accommodating overlapping samples from different GWAS cohorts. To assess the performance of Meta-TOW-S, we develop a phenotype simulator package that encompasses a comprehensive simulation scheme capable of modeling multiple phenotypes and multiple variants, including noise structures and diverse correlation patterns among phenotypes. Simulation studies validate that Meta-TOW-S maintains a desirable Type I error rate. Further simulation under different scenarios shows that Meta-TOW-S can improve power compared with other existing meta-analysis methods. When applied to four psychiatric disorders summary data, Meta-TOW-S detects a greater number of significant genes.
Collapse
Affiliation(s)
- Lirong Zhu
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| |
Collapse
|
2
|
Guo H, Li T, Shi Y, Wang X. MTML: An Efficient Multitrait Multilocus GWAS Method Based on the Cauchy Combination Test. Biom J 2024; 66:e202300130. [PMID: 39076046 DOI: 10.1002/bimj.202300130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 10/19/2023] [Accepted: 11/27/2023] [Indexed: 07/31/2024]
Abstract
Genome-wide association study (GWAS) by measuring the joint effect of multiple loci on multiple traits, has recently attracted interest, due to the decreased costs of high-throughput genotyping and phenotyping technologies. Previous studies mainly focused on either multilocus models that identify associations with a single trait or multitrait models that scan a single marker at a time. Since these types of models cannot fully utilize the association information, the powers of the tests are usually low. To potentially address this problem, we present here a multitrait multilocus (MTML) modeling framework that implements in three steps: (1) simplify the complex calculation; (2) reduce the model dimension; (3) integrate the joint contribution of single markers to multiple traits by Cauchy combination. The performances of MTML are evaluated and compared with other three published methods by Monte Carlo simulations. Simulation results show that MTML is more powerful for quantitative trait nucleotide detection and robust for various numbers of traits. In the meanwhile, MTML can effectively control type I error rate at a reasonable level. Real data analysis of Arabidopsis thaliana shows that MTML identifies more pleiotropic genetic associations. Therefore, we conclude that MTML is an efficient GWAS method for joint analysis of multiple quantitative traits. The R package MTML, which facilitates the implementation of the proposed method, is publicly available on GitHub https://github.com/Guohongping/MTML.
Collapse
Affiliation(s)
- Hongping Guo
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, China
| | - Tong Li
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, China
| | - Yao Shi
- School of Mathematics and Statistics, Qingdao University, Qingdao, China
| | - Xiao Wang
- School of Mathematics and Statistics, Qingdao University, Qingdao, China
| |
Collapse
|
3
|
Guo H, Li T, Wang Z. Pleiotropic genetic association analysis with multiple phenotypes using multivariate response best-subset selection. BMC Genomics 2023; 24:759. [PMID: 38082214 PMCID: PMC10712198 DOI: 10.1186/s12864-023-09820-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/20/2023] [Indexed: 12/18/2023] Open
Abstract
Genetic pleiotropy refers to the simultaneous association of a gene with multiple phenotypes. It is widely distributed in the whole genome and can help to understand the common genetic mechanism of diseases or traits. In this study, a multivariate response best-subset selection (MRBSS) model based pleiotropic association analysis method is proposed. Different from the traditional genetic association model, the high-dimensional genotypic data are viewed as response variables while the multiple phenotypic data as predictor variables. Moreover, the response best-subset selection procedure is converted into an 0-1 integer optimization problem by introducing a separation parameter and a tuning parameter. Furthermore, the model parameters are estimated by using the curve search under the modified Bayesian information criterion. Simulation experiments show that the proposed method MRBSS remarkably reduces the computational time, obtains higher statistical power under most of the considered scenarios, and controls the type I error rate at a low level. The application studies in the datasets of maize yield traits and pig lipid traits further verifies the effectiveness.
Collapse
Affiliation(s)
- Hongping Guo
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, 435002, People's Republic of China.
| | - Tong Li
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, 435002, People's Republic of China
| | - Zixuan Wang
- School of Mathematics and Statistics, South-Central Minzu University, Wuhan, 430074, People's Republic of China
| |
Collapse
|
4
|
Pandey D, Perumal P. O. Improved meta-analysis pipeline ameliorates distinctive gene regulators of diabetic vasculopathy in human endothelial cell (hECs) RNA-Seq data. PLoS One 2023; 18:e0293939. [PMID: 37943808 PMCID: PMC10635490 DOI: 10.1371/journal.pone.0293939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 10/21/2023] [Indexed: 11/12/2023] Open
Abstract
Enormous gene expression data generated through next-generation sequencing (NGS) technologies are accessible to the scientific community via public repositories. The data harboured in these repositories are foundational for data integrative studies enabling large-scale data analysis whose potential is yet to be fully realized. Prudent integration of individual gene expression data i.e. RNA-Seq datasets is remarkably challenging as it encompasses an assortment and series of data analysis steps that requires to be accomplished before arriving at meaningful insights on biological interrogations. These insights are at all times latent within the data and are not usually revealed from the modest individual data analysis owing to the limited number of biological samples in individual studies. Nevertheless, a sensibly designed meta-analysis of select individual studies would not only maximize the sample size of the analysis but also significantly improves the statistical power of analysis thereby revealing the latent insights. In the present study, a custom-built meta-analysis pipeline is presented for the integration of multiple datasets from different origins. As a case study, we have tested with the integration of two relevant datasets pertaining to diabetic vasculopathy retrieved from the open source domain. We report the meta-analysis ameliorated distinctive and latent gene regulators of diabetic vasculopathy and uncovered a total of 975 i.e. 930 up-regulated and 45 down-regulated gene signatures. Further investigation revealed a subset of 14 DEGs including CTLA4, CALR, G0S2, CALCR, OMA1, and DNAJC3 as latent i.e. novel as these signatures have not been reported earlier. Moreover, downstream investigations including enrichment analysis, and protein-protein interaction (PPI) network analysis of DEGs revealed durable disease association signifying their potential as novel transcriptomic biomarkers of diabetic vasculopathy. While the meta-analysis of individual whole transcriptomic datasets for diabetic vasculopathy is exclusive to our comprehension, however, the novel meta-analysis pipeline could very well be extended to study the mechanistic links of DEGs in other disease conditions.
Collapse
Affiliation(s)
- Diksha Pandey
- Department of Biotechnology, National Institute of Technology, Warangal, India
| | - Onkara Perumal P.
- Department of Biotechnology, National Institute of Technology, Warangal, India
| |
Collapse
|
5
|
Wang J, Jiang Z, Guo H, Li Z. Divided-and-combined omnibus test for genetic association analysis with high-dimensional data. Stat Methods Med Res 2023; 32:626-637. [PMID: 36652550 DOI: 10.1177/09622802231151204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Advances in biologic technology enable researchers to obtain a huge amount of genetic and genomic data, whose dimensions are often quite high on both phenotypes and variants. Testing their association with multiple phenotypes has been a hot topic in recent years. Traditional single phenotype multiple variant analysis has to be adjusted for multiple testing and thus suffers from substantial power loss due to ignorance of correlation across phenotypes. Similarity-based method, which uses the trace of product of two similarity matrices as a test statistic, has emerged as a useful tool to handle this problem. However, it loses power when the correlation strength within multiple phenotypes is middle or strong, for some signals represented by the eigenvalues of phenotypic similarity matrix are masked by others. We propose a divided-and-combined omnibus test to handle this drawback of the similarity-based method. Based on the divided-and-combined strategy, we first divide signals into two groups in a series of cut points according to eigenvalues of the phenotypic similarity matrix and combine analysis results via the Cauchy-combined method to reach a final statistic. Extensive simulations and application to a pig data demonstrate that the proposed statistic is much more powerful and robust than the original test under most of the considered scenarios, and sometimes the power increase can be more than 0.6. Divided-and-combined omnibus test facilitates genetic association analysis with high-dimensional data and achieves much higher power than the existing similarity based method. In fact, divided-and-combined omnibus test can be used whenever the association analysis between two multivariate variables needs to be conducted.
Collapse
Affiliation(s)
- Jinjuan Wang
- School of Mathematics and Statistics, 47833Beijing Institute of Technology, Beijing, China
| | - Zhenzhen Jiang
- LSC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,School of Mathematical Science, University of Chinese Academy of Sciences, Beijing, China
| | - Hongping Guo
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, China
| | - Zhengbang Li
- School of Mathematics and Statistics, 12446Central China Normal University, Wuhan, China
| |
Collapse
|
6
|
Lin YC, Liang YJ, Yang HC. Evaluating statistical significance in a meta-analysis by using numerical integration. Comput Struct Biotechnol J 2022; 20:3615-3620. [PMID: 35860413 PMCID: PMC9283883 DOI: 10.1016/j.csbj.2022.06.055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 06/25/2022] [Accepted: 06/25/2022] [Indexed: 11/24/2022] Open
|
7
|
Hou CD, Yang TS. Distribution of weighted Lancaster’s statistic for combining independent or dependent P-values, with applications to human genetic studies. COMMUN STAT-THEOR M 2022. [DOI: 10.1080/03610926.2022.2046088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- Chia-Ding Hou
- Department of Statistics and Information Science, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Ti-Sung Yang
- Department of Statistics and Information Science, Fu Jen Catholic University, New Taipei City, Taiwan
| |
Collapse
|
8
|
Zhang H, Wu Z. The generalized Fisher's combination and accurate p-value calculation under dependence. Biometrics 2022. [PMID: 35178716 DOI: 10.1111/biom.13634] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Accepted: 02/03/2022] [Indexed: 11/28/2022]
Abstract
Combining dependent tests of significance has broad applications but the related p-value calculation is challenging. For Fisher's combination test, current p-value calculation methods (e.g., Brown's approximation) tend to inflate the type I error rate when the desired significance level is substantially less than 0.05. The problem could lead to significant false discoveries in big data analyses. This paper provides two main contributions. First, it presents a general family of Fisher type statistics, referred to as the GFisher, which covers many classic statistics, such as Fisher's combination, Good's statistic, Lancaster's statistic, weighted Z-score combination, etc. The GFisher allows a flexible weighting scheme, as well as an omnibus procedure that automatically adapts proper weights and the statistic-defining parameters to a given data. Second, the paper presents several new p-value calculation methods based on two novel ideas: moment-ratio matching and joint-distribution surrogating. Systematic simulations show that the new calculation methods are more accurate under multivariate Gaussian, and more robust under the generalized linear model and the multivariate t-distribution. The applications of the GFisher and the new p-value calculation methods are demonstrated by a gene-based SNP-set association study. Relevant computation has been implemented to an R package GFisher available on the Comprehensive R Archive Network. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Hong Zhang
- Biostatistics and Research Decision Sciences, Merck Research Laboratories, Rahway, New Jersey, U.S.A
| | - Zheyang Wu
- Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, Massachusetts, U.S.A
| |
Collapse
|
9
|
Liu W, Xu Y, Wang A, Huang T, Liu Z. The eigen higher criticism and eigen Berk–Jones tests for multiple trait association studies based on GWAS summary statistics. Genet Epidemiol 2021; 46:89-104. [DOI: 10.1002/gepi.22439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 09/10/2021] [Accepted: 10/21/2021] [Indexed: 11/11/2022]
Affiliation(s)
- Wei Liu
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
- Department of Cell Biology and Genetics, School of Basic Medical Sciences Xi'an Jiaotong University Health Science Center Xi'an China
| | - Yuyang Xu
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
| | - Anqi Wang
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
| | - Tao Huang
- Department of Epidemiology and Biostatistics, School of Public Health Peking University Beijing China
- Institute for Artificial Intelligence, Center for Intelligent Public Health Peking University Beijing China
- Key Laboratory of Molecular Cardiovascular Diseases, Peking University Ministry of Education Beijing China
| | - Zhonghua Liu
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
| |
Collapse
|
10
|
Zhu J, Ma L, Ni M, Li Z. A bootstrap method to calculate the p-value of Fisher’s combination for a large number of weakly dependent p-values. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2021.1955265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Jiayan Zhu
- College of Information Engineering, Hubei University of Chinese Medicine, Wuhan, China
| | - Li Ma
- College of Information Engineering, Hubei University of Chinese Medicine, Wuhan, China
| | - Mengying Ni
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
| | - Zhengbang Li
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
| |
Collapse
|
11
|
Banf M, Zhao K, Rhee SY. METACLUSTER-an R package for context-specific expression analysis of metabolic gene clusters. Bioinformatics 2020; 35:3178-3180. [PMID: 30657869 DOI: 10.1093/bioinformatics/btz021] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 11/22/2018] [Accepted: 01/14/2019] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Plants and microbes produce numerous compounds to cope with their environments but the biosynthetic pathways for most of these compounds have yet to be elucidated. Some biosynthetic pathways are encoded by enzymes collocated in the chromosome. To facilitate a more comprehensive condition and tissue-specific expression analysis of metabolic gene clusters, we developed METACLUSTER, a probabilistic framework for characterizing metabolic gene clusters using context-specific gene expression information. AVAILABILITY AND IMPLEMENTATION METACLUSTER is freely available at https://github.com/mbanf/METACLUSTER. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michael Banf
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA.,EducatedGuess.ai, Siegen, Germany
| | - Kangmei Zhao
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA
| | - Seung Y Rhee
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA
| |
Collapse
|
12
|
Chien LC. A method for combining p-values in meta-analysis by gamma distributions. J Appl Stat 2019. [DOI: 10.1080/02664763.2018.1474857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Affiliation(s)
- Li-Chu Chien
- Center for Fundamental Science, Kaohsiung Medical University, Kaohsiung, Taiwan
| |
Collapse
|
13
|
Cai M, Li L. rPCMP: robust p-value combination by multiple partitions with applications to ATAC-seq data. BMC SYSTEMS BIOLOGY 2018; 12:141. [PMID: 30598086 PMCID: PMC6311921 DOI: 10.1186/s12918-018-0661-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND Evaluating the significance for a group of genes or proteins in a pathway or biological process for a disease could help researchers understand the mechanism of the disease. For example, identifying related pathways or gene functions for chromatin states of tumor-specific T cells will help determine whether T cells could reprogram or not, and further help design the cancer treatment strategy. Some existing p-value combination methods can be used in this scenario. However, these methods suffer from different disadvantages, and thus it is still challenging to design more powerful and robust statistical method. RESULTS The existing method of Group combined p-value (GCP) first partitions p-values to several groups using a set of several truncation points, but the method is often sensitive to these truncation points. Another method of adaptive rank truncated product method(ARTP) makes use of multiple truncation integers to adaptively combine the smallest p-values, but the method loses statistical power since it ignores the larger p-values. To tackle these problems, we propose a robust p-value combination method (rPCMP) by considering multiple partitions of p-values with different sets of truncation points. The proposed rPCMP statistic have a three-layer hierarchical structure. The inner-layer considers a statistic which combines p-values in a specified interval defined by two thresholds points, the intermediate-layer uses a GCP statistic which optimizes the statistic from the inner layer for a partition set of threshold points, and the outer-layer integrates the GCP statistic from multiple partitions of p-values. The empirical distribution of statistic under null distribution could be estimated by permutation procedure. CONCLUSIONS Our proposed rPCMP method has been shown to be more robust and have higher statistical power. Simulation study shows that our method can effectively control the type I error rates and have higher statistical power than the existing methods. We finally apply our rPCMP method to an ATAC-seq dataset for discovering the related gene functions with chromatin states in mouse tumors T cell.
Collapse
Affiliation(s)
- Menglan Cai
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West 28, Xi'an, China
| | - Limin Li
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West 28, Xi'an, China.
| |
Collapse
|
14
|
Zhang W, Yang L, Tang LL, Liu A, Mills JL, Sun Y, Li Q. GATE: an efficient procedure in study of pleiotropic genetic associations. BMC Genomics 2017; 18:552. [PMID: 28732532 PMCID: PMC5521155 DOI: 10.1186/s12864-017-3928-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Accepted: 07/06/2017] [Indexed: 11/10/2022] Open
Abstract
Background The association studies on human complex traits are admittedly propitious to identify deleterious genetic markers. Compared to single-trait analyses, multiple-trait analyses can arguably make better use of the information on both traits and markers, and thus improve statistical power of association tests prominently. Principal component analysis (PCA) is a well-known useful tool in multivariate analysis and can be applied to this task. Generally, PCA is first performed on all traits and then a certain number of top principal components (PCs) that explain most of the trait variations are selected to construct the test statistics. However, under some situations, only utilizing these top PCs would lead to a loss of important evidences from discarded PCs and thus makes the capability compromised. Methods To overcome this drawback while keeping the advantages of using the top PCs, we propose a group accumulated test evidence (GATE) procedure. By dividing the PCs which is sorted in the descending order according to the corresponding eigenvalues into a few groups, GATE integrates the information of traits at the group level. Results Simulation studies demonstrate the superiority of the proposed approach over several existing methods in terms of statistical power. Sometimes, the increase of power can reach 25%. These methods are further illustrated using the Heterogeneous Stock Mice data which is collected from a quantitative genome-wide association study. Conclusions Overall, GATE provides a powerful test for pleiotropic genetic associations. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3928-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wei Zhang
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, USA
| | - Liu Yang
- College of Geoscience and Surveying Engineering, China University of Mining and Technology, Beijing, China
| | - Larry L Tang
- Department of Statistics, George Mason University, Fairfax, VA, USA.,Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, USA
| | - Aiyi Liu
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - James L Mills
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Yuanchang Sun
- Department of Mathematics and Statistics, Florida International University, Miami, FL, USA
| | - Qizhai Li
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
15
|
Liu Z, Lin X. Multiple phenotype association tests using summary statistics in genome-wide association studies. Biometrics 2017; 74:165-175. [PMID: 28653391 DOI: 10.1111/biom.12735] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2016] [Revised: 05/01/2017] [Accepted: 05/01/2017] [Indexed: 12/13/2022]
Abstract
We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis.
Collapse
Affiliation(s)
- Zhonghua Liu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston 02115, U.S.A
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston 02115, U.S.A
| |
Collapse
|
16
|
Gordon D, Londono D, Patel P, Kim W, Finch SJ, Heiman GA. An Analytic Solution to the Computation of Power and Sample Size for Genetic Association Studies under a Pleiotropic Mode of Inheritance. Hum Hered 2017; 81:194-209. [PMID: 28315880 DOI: 10.1159/000457135] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 01/20/2017] [Indexed: 01/14/2023] Open
Abstract
Our motivation here is to calculate the power of 3 statistical tests used when there are genetic traits that operate under a pleiotropic mode of inheritance and when qualitative phenotypes are defined by use of thresholds for the multiple quantitative phenotypes. Specifically, we formulate a multivariate function that provides the probability that an individual has a vector of specific quantitative trait values conditional on having a risk locus genotype, and we apply thresholds to define qualitative phenotypes (affected, unaffected) and compute penetrances and conditional genotype frequencies based on the multivariate function. We extend the analytic power and minimum-sample-size-necessary (MSSN) formulas for 2 categorical data-based tests (genotype, linear trend test [LTT]) of genetic association to the pleiotropic model. We further compare the MSSN of the genotype test and the LTT with that of a multivariate ANOVA (Pillai). We approximate the MSSN for statistics by linear models using a factorial design and ANOVA. With ANOVA decomposition, we determine which factors most significantly change the power/MSSN for all statistics. Finally, we determine which test statistics have the smallest MSSN. In this work, MSSN calculations are for 2 traits (bivariate distributions) only (for illustrative purposes). We note that the calculations may be extended to address any number of traits. Our key findings are that the genotype test usually has lower MSSN requirements than the LTT. More inclusive thresholds (top/bottom 25% vs. top/bottom 10%) have higher sample size requirements. The Pillai test has a much larger MSSN than both the genotype test and the LTT, as a result of sample selection. With these formulas, researchers can specify how many subjects they must collect to localize genes for pleiotropic phenotypes.
Collapse
Affiliation(s)
- Derek Gordon
- Department of Genetics, The State University of New Jersey, Piscataway, NJ, USA
| | | | | | | | | | | |
Collapse
|
17
|
Shchetynsky K, Diaz-Gallo LM, Folkersen L, Hensvold AH, Catrina AI, Berg L, Klareskog L, Padyukov L. Discovery of new candidate genes for rheumatoid arthritis through integration of genetic association data with expression pathway analysis. Arthritis Res Ther 2017; 19:19. [PMID: 28148290 PMCID: PMC5288892 DOI: 10.1186/s13075-017-1220-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2016] [Accepted: 01/04/2017] [Indexed: 12/13/2022] Open
Abstract
Background Here we integrate verified signals from previous genetic association studies with gene expression and pathway analysis for discovery of new candidate genes and signaling networks, relevant for rheumatoid arthritis (RA). Method RNA-sequencing-(RNA-seq)-based expression analysis of 377 genes from previously verified RA-associated loci was performed in blood cells from 5 newly diagnosed, non-treated patients with RA, 7 patients with treated RA and 12 healthy controls. Differentially expressed genes sharing a similar expression pattern in treated and untreated RA sub-groups were selected for pathway analysis. A set of “connector” genes derived from pathway analysis was tested for differential expression in the initial discovery cohort and validated in blood cells from 73 patients with RA and in 35 healthy controls. Results There were 11 qualifying genes selected for pathway analysis and these were grouped into two evidence-based functional networks, containing 29 and 27 additional connector molecules. The expression of genes, corresponding to connector molecules was then tested in the initial RNA-seq data. Differences in the expression of ERBB2, TP53 and THOP1 were similar in both treated and non-treated patients with RA and an additional nine genes were differentially expressed in at least one group of patients compared to healthy controls. The ERBB2, TP53. THOP1 expression profile was successfully replicated in RNA-seq data from peripheral blood mononuclear cells from healthy controls and non-treated patients with RA, in an independent collection of samples. Conclusion Integration of RNA-seq data with findings from association studies, and consequent pathway analysis implicate new candidate genes, ERBB2, TP53 and THOP1 in the pathogenesis of RA. Electronic supplementary material The online version of this article (doi:10.1186/s13075-017-1220-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Klementy Shchetynsky
- Rheumatology Unit, Department of Medicine Centre of Molecular Medicine, CMM:L8:04, Karolinska Institutet/Karolinska University Hospital Solna, 171 61, Stockholm, Sweden.
| | - Lina-Marcella Diaz-Gallo
- Rheumatology Unit, Department of Medicine Centre of Molecular Medicine, CMM:L8:04, Karolinska Institutet/Karolinska University Hospital Solna, 171 61, Stockholm, Sweden
| | - Lasse Folkersen
- Rheumatology Unit, Department of Medicine Centre of Molecular Medicine, CMM:L8:04, Karolinska Institutet/Karolinska University Hospital Solna, 171 61, Stockholm, Sweden
| | - Aase Haj Hensvold
- Rheumatology Unit, Department of Medicine Centre of Molecular Medicine, CMM:L8:04, Karolinska Institutet/Karolinska University Hospital Solna, 171 61, Stockholm, Sweden
| | - Anca Irinel Catrina
- Rheumatology Unit, Department of Medicine Centre of Molecular Medicine, CMM:L8:04, Karolinska Institutet/Karolinska University Hospital Solna, 171 61, Stockholm, Sweden
| | - Louise Berg
- Rheumatology Unit, Department of Medicine Centre of Molecular Medicine, CMM:L8:04, Karolinska Institutet/Karolinska University Hospital Solna, 171 61, Stockholm, Sweden
| | - Lars Klareskog
- Rheumatology Unit, Department of Medicine Centre of Molecular Medicine, CMM:L8:04, Karolinska Institutet/Karolinska University Hospital Solna, 171 61, Stockholm, Sweden
| | - Leonid Padyukov
- Rheumatology Unit, Department of Medicine Centre of Molecular Medicine, CMM:L8:04, Karolinska Institutet/Karolinska University Hospital Solna, 171 61, Stockholm, Sweden
| |
Collapse
|
18
|
Schneider-Luftman D. p-Value combiners for graphical modelling of EEG data in the frequency domain. J Neurosci Methods 2016; 271:92-106. [PMID: 27452487 DOI: 10.1016/j.jneumeth.2016.07.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Revised: 06/24/2016] [Accepted: 07/18/2016] [Indexed: 11/19/2022]
Abstract
BACKGROUND In the graphical modelling of brain data, we are interested in estimating connectivity between various regions of interest, and evaluating statistical significance in order to derive a network model. This process involves aggregating results across frequency ranges and several patients, in order to obtain an overall result that can serve to construct a graph. NEW METHOD In this paper, we propose a method based on p-value combiners, which have never been used in applications to EEG data analysis. This new method is split into two aspects: frequency-wide tests and group-wide tests. The first step can be effectively adjusted to control for false detection rate. RESULTS This two-step protocol is applied to EEG data collected from distinct groups of mental health patients, in order to draw graphical models for each group and highlight structural connectivity differences. Using the method proposed, we show that it is possible to reliably achieve this while effectively controlling for false connections detection. COMPARISON WITH EXISTING METHOD(S) Conventionally, the Holm's Stepdown procedure is used for this type of problem, as it is robust to type I errors. However, it is known to be conservative and prone to false negatives. Furthermore, unlike the proposed methods, it does not directly output a decision rule on whether to accept or reject a statement. CONCLUSIONS The proposed methodology offers significant improvements over the stepdown procedure in terms of error rate and false negative rate across the network models, as well as in term of applicability.
Collapse
|
19
|
Hu X, Zhang W, Zhang S, Ma S, Li Q. Group-combined P-values with applications to genetic association studies. Bioinformatics 2016; 32:2737-43. [PMID: 27259542 DOI: 10.1093/bioinformatics/btw314] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2016] [Accepted: 05/13/2016] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION In large-scale genetic association studies with tens of hundreds of single nucleotide polymorphisms (SNPs) genotyped, the traditional statistical framework of logistic regression using maximum likelihood estimator (MLE) to infer the odds ratios of SNPs may not work appropriately. This is because a large number of odds ratios need to be estimated, and the MLEs may be not stable when some of the SNPs are in high linkage disequilibrium. Under this situation, the P-value combination procedures seem to provide good alternatives as they are constructed on the basis of single-marker analysis. RESULTS The commonly used P-value combination methods (such as the Fisher's combined test, the truncated product method, the truncated tail strength and the adaptive rank truncated product) may lose power when the significance level varies across SNPs. To tackle this problem, a group combined P-value method (GCP) is proposed, where the P-values are divided into multiple groups and then are combined at the group level. With this strategy, the significance values are integrated at different levels, and the power is improved. Simulation shows that the GCP can effectively control the type I error rates and have additional power over the existing methods-the power increase can be as high as over 50% under some situations. The proposed GCP method is applied to data from the Genetic Analysis Workshop 16. Among all the methods, only the GCP and ARTP can give the significance to identify a genomic region covering gene DSC3 being associated with rheumatoid arthritis, but the GCP provides smaller P-value. AVAILABILITY AND IMPLEMENTATION http://www.statsci.amss.ac.cn/yjscy/yjy/lqz/201510/t20151027_313273.html CONTACT liqz@amss.ac.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaonan Hu
- School of Mathematical Sciences, University of Chinese Academy of Sciences Key Laboratory of Big Data Mining and Knowledge Management
| | - Wei Zhang
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Sanguo Zhang
- School of Mathematical Sciences, University of Chinese Academy of Sciences Key Laboratory of Big Data Mining and Knowledge Management
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Qizhai Li
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
20
|
Zang Y, Zhang S, Li Q, Zhang Q. Jackknife empirical likelihood test for high-dimensional regression coefficients. Comput Stat Data Anal 2016. [DOI: 10.1016/j.csda.2015.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
21
|
Johnson SC, Dong X, Vijg J, Suh Y. Genetic evidence for common pathways in human age-related diseases. Aging Cell 2015; 14:809-17. [PMID: 26077337 PMCID: PMC4568968 DOI: 10.1111/acel.12362] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/11/2015] [Indexed: 12/23/2022] Open
Abstract
Aging is the single largest risk factor for chronic disease. Studies in model organisms have identified conserved pathways that modulate aging rate and the onset and progression of multiple age-related diseases, suggesting that common pathways of aging may influence age-related diseases in humans as well. To determine whether there is genetic evidence supporting the notion of common pathways underlying age-related diseases, we analyzed the genes and pathways found to be associated with five major categories of age-related disease using a total of 410 genomewide association studies (GWAS). While only a small number of genes are shared among all five disease categories, those found in at least three of the five major age-related disease categories are highly enriched for apoliprotein metabolism genes. We found that a more substantial number of gene ontology (GO) terms are shared among the 5 age-related disease categories and shared GO terms include canonical aging pathways identified in model organisms, such as nutrient-sensing signaling, translation, proteostasis, stress responses, and genome maintenance. Taking advantage of the vast amount of genetic data from the GWAS, our findings provide the first direct evidence that conserved pathways of aging simultaneously influence multiple age-related diseases in humans as has been demonstrated in model organisms.
Collapse
Affiliation(s)
- Simon C. Johnson
- Department of Genetics Albert Einstein College of Medicine Bronx NY USA
| | - Xiao Dong
- Department of Genetics Albert Einstein College of Medicine Bronx NY USA
| | - Jan Vijg
- Department of Genetics Albert Einstein College of Medicine Bronx NY USA
- Department of Ophthalmology and Visual Sciences Albert Einstein College of Medicine Bronx NY USA
| | - Yousin Suh
- Department of Genetics Albert Einstein College of Medicine Bronx NY USA
- Department of Medicine Endocrinology Albert Einstein College of Medicine Bronx NY USA
| |
Collapse
|