1
|
Cao R, Olawsky E, McFowland E, Marcotte E, Spector L, Yang T. Subset scanning for multi-trait analysis using GWAS summary statistics. Bioinformatics 2024; 40:btad777. [PMID: 38191683 PMCID: PMC11087659 DOI: 10.1093/bioinformatics/btad777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/23/2023] [Accepted: 01/05/2024] [Indexed: 01/10/2024] Open
Abstract
MOTIVATION Multi-trait analysis has been shown to have greater statistical power than single-trait analysis. Most of the existing multi-trait analysis methods only work with a limited number of traits and usually prioritize high statistical power over identifying relevant traits, which heavily rely on domain knowledge. RESULTS To handle diseases and traits with obscure etiology, we developed TraitScan, a powerful and fast algorithm that identifies potential pleiotropic traits from a moderate or large number of traits (e.g. dozens to thousands) and tests the association between one genetic variant and the selected traits. TraitScan can handle either individual-level or summary-level GWAS data. We evaluated TraitScan using extensive simulations and found that it outperformed existing methods in terms of both testing power and trait selection when sparsity was low or modest. We then applied it to search for traits associated with Ewing Sarcoma, a rare bone tumor with peak onset in adolescence, among 754 traits in UK Biobank. Our analysis revealed a few promising traits worthy of further investigation, highlighting the use of TraitScan for more effective multi-trait analysis as biobanks emerge. We also extended TraitScan to search and test association with a polygenic risk score and genetically imputed gene expression. AVAILABILITY AND IMPLEMENTATION Our algorithm is implemented in an R package "TraitScan" available at https://github.com/RuiCao34/TraitScan.
Collapse
Affiliation(s)
- Rui Cao
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
| | - Evan Olawsky
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
| | - Edward McFowland
- Technology and Operations Management, Harvard Business School, Harvard University, Boston, MA 02163, United States
| | - Erin Marcotte
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN 55454, United States
| | - Logan Spector
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN 55454, United States
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN 55454, United States
| |
Collapse
|
2
|
Meng Z, Jiang Z. Cauchy combination omnibus test for normality. PLoS One 2023; 18:e0289498. [PMID: 37535617 PMCID: PMC10399863 DOI: 10.1371/journal.pone.0289498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 07/20/2023] [Indexed: 08/05/2023] Open
Abstract
Testing whether data are from a normal distribution is a traditional problem and is of great concern for data analyses. The normality is the premise of many statistical methods, such as t-test, Hotelling T2 test and ANOVA. There are numerous tests in the literature and the commonly used ones are Anderson-Darling test, Shapiro-Wilk test and Jarque-Bera test. Each test has its own advantageous points since they are developed for specific patterns and there is no method that consistently performs optimally in all situations. Since the data distribution of practical problems can be complex and diverse, we propose a Cauchy Combination Omnibus Test (CCOT) that is robust and valid in most data cases. We also give some theoretical results to analyze the good properties of CCOT. Two obvious advantages of CCOT are that not only does CCOT have a display expression for calculating statistical significance, but extensive simulation results show its robustness regardless of the shape of distribution the data comes from. Applications to South African Heart Disease and Neonatal Hearing Impairment data further illustrate its practicability.
Collapse
Affiliation(s)
- Zhen Meng
- School of Statistics, Capital University of Economics and Business, Beijing, China
| | - Zhenzhen Jiang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
3
|
Wang J, Jiang Z, Guo H, Li Z. Divided-and-combined omnibus test for genetic association analysis with high-dimensional data. Stat Methods Med Res 2023; 32:626-637. [PMID: 36652550 DOI: 10.1177/09622802231151204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Advances in biologic technology enable researchers to obtain a huge amount of genetic and genomic data, whose dimensions are often quite high on both phenotypes and variants. Testing their association with multiple phenotypes has been a hot topic in recent years. Traditional single phenotype multiple variant analysis has to be adjusted for multiple testing and thus suffers from substantial power loss due to ignorance of correlation across phenotypes. Similarity-based method, which uses the trace of product of two similarity matrices as a test statistic, has emerged as a useful tool to handle this problem. However, it loses power when the correlation strength within multiple phenotypes is middle or strong, for some signals represented by the eigenvalues of phenotypic similarity matrix are masked by others. We propose a divided-and-combined omnibus test to handle this drawback of the similarity-based method. Based on the divided-and-combined strategy, we first divide signals into two groups in a series of cut points according to eigenvalues of the phenotypic similarity matrix and combine analysis results via the Cauchy-combined method to reach a final statistic. Extensive simulations and application to a pig data demonstrate that the proposed statistic is much more powerful and robust than the original test under most of the considered scenarios, and sometimes the power increase can be more than 0.6. Divided-and-combined omnibus test facilitates genetic association analysis with high-dimensional data and achieves much higher power than the existing similarity based method. In fact, divided-and-combined omnibus test can be used whenever the association analysis between two multivariate variables needs to be conducted.
Collapse
Affiliation(s)
- Jinjuan Wang
- School of Mathematics and Statistics, 47833Beijing Institute of Technology, Beijing, China
| | - Zhenzhen Jiang
- LSC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,School of Mathematical Science, University of Chinese Academy of Sciences, Beijing, China
| | - Hongping Guo
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, China
| | - Zhengbang Li
- School of Mathematics and Statistics, 12446Central China Normal University, Wuhan, China
| |
Collapse
|
4
|
Long M, Li Z, Zhang W, Li Q. The Cauchy Combination Test under Arbitrary Dependence Structures. AM STAT 2022. [DOI: 10.1080/00031305.2022.2116109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Affiliation(s)
- Mingya Long
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences
| | | | - Wei Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences
| | - Qizhai Li
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences
| |
Collapse
|
5
|
Meng Z, Yang Q, Li Q, Zhang B. Directional-sum test for nonparametric Behrens-Fisher problem with applications to the dietary intervention trial. Stat Methods Med Res 2021; 30:1640-1653. [PMID: 34134561 DOI: 10.1177/09622802211002864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
For a nonparametric Behrens-Fisher problem, a directional-sum test is proposed based on division-combination strategy. A one-layer wild bootstrap procedure is given to calculate its statistical significance. We conduct simulation studies with data generated from lognormal, t and Laplace distributions to show that the proposed test can control the type I error rates properly and is more powerful than the existing rank-sum and maximum-type tests under most of the considered scenarios. Applications to the dietary intervention trial further show the performance of the proposed test.
Collapse
Affiliation(s)
- Zhen Meng
- School of Statistics, Capital University of Economics and Business, Beijing, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China.,LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Qinglong Yang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| | - Qizhai Li
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China.,LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Baoxue Zhang
- School of Statistics, Capital University of Economics and Business, Beijing, China
| |
Collapse
|
6
|
Li Z, Qin S, Li Q. A novel test by combining the maximum and minimum values among a large number of dependent Z-scores with application to genome wide association study. Stat Med 2021; 40:2422-2434. [PMID: 33665825 DOI: 10.1002/sim.8912] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 01/19/2021] [Accepted: 01/30/2021] [Indexed: 12/22/2022]
Abstract
In this article, we propose a novel test via combining the maximum and minimum values among a large number of dependent Z-scores for testing the hypothesis with sparse signals. The proposed test employs the information about different signs of maximum and minimum Z-scores and thus power is gained. Its asymptotic null distribution is derived under the null hypothesis and some regular conditions. Extensive simulation studies are conducted to show the advantages of the proposed test by comparing with two existing ones. A real application to the lipids genome wide association study further shows its performances.
Collapse
Affiliation(s)
- Zhengbang Li
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
| | - Sanan Qin
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
| | - Qizhai Li
- LSC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|