101
|
Zhang H, Mehrotra DV, Shen J. AWOT and CWOT for genotype and genotype-by-treatment interaction joint analysis in pharmacogenetics GWAS. Bioinformatics 2023; 39:6994182. [PMID: 36661328 PMCID: PMC9885423 DOI: 10.1093/bioinformatics/btac834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 12/05/2022] [Indexed: 01/21/2023] Open
Abstract
MOTIVATION Pharmacogenomics (PGx) research holds the promise for detecting association between genetic variants and drug responses in randomized clinical trials, but it is limited by small populations and thus has low power to detect signals. It is critical to increase the power of PGx genome-wide association studies (GWAS) with small sample sizes so that variant-drug-response association discoveries are not limited to common variants with extremely large effect. RESULTS In this article, we first discuss the challenges of PGx GWAS studies and then propose the adaptively weighted joint test (AWOT) and Cauchy Weighted jOint Test (CWOT), which are two flexible and robust joint tests of the single nucleotide polymorphism main effect and genotype-by-treatment interaction effect for continuous and binary endpoints. Two analytic procedures are proposed to accurately calculate the joint test P-value. We evaluate AWOT and CWOT through extensive simulations under various scenarios. The results show that the proposed AWOT and CWOT control type I error well and outperform existing methods in detecting the most interesting signal patterns in PGx settings (i.e. with strong genotype-by-treatment interaction effects, but weak genotype main effects). We demonstrate the value of AWOT and CWOT by applying them to the PGx GWAS from the Bezlotoxumab Clostridium difficile MODIFY I/II Phase 3 trials. AVAILABILITY AND IMPLEMENTATION The R package COWT is publicly available on CRAN https://cran.r-project.org/web/packages/cwot/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc, North Wales, PA 19454, USA
| | | |
Collapse
|
102
|
Song H, Ling W, Zhao N, Plantinga AM, Broedlow CA, Klatt NR, Hensley-McBain T, Wu MC. Accommodating multiple potential normalizations in microbiome associations studies. BMC Bioinformatics 2023; 24:22. [PMID: 36658484 PMCID: PMC9850542 DOI: 10.1186/s12859-023-05147-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/12/2023] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Microbial communities are known to be closely related to many diseases, such as obesity and HIV, and it is of interest to identify differentially abundant microbial species between two or more environments. Since the abundances or counts of microbial species usually have different scales and suffer from zero-inflation or over-dispersion, normalization is a critical step before conducting differential abundance analysis. Several normalization approaches have been proposed, but it is difficult to optimize the characterization of the true relationship between taxa and interesting outcomes. RESULTS: To avoid the challenge of picking an optimal normalization and accommodate the advantages of several normalization strategies, we propose an omnibus approach. Our approach is based on a Cauchy combination test, which is flexible and powerful by aggregating individual p values. We also consider a truncated test statistic to prevent substantial power loss. We experiment with a basic linear regression model as well as recently proposed powerful association tests for microbiome data and compare the performance of the omnibus approach with individual normalization approaches. Experimental results show that, regardless of simulation settings, the new approach exhibits power that is close to the best normalization strategy, while controling the type I error well. CONCLUSIONS: The proposed omnibus test releases researchers from choosing among various normalization methods and it is an aggregated method that provides the powerful result to the underlying optimal normalization, which requires tedious trial and error. While the power may not exceed the best normalization, it is always much better than using a poor choice of normalization.
Collapse
Affiliation(s)
- Hoseung Song
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA USA
| | - Wodan Ling
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA USA
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA
| | - Anna M. Plantinga
- Department of Mathematics and Statistics, Williams College, Williamstown, MA USA
| | - Courtney A. Broedlow
- Division of Surgical Outcomes and Precision Medicine Research, Department of Surgery, University of Minnesota School of Medicine, Minneapolis, MN USA
| | - Nichole R. Klatt
- Division of Surgical Outcomes and Precision Medicine Research, Department of Surgery, University of Minnesota School of Medicine, Minneapolis, MN USA
| | | | - Michael C. Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA USA
| |
Collapse
|
103
|
Li A, Liu S, Bakshi A, Jiang L, Chen W, Zheng Z, Sullivan PF, Visscher PM, Wray NR, Yang J, Zeng J. mBAT-combo: A more powerful test to detect gene-trait associations from GWAS data. Am J Hum Genet 2023; 110:30-43. [PMID: 36608683 PMCID: PMC9892780 DOI: 10.1016/j.ajhg.2022.12.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 12/08/2022] [Indexed: 01/07/2023] Open
Abstract
Gene-based association tests aggregate multiple SNP-trait associations into sets defined by gene boundaries and are widely used in post-GWAS analysis. A common approach for gene-based tests is to combine SNPs associations by computing the sum of χ2 statistics. However, this strategy ignores the directions of SNP effects, which could result in a loss of power for SNPs with masking effects, e.g., when the product of two SNP effects and the linkage disequilibrium (LD) correlation is negative. Here, we introduce "mBAT-combo," a set-based test that is better powered than other methods to detect multi-SNP associations in the context of masking effects. We validate the method through simulations and applications to real data. We find that of 35 blood and urine biomarker traits in the UK Biobank, 34 traits show evidence for masking effects in a total of 4,273 gene-trait pairs, indicating that masking effects is common in complex traits. We further validate the improved power of our method in height, body mass index, and schizophrenia with different GWAS sample sizes and show that on average 95.7% of the genes detected only by mBAT-combo with smaller sample sizes can be identified by the single-SNP approach with a 1.7-fold increase in sample sizes. Eleven genes significant only in mBAT-combo for schizophrenia are confirmed by functionally informed fine-mapping or Mendelian randomization integrating gene expression data. The framework of mBAT-combo can be applied to any set of SNPs to refine trait-association signals hidden in genomic regions with complex LD structures.
Collapse
Affiliation(s)
- Ang Li
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Shouye Liu
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Andrew Bakshi
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| | | | - Wenhan Chen
- Epigenetics Research Laboratory, Genomics and Epigenetics Theme, Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Zhili Zheng
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Patrick F Sullivan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden; Departments of Genetics and Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Peter M Visscher
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Naomi R Wray
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia; Queensland Brain Institute, University of Queensland, Brisbane, QLD, Australia
| | - Jian Yang
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China; Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| | - Jian Zeng
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
104
|
Abbas-Aghababazadeh F, Xu W, Haibe-Kains B. The impact of violating the independence assumption in meta-analysis on biomarker discovery. Front Genet 2023; 13:1027345. [PMID: 36726714 PMCID: PMC9885264 DOI: 10.3389/fgene.2022.1027345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 11/25/2022] [Indexed: 01/06/2023] Open
Abstract
With rapid advancements in high-throughput sequencing technologies, massive amounts of "-omics" data are now available in almost every biomedical field. Due to variance in biological models and analytic methods, findings from clinical and biological studies are often not generalizable when tested in independent cohorts. Meta-analysis, a set of statistical tools to integrate independent studies addressing similar research questions, has been proposed to improve the accuracy and robustness of new biological insights. However, it is common practice among biomarker discovery studies using preclinical pharmacogenomic data to borrow molecular profiles of cancer cell lines from one study to another, creating dependence across studies. The impact of violating the independence assumption in meta-analyses is largely unknown. In this study, we review and compare different meta-analyses to estimate variations across studies along with biomarker discoveries using preclinical pharmacogenomics data. We further evaluate the performance of conventional meta-analysis where the dependence of the effects was ignored via simulation studies. Results show that, as the number of non-independent effects increased, relative mean squared error and lower coverage probability increased. Additionally, we also assess potential bias in the estimation of effects for established meta-analysis approaches when data are duplicated and the assumption of independence is violated. Using pharmacogenomics biomarker discovery, we find that treating dependent studies as independent can substantially increase the bias of meta-analyses. Importantly, we show that violating the independence assumption decreases the generalizability of the biomarker discovery process and increases false positive results, a key challenge in precision oncology.
Collapse
Affiliation(s)
| | - Wei Xu
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada,Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada,Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada,Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada,Ontario Institute for Cancer Research, Toronto, ON, Canada,Department of Computer Science, University of Toronto, Toronto, ON, Canada,*Correspondence: Benjamin Haibe-Kains,
| |
Collapse
|
105
|
Li X, Sung AD, Xie J. DART: Distance Assisted Recursive Testing. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2023; 24:169. [PMID: 39669222 PMCID: PMC11636646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2024]
Abstract
Multiple testing is a commonly used tool in modern data science. Sometimes, the hypotheses are embedded in a space; the distances between the hypotheses reflect their co-null/co-alternative patterns. Properly incorporating the distance information in testing will boost testing power. Hence, we developed a new multiple testing framework named Distance Assisted Recursive Testing (DART). DART features in joint artificial intelligence (AI) and statistics modeling. It has two stages. The first stage uses AI models to construct an aggregation tree that reflects the distance information. The second stage uses statistical models to embed the testing on the tree and control the false discovery rate. Theoretical analysis and numerical experiments demonstrated that DART generates valid, robust, and powerful results. We applied DART to a clinical trial in the allogeneic stem cell transplantation study to identify the gut microbiota whose abundance was impacted by post-transplant care.
Collapse
Affiliation(s)
- Xuechan Li
- Department of Biostatistics, Duke University, Durham, NC 27705, USA
| | - Anthony D Sung
- Department of Medicine, Duke University, Durham, NC 27705, USA
| | - Jichun Xie
- Department of Biostatistics, Duke University, Durham, NC 27705, USA
| |
Collapse
|
106
|
Li N, Chen L, Zhou Y, Wei Q. A fast and efficient approach for gene-based association studies of ordinal phenotypes. Stat Appl Genet Mol Biol 2023; 22:sagmb-2021-0068. [PMID: 36724206 DOI: 10.1515/sagmb-2021-0068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 01/16/2023] [Indexed: 02/02/2023]
Abstract
Many human disease conditions need to be measured by ordinal phenotypes, so analysis of ordinal phenotypes is valuable in genome-wide association studies (GWAS). However, existing association methods for dichotomous or quantitative phenotypes are not appropriate to ordinal phenotypes. Therefore, based on an aggregated Cauchy association test, we propose a fast and efficient association method to test the association between genetic variants and an ordinal phenotype. To enrich association signals of rare variants, we first use the burden method to aggregate rare variants. Then we respectively test the significance of the aggregated rare variants and other common variants. Finally, the combination of transformed variant-level P values is taken as test statistic, that approximately follows Cauchy distribution under the null hypothesis. Extensive simulation studies and analysis of GAW19 show that our proposed method is powerful and computationally fast as a gene-based method. Especially, in the presence of an extremely low proportion of causal variants in a gene, our method has better performance.
Collapse
Affiliation(s)
- Nanxing Li
- School of Mathematical Sciences, Heilongjiang University, Harbin 150080, P. R. China
| | - Lili Chen
- School of Mathematical Sciences, Heilongjiang University, Harbin 150080, P. R. China
| | - Yajing Zhou
- School of Mathematical Sciences, Heilongjiang University, Harbin 150080, P. R. China
| | - Qianran Wei
- School of Mathematical Sciences, Heilongjiang University, Harbin 150080, P. R. China
| |
Collapse
|
107
|
Kim Y, Chi YY, Shen J, Zou F. Robust genetic model-based SNP-set association test using CauchyGM. BIOINFORMATICS (OXFORD, ENGLAND) 2023; 39:6831090. [PMID: 36383169 DOI: 10.1093/bioinformatics/btac728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 10/26/2022] [Accepted: 11/15/2022] [Indexed: 11/17/2022]
Abstract
MOTIVATION Association testing on genome-wide association studies (GWAS) data is commonly performed under a single (mostly additive) genetic model framework. However, the underlying true genetic mechanisms are often unknown in practice for most complex traits. When the employed inheritance model deviates from the underlying model, statistical power may be reduced. To overcome this challenge, an integrative association test that directly infers the underlying genetic model from GWAS data has previously been proposed for single-SNP analysis. RESULTS In this article, we propose a Cauchy combination Genetic Model-based association test (CauchyGM) under a generalized linear model framework for SNP-set level analysis. CauchyGM does not require prior knowledge on the underlying inheritance pattern of each SNP. It performs a score test that first estimates an individual P-value of each SNP in an SNP-set with both minor allele frequency (MAF) > 1% and three genotypes and further aggregates the rest SNPs using SKAT. CauchyGM then combines the correlated P-values across multiple SNPs and different genetic models within the set using Cauchy Combination Test. To further accommodate both sparse and dense signal patterns, we also propose an omnibus association test (CauchyGM-O) by combining CauchyGM with SKAT and the burden test. Our extensive simulations show that both CauchyGM and CauchyGM-O maintain the type I error well at the genome-wide significance level and provide substantial power improvement compared to existing methods. We apply our methods to a pharmacogenomic GWAS data from a large cardiovascular randomized clinical trial. Both CauchyGM and CauchyGM-O identify several novel genome-wide significant genes. AVAILABILITY AND IMPLEMENTATION The R package CauchyGM is publicly available on github: https://github.com/ykim03517/CauchyGM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yeonil Kim
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Yueh-Yun Chi
- Department of Pediatrics, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
108
|
Song H, Liu H, Wu MC. A fast kernel independence test for cluster-correlated data. Sci Rep 2022; 12:21659. [PMID: 36522522 PMCID: PMC9755291 DOI: 10.1038/s41598-022-26278-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Cluster-correlated data receives a lot of attention in biomedical and longitudinal studies and it is of interest to assess the generalized dependence between two multivariate variables under the cluster-correlated structure. The Hilbert-Schmidt independence criterion (HSIC) is a powerful kernel-based test statistic that captures various dependence between two random vectors and can be applied to an arbitrary non-Euclidean domain. However, the existing HSIC is not directly applicable to cluster-correlated data. Therefore, we propose a HSIC-based test of independence for cluster-correlated data. The new test statistic combines kernel information so that the dependence structure in each cluster is fully considered and exhibits good performance under high dimensions. Moreover, a rapid p value approximation makes the new test fast applicable to large datasets. Numerical studies show that the new approach performs well in both synthetic and real world data.
Collapse
Affiliation(s)
- Hoseung Song
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
| | - Hongjiao Liu
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA.
| |
Collapse
|
109
|
Choi W, Kim I. Averaging p-values under exchangeability. Stat Probab Lett 2022. [DOI: 10.1016/j.spl.2022.109748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
110
|
Saunders GRB, Wang X, Chen F, Jang SK, Liu M, Wang C, Gao S, Jiang Y, Khunsriraksakul C, Otto JM, Addison C, Akiyama M, Albert CM, Aliev F, Alonso A, Arnett DK, Ashley-Koch AE, Ashrani AA, Barnes KC, Barr RG, Bartz TM, Becker DM, Bielak LF, Benjamin EJ, Bis JC, Bjornsdottir G, Blangero J, Bleecker ER, Boardman JD, Boerwinkle E, Boomsma DI, Boorgula MP, Bowden DW, Brody JA, Cade BE, Chasman DI, Chavan S, Chen YDI, Chen Z, Cheng I, Cho MH, Choquet H, Cole JW, Cornelis MC, Cucca F, Curran JE, de Andrade M, Dick DM, Docherty AR, Duggirala R, Eaton CB, Ehringer MA, Esko T, Faul JD, Fernandes Silva L, Fiorillo E, Fornage M, Freedman BI, Gabrielsen ME, Garrett ME, Gharib SA, Gieger C, Gillespie N, Glahn DC, Gordon SD, Gu CC, Gu D, Gudbjartsson DF, Guo X, Haessler J, Hall ME, Haller T, Harris KM, He J, Herd P, Hewitt JK, Hickie I, Hidalgo B, Hokanson JE, Hopfer C, Hottenga J, Hou L, Huang H, Hung YJ, Hunter DJ, Hveem K, Hwang SJ, Hwu CM, Iacono W, Irvin MR, Jee YH, Johnson EO, Joo YY, Jorgenson E, Justice AE, Kamatani Y, Kaplan RC, Kaprio J, Kardia SLR, Keller MC, Kelly TN, Kooperberg C, Korhonen T, Kraft P, Krauter K, Kuusisto J, Laakso M, Lasky-Su J, Lee WJ, Lee JJ, Levy D, Li L, Li K, Li Y, Lin K, Lind PA, Liu C, Lloyd-Jones DM, Lutz SM, Ma J, Mägi R, Manichaikul A, Martin NG, Mathur R, Matoba N, McArdle PF, McGue M, McQueen MB, Medland SE, Metspalu A, Meyers DA, Millwood IY, Mitchell BD, Mohlke KL, Moll M, Montasser ME, Morrison AC, Mulas A, Nielsen JB, North KE, Oelsner EC, Okada Y, Orrù V, Palmer ND, Palviainen T, Pandit A, Park SL, Peters U, Peters A, Peyser PA, Polderman TJC, Rafaels N, Redline S, Reed RM, Reiner AP, Rice JP, Rich SS, Richmond NE, Roan C, Rotter JI, Rueschman MN, Runarsdottir V, Saccone NL, Schwartz DA, Shadyab AH, Shi J, Shringarpure SS, Sicinski K, Skogholt AH, Smith JA, Smith NL, Sotoodehnia N, Stallings MC, Stefansson H, Stefansson K, Stitzel JA, Sun X, Syed M, Tal-Singer R, Taylor AE, Taylor KD, Telen MJ, Thai KK, Tiwari H, Turman C, Tyrfingsson T, Wall TL, Walters RG, Weir DR, Weiss ST, White WB, Whitfield JB, Wiggins KL, Willemsen G, Willer CJ, Winsvold BS, Xu H, Yanek LR, Yin J, Young KL, Young KA, Yu B, Zhao W, Zhou W, Zöllner S, Zuccolo L, Batini C, Bergen AW, Bierut LJ, David SP, Gagliano Taliun SA, Hancock DB, Jiang B, Munafò MR, Thorgeirsson TE, Liu DJ, Vrieze S. Genetic diversity fuels gene discovery for tobacco and alcohol use. Nature 2022; 612:720-724. [PMID: 36477530 PMCID: PMC9771818 DOI: 10.1038/s41586-022-05477-4] [Citation(s) in RCA: 178] [Impact Index Per Article: 59.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 10/25/2022] [Indexed: 12/12/2022]
Abstract
Tobacco and alcohol use are heritable behaviours associated with 15% and 5.3% of worldwide deaths, respectively, due largely to broad increased risk for disease and injury1-4. These substances are used across the globe, yet genome-wide association studies have focused largely on individuals of European ancestries5. Here we leveraged global genetic diversity across 3.4 million individuals from four major clines of global ancestry (approximately 21% non-European) to power the discovery and fine-mapping of genomic loci associated with tobacco and alcohol use, to inform function of these loci via ancestry-aware transcriptome-wide association studies, and to evaluate the genetic architecture and predictive power of polygenic risk within and across populations. We found that increases in sample size and genetic diversity improved locus identification and fine-mapping resolution, and that a large majority of the 3,823 associated variants (from 2,143 loci) showed consistent effect sizes across ancestry dimensions. However, polygenic risk scores developed in one ancestry performed poorly in others, highlighting the continued need to increase sample sizes of diverse ancestries to realize any potential benefit of polygenic prediction.
Collapse
Affiliation(s)
| | - Xingyan Wang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Fang Chen
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Seon-Kyeong Jang
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Mengzhen Liu
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Chen Wang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Shuang Gao
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Yu Jiang
- Department of Epidemiology & Population Health at Stanford University, Stanford, CA, USA
| | | | - Jacqueline M Otto
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Clifton Addison
- Jackson Heart Study (JHS) Graduate Training and Education Center (GTEC), Department of Epidemiology and Biostatistics, School of Public Health, Jackson State University, Jackson, MS, USA
| | - Masato Akiyama
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Department of Ocular Pathology and Imaging Science, Kyushu University Graduate School of Medical Sciences, Fukuoka, Japan
| | - Christine M Albert
- Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Division of Preventive Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Fazil Aliev
- Department of Psychiatry, Rutgers Robert Wood Johnson Medical School, New Brunswick, NJ, USA
| | - Alvaro Alonso
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Donna K Arnett
- Dean's Office and Department of Epidemiology, College of Public Health, University of Kentucky, Lexington, KY, USA
| | - Allison E Ashley-Koch
- Department of Medicine and Duke Comprehensive Sickle Cell Center, Duke University School of Medicine, Durham, NC, USA
- Duke Molecular Physiology Institute, Duke University School of Medicine, Durham, NC, USA
| | - Aneel A Ashrani
- Division of Hematology, Department of Medicine, Mayo Clinic College of Medicine and Science, Rochester, MN, USA
| | - Kathleen C Barnes
- Division of Biomedical Informatics & Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Tempus, Chicago, IL, USA
| | - R Graham Barr
- Department of Medicine, Columbia University Medical Center, New York, NY, USA
| | - Traci M Bartz
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Diane M Becker
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Lawrence F Bielak
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Emelia J Benjamin
- Department of Medicine, Boston Medical Center, Boston University School of Medicine, Boston, MA, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | | | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | | | - Jason D Boardman
- Institute of Behavioral Science, University of Colorado Boulder, Boulder, CO, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Dorret I Boomsma
- Netherlands Twin Register, Dept Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Meher Preethi Boorgula
- Division of Biomedical Informatics & Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E Cade
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
| | - Daniel I Chasman
- Division of Preventive Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Sameer Chavan
- Division of Biomedical Informatics & Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Yii-Der Ida Chen
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Zhengming Chen
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Iona Cheng
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA, USA
- UCSF Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA
| | - Michael H Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Hélène Choquet
- Kaiser Permanente Northern California (KPNC), Division of Research, Oakland, CA, USA
| | - John W Cole
- Department of Neurology, Baltimore Veterans Affairs Medical Center, Baltimore, MD, USA
- Division of Vascular Neurology, Department of Neurology, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Marilyn C Cornelis
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | | | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Mariza de Andrade
- Division of Clinical Trials and Biostatistics, Department of Quantitative Health Sciences, Mayo Clinic College of Medicine and Science, Rochester, MN, USA
| | - Danielle M Dick
- Department of Psychiatry, Rutgers Robert Wood Johnson Medical School, New Brunswick, NJ, USA
| | - Anna R Docherty
- Department of Psychiatry, University of Utah School of Medicine, Salt Lake City, UT, USA
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Virginia, USA
- Huntsman Mental Health Institute, Salt Lake City, UT, USA
| | - Ravindranath Duggirala
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Charles B Eaton
- Department of Family Medicine, Brown University, Providence, RI, USA
| | - Marissa A Ehringer
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
- Department of Integrative Physiology, University of Colorado Boulder, Boulder, CO, USA
| | - Tõnu Esko
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Jessica D Faul
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Lilian Fernandes Silva
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland
| | - Edoardo Fiorillo
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche (CNR), Monserrato, Italy
| | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Barry I Freedman
- Department of Internal Medicine-Section on Nephrology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Maiken E Gabrielsen
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
| | - Melanie E Garrett
- Department of Medicine and Duke Comprehensive Sickle Cell Center, Duke University School of Medicine, Durham, NC, USA
- Duke Molecular Physiology Institute, Duke University School of Medicine, Durham, NC, USA
| | - Sina A Gharib
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of Washington, Seattle, WA, USA
- Center for Lung Biology, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Christian Gieger
- Research Unit Molecular Epidemiology, Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Nathan Gillespie
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Virginia, USA
| | - David C Glahn
- Department of Psychiatry & Behavioral Sciences, Boston Children's Hospital & Harvard Medical School, Boston, MA, USA
| | - Scott D Gordon
- Genetic Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Charles C Gu
- Division of Biostatistics, Washington University School of Medicine, St. Louis, MO, USA
| | - Dongfeng Gu
- Department of Epidemiology and Key Laboratory of Cardiovascular Epidemiology, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Daniel F Gudbjartsson
- deCODE Genetics/Amgen, Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Jeffrey Haessler
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Michael E Hall
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Toomas Haller
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Kathleen Mullan Harris
- Department of Sociology and the Carolina Population Center, University of North Carolina, Chapel Hill, NC, USA
| | - Jiang He
- Department of Epidemiology, Tulane University, New Orleans, LA, USA
- Translational Sciences Institute, Tulane University, New Orleans, LA, USA
| | - Pamela Herd
- McCourt School of Public Policy, Georgetown University, Washington, DC, USA
| | - John K Hewitt
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
- Department Of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, USA
| | - Ian Hickie
- Youth Mental Health & Technology Team, Brain and Mind Centre, University of Sydney, Sydney, Australia
| | - Bertha Hidalgo
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - John E Hokanson
- Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Christian Hopfer
- Department of Psychiatry, University of Colorado Anschutz Medical Center, Denver, CO, USA
| | - JoukeJan Hottenga
- Netherlands Twin Register, Dept Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Hongyan Huang
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Yi-Jen Hung
- Institute of Preventive Medicine, National Defense Medical Center, New Taipei City, Taiwan
| | - David J Hunter
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Kristian Hveem
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
- HUNT Research Center, Department of Public Health and Nursing, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
- Department of Research, Innovation and Education, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway
| | - Shih-Jen Hwang
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Chii-Min Hwu
- Section of Endocrinology and Metabolism, Department of Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
| | - William Iacono
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Marguerite R Irvin
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Yon Ho Jee
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Eric O Johnson
- GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, USA
- Fellow Program, RTI International, Research Triangle Park, NC, USA
| | - Yoonjung Y Joo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Institute of Data Science, Korea University, Seoul, South Korea
| | | | - Anne E Justice
- Department of Population Health Sciences, Geisinger, Danville, PA, USA
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yoichiro Kamatani
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Robert C Kaplan
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland - FIMM, University of Helsinki, Helsinki, Finland
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Matthew C Keller
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
- Department Of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, USA
| | - Tanika N Kelly
- Department of Epidemiology, Tulane University, New Orleans, LA, USA
- Translational Sciences Institute, Tulane University, New Orleans, LA, USA
| | - Charles Kooperberg
- Department of Biostatistics, University of Washington, Seattle, WA, USA
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Tellervo Korhonen
- Institute for Molecular Medicine Finland - FIMM, University of Helsinki, Helsinki, Finland
| | - Peter Kraft
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Kenneth Krauter
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO, USA
| | - Johanna Kuusisto
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland
- Center for Medicine and Clinical Research, Kuopio University Hospital, Kuopio, Finland
| | - Markku Laakso
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland
| | - Jessica Lasky-Su
- Brigham and Women's Hospital, Department of Medicine, Channing Division of Network Medicine, Boston, MA, USA
| | - Wen-Jane Lee
- Department of Medical Research, Taichung Veterans General Hospital, Taichung City, Taiwan
| | - James J Lee
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Daniel Levy
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Liming Li
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing, China
| | - Kevin Li
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Yuqing Li
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA, USA
| | - Kuang Lin
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Penelope A Lind
- Psychiatric Genetics, QIMR Berghofer Medical Research Institute, Brisbane, Australia
- School of Biomedical Sciences, Faculty of Medicine, University of Queensland, Brisbane, Australia
- School of Biomedical Sciences, Queensland University of Technology, Brisbane, Australia
| | - Chunyu Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Donald M Lloyd-Jones
- Departments of Preventive Medicine, Medicine, and Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Sharon M Lutz
- Department of Population Medicine, Harvard Pilgrim Health Care Institute, Boston, MA, USA
- Department of Biostatics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jiantao Ma
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
- Division of Nutrition Epidemiology and Data Science, Friedman School of Nutrition Science and Policy, Tufts University, Boston, MA, USA
| | - Reedik Mägi
- Department of Integrative Physiology, University of Colorado Boulder, Boulder, CO, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Nicholas G Martin
- Genetic Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Ravi Mathur
- GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, USA
| | - Nana Matoba
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Department of Genetics, UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Patrick F McArdle
- Division of Endocrinology, Diabetes and Nutrition, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Matt McGue
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Matthew B McQueen
- Department of Integrative Physiology, University of Colorado, Boulder, CO, USA
| | - Sarah E Medland
- Psychiatric Genetics, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | | | | | - Iona Y Millwood
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Braxton D Mitchell
- Division of Endocrinology, Diabetes and Nutrition, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Geriatrics Research and Education Clinical Center, Baltimore Veterans Administration Medical Center, Baltimore, MD, USA
| | - Karen L Mohlke
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Matthew Moll
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - May E Montasser
- Division of Endocrinology, Diabetes and Nutrition, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Antonella Mulas
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche (CNR), Monserrato, Italy
| | - Jonas B Nielsen
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Elizabeth C Oelsner
- Department of Medicine, Columbia University Medical Center, New York, NY, USA
| | - Yukinori Okada
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
| | - Valeria Orrù
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche (CNR), Monserrato, Italy
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Teemu Palviainen
- Institute for Molecular Medicine Finland - FIMM, University of Helsinki, Helsinki, Finland
| | - Anita Pandit
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - S Lani Park
- Population Sciences of the Pacific Program, University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Annette Peters
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig Maximilians University Munich, Munich, Germany
- German Centre for Cardiovascular Research, DZHK, Partner Site Munich, Munich, Germany
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Tinca J C Polderman
- Department of Clinical Developmental Psychology, Vrije Universiteit, Amsterdam, The Netherlands
- Department of Child and Adolescent Psychiatry, Amsterdam UMC, Amsterdam, The Netherlands
| | - Nicholas Rafaels
- Division of Biomedical Informatics & Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
- Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Robert M Reed
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alex P Reiner
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - John P Rice
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA
| | - Stephen S Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Nicole E Richmond
- Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Carol Roan
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, WI, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Michael N Rueschman
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Nancy L Saccone
- Division of Biostatistics, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - David A Schwartz
- Division of Pulmonary Sciences and Critical Care Medicine; Department of Medicine and Immunology, University of Colorado, Aurora, CO, USA
| | - Aladdin H Shadyab
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California, San Diego, La Jolla, CA, USA
| | | | | | - Kamil Sicinski
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, WI, USA
| | - Anne Heidi Skogholt
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Nicholas L Smith
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, Seattle, WA, USA
- Seattle Epidemiologic Research and Information Center, Department of Veterans Affairs Office of Research and Development, Seattle, WA, USA
| | - Nona Sotoodehnia
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Division of Cardiology, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Michael C Stallings
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
- Department Of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, USA
| | | | - Kari Stefansson
- deCODE Genetics/Amgen, Inc., Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland
| | - Jerry A Stitzel
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
| | - Xiao Sun
- Department of Epidemiology, Tulane University, New Orleans, LA, USA
| | - Moin Syed
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | | | - Amy E Taylor
- MRC Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, UK
- National Institute for Health Research Biomedical Research Centre at the University Hospitals Bristol NHS Foundation Trust and the University of Bristol, Bristol, UK
- Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Marilyn J Telen
- Department of Medicine and Duke Comprehensive Sickle Cell Center, Duke University School of Medicine, Durham, NC, USA
| | - Khanh K Thai
- Kaiser Permanente Northern California (KPNC), Division of Research, Oakland, CA, USA
| | - Hemant Tiwari
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Constance Turman
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Tamara L Wall
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Robin G Walters
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - David R Weir
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Scott T Weiss
- Brigham and Women's Hospital, Department of Medicine, Channing Division of Network Medicine, Boston, MA, USA
| | - Wendy B White
- Jackson Heart Study Undergraduate Training and Education Center, Tougaloo College, Tougaloo, MS, USA
| | - John B Whitfield
- Genetic Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Kerri L Wiggins
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - Gonneke Willemsen
- Netherlands Twin Register, Dept Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Cristen J Willer
- Department of Internal Medicine, Division of Cardiology, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Bendik S Winsvold
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Research and Innovation, Division of Clinical Neuroscience, Oslo University Hospital, Oslo, Norway
- Department of Neurology, Oslo University Hospital, Oslo, Norway
| | - Huichun Xu
- Division of Endocrinology, Diabetes and Nutrition, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Lisa R Yanek
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jie Yin
- Kaiser Permanente Northern California (KPNC), Division of Research, Oakland, CA, USA
| | - Kristin L Young
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kendra A Young
- Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Bing Yu
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Wei Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Luisa Zuccolo
- MRC Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, UK
- Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
- Health Data Science Centre, Fondazione Human Technopole, Milan, Italy
| | - Chiara Batini
- Department of Population Health Sciences, University of Leicester, Leicester, UK
| | - Andrew W Bergen
- Oregon Research Institute, Springfield, OR, USA
- BioRealm, LLC, Walnut, CA, USA
| | - Laura J Bierut
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA
| | - Sean P David
- Outcomes Research Network & Department of Family Medicine, NorthShore University HealthSystem, Evanston, IL, USA
- Department of Family Medicine, University of Chicago, Chicago, IL, USA
| | - Sarah A Gagliano Taliun
- Department of Medicine, Université de Montréal, Montréal, Québec, Canada
- Department of Neurosciences, Université de Montréal, Montréal, Québec, Canada
- Research Centre, Montréal Heart Institute, Montréal, Québec, Canada
| | - Dana B Hancock
- GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, USA
| | - Bibo Jiang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Marcus R Munafò
- MRC Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, UK
- National Institute for Health Research Biomedical Research Centre at the University Hospitals Bristol NHS Foundation Trust and the University of Bristol, Bristol, UK
- School of Psychological Science, University of Bristol, Bristol, UK
| | | | - Dajiang J Liu
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA.
| | - Scott Vrieze
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
111
|
Jin C, Lee B, Shen L, Long Q. Integrating multi-omics summary data using a Mendelian randomization framework. Brief Bioinform 2022; 23:bbac376. [PMID: 36094096 PMCID: PMC9677504 DOI: 10.1093/bib/bbac376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 07/29/2022] [Accepted: 08/09/2022] [Indexed: 12/14/2022] Open
Abstract
Mendelian randomization is a versatile tool to identify the possible causal relationship between an omics biomarker and disease outcome using genetic variants as instrumental variables. A key theme is the prioritization of genes whose omics readouts can be used as predictors of the disease outcome through analyzing GWAS and QTL summary data. However, there is a dearth of study of the best practice in probing the effects of multiple -omics biomarkers annotated to the same gene of interest. To bridge this gap, we propose powerful combination tests that integrate multiple correlated $P$-values without assuming the dependence structure between the exposures. Our extensive simulation experiments demonstrate the superiority of our proposed approach compared with existing methods that are adapted to the setting of our interest. The top hits of the analyses of multi-omics Alzheimer's disease datasets include genes ABCA7 and ATP1B1.
Collapse
Affiliation(s)
- Chong Jin
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Brian Lee
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | | |
Collapse
|
112
|
Chen X, Zhang H, Liu M, Deng HW, Wu Z. Simultaneous detection of novel genes and SNPs by adaptive p-value combination. Front Genet 2022; 13:1009428. [DOI: 10.3389/fgene.2022.1009428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 11/03/2022] [Indexed: 11/18/2022] Open
Abstract
Combining SNP p-values from GWAS summary data is a promising strategy for detecting novel genetic factors. Existing statistical methods for the p-value-based SNP-set testing confront two challenges. First, the statistical power of different methods depends on unknown patterns of genetic effects that could drastically vary over different SNP sets. Second, they do not identify which SNPs primarily contribute to the global association of the whole set. We propose a new signal-adaptive analysis pipeline to address these challenges using the omnibus thresholding Fisher’s method (oTFisher). The oTFisher remains robustly powerful over various patterns of genetic effects. Its adaptive thresholding can be applied to estimate important SNPs contributing to the overall significance of the given SNP set. We develop efficient calculation algorithms to control the type I error rate, which accounts for the linkage disequilibrium among SNPs. Extensive simulations show that the oTFisher has robustly high power and provides a higher balanced accuracy in screening SNPs than the traditional Bonferroni and FDR procedures. We applied the oTFisher to study the genetic association of genes and haplotype blocks of the bone density-related traits using the summary data of the Genetic Factors for Osteoporosis Consortium. The oTFisher identified more novel and literature-reported genetic factors than existing p-value combination methods. Relevant computation has been implemented into the R package TFisher to support similar data analysis.
Collapse
|
113
|
Ham H, Park T. Combining p-values from various statistical methods for microbiome data. Front Microbiol 2022; 13:990870. [PMID: 36439799 PMCID: PMC9686280 DOI: 10.3389/fmicb.2022.990870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Accepted: 10/11/2022] [Indexed: 08/30/2023] Open
Abstract
MOTIVATION In the field of microbiome analysis, there exist various statistical methods that have been developed for identifying differentially expressed features, that account for the overdispersion and the high sparsity of microbiome data. However, due to the differences in statistical models or test formulations, it is quite often to have inconsistent significance results across statistical methods, that makes it difficult to determine the importance of microbiome taxa. Thus, it is practically important to have the integration of the result from all statistical methods to determine the importance of microbiome taxa. A standard meta-analysis is a powerful tool for integrative analysis and it provides a summary measure by combining p-values from various statistical methods. While there are many meta-analyses available, it is not easy to choose the best meta-analysis that is the most suitable for microbiome data. RESULTS In this study, we investigated which meta-analysis method most adequately represents the importance of microbiome taxa. We considered Fisher's method, minimum value of p method, Simes method, Stouffer's method, Kost method, and Cauchy combination test. Through simulation studies, we showed that Cauchy combination test provides the best combined value of p in the sense that it performed the best among the examined methods while controlling the type 1 error rates. Furthermore, it produced high rank similarity with the true ranks. Through the real data application of colorectal cancer microbiome data, we demonstrated that the most highly ranked microbiome taxa by Cauchy combination test have been reported to be associated with colorectal cancer.
Collapse
Affiliation(s)
- Hyeonjung Ham
- Interdisciplinary Program of Bioinformatics, Seoul National University, Seoul, South Korea
| | - Taesung Park
- Interdisciplinary Program of Bioinformatics, Seoul National University, Seoul, South Korea
- Departement of Statistics, Seoul National University, Seoul, South Korea
| |
Collapse
|
114
|
Zhang Z, Bae YE, Bradley JR, Wu L, Wu C. SUMMIT: An integrative approach for better transcriptomic data imputation improves causal gene identification. Nat Commun 2022; 13:6336. [PMID: 36284135 PMCID: PMC9593997 DOI: 10.1038/s41467-022-34016-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 10/11/2022] [Indexed: 12/25/2022] Open
Abstract
Genes with moderate to low expression heritability may explain a large proportion of complex trait etiology, but such genes cannot be sufficiently captured in conventional transcriptome-wide association studies (TWASs), partly due to the relatively small available reference datasets for developing expression genetic prediction models to capture the moderate to low genetically regulated components of gene expression. Here, we introduce a method, the Summary-level Unified Method for Modeling Integrated Transcriptome (SUMMIT), to improve the expression prediction model accuracy and the power of TWAS by using a large expression quantitative trait loci (eQTL) summary-level dataset. We apply SUMMIT to the eQTL summary-level data provided by the eQTLGen consortium. Through simulation studies and analyses of genome-wide association study summary statistics for 24 complex traits, we show that SUMMIT improves the accuracy of expression prediction in blood, successfully builds expression prediction models for genes with low expression heritability, and achieves higher statistical power than several benchmark methods. Finally, we conduct a case study of COVID-19 severity with SUMMIT and identify 11 likely causal genes associated with COVID-19 severity.
Collapse
Affiliation(s)
- Zichen Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Ye Eun Bae
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Jonathan R Bradley
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa, Honolulu, HI, USA
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
115
|
Zhou W, Bi W, Zhao Z, Dey KK, Jagadeesh KA, Karczewski KJ, Daly MJ, Neale BM, Lee S. SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests. Nat Genet 2022; 54:1466-1469. [PMID: 36138231 PMCID: PMC9534766 DOI: 10.1038/s41588-022-01178-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2021] [Accepted: 07/29/2022] [Indexed: 01/07/2023]
Abstract
Several biobanks, including UK Biobank (UKBB), are generating large-scale sequencing data. An existing method, SAIGE-GENE, performs well when testing variants with minor allele frequency (MAF) ≤ 1%, but inflation is observed in variance component set-based tests when restricting to variants with MAF ≤ 0.1% or 0.01%. Here, we propose SAIGE-GENE+ with greatly improved type I error control and computational efficiency to facilitate rare variant tests in large-scale data. We further show that incorporating multiple MAF cutoffs and functional annotations can improve power and thus uncover new gene-phenotype associations. In the analysis of UKBB whole exome sequencing data for 30 quantitative and 141 binary traits, SAIGE-GENE+ identified 551 gene-phenotype associations.
Collapse
Affiliation(s)
- Wei Zhou
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA. .,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA. .,Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| | - Wenjian Bi
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China. .,Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA. .,Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
| | - Zhangchen Zhao
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA.,Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Kushal K Dey
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Karthik A Jagadeesh
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Konrad J Karczewski
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Institute for Molecular Medicine Finland, Helsinki Institute of Life Sciences, University of Helsinki, Helsinki, Finland
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Seunggeun Lee
- Graduate School of Data Science, Seoul National University, Seoul, Korea.
| |
Collapse
|
116
|
Yang ZY, Liu W, Yuan YX, Kong YF, Zhao PZ, Fung WK, Zhou JY. Robust association tests for quantitative traits on the X chromosome. Heredity (Edinb) 2022; 129:244-256. [PMID: 36085362 PMCID: PMC9519943 DOI: 10.1038/s41437-022-00560-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 08/24/2022] [Accepted: 08/24/2022] [Indexed: 11/09/2022] Open
Abstract
The genome-wide association study is an elementary tool to assess the genetic contribution to complex human traits. However, such association tests are mainly proposed for autosomes, and less attention has been given to methods for identifying loci on the X chromosome due to their distinct biological features. In addition, the existing association tests for quantitative traits on the X chromosome either fail to incorporate the information of males or only detect variance heterogeneity. Therefore, we propose four novel methods, which are denoted as QXcat, QZmax, QMVXcat and QMVZmax. When using these methods, it is assumed that the risk alleles for females and males are the same and that the locus being studied satisfies the generalized genetic model for females. The first two methods are based on comparing the means of the trait value across different genotypes, while the latter two methods test for the difference of both means and variances. All four methods effectively incorporate the information of X chromosome inactivation. Simulation studies demonstrate that the proposed methods control the type I error rates well. Under the simulated scenarios, the proposed methods are generally more powerful than the existing methods. We also apply our proposed methods to data from the Minnesota Center for Twin and Family Research and find 10 single nucleotide polymorphisms that are statistically significantly associated with at least two traits at the significance level of 1 × 10-3.
Collapse
Affiliation(s)
- Zi-Ying Yang
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou, China
- Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou, China
| | - Wei Liu
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou, China
- Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou, China
| | - Yu-Xin Yuan
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou, China
- Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou, China
| | - Yi-Fan Kong
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou, China
| | - Pei-Zhen Zhao
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou, China
| | - Wing Kam Fung
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China.
| | - Ji-Yuan Zhou
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou, China.
- Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou, China.
| |
Collapse
|
117
|
Belloy ME, Le Guen Y, Eger SJ, Napolioni V, Greicius MD, He Z. A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data. Neurol Genet 2022; 8:e200012. [PMID: 35966919 PMCID: PMC9372872 DOI: 10.1212/nxg.0000000000200012] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/31/2022] [Indexed: 02/02/2023]
Abstract
Background and Objectives Exome sequencing (ES) and genome sequencing (GS) are expected to be critical to further elucidate the missing genetic heritability of Alzheimer disease (AD) risk by identifying rare coding and/or noncoding variants that contribute to AD pathogenesis. In the United States, the Alzheimer Disease Sequencing Project (ADSP) has taken a leading role in sequencing AD-related samples at scale, with the resultant data being made publicly available to researchers to generate new insights into the genetic etiology of AD. To achieve sufficient power, the ADSP has adapted a study design where subsets of larger AD cohorts are collected and sequenced across multiple centers, using a variety of sequencing platforms. This approach may lead to variable variant quality across sequencing centers and/or platforms. In this study, we sought to implement and evaluate filters that can be applied fast to robustly remove variant-level artifacts in the ADSP data. Methods We implemented a robust quality control procedure to handle ADSP data. We evaluated this procedure while performing exome-wide and genome-wide association analyses on AD risk using the latest ADSP whole ES (WES) and whole GS (WGS) data releases (NG00067.v5). Results We observed that many variants displayed large variation in allele frequencies across sequencing centers/platforms and contributed to spurious association signals with AD risk. We also observed that sequencing platform/center adjustment in association models could not fully account for these spurious signals. To address this issue, we designed and implemented variant filters that could capture and remove these center-specific/platform-specific artifactual variants. Discussion We derived a fast and robust approach to filter variants that represent sequencing center-related or platform-related artifacts underlying spurious associations with AD risk in ADSP WES and WGS data. This approach will be important to support future robust genetic association studies on ADSP data, as well as other studies with similar designs.
Collapse
Affiliation(s)
- Michael E. Belloy
- From the Department of Neurology and Neurological Sciences (M.E.B., Y.L.G., S.J.E., M.D.G., Z.H.), Stanford University, CA; Institut du Cerveau—Paris Brain Institute—ICM (Y.L.G.), France; School of Biosciences and Veterinary Medicine (V.N.), University of Camerino, Italy; and Quantitative Sciences Unit (Z.H.), Department of Medicine, Stanford University, CA
| | - Yann Le Guen
- From the Department of Neurology and Neurological Sciences (M.E.B., Y.L.G., S.J.E., M.D.G., Z.H.), Stanford University, CA; Institut du Cerveau—Paris Brain Institute—ICM (Y.L.G.), France; School of Biosciences and Veterinary Medicine (V.N.), University of Camerino, Italy; and Quantitative Sciences Unit (Z.H.), Department of Medicine, Stanford University, CA
| | - Sarah J. Eger
- From the Department of Neurology and Neurological Sciences (M.E.B., Y.L.G., S.J.E., M.D.G., Z.H.), Stanford University, CA; Institut du Cerveau—Paris Brain Institute—ICM (Y.L.G.), France; School of Biosciences and Veterinary Medicine (V.N.), University of Camerino, Italy; and Quantitative Sciences Unit (Z.H.), Department of Medicine, Stanford University, CA
| | - Valerio Napolioni
- From the Department of Neurology and Neurological Sciences (M.E.B., Y.L.G., S.J.E., M.D.G., Z.H.), Stanford University, CA; Institut du Cerveau—Paris Brain Institute—ICM (Y.L.G.), France; School of Biosciences and Veterinary Medicine (V.N.), University of Camerino, Italy; and Quantitative Sciences Unit (Z.H.), Department of Medicine, Stanford University, CA
| | - Michael D. Greicius
- From the Department of Neurology and Neurological Sciences (M.E.B., Y.L.G., S.J.E., M.D.G., Z.H.), Stanford University, CA; Institut du Cerveau—Paris Brain Institute—ICM (Y.L.G.), France; School of Biosciences and Veterinary Medicine (V.N.), University of Camerino, Italy; and Quantitative Sciences Unit (Z.H.), Department of Medicine, Stanford University, CA
| | - Zihuai He
- From the Department of Neurology and Neurological Sciences (M.E.B., Y.L.G., S.J.E., M.D.G., Z.H.), Stanford University, CA; Institut du Cerveau—Paris Brain Institute—ICM (Y.L.G.), France; School of Biosciences and Veterinary Medicine (V.N.), University of Camerino, Italy; and Quantitative Sciences Unit (Z.H.), Department of Medicine, Stanford University, CA
| |
Collapse
|
118
|
Greco LA, Reay WR, Dayas CV, Cairns MJ. Pairwise genetic meta-analyses between schizophrenia and substance dependence phenotypes reveals novel association signals with pharmacological significance. Transl Psychiatry 2022; 12:403. [PMID: 36151087 PMCID: PMC9508072 DOI: 10.1038/s41398-022-02186-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 08/25/2022] [Accepted: 09/13/2022] [Indexed: 12/04/2022] Open
Abstract
Almost half of individuals diagnosed with schizophrenia also present with a substance use disorder, however, little is known about potential molecular mechanisms underlying this comorbidity. We used genetic analyses to enhance our understanding of the molecular overlap between these conditions. Our analyses revealed a positive genetic correlation between schizophrenia and the following dependence phenotypes: alcohol (rg = 0.368, SE = 0.076, P = 1.61 × 10-6), cannabis use disorder (rg = 0.309, SE = 0.033, P = 1.97 × 10-20) and nicotine (rg = 0.117, SE = 0.043, P = 7.0 × 10-3), as well as drinks per week (rg = 0.087, SE = 0.021, P = 6.36 × 10-5), cigarettes per day (rg = 0.11, SE = 0.024, P = 4.93 × 10-6) and life-time cannabis use (rg = 0.234, SE = 0.029, P = 3.74 × 10-15). We further constructed latent causal variable (LCV) models to test for partial genetic causality and found evidence for a potential causal relationship between alcohol dependence and schizophrenia (GCP = 0.6, SE = 0.22, P = 1.6 × 10-3). This putative causal effect with schizophrenia was not seen using a continuous phenotype of drinks consumed per week, suggesting that distinct molecular mechanisms underlying dependence are involved in the relationship between alcohol and schizophrenia. To localise the specific genetic overlap between schizophrenia and substance use disorders (SUDs), we conducted a gene-based and gene-set pairwise meta-analysis between schizophrenia and each of the four individual substance dependence phenotypes in up to 790,806 individuals. These bivariate meta-analyses identified 44 associations not observed in the individual GWAS, including five shared genes that play a key role in early central nervous system development. The results from this study further supports the existence of underlying shared biology that drives the overlap in substance dependence in schizophrenia, including specific biological systems related to metabolism and neuronal function.
Collapse
Affiliation(s)
- Laura A Greco
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, New Lambton, NSW, Australia
| | - William R Reay
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, New Lambton, NSW, Australia
| | - Christopher V Dayas
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, Australia
| | - Murray J Cairns
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, Australia.
- Precision Medicine Research Program, Hunter Medical Research Institute, New Lambton, NSW, Australia.
| |
Collapse
|
119
|
Qiao J, Shao Z, Wu Y, Zeng P, Wang T. Detecting associated genes for complex traits shared across East Asian and European populations under the framework of composite null hypothesis testing. Lab Invest 2022; 20:424. [PMID: 36138484 PMCID: PMC9503281 DOI: 10.1186/s12967-022-03637-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 09/12/2022] [Indexed: 11/21/2022]
Abstract
Background Detecting trans-ethnic common associated genetic loci can offer important insights into shared genetic components underlying complex diseases/traits across diverse continental populations. However, effective statistical methods for such a goal are currently lacking. Methods By leveraging summary statistics available from global-scale genome-wide association studies, we herein proposed a novel genetic overlap detection method called CONTO (COmposite Null hypothesis test for Trans-ethnic genetic Overlap) from the perspective of high-dimensional composite null hypothesis testing. Unlike previous studies which generally analyzed individual genetic variants, CONTO is a gene-centric method which focuses on a set of genetic variants located within a gene simultaneously and assesses their joint significance with the trait of interest. By borrowing the similar principle of joint significance test (JST), CONTO takes the maximum P value of multiple associations as the significance measurement. Results Compared to JST which is often overly conservative, CONTO is improved in two aspects, including the construction of three-component mixture null distribution and the adjustment of trans-ethnic genetic correlation. Consequently, CONTO corrects the conservativeness of JST with well-calibrated P values and is much more powerful validated by extensive simulation studies. We applied CONTO to discover common associated genes for 31 complex diseases/traits between the East Asian and European populations, and identified many shared trait-associated genes that had otherwise been missed by JST. We further revealed that population-common genes were generally more evolutionarily conserved than population-specific or null ones. Conclusion Overall, CONTO represents a powerful method for detecting common associated genes across diverse ancestral groups; our results provide important implications on the transferability of GWAS discoveries in one population to others. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-022-03637-8.
Collapse
Affiliation(s)
- Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuxuan Wu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
120
|
Yu X, Li D, Xue L. Fisher’s combined probability test for high-dimensional covariance matrices *. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2126781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Affiliation(s)
- Xiufan Yu
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame
| | - Danning Li
- KLAS and School of Mathematics & Statistics, Northeast Normal University
| | - Lingzhou Xue
- Department of Statistics, Pennsylvania State University
| |
Collapse
|
121
|
Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes. Nat Commun 2022; 13:5332. [PMID: 36088354 PMCID: PMC9464252 DOI: 10.1038/s41467-022-32864-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 08/22/2022] [Indexed: 12/05/2022] Open
Abstract
Here we present an exome-wide rare genetic variant association study for 30 blood biomarkers in 191,971 individuals in the UK Biobank. We compare gene-based association tests for separate functional variant categories to increase interpretability and identify 193 significant gene-biomarker associations. Genes associated with biomarkers were ~ 4.5-fold enriched for conferring Mendelian disorders. In addition to performing weighted gene-based variant collapsing tests, we design and apply variant-category-specific kernel-based tests that integrate quantitative functional variant effect predictions for missense variants, splicing and the binding of RNA-binding proteins. For these tests, we present a computationally efficient combination of the likelihood-ratio and score tests that found 36% more associations than the score test alone while also controlling the type-1 error. Kernel-based tests identified 13% more associations than their gene-based collapsing counterparts and had advantages in the presence of gain of function missense variants. We introduce local collapsing by amino acid position for missense variants and use it to interpret associations and identify potential novel gain of function variants in PIEZO1. Our results show the benefits of investigating different functional mechanisms when performing rare-variant association tests, and demonstrate pervasive rare-variant contribution to biomarker variability.
Collapse
|
122
|
Wang T, Ionita-Laza I, Wei Y. Integrated Quantile RAnk Test (iQRAT) for gene-level associations. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1548] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Tianying Wang
- Center for Statistical Science & Department of Industrial Engineering, Tsinghua University
| | | | - Ying Wei
- Department of Biostatistics, Columbia University
| |
Collapse
|
123
|
Shao Z, Wang T, Qiao J, Zhang Y, Huang S, Zeng P. A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies. BMC Bioinformatics 2022; 23:359. [PMID: 36042399 PMCID: PMC9429742 DOI: 10.1186/s12859-022-04897-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/22/2022] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-set methods, few comprehensive comparison studies have been previously performed to evaluate the effectiveness of these methods. RESULTS We herein sought to fill this knowledge gap by conducting a comprehensive empirical comparison for 22 commonly-used summary-statistics based SNP-set methods. We showed that only seven methods could effectively control the type I error, and that these well-calibrated approaches had varying power performance under the simulation scenarios. Overall, we confirmed that the burden test was generally underpowered and score-based variance component tests (e.g., sequence kernel association test) were much powerful under the polygenic genetic architecture in both common and rare variant association analyses. We further revealed that two linkage-disequilibrium-free P value combination methods (e.g., harmonic mean P value method and aggregated Cauchy association test) behaved very well under the sparse genetic architecture in simulations and real-data applications to common and rare variant association analyses as well as in expression quantitative trait loci weighted integrative analysis. We also assessed the scalability of these approaches by recording computational time and found that all these methods can be scalable to biobank-scale data although some might be relatively slow. CONCLUSION In conclusion, we hope that our findings can offer an important guidance on how to choose appropriate multilocus association analysis methods in post-GWAS era. All the SNP-set methods are implemented in the R package called MCA, which is freely available at https://github.com/biostatpzeng/ .
Collapse
Affiliation(s)
- Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuchen Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
124
|
Cai Z, Lei J, Roeder K. Model-free prediction test with application to genomics data. Proc Natl Acad Sci U S A 2022; 119:e2205518119. [PMID: 35969737 PMCID: PMC9407618 DOI: 10.1073/pnas.2205518119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 07/20/2022] [Indexed: 11/18/2022] Open
Abstract
Testing the significance of predictors in a regression model is one of the most important topics in statistics. This problem is especially difficult without any parametric assumptions on the data. This paper aims to test the null hypothesis that given confounding variables Z, X does not significantly contribute to the prediction of Y under the model-free setting, where X and Z are possibly high dimensional. We propose a general framework that first fits nonparametric machine learning regression algorithms on [Formula: see text] and [Formula: see text], then compares the prediction power of the two models. The proposed method allows us to leverage the strength of the most powerful regression algorithms developed in the modern machine learning community. The P value for the test can be easily obtained by permutation. In simulations, we find that the proposed method is more powerful compared to existing methods. The proposed method allows us to draw biologically meaningful conclusions from two gene expression data analyses without strong distributional assumptions: 1) testing the prediction power of sequencing RNA for the proteins in cellular indexing of transcriptomes and epitopes by sequencing data and 2) identification of spatially variable genes in spatially resolved transcriptomics data.
Collapse
Affiliation(s)
- Zhanrui Cai
- Department of Statistics, Iowa State University, Ames, IA 50011
| | - Jing Lei
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213
| |
Collapse
|
125
|
Long M, Li Z, Zhang W, Li Q. The Cauchy Combination Test under Arbitrary Dependence Structures. AM STAT 2022. [DOI: 10.1080/00031305.2022.2116109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Affiliation(s)
- Mingya Long
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences
| | | | - Wei Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences
| | - Qizhai Li
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences
| |
Collapse
|
126
|
Shi C, Zhu J, Shen Y, Luo S, Zhu H, Song R. Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2110876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
Affiliation(s)
| | | | - Ye Shen
- North Carolina State University
| | | | - Hongtu Zhu
- University of North Carolina at Chapel Hill
| | | |
Collapse
|
127
|
Xiong P, Hu T. On Samuel’s p-value model and the Simes test under dependence. Stat Probab Lett 2022. [DOI: 10.1016/j.spl.2022.109509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
128
|
Li Y, Zhou X, Cao H. Statistical analysis of spatially resolved transcriptomic data by incorporating multiomics auxiliary information. Genetics 2022; 221:iyac095. [PMID: 35731210 PMCID: PMC9339334 DOI: 10.1093/genetics/iyac095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 06/14/2022] [Indexed: 11/13/2022] Open
Abstract
Effective control of false discovery rate is key for multiplicity problems. Here, we consider incorporating informative covariates from external datasets in the multiple testing procedure to boost statistical power while maintaining false discovery rate control. In particular, we focus on the statistical analysis of innovative high-dimensional spatial transcriptomic data while incorporating external multiomics data that provide distinct but complementary information to the detection of spatial expression patterns. We extend OrderShapeEM, an efficient covariate-assisted multiple testing procedure that incorporates one auxiliary study, to make it permissible to incorporate multiple external omics studies, to boost statistical power of spatial expression pattern detection. Specifically, we first use a recently proposed computationally efficient statistical analysis method, spatial pattern recognition via kernels, to produce the primary test statistics for spatial transcriptomic data. Afterwards, we construct the auxiliary covariate by combining information from multiple external omics studies, such as bulk and single-cell RNA-seq data using the Cauchy combination rule. Finally, we extend and implement the integrative analysis method OrderShapeEM on the primary P-values along with auxiliary data incorporating multiomics information for efficient covariate-assisted spatial expression analysis. We conduct a series of realistic simulations to evaluate the performance of our method with known ground truth. Four case studies in mouse olfactory bulb, mouse cerebellum, human breast cancer, and human heart tissues further demonstrate the substantial power gain of our method in detecting genes with spatial expression patterns compared to existing classic approaches that do not utilize any external information.
Collapse
Affiliation(s)
- Yan Li
- School of Mathematics, Jilin University, Changchun, Jilin 130012, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hongyuan Cao
- School of Mathematics, Jilin University, Changchun, Jilin 130012, China
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
| |
Collapse
|
129
|
Song S, Sun H, Liu JS, Hou L. Multi-Cell-Type Openness-Weighted Association Studies for Trait-Associated Genomic Segments Prioritization. Genes (Basel) 2022; 13:1220. [PMID: 35886003 PMCID: PMC9323627 DOI: 10.3390/genes13071220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 06/30/2022] [Accepted: 07/03/2022] [Indexed: 02/01/2023] Open
Abstract
Openness-weighted association study (OWAS) is a method that leverages the in silico prediction of chromatin accessibility to prioritize genome-wide association studies (GWAS) signals, and can provide novel insights into the roles of non-coding variants in complex diseases. A prerequisite to apply OWAS is to choose a trait-related cell type beforehand. However, for most complex traits, the trait-relevant cell types remain elusive. In addition, many complex traits involve multiple related cell types. To address these issues, we develop OWAS-joint, an efficient framework that aggregates predicted chromatin accessibility across multiple cell types, to prioritize disease-associated genomic segments. In simulation studies, we demonstrate that OWAS-joint achieves a greater statistical power compared to OWAS. Moreover, the heritability explained by OWAS-joint segments is higher than or comparable to OWAS segments. OWAS-joint segments also have high replication rates in independent replication cohorts. Applying the method to six complex human traits, we demonstrate the advantages of OWAS-joint over a single-cell-type OWAS approach. We highlight that OWAS-joint enhances the biological interpretation of disease mechanisms, especially for non-coding regions.
Collapse
Affiliation(s)
- Shuang Song
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China; (S.S.); (H.S.)
| | - Hongyi Sun
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China; (S.S.); (H.S.)
| | - Jun S. Liu
- Department of Statistics, Harvard University, Cambridge, MA 02138, USA
| | - Lin Hou
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China; (S.S.); (H.S.)
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
130
|
Reay WR, Geaghan MP, Cairns MJ. The genetic architecture of pneumonia susceptibility implicates mucin biology and a relationship with psychiatric illness. Nat Commun 2022; 13:3756. [PMID: 35768473 PMCID: PMC9243103 DOI: 10.1038/s41467-022-31473-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 06/17/2022] [Indexed: 01/25/2023] Open
Abstract
Pneumonia remains one of the leading causes of death worldwide. In this study, we use genome-wide meta-analysis of lifetime pneumonia diagnosis (N = 391,044) to identify four association signals outside of the previously implicated major histocompatibility complex region. Integrative analyses and finemapping of these signals support clinically tractable targets, including the mucin MUC5AC and tumour necrosis factor receptor superfamily member TNFRSF1A. Moreover, we demonstrate widespread evidence of genetic overlap with pneumonia susceptibility across the human phenome, including particularly significant correlations with psychiatric phenotypes that remain significant after testing differing phenotype definitions for pneumonia or genetically conditioning on smoking behaviour. Finally, we show how polygenic risk could be utilised for precision treatment formulation or drug repurposing through pneumonia risk scores constructed using variants mapped to pathways with known drug targets. In summary, we provide insights into the genetic architecture of pneumonia susceptibility and genetics informed targets for drug development or repositioning.
Collapse
Affiliation(s)
- William R Reay
- School of Biomedical Sciences and Pharmacy, Faculty of Health and Medicine, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Program, Hunter Medical Research Institute, Newcastle, NSW, 2305, Australia
| | - Michael P Geaghan
- School of Biomedical Sciences and Pharmacy, Faculty of Health and Medicine, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Program, Hunter Medical Research Institute, Newcastle, NSW, 2305, Australia
| | - Murray J Cairns
- School of Biomedical Sciences and Pharmacy, Faculty of Health and Medicine, The University of Newcastle, Callaghan, NSW, 2308, Australia.
- Precision Medicine Program, Hunter Medical Research Institute, Newcastle, NSW, 2305, Australia.
| |
Collapse
|
131
|
Huang C, Callahan BJ, Wu MC, Holloway ST, Brochu H, Lu W, Peng X, Tzeng JY. Phylogeny-guided microbiome OTU-specific association test (POST). MICROBIOME 2022; 10:86. [PMID: 35668471 PMCID: PMC9171974 DOI: 10.1186/s40168-022-01266-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 04/01/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND The relationship between host conditions and microbiome profiles, typically characterized by operational taxonomic units (OTUs), contains important information about the microbial role in human health. Traditional association testing frameworks are challenged by the high dimensionality and sparsity of typical microbiome profiles. Phylogenetic information is often incorporated to address these challenges with the assumption that evolutionarily similar taxa tend to behave similarly. However, this assumption may not always be valid due to the complex effects of microbes, and phylogenetic information should be incorporated in a data-supervised fashion. RESULTS In this work, we propose a local collapsing test called phylogeny-guided microbiome OTU-specific association test (POST). In POST, whether or not to borrow information and how much information to borrow from the neighboring OTUs in the phylogenetic tree are supervised by phylogenetic distance and the outcome-OTU association. POST is constructed under the kernel machine framework to accommodate complex OTU effects and extends kernel machine microbiome tests from community level to OTU level. Using simulation studies, we show that when the phylogenetic tree is informative, POST has better performance than existing OTU-level association tests. When the phylogenetic tree is not informative, POST achieves similar performance as existing methods. Finally, in real data applications on bacterial vaginosis and on preterm birth, we find that POST can identify similar or more outcome-associated OTUs that are of biological relevance compared to existing methods. CONCLUSIONS Using POST, we show that adaptively leveraging the phylogenetic information can enhance the selection performance of associated microbiome features by improving the overall true-positive and false-positive detection. We developed a user friendly R package POSTm which is freely available on CRAN ( https://CRAN.R-project.org/package=POSTm ). Video Abstract.
Collapse
Affiliation(s)
- Caizhi Huang
- Bioinformatics Research Center, North Carolina State University, Raleigh, 27606, USA
| | - Benjamin J Callahan
- Bioinformatics Research Center, North Carolina State University, Raleigh, 27606, USA
- Department of Population Health and Pathobiology, North Carolina State University, Raleigh, 27607, USA
| | - Michael C Wu
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, 98109, USA
| | - Shannon T Holloway
- Department of Statistics, North Carolina State University, Raleigh, 27606, USA
| | - Hayden Brochu
- Bioinformatics Research Center, North Carolina State University, Raleigh, 27606, USA
- Department of Molecular Biomedical Sciences, North Carolina State University, Raleigh, 27607, USA
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, 27606, USA
| | - Xinxia Peng
- Bioinformatics Research Center, North Carolina State University, Raleigh, 27606, USA
- Department of Molecular Biomedical Sciences, North Carolina State University, Raleigh, 27607, USA
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, Raleigh, 27606, USA.
- Department of Statistics, North Carolina State University, Raleigh, 27606, USA.
| |
Collapse
|
132
|
Khunsriraksakul C, McGuire D, Sauteraud R, Chen F, Yang L, Wang L, Hughey J, Eckert S, Dylan Weissenkampen J, Shenoy G, Marx O, Carrel L, Jiang B, Liu DJ. Integrating 3D genomic and epigenomic data to enhance target gene discovery and drug repurposing in transcriptome-wide association studies. Nat Commun 2022; 13:3258. [PMID: 35672318 PMCID: PMC9171100 DOI: 10.1038/s41467-022-30956-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 05/25/2022] [Indexed: 02/08/2023] Open
Abstract
Transcriptome-wide association studies (TWAS) are popular approaches to test for association between imputed gene expression levels and traits of interest. Here, we propose an integrative method PUMICE (Prediction Using Models Informed by Chromatin conformations and Epigenomics) to integrate 3D genomic and epigenomic data with expression quantitative trait loci (eQTL) to more accurately predict gene expressions. PUMICE helps define and prioritize regions that harbor cis-regulatory variants, which outperforms competing methods. We further describe an extension to our method PUMICE +, which jointly combines TWAS results from single- and multi-tissue models. Across 79 traits, PUMICE + identifies 22% more independent novel genes and increases median chi-square statistics values at known loci by 35% compared to the second-best method, as well as achieves the narrowest credible interval size. Lastly, we perform computational drug repurposing and confirm that PUMICE + outperforms other TWAS methods.
Collapse
Affiliation(s)
- Chachrit Khunsriraksakul
- grid.29857.310000 0001 2097 4281Bioinformatics and Genomics Graduate Program, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Daniel McGuire
- grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Renan Sauteraud
- grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Fang Chen
- grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Lina Yang
- grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Lida Wang
- grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Jordan Hughey
- grid.29857.310000 0001 2097 4281Bioinformatics and Genomics Graduate Program, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Scott Eckert
- grid.29857.310000 0001 2097 4281Bioinformatics and Genomics Graduate Program, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - J. Dylan Weissenkampen
- grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Ganesh Shenoy
- grid.29857.310000 0001 2097 4281Department of Neurosurgery, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Olivia Marx
- grid.29857.310000 0001 2097 4281Biomedical Science Program, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Laura Carrel
- grid.29857.310000 0001 2097 4281Department of Biochemistry and Molecular Biology, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Bibo Jiang
- grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Dajiang J. Liu
- grid.29857.310000 0001 2097 4281Bioinformatics and Genomics Graduate Program, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| |
Collapse
|
133
|
LOPER JH, Lei L, FITHIAN W, TANSEY W. Smoothed Nested Testing on Directed Acyclic Graphs. Biometrika 2022; 109:457-471. [PMID: 38694183 PMCID: PMC11061840 DOI: 10.1093/biomet/asab041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/04/2024] Open
Abstract
We consider the problem of multiple hypothesis testing when there is a logical nested structure to the hypotheses. When one hypothesis is nested inside another, the outer hypothesis must be false if the inner hypothesis is false. We model the nested structure as a directed acyclic graph, including chain and tree graphs as special cases. Each node in the graph is a hypothesis and rejecting a node requires also rejecting all of its ancestors. We propose a general framework for adjusting node-level test statistics using the known logical constraints. Within this framework, we study a smoothing procedure that combines each node with all of its descendants to form a more powerful statistic. We prove a broad class of smoothing strategies can be used with existing selection procedures to control the familywise error rate, false discovery exceedance rate, or false discovery rate, so long as the original test statistics are independent under the null. When the null statistics are not independent but are derived from positively-correlated normal observations, we prove control for all three error rates when the smoothing method is arithmetic averaging of the observations. Simulations and an application to a real biology dataset demonstrate that smoothing leads to substantial power gains.
Collapse
Affiliation(s)
- J. H. LOPER
- Department of Neuroscience, Columbia University, 716 Jerome L. Greene Building, New York, New York 10025, U.S.A
| | - L. Lei
- Department of Statistics, Stanford University, Sequoia Hall, Palo Alto, California 94305, U.S.A
| | - W. FITHIAN
- Department of Statistics, University of California, Berkeley, 367 Evans Hall, Berkeley, California 94720, U.S.A
| | - W. TANSEY
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, 321 E 61st St., New York, New York 10065, U.S.A
| |
Collapse
|
134
|
Zhang W, Liu A, Zhang Z, Chen G, Li Q. An adaptive direction-assisted test for microbiome compositional data. Bioinformatics 2022; 38:3493-3500. [PMID: 35640978 PMCID: PMC9890306 DOI: 10.1093/bioinformatics/btac361] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 04/11/2022] [Accepted: 05/28/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Microbial communities have been shown to be associated with many complex diseases, such as cancers and cardiovascular diseases. The identification of differentially abundant taxa is clinically important. It can help understand the pathology of complex diseases, and potentially provide preventive and therapeutic strategies. Appropriate differential analyses for microbiome data are challenging due to its unique data characteristics including compositional constraint, excessive zeros and high dimensionality. Most existing approaches either ignore these data characteristics or only account for the compositional constraint by using log-ratio transformations with zero observations replaced by a pseudocount. However, there is no consensus on how to choose a pseudocount. More importantly, ignoring the characteristic of excessive zeros may result in poorly powered analyses and therefore yield misleading findings. RESULTS We develop a novel microbiome-based direction-assisted test for the detection of overall difference in microbial relative abundances between two health conditions, which simultaneously incorporates the characteristics of relative abundance data. The proposed test (i) divides the taxa into two clusters by the directions of mean differences of relative abundances and then combines them at cluster level, in light of the compositional characteristic; and (ii) contains a burden type test, which collapses multiple taxa into a single one to account for excessive zeros. Moreover, the proposed test is an adaptive procedure, which can accommodate high-dimensional settings and yield high power against various alternative hypotheses. We perform extensive simulation studies across a wide range of scenarios to evaluate the proposed test and show its substantial power gain over some existing tests. The superiority of the proposed approach is further demonstrated with real datasets from two microbiome studies. AVAILABILITY AND IMPLEMENTATION An R package for MiDAT is available at https://github.com/zhangwei0125/MiDAT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wei Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Aiyi Liu
- To whom correspondence should be addressed. or
| | - Zhiwei Zhang
- Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Guanjie Chen
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Qizhai Li
- To whom correspondence should be addressed. or
| |
Collapse
|
135
|
Bolt MA, MaWhinney S, Pattee JW, Erlandson KM, Badesch DB, Peterson RA. Inference following multiple imputation for generalized additive models: an investigation of the median p-value rule with applications to the Pulmonary Hypertension Association Registry and Colorado COVID-19 hospitalization data. BMC Med Res Methodol 2022; 22:148. [PMID: 35597908 PMCID: PMC9123297 DOI: 10.1186/s12874-022-01613-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 04/21/2022] [Indexed: 11/10/2022] Open
Abstract
Background Missing data prove troublesome in data analysis; at best they reduce a study’s statistical power and at worst they induce bias in parameter estimates. Multiple imputation via chained equations is a popular technique for dealing with missing data. However, techniques for combining and pooling results from fitted generalized additive models (GAMs) after multiple imputation have not been well explored. Methods We simulated missing data under MCAR, MAR, and MNAR frameworks and utilized random forest and predictive mean matching imputation to investigate a variety of rules for combining GAMs after multiple imputation with binary and normally distributed outcomes. We compared multiple pooling procedures including the “D2” method, the Cauchy combination test, and the median p-value (MPV) rule. The MPV rule involves simply computing and reporting the median p-value across all imputations. Other ad hoc methods such as a mean p-value rule and a single imputation method are investigated. The viability of these methods in pooling results from B-splines is also examined for normal outcomes. An application of these various pooling techniques is then performed on two case studies, one which examines the effect of elevation on a six-minute walk distance (a normal outcome) for patients with pulmonary arterial hypertension, and the other which examines risk factors for intubation in hospitalized COVID-19 patients (a dichotomous outcome). Results In comparison to the results from generalized additive models fit on full datasets, the median p-value rule performs as well as if not better than the other methods examined. In situations where the alternative hypothesis is true, the Cauchy combination test appears overpowered and alternative methods appear underpowered, while the median p-value rule yields results similar to those from analyses of complete data. Conclusions For pooling results after fitting GAMs to multiply imputed datasets, the median p-value is a simple yet useful approach which balances both power to detect important associations and control of Type I errors. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01613-w.
Collapse
Affiliation(s)
- Matthew A Bolt
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado-Denver Anschutz Medical Campus, 13001 E. 17th Pl, Aurora, CO, USA
| | - Samantha MaWhinney
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado-Denver Anschutz Medical Campus, 13001 E. 17th Pl, Aurora, CO, USA
| | - Jack W Pattee
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado-Denver Anschutz Medical Campus, 13001 E. 17th Pl, Aurora, CO, USA
| | - Kristine M Erlandson
- School of Medicine, University of Colorado-Denver Anschutz Medical Campus, Aurora, CO, USA
| | - David B Badesch
- School of Medicine, University of Colorado-Denver Anschutz Medical Campus, Aurora, CO, USA
| | - Ryan A Peterson
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado-Denver Anschutz Medical Campus, 13001 E. 17th Pl, Aurora, CO, USA.
| |
Collapse
|
136
|
Zhu Z, Satten GA, Hu YJ. Integrative analysis of relative abundance data and presence-absence data of the microbiome using the LDM. Bioinformatics 2022; 38:2915-2917. [PMID: 35561163 PMCID: PMC9113255 DOI: 10.1093/bioinformatics/btac181] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 03/11/2022] [Accepted: 03/22/2022] [Indexed: 02/03/2023] Open
Abstract
SUMMARY We previously developed the LDM for testing hypotheses about the microbiome that performs the test at both the community level and the individual taxon level. The LDM can be applied to relative abundance data and presence-absence data separately, which work well when associated taxa are abundant and rare, respectively. Here, we propose LDM-omni3 that combines LDM analyses at the relative abundance and presence-absence data scales, thereby offering optimal power across scenarios with different association mechanisms. The new LDM-omni3 test is available for the wide range of data types and analyses that are supported by the LDM. AVAILABILITY AND IMPLEMENTATION The LDM-omni3 test has been added to the R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhengyi Zhu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Glen A Satten
- Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
137
|
Cinar O, Viechtbauer W. A Comparison of Methods for Gene-Based Testing That Account for Linkage Disequilibrium. Front Genet 2022; 13:867724. [PMID: 35601489 PMCID: PMC9117705 DOI: 10.3389/fgene.2022.867724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/07/2022] [Indexed: 11/16/2022] Open
Abstract
Controlling the type I error rate while retaining sufficient power is a major concern in genome-wide association studies, which nowadays often examine more than a million single-nucleotide polymorphisms (SNPs) simultaneously. Methods such as the Bonferroni correction can lead to a considerable decrease in power due to the large number of tests conducted. Shifting the focus to higher functional structures (e.g., genes) can reduce the loss of power. This can be accomplished via the combination of p-values of SNPs that belong to the same structural unit to test their joint null hypothesis. However, standard methods for this purpose (e.g., Fisher’s method) do not account for the dependence among the tests due to linkage disequilibrium (LD). In this paper, we review various adjustments to methods for combining p-values that take LD information explicitly into consideration and evaluate their performance in a simulation study based on data from the HapMap project. The results illustrate the importance of incorporating LD information into the methods for controlling the type I error rate at the desired level. Furthermore, some methods are more successful in controlling the type I error rate than others. Among them, Brown’s method was the most robust technique with respect to the characteristics of the genes and outperformed the Bonferroni method in terms of power in many scenarios. Examining the genetic factors of a phenotype of interest at the gene-rather than SNP-level can provide researchers benefits in terms of the power of the study. While doing so, one should be careful to account for LD in SNPs belonging to the same gene, for which Brown’s method seems the most robust technique.
Collapse
|
138
|
Wang M, Zhang S, Sha Q. A computationally efficient clustering linear combination approach to jointly analyze multiple phenotypes for GWAS. PLoS One 2022; 17:e0260911. [PMID: 35482827 PMCID: PMC9049312 DOI: 10.1371/journal.pone.0260911] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 04/13/2022] [Indexed: 11/18/2022] Open
Abstract
There has been an increasing interest in joint analysis of multiple phenotypes in genome-wide association studies (GWAS) because jointly analyzing multiple phenotypes may increase statistical power to detect genetic variants associated with complex diseases or traits. Recently, many statistical methods have been developed for joint analysis of multiple phenotypes in genetic association studies, including the Clustering Linear Combination (CLC) method. The CLC method works particularly well with phenotypes that have natural groupings, but due to the unknown number of clusters for a given data, the final test statistic of CLC method is the minimum p-value among all p-values of the CLC test statistics obtained from each possible number of clusters. Therefore, a simulation procedure needs to be used to evaluate the p-value of the final test statistic. This makes the CLC method computationally demanding. We develop a new method called computationally efficient CLC (ceCLC) to test the association between multiple phenotypes and a genetic variant. Instead of using the minimum p-value as the test statistic in the CLC method, ceCLC uses the Cauchy combination test to combine all p-values of the CLC test statistics obtained from each possible number of clusters. The test statistic of ceCLC approximately follows a standard Cauchy distribution, so the p-value can be obtained from the cumulative density function without the need for the simulation procedure. Through extensive simulation studies and application on the COPDGene data, the results demonstrate that the type I error rates of ceCLC are effectively controlled in different simulation settings and ceCLC either outperforms all other methods or has statistical power that is very close to the most powerful method with which it has been compared.
Collapse
Affiliation(s)
- Meida Wang
- Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America
| | - Shuanglin Zhang
- Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America
| | - Qiuying Sha
- Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America
| |
Collapse
|
139
|
Yu X, Li D, Xue L, Li R. Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2061354] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
| | | | | | - Runze Li
- The Pennsylvania State University
| |
Collapse
|
140
|
Dietlein F, Wang AB, Fagre C, Tang A, Besselink NJM, Cuppen E, Li C, Sunyaev SR, Neal JT, Van Allen EM. Genome-wide analysis of somatic noncoding mutation patterns in cancer. Science 2022; 376:eabg5601. [PMID: 35389777 PMCID: PMC9092060 DOI: 10.1126/science.abg5601] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
We established a genome-wide compendium of somatic mutation events in 3949 whole cancer genomes representing 19 tumor types. Protein-coding events captured well-established drivers. Noncoding events near tissue-specific genes, such as ALB in the liver or KLK3 in the prostate, characterized localized passenger mutation patterns and may reflect tumor-cell-of-origin imprinting. Noncoding events in regulatory promoter and enhancer regions frequently involved cancer-relevant genes such as BCL6, FGFR2, RAD51B, SMC6, TERT, and XBP1 and represent possible drivers. Unlike most noncoding regulatory events, XBP1 mutations primarily accumulated outside the gene's promoter, and we validated their effect on gene expression using CRISPR-interference screening and luciferase reporter assays. Broadly, our study provides a blueprint for capturing mutation events across the entire genome to guide advances in biological discovery, therapies, and diagnostics.
Collapse
Affiliation(s)
- Felix Dietlein
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA.,Cancer Program, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA.,Corresponding author. (E.M.V.A.); (F.D.)
| | - Alex B. Wang
- Cancer Program, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Christian Fagre
- Cancer Program, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Anran Tang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA.,Cancer Program, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Nicolle J. M. Besselink
- Center for Molecular Medicine and Oncode Institute, University Medical Center Utrecht, 3584 CX Utrecht, Netherlands
| | - Edwin Cuppen
- Center for Molecular Medicine and Oncode Institute, University Medical Center Utrecht, 3584 CX Utrecht, Netherlands.,Hartwig Medical Foundation, 1098 XH Amsterdam, Netherlands
| | - Chunliang Li
- Department of Tumor Cell Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Shamil R. Sunyaev
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - James T. Neal
- Cancer Program, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Eliezer M. Van Allen
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA.,Cancer Program, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA.,Corresponding author. (E.M.V.A.); (F.D.)
| |
Collapse
|
141
|
Hébert F, Causeur D, Emily M. Omnibus testing approach for gene-based gene-gene interaction. Stat Med 2022; 41:2854-2878. [PMID: 35338506 DOI: 10.1002/sim.9389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 03/03/2022] [Accepted: 03/04/2022] [Indexed: 11/07/2022]
Abstract
Genetic interaction is considered as one of the main heritable component of complex traits. With the emergence of genome-wide association studies (GWAS), a collection of statistical methods dedicated to the identification of interaction at the SNP level have been proposed. More recently, gene-based gene-gene interaction testing has emerged as an attractive alternative as they confer advantage in both statistical power and biological interpretation. Most of the gene-based interaction methods rely on a multidimensional modeling of the interaction, thus facing a lack of robustness against the huge space of interaction patterns. In this paper, we study a global testing approaches to address the issue of gene-based gene-gene interaction. Based on a logistic regression modeling framework, all SNP-SNP interaction tests are combined to produce a gene-level test for interaction. We propose an omnibus test that takes advantage of (1) the heterogeneity between existing global tests and (2) the complementarity between allele-based and genotype-based coding of SNPs. Through an extensive simulation study, it is demonstrated that the proposed omnibus test has the ability to detect with high power the most common interaction genetic models with one causal pair as well as more complex genetic models where more than one causal pair is involved. On the other hand, the flexibility of the proposed approach is shown to be robust and improves power compared to single global tests in replication studies. Furthermore, the application of our procedure to real datasets confirms the adaptability of our approach to replicate various gene-gene interactions.
Collapse
Affiliation(s)
- Florian Hébert
- Department of Statistics and Computer Science, Institut Agro, CNRS, IRMAR, Univ Rennes, F-35000, Rennes, France
| | - David Causeur
- Department of Statistics and Computer Science, Institut Agro, CNRS, IRMAR, Univ Rennes, F-35000, Rennes, France
| | - Mathieu Emily
- Department of Statistics and Computer Science, Institut Agro, CNRS, IRMAR, Univ Rennes, F-35000, Rennes, France
| |
Collapse
|
142
|
Statistical Methods with Applications in Data Mining: A Review of the Most Recent Works. MATHEMATICS 2022. [DOI: 10.3390/math10060993] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
The importance of statistical methods in finding patterns and trends in otherwise unstructured and complex large sets of data has grown over the past decade, as the amount of data produced keeps growing exponentially and knowledge obtained from understanding data allows to make quick and informed decisions that save time and provide a competitive advantage. For this reason, we have seen considerable advances over the past few years in statistical methods in data mining. This paper is a comprehensive and systematic review of these recent developments in the area of data mining.
Collapse
|
143
|
Cao X, Wang X, Zhang S, Sha Q. Gene-based association tests using GWAS summary statistics and incorporating eQTL. Sci Rep 2022; 12:3553. [PMID: 35241742 PMCID: PMC8894384 DOI: 10.1038/s41598-022-07465-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 02/11/2022] [Indexed: 01/29/2023] Open
Abstract
Although genome-wide association studies (GWAS) have been successfully applied to a variety of complex diseases and identified many genetic variants underlying complex diseases via single marker tests, there is still a considerable heritability of complex diseases that could not be explained by GWAS. One alternative approach to overcome the missing heritability caused by genetic heterogeneity is gene-based analysis, which considers the aggregate effects of multiple genetic variants in a single test. Another alternative approach is transcriptome-wide association study (TWAS). TWAS aggregates genomic information into functionally relevant units that map to genes and their expression. TWAS is not only powerful, but can also increase the interpretability in biological mechanisms of identified trait associated genes. In this study, we propose a powerful and computationally efficient gene-based association test, called Overall. Using extended Simes procedure, Overall aggregates information from three types of traditional gene-based association tests and also incorporates expression quantitative trait locus (eQTL) information into a gene-based association test using GWAS summary statistics. We show that after a small number of replications to estimate the correlation among the integrated gene-based tests, the p values of Overall can be calculated analytically. Simulation studies show that Overall can control type I error rates very well and has higher power than the tests that we compared with. We also apply Overall to two schizophrenia GWAS summary datasets and two lipids GWAS summary datasets. The results show that this newly developed method can identify more significant genes than other methods we compared with.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, Denton, TX, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA.
| |
Collapse
|
144
|
Ge Y, Chen G, Waltz JA, Hong LE, Kochunov P, Chen S. An integrated cluster-wise significance measure for fMRI analysis. Hum Brain Mapp 2022; 43:2444-2459. [PMID: 35233859 PMCID: PMC9057103 DOI: 10.1002/hbm.25795] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 12/31/2021] [Accepted: 01/17/2022] [Indexed: 11/07/2022] Open
Abstract
Cluster-wise inference is widely used in fMRI analysis. The cluster-level statistic is often obtained by counting the number of intra-cluster voxels which surpass a voxel-level statistical significance threshold. This measure can be sub-optimal regarding the power and false-positive error rate because the suprathreshold voxel count neglects the voxel-wise significance levels and ignores the dependence between voxels. This article aims to provide a new Integrated Cluster-wise significance Measure (ICM) for cluster-level significance determination in cluster-wise fMRI analysis by integrating cluster extent, voxel-level significance (e.g., p values), and activation dependence between within-cluster voxels. We develop a computationally efficient strategy for ICM based on probabilistic approximation theories. Consequently, the computational load for ICM-based cluster-wise inference (e.g., permutation tests) is affordable. We validate the proposed method via extensive simulations and then apply it to two fMRI data sets. The results demonstrate that ICM can improve the power with well-controlled family-wise error (FWE).
Collapse
Affiliation(s)
- Yunjiang Ge
- Department of Mathematics, University of Maryland-College Park, College Park, Maryland, USA
| | - Gang Chen
- Scientific and Statistical Computing Core, National Institute of Mental Health, National Institute of Health, Bethesda, Maryland, USA
| | - James A Waltz
- Maryland Psychiatric Research Center, Department of Psychiatry, School of Medicine, University of Maryland, Catonsville, Maryland, USA
| | - Liyi Elliot Hong
- Maryland Psychiatric Research Center, Department of Psychiatry, School of Medicine, University of Maryland, Catonsville, Maryland, USA
| | - Peter Kochunov
- Maryland Psychiatric Research Center, Department of Psychiatry, School of Medicine, University of Maryland, Catonsville, Maryland, USA
| | - Shuo Chen
- Maryland Psychiatric Research Center, Department of Psychiatry, School of Medicine, University of Maryland, Catonsville, Maryland, USA.,Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, School of Medicine, University of Maryland, Baltimore, Maryland, USA
| |
Collapse
|
145
|
Chen Z. Robust tests for combining p-values under arbitrary dependency structures. Sci Rep 2022; 12:3158. [PMID: 35210502 PMCID: PMC8873210 DOI: 10.1038/s41598-022-07094-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 02/11/2022] [Indexed: 12/16/2022] Open
Abstract
Recently Liu and Xie proposed a p-value combination test based on the Cauchy distribution (CCT). They showed that when the significance levels are small, CCT can control type I error rate and the resulting p-value can be simply approximated using a Cauchy distribution. One very special and attractive property of CCT is that it is applicable to situations where the p-values to be combined are dependent. However, in this paper, we show that under some conditions the commonly used MinP test is much more powerful than CCT. In addition, under some other situations, CCT is powerless at all. Therefore, we should use CCT with caution. We also proposed new robust p-value combination tests using a second MinP/CCT to combine the dependent p-values obtained from CCT and MinP applied to the original p-values. We call the new tests MinP-CCT-MinP (MCM) and CCT-MinP-CCT (CMC). We study the performance of the new tests by comparing them with CCT and MinP using comprehensive simulation study. Our study shows that the proposed tests, MCM and CMC, are robust and powerful under many conditions, and can be considered as alternatives of CCT or MinP.
Collapse
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, 1025 E. 7th Street, Bloomington, IN, 47405, USA.
| |
Collapse
|
146
|
Zhang H, Wu Z. The generalized Fisher's combination and accurate p-value calculation under dependence. Biometrics 2022. [PMID: 35178716 DOI: 10.1111/biom.13634] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Accepted: 02/03/2022] [Indexed: 11/28/2022]
Abstract
Combining dependent tests of significance has broad applications but the related p-value calculation is challenging. For Fisher's combination test, current p-value calculation methods (e.g., Brown's approximation) tend to inflate the type I error rate when the desired significance level is substantially less than 0.05. The problem could lead to significant false discoveries in big data analyses. This paper provides two main contributions. First, it presents a general family of Fisher type statistics, referred to as the GFisher, which covers many classic statistics, such as Fisher's combination, Good's statistic, Lancaster's statistic, weighted Z-score combination, etc. The GFisher allows a flexible weighting scheme, as well as an omnibus procedure that automatically adapts proper weights and the statistic-defining parameters to a given data. Second, the paper presents several new p-value calculation methods based on two novel ideas: moment-ratio matching and joint-distribution surrogating. Systematic simulations show that the new calculation methods are more accurate under multivariate Gaussian, and more robust under the generalized linear model and the multivariate t-distribution. The applications of the GFisher and the new p-value calculation methods are demonstrated by a gene-based SNP-set association study. Relevant computation has been implemented to an R package GFisher available on the Comprehensive R Archive Network. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Hong Zhang
- Biostatistics and Research Decision Sciences, Merck Research Laboratories, Rahway, New Jersey, U.S.A
| | - Zheyang Wu
- Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, Massachusetts, U.S.A
| |
Collapse
|
147
|
Vovk V, Wang B, Wang R. Admissible ways of merging p-values under arbitrary dependence. Ann Stat 2022. [DOI: 10.1214/21-aos2109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Vladimir Vovk
- Department of Computer Science, Royal Holloway, University of London
| | - Bin Wang
- RCSDS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences
| | - Ruodu Wang
- Department of Statistics and Actuarial Science, University of Waterloo
| |
Collapse
|
148
|
Fan Q, Sun S, Li YJ. Precisely modeling zero-inflated count phenotype for rare variants. Genet Epidemiol 2022; 46:73-86. [PMID: 34779034 PMCID: PMC9615426 DOI: 10.1002/gepi.22438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 08/12/2021] [Accepted: 10/11/2021] [Indexed: 02/03/2023]
Abstract
Count data with excessive zeros are increasingly ubiquitous in genetic association studies, such as neuritic plaques in brain pathology for Alzheimer's disease. Here, we developed gene-based association tests to model such data by a mixture of two distributions, one for the structural zeros contributed by the Binomial distribution, and the other for the counts from the Poisson distribution. We derived the score statistics of the corresponding parameter of the rare variants in the zero-inflated Poisson regression model, and then constructed burden (ZIP-b) and kernel (ZIP-k) tests for the association tests. We evaluated omnibus tests that combined both ZIP-b and ZIP-k tests. Through simulated sequence data, we illustrated the potential power gain of our proposed method over a two-stage method that analyzes binary and non-zero continuous data separately for both burden and kernel tests. The ZIP burden test outperformed the kernel test as expected in all scenarios except for the scenario of variants with a mixture of directions in the genetic effects. We further demonstrated its applications to analyses of the neuritic plaque data in the ROSMAP cohort. We expect our proposed test to be useful in practice as more powerful than or complementary to the two-stage method.
Collapse
Affiliation(s)
- Qiao Fan
- Duke-NUS Medical School, Centre for Quantitative Medicine, National University of Singapore, Singapore, Singapore
| | - Shuming Sun
- Duke Molecular Physiology Institute, Duke University School of Medicine, Durham, North Carolina, USA
| | - Yi-Ju Li
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, North Carolina, USA
| |
Collapse
|
149
|
An adaptive combination method for Cauchy variable based on optimal threshold. J Genet 2022. [DOI: 10.1007/s12041-021-01351-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
150
|
Wang T, Ling W, Plantinga AM, Wu MC, Zhan X. Testing microbiome association using integrated quantile regression models. Bioinformatics 2022; 38:419-425. [PMID: 34554223 PMCID: PMC10060731 DOI: 10.1093/bioinformatics/btab668] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 08/24/2021] [Accepted: 09/18/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Most existing microbiome association analyses focus on the association between microbiome and conditional mean of health or disease-related outcomes, and within this vein, vast computational tools and methods have been devised for standard binary or continuous outcomes. However, these methods tend to be limited either when the underlying microbiome-outcome association occurs somewhere other than the mean level, or when distribution of the outcome variable is irregular (e.g. zero-inflated or mixtures) such that conditional outcome mean is less meaningful. We address this gap by investigating association analysis between microbiome compositions and conditional outcome quantiles. RESULTS We introduce a new association analysis tool named MiRKAT-IQ within the Microbiome Regression-based Kernel Association Test framework using Integrated Quantile regression models to examine the association between microbiome and the distribution of outcome. For an individual quantile, we utilize the existing kernel machine regression framework to examine the association between that conditional outcome quantile and a group of microbial features (e.g. microbiome community compositions). Then, the goal of examining microbiome association with the whole outcome distribution is achieved by integrating all outcome conditional quantiles over a process, and thus our new MiRKAT-IQ test is robust to both the location of association signals (e.g. mean, variance, median) and the heterogeneous distribution of the outcome. Extensive numerical simulation studies have been conducted to show the validity of the new MiRKAT-IQ test. We demonstrate the potential usefulness of MiRKAT-IQ with applications to actual biological data collected from a previous microbiome study. AVAILABILITY AND IMPLEMENTATION R codes to implement the proposed methodology is provided in the MiRKAT package, which is available on CRAN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tianying Wang
- Center for Statistical Science, Tsinghua University, Beijing 100084, China
- Department of Industrial Engineering, Tsinghua University, Beijing 100084, China
| | - Wodan Ling
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Anna M Plantinga
- Department of Mathematics and Statistics, Williams College, Williamstown, MA 01267, USA
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Xiang Zhan
- Department of Biostatistics, School of Public Health, Peking University, Beijing 100191, China
- Beijing International Center for Mathematical Research, Peking University, Beijing 100871, China
| |
Collapse
|