1
|
Sun R, Shi A, Lin X. Differences in set-based tests for sparse alternatives when testing sets of outcomes compared to sets of explanatory factors in genetic association studies. Biostatistics 2023; 25:171-187. [PMID: 36000269 PMCID: PMC10724113 DOI: 10.1093/biostatistics/kxac036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 07/15/2022] [Accepted: 08/07/2022] [Indexed: 01/11/2023] Open
Abstract
Set-based association tests are widely popular in genetic association settings for their ability to aggregate weak signals and reduce multiple testing burdens. In particular, a class of set-based tests including the Higher Criticism, Berk-Jones, and other statistics have recently been popularized for reaching a so-called detection boundary when signals are rare and weak. Such tests have been applied in two subtly different settings: (a) associating a genetic variant set with a single phenotype and (b) associating a single genetic variant with a phenotype set. A significant issue in practice is the choice of test, especially when deciding between innovated and generalized type methods for detection boundary tests. Conflicting guidance is present in the literature. This work describes how correlation structures generate marked differences in relative operating characteristics for settings (a) and (b). The implications for study design are significant. We also develop novel power bounds that facilitate the aforementioned calculations and allow for analysis of individual testing settings. In more concrete terms, our investigation is motivated by translational expression quantitative trait loci (eQTL) studies in lung cancer. These studies involve both testing for groups of variants associated with a single gene expression (multiple explanatory factors) and testing whether a single variant is associated with a group of gene expressions (multiple outcomes). Results are supported by a collection of simulation studies and illustrated through lung cancer eQTL examples.
Collapse
Affiliation(s)
- Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, TX 77030, USA
| | - Andy Shi
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA 02215, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA 02215, USA
| |
Collapse
|
2
|
Van Buren E, Hu M, Cheng L, Wrobel J, Wilhelmsen K, Su L, Li Y, Wu D. TWO-SIGMA-G: a new competitive gene set testing framework for scRNA-seq data accounting for inter-gene and cell-cell correlation. Brief Bioinform 2022; 23:bbac084. [PMID: 35325048 PMCID: PMC9271221 DOI: 10.1093/bib/bbac084] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 02/09/2022] [Accepted: 02/17/2022] [Indexed: 11/14/2022] Open
Abstract
We propose TWO-SIGMA-G, a competitive gene set test for scRNA-seq data. TWO-SIGMA-G uses a mixed-effects regression model based on our previously published TWO-SIGMA to test for differential expression at the gene-level. This regression-based model provides flexibility and rigor at the gene-level in (1) handling complex experimental designs, (2) accounting for the correlation between biological replicates and (3) accommodating the distribution of scRNA-seq data to improve statistical inference. Moreover, TWO-SIGMA-G uses a novel approach to adjust for inter-gene-correlation (IGC) at the set-level to control the set-level false positive rate. Simulations demonstrate that TWO-SIGMA-G preserves type-I error and increases power in the presence of IGC compared with other methods. Application to two datasets identified HIV-associated interferon pathways in xenograft mice and pathways associated with Alzheimer's disease progression in humans.
Collapse
Affiliation(s)
- Eric Van Buren
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation
| | - Liang Cheng
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill
- Department of Microbiology and Immunology, The University of North Carolina at Chapel Hill
- Frontier Science Center for Immunology and Metabolism, Medical Research Institute, Wuhan University
| | - John Wrobel
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill
| | - Kirk Wilhelmsen
- Departments of Genetics and Neurology, Renaissance Computing Institute, University of North Carolina at Chapel Hill
| | - Lishan Su
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill
- Department of Microbiology and Immunology, The University of North Carolina at Chapel Hill
- Departments of Pharmacology, Microbiology & Immunology University of Maryland School of Medicine
| | - Yun Li
- Department of Biostatistics, The University of North Carolina at Chapel Hill
- Department of Genetics, The University of North Carolina at Chapel Hill
- Department of Computer Science, The University of North Carolina at Chapel Hill
| | - Di Wu
- Department of Biostatistics, The University of North Carolina at Chapel Hill
- Department of Computer Science, The University of North Carolina at Chapel Hill
| |
Collapse
|
3
|
Helms L, Marchiano S, Stanaway IB, Hsiang TY, Juliar BA, Saini S, Zhao YT, Khanna A, Menon R, Alakwaa F, Mikacenic C, Morrell ED, Wurfel MM, Kretzler M, Harder JL, Murry CE, Himmelfarb J, Ruohola-Baker H, Bhatraju PK, Gale M, Freedman BS. Cross-validation of SARS-CoV-2 responses in kidney organoids and clinical populations. JCI Insight 2021; 6:e154882. [PMID: 34767537 PMCID: PMC8783682 DOI: 10.1172/jci.insight.154882] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 11/10/2021] [Indexed: 11/17/2022] Open
Abstract
Kidneys are critical target organs of COVID-19, but susceptibility and responses to infection remain poorly understood. Here, we combine SARS-CoV-2 variants with genome-edited kidney organoids and clinical data to investigate tropism, mechanism, and therapeutics. SARS-CoV-2 specifically infects organoid proximal tubules among diverse cell types. Infections produce replicating virus, apoptosis, and disrupted cell morphology, features of which are revealed in the context of polycystic kidney disease. Cross-validation of gene expression patterns in organoids reflects proteomic signatures of COVID-19 in the urine of critically ill patients indicating interferon pathway upregulation. SARS-CoV-2 viral variants alpha, beta, gamma, kappa, and delta exhibit comparable levels of infection in organoids. Infection is ameliorated in ACE2-/- organoids and blocked via treatment with de novo-designed spike binder peptides. Collectively, these studies clarify the impact of kidney infection in COVID-19 as reflected in organoids and clinical populations, enabling assessment of viral fitness and emerging therapies.
Collapse
Affiliation(s)
- Louisa Helms
- Department of Medicine
- Division of Nephrology
- Kidney Research Institute
- Institute for Stem Cell and Regenerative Medicine
- Department of Laboratory Medicine and Pathology
| | - Silvia Marchiano
- Department of Medicine
- Institute for Stem Cell and Regenerative Medicine
- Department of Laboratory Medicine and Pathology
- Division of Cardiology
- Center for Cardiovascular Biology
| | - Ian B. Stanaway
- Department of Medicine
- Division of Nephrology
- Kidney Research Institute
| | - Tien-Ying Hsiang
- Center for Innate Immunity and Immune Disease, Department of Immunology
| | - Benjamin A. Juliar
- Department of Medicine
- Division of Nephrology
- Kidney Research Institute
- Institute for Stem Cell and Regenerative Medicine
| | - Shally Saini
- Institute for Stem Cell and Regenerative Medicine
- Department of Biochemistry; and
| | - Yan Ting Zhao
- Institute for Stem Cell and Regenerative Medicine
- Department of Biochemistry; and
- Department of Oral Health Sciences, School of Dentistry, University of Washington School of Medicine, Seattle, Washington, USA
| | - Akshita Khanna
- Institute for Stem Cell and Regenerative Medicine
- Department of Laboratory Medicine and Pathology
- Center for Cardiovascular Biology
| | - Rajasree Menon
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA
| | - Fadhl Alakwaa
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA
| | - Carmen Mikacenic
- Department of Medicine
- Translational Research, Benaroya Research Institute, Seattle, Washington, USA
- Division of Pulmonary, Critical Care and Sleep Medicine, University of Washington School of Medicine, Seattle, Washington, USA
| | - Eric D. Morrell
- Department of Medicine
- Division of Pulmonary, Critical Care and Sleep Medicine, University of Washington School of Medicine, Seattle, Washington, USA
| | - Mark M. Wurfel
- Department of Medicine
- Division of Pulmonary, Critical Care and Sleep Medicine, University of Washington School of Medicine, Seattle, Washington, USA
| | - Matthias Kretzler
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Jennifer L. Harder
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA
| | - Charles E. Murry
- Department of Medicine
- Institute for Stem Cell and Regenerative Medicine
- Department of Laboratory Medicine and Pathology
- Division of Cardiology
- Center for Cardiovascular Biology
- Sana Biotechnology, Seattle, Washington, USA
| | | | - Hannele Ruohola-Baker
- Institute for Stem Cell and Regenerative Medicine
- Department of Biochemistry; and
- Department of Oral Health Sciences, School of Dentistry, University of Washington School of Medicine, Seattle, Washington, USA
- Department of Bioengineering, University of Washington, Seattle, Washington, USA
| | - Pavan K. Bhatraju
- Department of Medicine
- Kidney Research Institute
- Division of Pulmonary, Critical Care and Sleep Medicine, University of Washington School of Medicine, Seattle, Washington, USA
| | - Michael Gale
- Center for Innate Immunity and Immune Disease, Department of Immunology
| | - Benjamin S. Freedman
- Department of Medicine
- Division of Nephrology
- Kidney Research Institute
- Institute for Stem Cell and Regenerative Medicine
- Department of Laboratory Medicine and Pathology
- Department of Bioengineering, University of Washington, Seattle, Washington, USA
| |
Collapse
|
4
|
Zeng P, Dai J, Jin S, Zhou X. Aggregating multiple expression prediction models improves the power of transcriptome-wide association studies. Hum Mol Genet 2021; 30:939-951. [PMID: 33615361 DOI: 10.1093/hmg/ddab056] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 02/10/2021] [Accepted: 02/15/2021] [Indexed: 12/11/2022] Open
Abstract
Transcriptome-wide association study (TWAS) is an important integrative method for identifying genes that are causally associated with phenotypes. A key step of TWAS involves the construction of expression prediction models for every gene in turn using its cis-SNPs as predictors. Different TWAS methods rely on different models for gene expression prediction, and each such model makes a distinct modeling assumption that is often suitable for a particular genetic architecture underlying expression. However, the genetic architectures underlying gene expression vary across genes throughout the transcriptome. Consequently, different TWAS methods may be beneficial in detecting genes with distinct genetic architectures. Here, we develop a new method, HMAT, which aggregates TWAS association evidence obtained across multiple gene expression prediction models by leveraging the harmonic mean P-value combination strategy. Because each expression prediction model is suited to capture a particular genetic architecture, aggregating TWAS associations across prediction models as in HMAT improves accurate expression prediction and enables subsequent powerful TWAS analysis across the transcriptome. A key feature of HMAT is its ability to accommodate the correlations among different TWAS test statistics and produce calibrated P-values after aggregation. Through numerical simulations, we illustrated the advantage of HMAT over commonly used TWAS methods as well as ad hoc P-value combination rules such as Fisher's method. We also applied HMAT to analyze summary statistics of nine common diseases. In the real data applications, HMAT was on average 30.6% more powerful compared to the next best method, detecting many new disease-associated genes that were otherwise not identified by existing TWAS approaches. In conclusion, HMAT represents a flexible and powerful TWAS method that enjoys robust performance across a range of genetic architectures underlying gene expression.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Jing Dai
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Siyi Jin
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
5
|
Xiao L, Yuan Z, Jin S, Wang T, Huang S, Zeng P. Multiple-Tissue Integrative Transcriptome-Wide Association Studies Discovered New Genes Associated With Amyotrophic Lateral Sclerosis. Front Genet 2020; 11:587243. [PMID: 33329728 PMCID: PMC7714931 DOI: 10.3389/fgene.2020.587243] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Accepted: 10/26/2020] [Indexed: 12/12/2022] Open
Abstract
Genome-wide association studies (GWAS) have identified multiple causal genes associated with amyotrophic lateral sclerosis (ALS); however, the genetic architecture of ALS remains completely unknown and a large number of causal genes have yet been discovered. To full such gap in part, we implemented an integrative analysis of transcriptome-wide association study (TWAS) for ALS to prioritize causal genes with summary statistics from 80,610 European individuals and employed 13 GTEx brain tissues as reference transcriptome panels. The summary-level TWAS analysis with single brain tissue was first undertaken and then a flexible p-value combination strategy, called summary data-based Cauchy Aggregation TWAS (SCAT), was proposed to pool association signals from single-tissue TWAS analysis while protecting against highly positive correlation among tests. Extensive simulations demonstrated SCAT can produce well-calibrated p-value for the control of type I error and was often much more powerful to identify association signals across various scenarios compared with single-tissue TWAS analysis. Using SCAT, we replicated three ALS-associated genes (i.e., ATXN3, SCFD1, and C9orf72) identified in previous GWASs and discovered additional five genes (i.e., SLC9A8, FAM66D, TRIP11, JUP, and RP11-529H20.6) which were not reported before. Furthermore, we discovered the five associations were largely driven by genes themselves and thus might be new genes which were likely related to the risk of ALS. However, further investigations are warranted to verify these results and untangle the pathophysiological function of the genes in developing ALS.
Collapse
Affiliation(s)
- Lishun Xiao
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Siyi Jin
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|