2
|
Zhang L, Kim I. Finite mixtures of semiparametric Bayesian survival kernel machine regressions: Application to breast cancer gene pathway subgroup analysis. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Lin Zhang
- Department of Statistics Virginia Tech Blacksburg VAUSA
| | - Inyoung Kim
- Department of Statistics Virginia Tech Blacksburg VAUSA
| |
Collapse
|
3
|
Xu Y, Kim I, Carroll RJ. A hybrid omnibus test for generalized semiparametric single-index models with high-dimensional covariate sets. Biometrics 2019; 75:757-767. [PMID: 30859553 DOI: 10.1111/biom.13054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Accepted: 02/26/2019] [Indexed: 11/27/2022]
Abstract
Numerous statistical methods have been developed for analyzing high-dimensional data. These methods often focus on variable selection approaches but are limited for the purpose of testing with high-dimensional data. They are often required to have explicit-likelihood functions. In this article, we propose a "hybrid omnibus test" for high-dicmensional data testing purpose with much weaker requirements. Our hybrid omnibus test is developed under a semiparametric framework where a likelihood function is no longer necessary. Our test is a version of a frequentist-Bayesian hybrid score-type test for a generalized partially linear single-index model, which has a link function being a function of a set of variables through a generalized partially linear single index. We propose an efficient score based on estimating equations, define local tests, and then construct our hybrid omnibus test using local tests. We compare our approach with an empirical-likelihood ratio test and Bayesian inference based on Bayes factors, using simulation studies. Our simulation results suggest that our approach outperforms the others, in terms of type I error, power, and computational cost in both the low- and high-dimensional cases. The advantage of our approach is demonstrated by applying it to genetic pathway data for type II diabetes mellitus.
Collapse
Affiliation(s)
- Yangyi Xu
- Department of Statistics, Virginia Tech., Blacksburg, Virginia
| | - Inyoung Kim
- Department of Statistics, Virginia Tech., Blacksburg, Virginia
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, Texas.,School of Mathematical and Physical Sciences, University of Technology, Sydney, Sydney, Broadway, NSW, Australia
| |
Collapse
|
4
|
Zhang L, Kim I. Semiparametric Bayesian kernel survival model for evaluating pathway effects. Stat Methods Med Res 2018; 28:3301-3317. [PMID: 30289021 DOI: 10.1177/0962280218797360] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Massive amounts of high-dimensional data have been accumulated over the past two decades, which has cultured increasing interests in identifying gene pathways related to certain biological processes. In particular, since pathway-based analysis has the ability to detect subtle changes of differentially expressed genes that could be missed when using gene-based analysis, detecting the gene pathways that regulate certain diseases can provide new strategies for medical procedures and new targets for drug discovery. Limited work has been carried out, primarily in regression settings, to study the effects of pathways on survival outcomes. Motivated by a breast cancer gene-pathway data set, which exhibits the "small n, large p" characteristics, we propose a semiparametric Bayesian kernel survival model (s-BKSurv) to study the effects of both clinical covariates and gene expression levels within a pathway on survival time. We model the unknown high-dimensional functions of pathways via Gaussian kernel machine to consider the possibility that genes within the same pathway interact with each other. To address the multiple comparisons problem under a full Bayesian setting, we propose a similarity-dependent procedure based on Bayes factor to control the family-wise error rate. We demonstrate the outperformance of our approach under various simulation settings and pathways data.
Collapse
Affiliation(s)
- Lin Zhang
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Inyoung Kim
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| |
Collapse
|
5
|
B. Sibley A, Li Z, Jiang Y, Li YJ, Chan C, Allen A, Owzar K. Facilitating the Calculation of the Efficient Score Using Symbolic Computing. AM STAT 2018; 72:199-205. [DOI: 10.1080/00031305.2017.1392361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
| | - Zhiguo Li
- Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC
| | - Yu Jiang
- Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC
| | - Yi-Ju Li
- Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC
| | - Cliburn Chan
- Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC
| | - Andrew Allen
- Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC
| | - Kouros Owzar
- Duke Cancer Institute, Duke University Medical Center, Durham, NC
- Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC
| |
Collapse
|
6
|
Marceau R, Lu W, Holloway S, Sale MM, Worrall BB, Williams SR, Hsu FC, Tzeng JY. A Fast Multiple-Kernel Method With Applications to Detect Gene-Environment Interaction. Genet Epidemiol 2015; 39:456-68. [PMID: 26139508 DOI: 10.1002/gepi.21909] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Revised: 05/10/2015] [Accepted: 05/20/2015] [Indexed: 01/27/2023]
Abstract
Kernel machine (KM) models are a powerful tool for exploring associations between sets of genetic variants and complex traits. Although most KM methods use a single kernel function to assess the marginal effect of a variable set, KM analyses involving multiple kernels have become increasingly popular. Multikernel analysis allows researchers to study more complex problems, such as assessing gene-gene or gene-environment interactions, incorporating variance-component based methods for population substructure into rare-variant association testing, and assessing the conditional effects of a variable set adjusting for other variable sets. The KM framework is robust, powerful, and provides efficient dimension reduction for multifactor analyses, but requires the estimation of high dimensional nuisance parameters. Traditional estimation techniques, including regularization and the "expectation-maximization (EM)" algorithm, have a large computational cost and are not scalable to large sample sizes needed for rare variant analysis. Therefore, under the context of gene-environment interaction, we propose a computationally efficient and statistically rigorous "fastKM" algorithm for multikernel analysis that is based on a low-rank approximation to the nuisance effect kernel matrices. Our algorithm is applicable to various trait types (e.g., continuous, binary, and survival traits) and can be implemented using any existing single-kernel analysis software. Through extensive simulation studies, we show that our algorithm has similar performance to an EM-based KM approach for quantitative traits while running much faster. We also apply our method to the Vitamin Intervention for Stroke Prevention (VISP) clinical trial, examining gene-by-vitamin effects on recurrent stroke risk and gene-by-age effects on change in homocysteine level.
Collapse
Affiliation(s)
- Rachel Marceau
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Shannon Holloway
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Michèle M Sale
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America.,Department of Medicine, University of Virginia, Charlottesville, Virginia, United States of America.,Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia, United States of America
| | - Bradford B Worrall
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America.,Department of Neurology, University of Virginia, Charlottesville, Virginia, United States of America
| | - Stephen R Williams
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America.,Cardiovascular Research Center, University of Virginia, Charlottesville, Virginia, United States of America
| | - Fang-Chi Hsu
- Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America.,Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America.,Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| |
Collapse
|
7
|
Dellinger AE, Nixon AB, Pang H. Integrative Pathway Analysis Using Graph-Based Learning with Applications to TCGA Colon and Ovarian Data. Cancer Inform 2014; 13:1-9. [PMID: 25125969 PMCID: PMC4125381 DOI: 10.4137/cin.s13634] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Revised: 03/17/2014] [Accepted: 03/18/2014] [Indexed: 12/15/2022] Open
Abstract
Recent method development has included multi-dimensional genomic data algorithms because such methods have more accurately predicted clinical phenotypes related to disease. This study is the first to conduct an integrative genomic pathway-based analysis with a graph-based learning algorithm. The methodology of this analysis, graph-based semi-supervised learning, detects pathways that improve prediction of a dichotomous variable, which in this study is cancer stage. This analysis integrates genome-level gene expression, methylation, and single nucleotide polymorphism (SNP) data in serous cystadenocarcinoma (OV) and colon adenocarcinoma (COAD). The top 10 ranked predictive pathways in COAD and OV were biologically relevant to their respective cancer stages and significantly enhanced prediction accuracy and area under the ROC curve (AUC) when compared to single data-type analyses. This method is an effective way to simultaneously predict binary clinical phenotypes and discover their biological mechanisms.
Collapse
Affiliation(s)
- Andrew E Dellinger
- Department of Mathematics and Statistics, Elon University, Elon, NC, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
| | - Andrew B Nixon
- Department of Medicine, Division of Medical Oncology, Duke University School of Medicine, Durham, NC, USA
| | - Herbert Pang
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|