1
|
Ghosal R, Matabuena M, Zhang J. Functional proportional hazards mixture cure model with applications in cancer mortality in NHANES and post ICU recovery. Stat Methods Med Res 2023; 32:2254-2269. [PMID: 37855203 DOI: 10.1177/09622802231206472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2023]
Abstract
We develop a functional proportional hazards mixture cure model with scalar and functional covariates measured at the baseline. The mixture cure model, useful in studying populations with a cure fraction of a particular event of interest is extended to functional data. We employ the expectation-maximization algorithm and develop a semiparametric penalized spline-based approach to estimate the dynamic functional coefficients of the incidence and the latency part. The proposed method is computationally efficient and simultaneously incorporates smoothness in the estimated functional coefficients via roughness penalty. Simulation studies illustrate a satisfactory performance of the proposed method in accurately estimating the model parameters and the baseline survival function. Finally, the clinical potential of the model is demonstrated in two real data examples that incorporate rich high-dimensional biomedical signals as functional covariates measured at the baseline and constitute novel domains to apply cure survival models in contemporary medical situations. In particular, we analyze (i) minute-by-minute physical activity data from the National Health And Nutrition Examination Survey 2003-2006 to study the association between diurnal patterns of physical activity at baseline and all cancer mortality through 2019 while adjusting for other biological factors; (ii) the impact of daily functional measures of disease severity collected in the intensive care unit on post intensive care unit recovery and mortality event. Our findings provide novel epidemiological insights into the association between daily patterns of physical activity and cancer mortality. Software implementation and illustration of the proposed estimation method are provided in R.
Collapse
Affiliation(s)
- Rahul Ghosal
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA
| | - Marcos Matabuena
- Department of Biostatistics, Harvard University T. H. Chan School of Public Health, Boston, MA, USA
| | - Jiajia Zhang
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA
| |
Collapse
|
2
|
Huang TJ, Luedtke A, McKeague IW. EFFICIENT ESTIMATION OF THE MAXIMAL ASSOCIATION BETWEEN MULTIPLE PREDICTORS AND A SURVIVAL OUTCOME. Ann Stat 2023; 51:1965-1988. [PMID: 38405375 PMCID: PMC10888526 DOI: 10.1214/23-aos2313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much remains to be done to make the methods both reliable and computationally-scalable in high-dimensions. Machine learning tools are commonly used to provide predictions of survival outcomes, but the estimated effect of a selected predictor suffers from confirmation bias unless the selection is taken into account. The new approach involves the construction of semi-parametrically efficient estimators of the linear association between the predictors and the survival outcome, which are used to build a test statistic for detecting the presence of an association between any of the predictors and the outcome. Further, a stabilization technique reminiscent of bagging allows a normal calibration for the resulting test statistic, which enables the construction of confidence intervals for the maximal association between predictors and the outcome and also greatly reduces computational cost. Theoretical results show that this testing procedure is valid even when the number of predictors grows superpolynomially with sample size, and our simulations support this asymptotic guarantee at moderate sample sizes. The new approach is applied to the problem of identifying patterns in viral gene expression associated with the potency of an antiviral drug.
Collapse
Affiliation(s)
- Tzu-Jung Huang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center
| | - Alex Luedtke
- Department of Statistics, University of Washington
| | | |
Collapse
|
3
|
Lu F, Huang X, Lu X, Tian G, Yang J. Model detection for semiparametric accelerated failure additive model with right-censored data. Stat Methods Med Res 2023; 32:1527-1542. [PMID: 37338958 DOI: 10.1177/09622802231181224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2023]
Abstract
Censored data frequently appeared in applications across a variety of different areas like epidemiology or medical research. Traditionally statistical inference on this data mechanism was based on some pre-assigned models that will suffer from the risk of model-misspecification. This article proposes a two-folded shrinkage procedure for simultaneous structure identification and variable selection of the semiparametric accelerated failure additive model with right-censored data, in which the nonparametric functions are addressed by spline approximation. Under some regularity conditions, the consistency of model structure identification is theoretically established in the sense that the proposed method can automatically separate the linear and zero components from the nonlinear ones with probability approaching to one. Detailed issues in computation and turning parameter selection are also discussed. Finally, we illustrate the proposed method by some simulation studies and two real data applications to the primary biliary cirrhosis data and skin cutaneous melanoma data.
Collapse
Affiliation(s)
- Fang Lu
- MOE-LCSM, School of Mathematics and Statistics, Hunan Normal University, Changsha, China
| | - Xiaoyan Huang
- MOE-LCSM, School of Mathematics and Statistics, Hunan Normal University, Changsha, China
| | - Xuewen Lu
- Department of Mathematics and Statistics, University of Calgary, Canada
| | - Guoliang Tian
- Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen, China Fang Lu and Xiaoyan Huang are joint first authors
| | - Jing Yang
- MOE-LCSM, School of Mathematics and Statistics, Hunan Normal University, Changsha, China
| |
Collapse
|
4
|
Sun L, Li S, Wang L, Song X, Sui X. Simultaneous variable selection in regression analysis of multivariate interval-censored data. Biometrics 2022; 78:1402-1413. [PMID: 34407218 DOI: 10.1111/biom.13548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 05/13/2021] [Accepted: 08/03/2021] [Indexed: 12/30/2022]
Abstract
Multivariate interval-censored data arise when each subject under study can potentially experience multiple events and the onset time of each event is not observed exactly but is known to lie in a certain time interval formed by adjacent examination times with changed statuses of the event. This type of incomplete and complex data structure poses a substantial challenge in practical data analysis. In addition, many potential risk factors exist in numerous studies. Thus, conducting variable selection for event-specific covariates simultaneously becomes useful in identifying important variables and assessing their effects on the events of interest. In this paper, we develop a variable selection technique for multivariate interval-censored data under a general class of semiparametric transformation frailty models. The minimum information criterion (MIC) method is embedded in the optimization step of the proposed expectation-maximization (EM) algorithm to obtain the parameter estimator. The proposed EM algorithm greatly reduces the computational burden in maximizing the observed likelihood function, and the MIC naturally avoids selecting the optimal tuning parameter as needed in many other popular penalties, making the proposed algorithm promising and reliable. The proposed method is evaluated through extensive simulation studies and illustrated by an analysis of patient data from the Aerobics Center Longitudinal Study.
Collapse
Affiliation(s)
- Liuquan Sun
- School of Economics and Statistics, Guangzhou University, Guangzhou, China.,Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Shuwei Li
- School of Economics and Statistics, Guangzhou University, Guangzhou, China
| | - Lianming Wang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| | - Xinyuan Song
- Department of Statistics, Chinese University of Hong Kong, Hong Kong
| | - Xuemei Sui
- Department of Exercise Science, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina, USA
| |
Collapse
|
5
|
He X, Pan X, Tan KM, Zhou WX. Scalable estimation and inference for censored quantile regression process. Ann Stat 2022. [DOI: 10.1214/22-aos2214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Xuming He
- Department of Statistics, University of Michigan
| | - Xiaoou Pan
- Department of Mathematical Sciences, University of California, San Diego
| | | | - Wen-Xin Zhou
- Department of Mathematical Sciences, University of California, San Diego
| |
Collapse
|
6
|
Yin W, Zhao SD, Liang F. Bayesian penalized Buckley-James method for high dimensional bivariate censored regression models. LIFETIME DATA ANALYSIS 2022; 28:282-318. [PMID: 35239126 DOI: 10.1007/s10985-022-09549-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Accepted: 01/22/2022] [Indexed: 06/14/2023]
Abstract
For high dimensional gene expression data, one important goal is to identify a small number of genes that are associated with progression of the disease or survival of the patients. In this paper, we consider the problem of variable selection for multivariate survival data. We propose an estimation procedure for high dimensional accelerated failure time (AFT) models with bivariate censored data. The method extends the Buckley-James method by minimizing a penalized [Formula: see text] loss function with a penalty function induced from a bivariate spike-and-slab prior specification. In the proposed algorithm, censored observations are imputed using the Kaplan-Meier estimator, which avoids a parametric assumption on the error terms. Our empirical studies demonstrate that the proposed method provides better performance compared to the alternative procedures designed for univariate survival data regardless of whether the true events are correlated or not, and conceptualizes a formal way of handling bivariate survival data for AFT models. Findings from the analysis of a myeloma clinical trial using the proposed method are also presented.
Collapse
Affiliation(s)
- Wenjing Yin
- Department of Statistics, University of Illinois, Urbana-Champaign, Champaign, IL, USA
| | - Sihai Dave Zhao
- Department of Statistics, University of Illinois, Urbana-Champaign, Champaign, IL, USA
| | - Feng Liang
- Department of Statistics, University of Illinois, Urbana-Champaign, Champaign, IL, USA.
| |
Collapse
|
7
|
Suder PM, Molstad AJ. Scalable algorithms for semiparametric accelerated failure time models in high dimensions. Stat Med 2022; 41:933-949. [PMID: 35014701 DOI: 10.1002/sim.9264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 09/21/2021] [Accepted: 10/29/2021] [Indexed: 11/11/2022]
Abstract
Semiparametric accelerated failure time (AFT) models are a useful alternative to Cox proportional hazards models, especially when the assumption of constant hazard ratios is untenable. However, rank-based criteria for fitting AFT models are often nondifferentiable, which poses a computational challenge in high-dimensional settings. In this article, we propose a new alternating direction method of multipliers algorithm for fitting semiparametric AFT models by minimizing a penalized rank-based loss function. Our algorithm scales well in both the number of subjects and number of predictors, and can easily accommodate a wide range of popular penalties. To improve the selection of tuning parameters, we propose a new criterion which avoids some common problems in cross-validation with censored responses. Through extensive simulation studies, we show that our algorithm and software is much faster than existing methods (which can only be applied to special cases), and we show that estimators which minimize a penalized rank-based criterion often outperform alternative estimators which minimize penalized weighted least squares criteria. Application to nine cancer datasets further demonstrates that rank-based estimators of semiparametric AFT models are competitive with estimators assuming proportional hazards in high-dimensional settings, whereas weighted least squares estimators are often not. A software package implementing the algorithm, along with a set of auxiliary functions, is available for download at github.com/ajmolstad/penAFT.
Collapse
Affiliation(s)
- Piotr M Suder
- Department of Statistics, University of Florida, Gainesville, Florida, USA
| | - Aaron J Molstad
- Department of Statistics, University of Florida, Gainesville, Florida, USA.,Genetics Institute, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
8
|
Cheng C, Feng X, Huang J, Jiao Y, Zhang S. ℓ0-Regularized high-dimensional accelerated failure time model. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
9
|
Xiong J, He W. Identification of survival relevant genes with measurement error in gene expression incorporated. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.2004424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Juan Xiong
- Health Science Center, Shengzhen University, Shengzhen, Guangdong, P. R. China
| | - Wenqing He
- University of Western Ontario, London, Ontario, Canada
| |
Collapse
|
10
|
Choi T, Choi S. A fast algorithm for the accelerated failure time model with high-dimensional time-to-event data. J STAT COMPUT SIM 2021. [DOI: 10.1080/00949655.2021.1927034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Taehwa Choi
- Department of Statistics, Korea University, Seoul, South Korea
| | - Sangbum Choi
- Department of Statistics, Korea University, Seoul, South Korea
| |
Collapse
|
11
|
Affiliation(s)
- Mingyue Du
- Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong China
| | - Jianguo Sun
- Department of Statistics University of Missouri Columbia MO 65211 USA
| |
Collapse
|
12
|
Spirko-Burns L, Devarajan K. Supervised Dimension Reduction for Large-Scale "Omics" Data With Censored Survival Outcomes Under Possible Non-Proportional Hazards. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2032-2044. [PMID: 31940547 DOI: 10.1109/tcbb.2020.2965934] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The past two decades have witnessed significant advances in high-throughput "omics" technologies such as genomics, proteomics, metabolomics, transcriptomics and radiomics. These technologies have enabled simultaneous measurement of the expression levels of tens of thousands of features from individual patient samples and have generated enormous amounts of data that require analysis and interpretation. One specific area of interest has been in studying the relationship between these features and patient outcomes, such as overall and recurrence-free survival, with the goal of developing a predictive "omics" profile. Large-scale studies often suffer from the presence of a large fraction of censored observations and potential time-varying effects of features, and methods for handling them have been lacking. In this paper, we propose supervised methods for feature selection and survival prediction that simultaneously deal with both issues. Our approach utilizes continuum power regression (CPR) - a framework that includes a variety of regression methods - in conjunction with the parametric or semi-parametric accelerated failure time (AFT) model. Both CPR and AFT fall within the linear models framework and, unlike black-box models, the proposed prognostic index has a simple yet useful interpretation. We demonstrate the utility of our methods using simulated and publicly available cancer genomics data.
Collapse
|
13
|
Affiliation(s)
- Rahim Alhamzawi
- Department of Statistics, University of Al-Qadisiyah, Al Diwaniyah, Iraq
| |
Collapse
|
14
|
Sun Z, Liu Y, Chen K, Li G. Broken adaptive ridge regression for right-censored survival data. ANN I STAT MATH 2021. [DOI: 10.1007/s10463-021-00794-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
15
|
Liu Y, Chen X, Li G. A new joint screening method for right-censored time-to-event data with ultra-high dimensional covariates. Stat Methods Med Res 2020; 29:1499-1513. [PMID: 31359834 PMCID: PMC8285086 DOI: 10.1177/0962280219864710] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In an ultra-high dimensional setting with a huge number of covariates, variable screening is useful for dimension reduction before applying a more refined method for model selection and statistical analysis. This paper proposes a new sure joint screening procedure for right-censored time-to-event data based on a sparsity-restricted semiparametric accelerated failure time model. Our method, referred to as Buckley-James assisted sure screening (BJASS), consists of an initial screening step using a sparsity-restricted least-squares estimate based on a synthetic time variable and a refinement screening step using a sparsity-restricted least-squares estimate with the Buckley-James imputed event times. The refinement step may be repeated several times to obtain more stable results. We show that with any fixed number of refinement steps, the BJASS procedure retains all important variables with probability tending to 1. Simulation results are presented to illustrate its performance in comparison with some marginal screening methods. Real data examples are provided using a diffuse large-B-cell lymphoma (DLBCL) data and a breast cancer data. We have implemented the BJASS method using Matlab and made it available to readers through Github https://github.com/yiucla/BJASS .
Collapse
Affiliation(s)
- Yi Liu
- School of Mathematical Sciences, Ocean University of China, Qingdao, China
| | - Xiaolin Chen
- School of Statistics, Qufu Normal University, Qufu, China
| | - Gang Li
- Department of Biostatistics, University of California at Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
16
|
Zhao H, Wu Q, Gilbert PB, Chen YQ, Sun J. A regularized estimation approach for case-cohort periodic follow-up studies with an application to HIV vaccine trials. Biom J 2020; 62:1176-1191. [PMID: 32080888 DOI: 10.1002/bimj.201900180] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 11/21/2019] [Accepted: 11/27/2019] [Indexed: 11/05/2022]
Abstract
This paper discusses regression analysis of the failure time data arising from case-cohort periodic follow-up studies, and one feature of such data, which makes their analysis much more difficult, is that they are usually interval-censored rather than right-censored. Although some methods have been developed for general failure time data, there does not seem to exist an established procedure for the situation considered here. To address the problem, we present a semiparametric regularized procedure and develop a simple algorithm for the implementation of the proposed method. In addition, unlike some existing procedures for similar situations, the proposed procedure is shown to have the oracle property, and an extensive simulation is conducted and it suggests that the presented approach seems to work well for practical situations. The method is applied to an HIV vaccine trial that motivated this study.
Collapse
Affiliation(s)
- Hui Zhao
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, P. R. China
| | - Qiwei Wu
- Department of Statistics, University of Missouri, Columbia, MO, USA
| | - Peter B Gilbert
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center & Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Ying Q Chen
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center & Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Jianguo Sun
- Department of Statistics, University of Missouri, Columbia, MO, USA
| |
Collapse
|
17
|
Li S, Wu Q, Sun J. Penalized estimation of semiparametric transformation models with interval-censored data and application to Alzheimer's disease. Stat Methods Med Res 2019; 29:2151-2166. [PMID: 31718478 DOI: 10.1177/0962280219884720] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Variable selection or feature extraction is fundamental to identify important risk factors from a large number of covariates and has applications in many fields. In particular, its applications in failure time data analysis have been recognized and many methods have been proposed for right-censored data. However, developing relevant methods for variable selection becomes more challenging when one confronts interval censoring that often occurs in practice. In this article, motivated by an Alzheimer's disease study, we develop a variable selection method for interval-censored data with a general class of semiparametric transformation models. Specifically, a novel penalized expectation-maximization algorithm is developed to maximize the complex penalized likelihood function, which is shown to perform well in the finite-sample situation through a simulation study. The proposed methodology is then applied to the interval-censored data arising from the Alzheimer's disease study mentioned above.
Collapse
Affiliation(s)
- Shuwei Li
- School of Economics and Statistics, Guangzhou University, Guangzhou, China
| | - Qiwei Wu
- Department of Statistics, University of Missouri, Columbia, MO, USA
| | - Jianguo Sun
- Department of Statistics, University of Missouri, Columbia, MO, USA
| |
Collapse
|
18
|
Huang TJ, McKeague IW, Qian M. Marginal screening for high-dimensional predictors of survival outcomes. Stat Sin 2019; 29:2105-2139. [PMID: 31938013 PMCID: PMC6959482 DOI: 10.5705/ss.202017.0298] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
This study develops a marginal screening test to detect the presence of significant predictors for a right-censored time-to-event outcome under a high-dimensional accelerated failure time (AFT) model. Establishing a rigorous screening test in this setting is challenging, because of the right censoring and the post-selection inference. In the latter case, an implicit variable selection step needs to be included to avoid inflating the Type-I error. A prior study solved this problem by constructing an adaptive resampling test under an ordinary linear regression. To accommodate right censoring, we develop a new approach based on a maximally selected Koul-Susarla-Van Ryzin estimator from a marginal AFT working model. A regularized bootstrap method is used to calibrate the test. Our test is more powerful and less conservative than both a Bonferroni correction of the marginal tests and other competing methods. The proposed method is evaluated in simulation studies and applied to two real data sets.
Collapse
Affiliation(s)
| | | | - Min Qian
- Department of Biostatistics, Columbia University
| |
Collapse
|
19
|
Maity AK, Bhattacharya A, Mallick BK, Baladandayuthapani V. Bayesian data integration and variable selection for pan-cancer survival prediction using protein expression data. Biometrics 2019; 76:316-325. [PMID: 31393003 DOI: 10.1111/biom.13132] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 07/19/2019] [Indexed: 12/20/2022]
Abstract
Accurate prognostic prediction using molecular information is a challenging area of research, which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes, like the survival time of patients. There are considerable statistical and computational challenges due to the large dimension of the problems. Furthermore, data are available for different tumor types; hence data integration for various tumors is desirable. Having censored survival outcomes escalates one more level of complexity in the inferential procedure. We develop Bayesian hierarchical survival models, which accommodate all the challenges mentioned here. We use the hierarchical Bayesian accelerated failure time model for survival regression. Furthermore, we assume sparse horseshoe prior distribution for the regression coefficients to identify the major proteomic drivers. We borrow strength across tumor groups by introducing a correlation structure among the prior distributions. The proposed methods have been used to analyze data from the recently curated "The Cancer Proteome Atlas" (TCPA), which contains reverse-phase protein arrays-based high-quality protein expression data as well as detailed clinical annotation, including survival times. Our simulation and the TCPA data analysis illustrate the efficacy of the proposed integrative model, which links different tumors with the correlated prior structures.
Collapse
Affiliation(s)
- Arnab Kumar Maity
- Early Clinical Development Oncology Statistics, Pfizer Inc., San Diego, California
| | | | - Bani K Mallick
- Department of Statistics, Texas A&M University, College Station, Texas
| | | |
Collapse
|
20
|
Wang H, Li G. Extreme learning machine Cox model for high-dimensional survival analysis. Stat Med 2019; 38:2139-2156. [PMID: 30632193 PMCID: PMC6498851 DOI: 10.1002/sim.8090] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 10/11/2018] [Accepted: 12/12/2018] [Indexed: 11/07/2022]
Abstract
Some interesting recent studies have shown that neural network models are useful alternatives in modeling survival data when the assumptions of a classical parametric or semiparametric survival model such as the Cox (1972) model are seriously violated. However, to the best of our knowledge, the plausibility of adapting the emerging extreme learning machine (ELM) algorithm for single-hidden-layer feedforward neural networks to survival analysis has not been explored. In this paper, we present a kernel ELM Cox model regularized by an L0 -based broken adaptive ridge (BAR) penalization method. Then, we demonstrate that the resulting method, referred to as ELMCoxBAR, can outperform some other state-of-art survival prediction methods such as L1 - or L2 -regularized Cox regression, random survival forest with various splitting rules, and boosted Cox model, in terms of its predictive performance using both simulated and real world datasets. In addition to its good predictive performance, we illustrate that the proposed method has a key computational advantage over the above competing methods in terms of computation time efficiency using an a real-world ultra-high-dimensional survival data.
Collapse
Affiliation(s)
- Hong Wang
- School of Mathematics and Statistics, Central South University, Changsha, China
| | - Gang Li
- Department of Biostatistics, UCLA Fielding School of Public Health, University of California, Los Angeles, California
| |
Collapse
|
21
|
Park E, Ha ID. Penalized variable selection for accelerated failure time models with random effects. Stat Med 2019; 38:878-892. [PMID: 30411376 DOI: 10.1002/sim.8023] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 09/22/2018] [Accepted: 10/11/2018] [Indexed: 11/07/2022]
Abstract
Accelerated failure time (AFT) models allowing for random effects are linear mixed models under the log-transformation of survival time with censoring and describe dependence in correlated survival data. It is well known that the AFT models are useful alternatives to frailty models. To the best of our knowledge, however, there is no literature on variable selection methods for such AFT models. In this paper, we propose a simple but unified variable-selection procedure of fixed effects in the AFT random-effect models using penalized h-likelihood (HL). We consider four penalty functions (ie, least absolute shrinkage and selection operator (LASSO), adaptive LASSO, smoothly clipped absolute deviation (SCAD), and HL). We show that the proposed method can be easily implemented via a slight modification to existing h-likelihood estimation procedures. We thus demonstrate that the proposed method can also be easily extended to AFT models with multilevel (or nested) structures. Simulation studies also show that the procedure using the adaptive LASSO, SCAD, or HL penalty performs well. In particular, we find via the simulation results that the variable selection method with HL penalty provides a higher probability of choosing the true model than other three methods. The usefulness of the new method is illustrated using two actual datasets from multicenter clinical trials.
Collapse
Affiliation(s)
- Eunyoung Park
- Department of Statistics, Pukyong National University, Busan, South Korea
| | - Il Do Ha
- Department of Statistics, Pukyong National University, Busan, South Korea
| |
Collapse
|
22
|
Chai H, Zhang Q, Huang J, Ma S. INFERENCE FOR LOW-DIMENSIONAL COVARIATES IN A HIGH-DIMENSIONAL ACCELERATED FAILURE TIME MODEL. Stat Sin 2019; 29:877-894. [PMID: 31073263 DOI: 10.5705/ss.202016.0449] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Data with high-dimensional covariates are now commonly encountered. Compared to other types of responses, research on high-dimensional data with censored survival responses is still relatively limited, and most of the existing studies have been focused on estimation and variable selection. In this study, we consider data with a censored survival response, a set of low-dimensional covariates of main interest, and a set of high-dimensional covariates that may also affect survival. The accelerated failure time model is adopted to describe survival. The goal is to conduct inference for the effects of low-dimensional covariates, while properly accounting for the high-dimensional covariates. A penalization-based procedure is developed, and its validity is established under mild and widely adopted conditions. Simulation suggests satisfactory performance of the proposed procedure, and the analysis of two cancer genetic datasets demonstrates its practical applicability.
Collapse
|
23
|
Soret P, Avalos M, Wittkop L, Commenges D, Thiébaut R. Lasso regularization for left-censored Gaussian outcome and high-dimensional predictors. BMC Med Res Methodol 2018; 18:159. [PMID: 30514234 PMCID: PMC6280495 DOI: 10.1186/s12874-018-0609-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 11/02/2018] [Indexed: 12/14/2022] Open
Abstract
Background Biological assays for the quantification of markers may suffer from a lack of sensitivity and thus from an analytical detection limit. This is the case of human immunodeficiency virus (HIV) viral load. Below this threshold the exact value is unknown and values are consequently left-censored. Statistical methods have been proposed to deal with left-censoring but few are adapted in the context of high-dimensional data. Methods We propose to reverse the Buckley-James least squares algorithm to handle left-censored data enhanced with a Lasso regularization to accommodate high-dimensional predictors. We present a Lasso-regularized Buckley-James least squares method with both non-parametric imputation using Kaplan-Meier and parametric imputation based on the Gaussian distribution, which is typically assumed for HIV viral load data after logarithmic transformation. Cross-validation for parameter-tuning is based on an appropriate loss function that takes into account the different contributions of censored and uncensored observations. We specify how these techniques can be easily implemented using available R packages. The Lasso-regularized Buckley-James least square method was compared to simple imputation strategies to predict the response to antiretroviral therapy measured by HIV viral load according to the HIV genotypic mutations. We used a dataset composed of several clinical trials and cohorts from the Forum for Collaborative HIV Research (HIV Med. 2008;7:27-40). The proposed methods were also assessed on simulated data mimicking the observed data. Results Approaches accounting for left-censoring outperformed simple imputation methods in a high-dimensional setting. The Gaussian Buckley-James method with cross-validation based on the appropriate loss function showed the lowest prediction error on simulated data and, using real data, the most valid results according to the current literature on HIV mutations. Conclusions The proposed approach deals with high-dimensional predictors and left-censored outcomes and has shown its interest for predicting HIV viral load according to HIV mutations.
Collapse
Affiliation(s)
- Perrine Soret
- Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, UMR 1219, Bordeaux, F-33000, France.,Inria SISTM Team, Talence, F-33405, France.,Vaccine Research Institute (VRI), Créteil, F-94000, France
| | - Marta Avalos
- Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, UMR 1219, Bordeaux, F-33000, France. .,Inria SISTM Team, Talence, F-33405, France.
| | - Linda Wittkop
- Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, UMR 1219, Bordeaux, F-33000, France.,Inria SISTM Team, Talence, F-33405, France.,CHU Bordeaux, Department of Public Health, Bordeaux, F-33000, France
| | - Daniel Commenges
- Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, UMR 1219, Bordeaux, F-33000, France.,Inria SISTM Team, Talence, F-33405, France
| | - Rodolphe Thiébaut
- Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, UMR 1219, Bordeaux, F-33000, France.,Inria SISTM Team, Talence, F-33405, France.,Vaccine Research Institute (VRI), Créteil, F-94000, France.,CHU Bordeaux, Department of Public Health, Bordeaux, F-33000, France
| |
Collapse
|
24
|
Penalized variable selection for accelerated failure time models. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2018. [DOI: 10.29220/csam.2018.25.6.591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
25
|
Shen H, Chai H, Li M, Zhou Z, Liang Y, Yang Z, Huang H, Liu X, Zhang B. Robust sparse accelerated failure time model for survival analysis. Technol Health Care 2018; 26:55-63. [PMID: 29689755 PMCID: PMC6004954 DOI: 10.3233/thc-174141] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
To identify the bio-mark genes related to disease with high dimension and low sample size gene expression data, various regression approaches with different regularization methods have been proposed to solve this problem. Nevertheless, high-noises in biological data significantly reduce the performances of methods. The accelerated failure time (AFT) modelwas designed for gene selection and survival time estimation in cancer survival analysis. In this article, we proposed a novel robust sparse accelerated failure time model (RS-AFT) through combining the least absolute deviation (LAD) and Lq regularization. An iterative weighted linear programming algorithm without regularization parameter tuning was proposed to solve this RS-AFT model. The results of the experiments show our method has better performancebothin gene selection and survival time estimationthan some widely used regularization methods such as lasso, elastic net and SCAD. Hence we thought the RS-AFT model may be a competitive regularization method in cancer survival analysis.
Collapse
Affiliation(s)
| | | | | | | | - Yong Liang
- Corresponding author: Yong Liang, Faculty of Information Technology and State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau 999078, China. Tel.: +853 63869506; Fax: +853 88972034; E-mail: .
| | | | | | | | | |
Collapse
|
26
|
Abstract
In modeling censored data, survival forest models are a competitive nonparametric alternative to traditional parametric or semiparametric models when the function forms are possibly misspecified or the underlying assumptions are violated. In this work, we propose a survival forest approach with trees constructed using a novel pseudo R2 splitting rules. By studying the well-known benchmark data sets, we find that the proposed model generally outperforms popular survival models such as random survival forest with different splitting rules, Cox proportional hazard model, and generalized boosted model in terms of C-index metric.
Collapse
Affiliation(s)
- Hong Wang
- School of Mathematics and Statistics, Central South University, Changsha, China
| | - Xiaolin Chen
- School of Statistics, Qufu Normal University, Qufu, China
| | - Gang Li
- Department of Biostatistics, School of Public Health, University of California at Los Angeles, Los Angeles, California
| |
Collapse
|
27
|
Gorfine M, Berndt SI, Chang-Claude J, Hoffmeister M, Le Marchand L, Potter J, Slattery ML, Keret N, Peters U, Hsu L. Heritability Estimation using a Regularized Regression Approach (HERRA): Applicable to continuous, dichotomous or age-at-onset outcome. PLoS One 2017; 12:e0181269. [PMID: 28813438 PMCID: PMC5559077 DOI: 10.1371/journal.pone.0181269] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 06/28/2017] [Indexed: 01/06/2023] Open
Abstract
The popular Genome-wide Complex Trait Analysis (GCTA) software uses the random-effects models for estimating the narrow-sense heritability based on GWAS data of unrelated individuals without knowing and identifying the causal loci. Many methods have since extended this approach to various situations. However, since the proportion of causal loci among the variants is typically very small and GCTA uses all variants to calculate the similarities among individuals, the estimation of heritability may be unstable, resulting in a large variance of the estimates. Moreover, if the causal SNPs are not genotyped, GCTA sometimes greatly underestimates the true heritability. We present a novel narrow-sense heritability estimator, named HERRA, using well-developed ultra-high dimensional machine-learning methods, applicable to continuous or dichotomous outcomes, as other existing methods. Additionally, HERRA is applicable to time-to-event or age-at-onset outcome, which, to our knowledge, no existing method can handle. Compared to GCTA and LDAK for continuous and binary outcomes, HERRA often has a smaller variance, and when causal SNPs are not genotyped, HERRA has a much smaller empirical bias. We applied GCTA, LDAK and HERRA to a large colorectal cancer dataset using dichotomous outcome (4,312 cases, 4,356 controls, genotyped using Illumina 300K), the respective heritability estimates of GCTA, LDAK and HERRA are 0.068 (SE = 0.017), 0.072 (SE = 0.021) and 0.110 (SE = 5.19 x 10−3). HERRA yields over 50% increase in heritability estimate compared to GCTA or LDAK.
Collapse
Affiliation(s)
- Malka Gorfine
- Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv, Israel
- * E-mail: (MG); (LH)
| | - Sonja I. Berndt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Jenny Chang-Claude
- Division of Cancer Epidemiology, German Cancer Research Center, Heidelberg, Germany
| | - Michael Hoffmeister
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center, Heidelberg, Germany
| | - Loic Le Marchand
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - John Potter
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Martha L. Slattery
- Department of Internal Medicine, University of Utah Health Sciences Center, Salt Lake City, Utah, United States of America
| | - Nir Keret
- Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv, Israel
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- * E-mail: (MG); (LH)
| |
Collapse
|
28
|
Attallah O, Karthikesalingam A, Holt PJE, Thompson MM, Sayers R, Bown MJ, Choke EC, Ma X. Feature selection through validation and un-censoring of endovascular repair survival data for predicting the risk of re-intervention. BMC Med Inform Decis Mak 2017; 17:115. [PMID: 28774329 PMCID: PMC5543447 DOI: 10.1186/s12911-017-0508-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 07/24/2017] [Indexed: 12/25/2022] Open
Abstract
Background Feature selection (FS) process is essential in the medical area as it reduces the effort and time needed for physicians to measure unnecessary features. Choosing useful variables is a difficult task with the presence of censoring which is the unique characteristic in survival analysis. Most survival FS methods depend on Cox’s proportional hazard model; however, machine learning techniques (MLT) are preferred but not commonly used due to censoring. Techniques that have been proposed to adopt MLT to perform FS with survival data cannot be used with the high level of censoring. The researcher’s previous publications proposed a technique to deal with the high level of censoring. It also used existing FS techniques to reduce dataset dimension. However, in this paper a new FS technique was proposed and combined with feature transformation and the proposed uncensoring approaches to select a reduced set of features and produce a stable predictive model. Methods In this paper, a FS technique based on artificial neural network (ANN) MLT is proposed to deal with highly censored Endovascular Aortic Repair (EVAR). Survival data EVAR datasets were collected during 2004 to 2010 from two vascular centers in order to produce a final stable model. They contain almost 91% of censored patients. The proposed approach used a wrapper FS method with ANN to select a reduced subset of features that predict the risk of EVAR re-intervention after 5 years to patients from two different centers located in the United Kingdom, to allow it to be potentially applied to cross-centers predictions. The proposed model is compared with the two popular FS techniques; Akaike and Bayesian information criteria (AIC, BIC) that are used with Cox’s model. Results The final model outperforms other methods in distinguishing the high and low risk groups; as they both have concordance index and estimated AUC better than the Cox’s model based on AIC, BIC, Lasso, and SCAD approaches. These models have p-values lower than 0.05, meaning that patients with different risk groups can be separated significantly and those who would need re-intervention can be correctly predicted. Conclusion The proposed approach will save time and effort made by physicians to collect unnecessary variables. The final reduced model was able to predict the long-term risk of aortic complications after EVAR. This predictive model can help clinicians decide patients’ future observation plan. Electronic supplementary material The online version of this article (doi:10.1186/s12911-017-0508-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Omneya Attallah
- School of Engineering and Applied Science, Aston University, B4 7ET, Birmingham, UK.,Department of Electronics and Communications, College of Engineering and Technology, Arab Academy for Science and Technology, Alexandria, Egypt
| | | | | | | | - Rob Sayers
- St George's Vascular Institute, St George's University Hospitals NHS Foundation Trust, Blackshaw Road, London, SW17 0QT, UK
| | - Matthew J Bown
- Vascular Surgery Group, University of Leicester, Leicester, UK
| | - Eddie C Choke
- Vascular Surgery Group, Robert Kilpatrick Clinical Sciences Building, Leicester Royal Infirmary, University of Leicester, Leicester, LE2 7LX, UK
| | - Xianghong Ma
- School of Engineering and Applied Science, Aston University, B4 7ET, Birmingham, UK.
| |
Collapse
|
29
|
|
30
|
Das U, Ebrahimi N. Covariate selection for accelerated failure time data. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2015.1078475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Ujjwal Das
- Indian Institute of Management, Udaipur, Rajasthan, India
| | - Nader Ebrahimi
- Division of Statistics, Northern Illinois University, DeKalb, IL, USA
| |
Collapse
|
31
|
Zhao Y, Chung M, Johnson BA, Moreno CS, Long Q. Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence. J Am Stat Assoc 2017; 111:1427-1439. [PMID: 28435175 DOI: 10.1080/01621459.2016.1164051] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Our work is motivated by a prostate cancer study aimed at identifying mRNA and miRNA biomarkers that are predictive of cancer recurrence after prostatectomy. It has been shown in the literature that incorporating known biological information on pathway memberships and interactions among biomarkers improves feature selection of high-dimensional biomarkers in relation to disease risk. Biological information is often represented by graphs or networks, in which biomarkers are represented by nodes and interactions among them are represented by edges; however, biological information is often not fully known. For example, the role of microRNAs (miRNAs) in regulating gene expression is not fully understood and the miRNA regulatory network is not fully established, in which case new strategies are needed for feature selection. To this end, we treat unknown biological information as missing data (i.e., missing edges in graphs), different from commonly encountered missing data problems where variable values are missing. We propose a new concept of imputing unknown biological information based on observed data and define the imputed information as the novel biological information. In addition, we propose a hierarchical group penalty to encourage sparsity and feature selection at both the pathway level and the within-pathway level, which, combined with the imputation step, allows for incorporation of known and novel biological information. While it is applicable to general regression settings, we develop and investigate the proposed approach in the context of semiparametric accelerated failure time models motivated by our data example. Data application and simulation studies show that incorporation of novel biological information improves performance in risk prediction and feature selection and the proposed penalty outperforms the extensions of several existing penalties.
Collapse
Affiliation(s)
- Yize Zhao
- Postdoctoral Fellow, Statistical and Applied Mathematical Sciences Institute, Research Triangle Park, NC 27709
| | - Matthias Chung
- Assistant Professor, Department of Mathematics, Virginia Tech, Blacksburg, VA 24061
| | - Brent A Johnson
- Associate Professor, Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14642
| | - Carlos S Moreno
- Associate Professor, Department of Pathology and Laboratory Medicine
| | - Qi Long
- Associate Professor, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322
| |
Collapse
|
32
|
Xia X, Jiang B, Li J, Zhang W. Low-dimensional confounder adjustment and high-dimensional penalized estimation for survival analysis. LIFETIME DATA ANALYSIS 2016; 22:547-569. [PMID: 26463818 DOI: 10.1007/s10985-015-9350-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Accepted: 10/05/2015] [Indexed: 06/05/2023]
Abstract
High-throughput profiling is now common in biomedical research. In this paper we consider the layout of an etiology study composed of a failure time response, and gene expression measurements. In current practice, a widely adopted approach is to select genes according to a preliminary marginal screening and a follow-up penalized regression for model building. Confounders, including for example clinical risk factors and environmental exposures, usually exist and need to be properly accounted for. We propose covariate-adjusted screening and variable selection procedures under the accelerated failure time model. While penalizing the high-dimensional coefficients to achieve parsimonious model forms, our procedure also properly adjust the low-dimensional confounder effects to achieve more accurate estimation of regression coefficients. We establish the asymptotic properties of our proposed methods and carry out simulation studies to assess the finite sample performance. Our methods are illustrated with a real gene expression data analysis where proper adjustment of confounders produces more meaningful results.
Collapse
Affiliation(s)
- Xiaochao Xia
- College of Mathematics and Statistics, Chongqing University, Chongqing, China.
| | - Binyan Jiang
- Department of Applied Mathematics, Hong Kong Polytechnic University, Hong Kong, China
| | - Jialiang Li
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
| | - Wenyang Zhang
- Department of Mathematics, University of York, York, United Kingdom
| |
Collapse
|
33
|
Kim S, Halabi S. High Dimensional Variable Selection with Error Control. BIOMED RESEARCH INTERNATIONAL 2016; 2016:8209453. [PMID: 27597974 PMCID: PMC5002494 DOI: 10.1155/2016/8209453] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2016] [Accepted: 05/25/2016] [Indexed: 11/17/2022]
Abstract
Background. The iterative sure independence screening (ISIS) is a popular method in selecting important variables while maintaining most of the informative variables relevant to the outcome in high throughput data. However, it not only is computationally intensive but also may cause high false discovery rate (FDR). We propose to use the FDR as a screening method to reduce the high dimension to a lower dimension as well as controlling the FDR with three popular variable selection methods: LASSO, SCAD, and MCP. Method. The three methods with the proposed screenings were applied to prostate cancer data with presence of metastasis as the outcome. Results. Simulations showed that the three variable selection methods with the proposed screenings controlled the predefined FDR and produced high area under the receiver operating characteristic curve (AUROC) scores. In applying these methods to the prostate cancer example, LASSO and MCP selected 12 and 8 genes and produced AUROC scores of 0.746 and 0.764, respectively. Conclusions. We demonstrated that the variable selection methods with the sequential use of FDR and ISIS not only controlled the predefined FDR in the final models but also had relatively high AUROC scores.
Collapse
Affiliation(s)
- Sangjin Kim
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Box 2717, Durham, NC 27710, USA
| | - Susan Halabi
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Box 2717, Durham, NC 27710, USA
| |
Collapse
|
34
|
Bang S, Eo SH, Cho YM, Jhun M, Cho H. Non-crossing weighted kernel quantile regression with right censored data. LIFETIME DATA ANALYSIS 2016; 22:100-121. [PMID: 25511333 DOI: 10.1007/s10985-014-9314-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Accepted: 12/02/2014] [Indexed: 06/04/2023]
Abstract
Regarding survival data analysis in regression modeling, multiple conditional quantiles are useful summary statistics to assess covariate effects on survival times. In this study, we consider an estimation problem of multiple nonlinear quantile functions with right censored survival data. To account for censoring in estimating a nonlinear quantile function, weighted kernel quantile regression (WKQR) has been developed by using the kernel trick and inverse-censoring-probability weights. However, the individually estimated quantile functions based on the WKQR often cross each other and consequently violate the basic properties of quantiles. To avoid this problem of quantile crossing, we propose the non-crossing weighted kernel quantile regression (NWKQR), which estimates multiple nonlinear conditional quantile functions simultaneously by enforcing the non-crossing constraints on kernel coefficients. The numerical results are presented to demonstrate the competitive performance of the proposed NWKQR over the WKQR.
Collapse
Affiliation(s)
- Sungwan Bang
- Department of Mathematics, Korea Military Academy, P.O. Box 77, Seoul, Republic of Korea
| | - Soo-Heang Eo
- Department of Statistics, Korea University, Seoul, 136-701, Republic of Korea
| | - Yong Mee Cho
- Department of Pathology, Asan Medical Center, Seoul, 138-736, Republic of Korea
| | - Myoungshic Jhun
- Department of Statistics, Korea University, Seoul, 136-701, Republic of Korea
| | - HyungJun Cho
- Department of Statistics, Korea University, Seoul, 136-701, Republic of Korea.
| |
Collapse
|
35
|
Lu CL, Wang S, Ji Z, Wu Y, Xiong L, Jiang X, Ohno-Machado L. WebDISCO: a web service for distributed cox model learning without patient-level data sharing. J Am Med Inform Assoc 2015; 22:1212-9. [PMID: 26159465 PMCID: PMC5009917 DOI: 10.1093/jamia/ocv083] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Revised: 05/16/2015] [Accepted: 05/26/2015] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE The Cox proportional hazards model is a widely used method for analyzing survival data. To achieve sufficient statistical power in a survival analysis, it usually requires a large amount of data. Data sharing across institutions could be a potential workaround for providing this added power. METHODS AND MATERIALS The authors develop a web service for distributed Cox model learning (WebDISCO), which focuses on the proof-of-concept and algorithm development for federated survival analysis. The sensitive patient-level data can be processed locally and only the less-sensitive intermediate statistics are exchanged to build a global Cox model. Mathematical derivation shows that the proposed distributed algorithm is identical to the centralized Cox model. RESULTS The authors evaluated the proposed framework at the University of California, San Diego (UCSD), Emory, and Duke. The experimental results show that both distributed and centralized models result in near-identical model coefficients with differences in the range [Formula: see text] to [Formula: see text]. The results confirm the mathematical derivation and show that the implementation of the distributed model can achieve the same results as the centralized implementation. LIMITATION The proposed method serves as a proof of concept, in which a publicly available dataset was used to evaluate the performance. The authors do not intend to suggest that this method can resolve policy and engineering issues related to the federated use of institutional data, but they should serve as evidence of the technical feasibility of the proposed approach.Conclusions WebDISCO (Web-based Distributed Cox Regression Model; https://webdisco.ucsd-dbmi.org:8443/cox/) provides a proof-of-concept web service that implements a distributed algorithm to conduct distributed survival analysis without sharing patient level data.
Collapse
Affiliation(s)
- Chia-Lun Lu
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Shuang Wang
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Zhanglong Ji
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Yuan Wu
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, 27708, USA
| | - Li Xiong
- Department of Mathematics & Computer Science, Emory University, Atlanta, GA 30322, USA. Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , , Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| |
Collapse
|
36
|
Wu C, Ma S. A selective review of robust variable selection with applications in bioinformatics. Brief Bioinform 2015; 16:873-83. [PMID: 25479793 PMCID: PMC4570200 DOI: 10.1093/bib/bbu046] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Revised: 10/20/2014] [Indexed: 11/13/2022] Open
Abstract
A drastic amount of data have been and are being generated in bioinformatics studies. In the analysis of such data, the standard modeling approaches can be challenged by the heavy-tailed errors and outliers in response variables, the contamination in predictors (which may be caused by, for instance, technical problems in microarray gene expression studies), model mis-specification and others. Robust methods are needed to tackle these challenges. When there are a large number of predictors, variable selection can be as important as estimation. As a generic variable selection and regularization tool, penalization has been extensively adopted. In this article, we provide a selective review of robust penalized variable selection approaches especially designed for high-dimensional data from bioinformatics and biomedical studies. We discuss the robust loss functions, penalty functions and computational algorithms. The theoretical properties and implementation are also briefly examined. Application examples of the robust penalization approaches in representative bioinformatics and biomedical studies are also illustrated.
Collapse
|
37
|
The L1/2 regularization approach for survival analysis in the accelerated failure time model. Comput Biol Med 2015; 64:283-90. [DOI: 10.1016/j.compbiomed.2014.09.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2013] [Revised: 09/02/2014] [Accepted: 09/05/2014] [Indexed: 02/08/2023]
|
38
|
Zhao SD, Li Y. Score test variable screening. Biometrics 2014; 70:862-71. [PMID: 25124197 PMCID: PMC4427573 DOI: 10.1111/biom.12209] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Revised: 05/01/2014] [Accepted: 06/01/2014] [Indexed: 11/27/2022]
Abstract
Variable screening has emerged as a crucial first step in the analysis of high-throughput data, but existing procedures can be computationally cumbersome, difficult to justify theoretically, or inapplicable to certain types of analyses. Motivated by a high-dimensional censored quantile regression problem in multiple myeloma genomics, this article makes three contributions. First, we establish a score test-based screening framework, which is widely applicable, extremely computationally efficient, and relatively simple to justify. Secondly, we propose a resampling-based procedure for selecting the number of variables to retain after screening according to the principle of reproducibility. Finally, we propose a new iterative score test screening method which is closely related to sparse regression. In simulations we apply our methods to four different regression models and show that they can outperform existing procedures. We also apply score test screening to an analysis of gene expression data from multiple myeloma patients using a censored quantile regression model to identify high-risk genes.
Collapse
Affiliation(s)
- Sihai Dave Zhao
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois 61820, U.S.A
| | | |
Collapse
|
39
|
Huang X, Ning J, Wahed AS. Optimization of individualized dynamic treatment regimes for recurrent diseases. Stat Med 2014; 33:2363-78. [PMID: 24510534 PMCID: PMC4043865 DOI: 10.1002/sim.6104] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2013] [Revised: 01/14/2014] [Accepted: 01/15/2014] [Indexed: 11/10/2022]
Abstract
Patients with cancer or other recurrent diseases may undergo a long process of initial treatment, disease recurrences, and salvage treatments. It is important to optimize the multi-stage treatment sequence in this process to maximally prolong patients' survival. Comparing disease-free survival for each treatment stage over penalizes disease recurrences but under penalizes treatment-related mortalities. Moreover, treatment regimes used in practice are dynamic; that is, the choice of next treatment depends on a patient's responses to previous therapies. In this article, using accelerated failure time models, we develop a method to optimize such dynamic treatment regimes. This method utilizes all the longitudinal data collected during the multi-stage process of disease recurrences and treatments, and identifies the optimal dynamic treatment regime for each individual patient by maximizing his or her expected overall survival. We illustrate the application of this method using data from a study of acute myeloid leukemia, for which the optimal treatment strategies for different patient subgroups are identified.
Collapse
Affiliation(s)
- Xuelin Huang
- Department of Biostatistics, The University of Texas MD
Anderson Cancer Center, Houston, TX 77230
| | - Jing Ning
- Department of Biostatistics, The University of Texas MD
Anderson Cancer Center, Houston, TX 77230
| | - Abdus S. Wahed
- Department of Biostatistics, The University of Pittsburgh,
Pittsburgh, PA 15260
| |
Collapse
|
40
|
On the maximum penalized likelihood approach for proportional hazard models with right censored survival data. Comput Stat Data Anal 2014. [DOI: 10.1016/j.csda.2014.01.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
41
|
Adjusted regularized estimation in the accelerated failure time model with high dimensional covariates. J MULTIVARIATE ANAL 2013. [DOI: 10.1016/j.jmva.2013.07.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
42
|
Chung M, Long Q, Johnson BA. A Tutorial on Rank-based Coefficient Estimation for Censored Data in Small- and Large-Scale Problems. STATISTICS AND COMPUTING 2013; 23:601-614. [PMID: 23956500 PMCID: PMC3742389 DOI: 10.1007/s11222-012-9333-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
The analysis of survival endpoints subject to right-censoring is an important research area in statistics, particularly among econometricians and biostatisticians. The two most popular semiparametric models are the proportional hazards model and the accelerated failure time (AFT) model. Rank-based estimation in the AFT model is computationally challenging due to optimization of a non-smooth loss function. Previous work has shown that rank-based estimators may be written as solutions to linear programming (LP) problems. However, the size of the LP problem is O(n2 + p) subject to n2 linear constraints, where n denotes sample size and p denotes the dimension of parameters. As n and/or p increases, the feasibility of such solution in practice becomes questionable. Among data mining and statistical learning enthusiasts, there is interest in extending ordinary regression coefficient estimators for low-dimensions into high-dimensional data mining tools through regularization. Applying this recipe to rank-based coefficient estimators leads to formidable optimization problems which may be avoided through smooth approximations to non-smooth functions. We review smooth approximations and quasi-Newton methods for rank-based estimation in AFT models. The computational cost of our method is substantially smaller than the corresponding LP problem and can be applied to small- or large-scale problems similarly. The algorithm described here allows one to couple rank-based estimation for censored data with virtually any regularization and is exemplified through four case studies.
Collapse
Affiliation(s)
- Matthias Chung
- Department of Mathematics, Texas State University, San Marcos, TX 78666, U.S.A
| | - Qi Long
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, U.S.A
| | - Brent A. Johnson
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, U.S.A
| |
Collapse
|
43
|
Tong X, Zhu L, Leng C, Leisenring W, Robison LL. A general semiparametric hazards regression model: efficient estimation and structure selection. Stat Med 2013; 32:4980-94. [PMID: 23824784 DOI: 10.1002/sim.5885] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2012] [Accepted: 05/28/2013] [Indexed: 11/06/2022]
Abstract
We consider a general semiparametric hazards regression model that encompasses the Cox proportional hazards model and the accelerated failure time model for survival analysis. To overcome the nonexistence of the maximum likelihood, we derive a kernel-smoothed profile likelihood function and prove that the resulting estimates of the regression parameters are consistent and achieve semiparametric efficiency. In addition, we develop penalized structure selection techniques to determine which covariates constitute the accelerated failure time model and which covariates constitute the proportional hazards model. The proposed method is able to estimate the model structure consistently and model parameters efficiently. Furthermore, variance estimation is straightforward. The proposed estimation performs well in simulation studies and is applied to the analysis of a real data set.
Collapse
Affiliation(s)
- Xingwei Tong
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| | | | | | | | | |
Collapse
|
44
|
Minnier J, Tian L, Cai T. A Perturbation Method for Inference on Regularized Regression Estimates. J Am Stat Assoc 2012; 106:1371-1382. [PMID: 22844171 DOI: 10.1198/jasa.2011.tm10382] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Analysis of high dimensional data often seeks to identify a subset of important features and assess their effects on the outcome. Traditional statistical inference procedures based on standard regression methods often fail in the presence of high-dimensional features. In recent years, regularization methods have emerged as promising tools for analyzing high dimensional data. These methods simultaneously select important features and provide stable estimation of their effects. Adaptive LASSO and SCAD for instance, give consistent and asymptotically normal estimates with oracle properties. However, in finite samples, it remains difficult to obtain interval estimators for the regression parameters. In this paper, we propose perturbation resampling based procedures to approximate the distribution of a general class of penalized parameter estimates. Our proposal, justified by asymptotic theory, provides a simple way to estimate the covariance matrix and confidence regions. Through finite sample simulations, we verify the ability of this method to give accurate inference and compare it to other widely used standard deviation and confidence interval estimates. We also illustrate our proposals with a data set used to study the association of HIV drug resistance and a large number of genetic mutations.
Collapse
Affiliation(s)
- Jessica Minnier
- Ph.D. candidate, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115
| | | | | |
Collapse
|
45
|
|
46
|
Zhao XG, Dai W, Li Y, Tian L. AUC-based biomarker ensemble with an application on gene scores predicting low bone mineral density. ACTA ACUST UNITED AC 2011; 27:3050-5. [PMID: 21908541 DOI: 10.1093/bioinformatics/btr516] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
MOTIVATION The area under the receiver operating characteristic (ROC) curve (AUC), long regarded as a 'golden' measure for the predictiveness of a continuous score, has propelled the need to develop AUC-based predictors. However, the AUC-based ensemble methods are rather scant, largely due to the fact that the associated objective function is neither continuous nor concave. Indeed, there is no reliable numerical algorithm identifying optimal combination of a set of biomarkers to maximize the AUC, especially when the number of biomarkers is large. RESULTS We have proposed a novel AUC-based statistical ensemble methods for combining multiple biomarkers to differentiate a binary response of interest. Specifically, we propose to replace the non-continuous and non-convex AUC objective function by a convex surrogate loss function, whose minimizer can be efficiently identified. With the established framework, the lasso and other regularization techniques enable feature selections. Extensive simulations have demonstrated the superiority of the new methods to the existing methods. The proposal has been applied to a gene expression dataset to construct gene expression scores to differentiate elderly women with low bone mineral density (BMD) and those with normal BMD. The AUCs of the resulting scores in the independent test dataset has been satisfactory. CONCLUSION Aiming for directly maximizing AUC, the proposed AUC-based ensemble method provides an efficient means of generating a stable combination of multiple biomarkers, which is especially useful under the high-dimensional settings. CONTACT lutian@stanford.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- X G Zhao
- Department of Bone and Joint Surgery, The First Affiliated Hospital of Xi'an Medical University, Xi'an 710077, Shaanxi Province, PR China
| | | | | | | |
Collapse
|
47
|
Long Q, Chung M, Moreno CS, Johnson BA. Risk Prediction for Prostate Cancer Recurrence Through Regularized Estimation with Simultaneous Adjustment for Nonlinear Clinical Effects. Ann Appl Stat 2011; 5:2003-2023. [PMID: 22081781 PMCID: PMC3212400 DOI: 10.1214/11-aoas458] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
In biomedical studies, it is of substantial interest to develop risk prediction scores using high-dimensional data such as gene expression data for clinical endpoints that are subject to censoring. In the presence of well-established clinical risk factors, investigators often prefer a procedure that also adjusts for these clinical variables. While accelerated failure time (AFT) models are a useful tool for the analysis of censored outcome data, it assumes that covariate effects on the logarithm of time-to-event are linear, which is often unrealistic in practice. We propose to build risk prediction scores through regularized rank estimation in partly linear AFT models, where high-dimensional data such as gene expression data are modeled linearly and important clinical variables are modeled nonlinearly using penalized regression splines. We show through simulation studies that our model has better operating characteristics compared to several existing models. In particular, we show that there is a non-negligible effect on prediction as well as feature selection when nonlinear clinical effects are misspecified as linear. This work is motivated by a recent prostate cancer study, where investigators collected gene expression data along with established prognostic clinical variables and the primary endpoint is time to prostate cancer recurrence. We analyzed the prostate cancer data and evaluated prediction performance of several models based on the extended c statistic for censored data, showing that 1) the relationship between the clinical variable, prostate specific antigen, and the prostate cancer recurrence is likely nonlinear, i.e., the time to recurrence decreases as PSA increases and it starts to level off when PSA becomes greater than 11; 2) correct specification of this nonlinear effect improves performance in prediction and feature selection; and 3) addition of gene expression data does not seem to further improve the performance of the resultant risk prediction scores.
Collapse
Affiliation(s)
- Qi Long
- Department of Biostatistics and Bioinformatics Emory University Atlanta, GA 30322, USA
| | - Matthias Chung
- Department of Mathematics Texas State University San Marcos, TX 78666, USA
| | - Carlos S. Moreno
- Department of Pathology and Laboratory Medicine Emory University Atlanta, GA 30322, USA
| | - Brent A. Johnson
- Department of Biostatistics and Bioinformatics Emory University Atlanta, GA 30322, USA
| |
Collapse
|
48
|
|
49
|
Abstract
Dimension reduction, model and variable selection are ubiquitous concepts in modern statistical science and deriving new methods beyond the scope of current methodology is noteworthy. This article briefly reviews existing regularization methods for penalized least squares and likelihood for survival data and their extension to a certain class of penalized estimating function. We show that if one's goal is to estimate the entire regularized coefficient path using the observed survival data, then all current strategies fail for the Buckley-James estimating function. We propose a novel two-stage method to estimate and restore the entire Dantzig-regularized coefficient path for censored outcomes in a least-squares framework. We apply our methods to a microarray study of lung andenocarcinoma with sample size n = 200 and p = 1036 gene predictors and find 10 genes that are consistently selected across different criteria and an additional 14 genes that merit further investigation. In simulation studies, we found that the proposed path restoration and variable selection technique has the potential to perform as well as existing methods that begin with a proper convex loss function at the outset.
Collapse
Affiliation(s)
- Brent A Johnson
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia 30322, USA.
| | | | | |
Collapse
|
50
|
Zou Y, Zhang J, Qin G. Semiparametric Accelerated Failure Time Partial Linear Model and Its Application to Breast Cancer. Comput Stat Data Anal 2011; 55:1479-1487. [PMID: 21499529 DOI: 10.1016/j.csda.2010.10.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Breast cancer is the most common non-skin cancer in women and the second most common cause of cancer-related death in U.S. women. It is well known that the breast cancer survival varies by age at diagnosis. For most cancers, the relative survival decreases with age but breast cancer may have the unusual age pattern. In order to reveal the stage risk and age effects pattern, we propose the semiparametric accelerated failure time partial linear model and develop its estimation method based on the P-spline and the rank estimation approach. The simulation studies demonstrate that the proposed method is comparable to the parametric approach when data is not contaminated, and more stable than the parametric methods when data is contaminated. By applying the proposed model and method to the breast cancer data set of Atlantic county, New Jersey from SEER program, we successfully reveal the significant effects of stage, and show that women diagnosed around 38s have consistently higher survival rates than either younger or older women.
Collapse
Affiliation(s)
- Yubo Zou
- Department of Epidemiology and Biostatistics, University of South Carolina Columbia, SC 29208, USA
| | | | | |
Collapse
|