1
|
Luo B, Gao X, Halabi S. Penalized weighted proportional hazards model for robust variable selection and outlier detection. Stat Med 2022; 41:3398-3420. [PMID: 35581736 PMCID: PMC9283382 DOI: 10.1002/sim.9424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 03/20/2022] [Accepted: 04/12/2022] [Indexed: 11/09/2022]
Abstract
Identifying exceptional responders or nonresponders is an area of increased research interest in precision medicine as these patients may have different biological or molecular features and therefore may respond differently to therapies. Our motivation stems from a real example from a clinical trial where we are interested in characterizing exceptional prostate cancer responders. We investigate the outlier detection and robust regression problem in the sparse proportional hazards model for censored survival outcomes. The main idea is to model the irregularity of each observation by assigning an individual weight to the hazard function. By applying a LASSO-type penalty on both the model parameters and the log transformation of the weight vector, our proposed method is able to perform variable selection and outlier detection simultaneously. The optimization problem can be transformed to a typical penalized maximum partial likelihood problem and thus it is easy to implement. We further extend the proposed method to deal with the potential outlier masking problem caused by censored outcomes. The performance of the proposed estimator is demonstrated with extensive simulation studies and real data analyses in low-dimensional and high-dimensional settings.
Collapse
Affiliation(s)
- Bin Luo
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| | - Xiaoli Gao
- Department of Mathematics and Statistics, The University of North Carolina at Greensboro, Greensboro, North Carolina, USA
| | - Susan Halabi
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| |
Collapse
|
2
|
A sequential feature selection procedure for high-dimensional Cox proportional hazards model. ANN I STAT MATH 2022. [DOI: 10.1007/s10463-022-00824-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
3
|
Pan Y, Cai W, Liu Z. Inference for non-probability samples under high-dimensional covariate-adjusted superpopulation model. STAT METHOD APPL-GER 2022. [DOI: 10.1007/s10260-021-00619-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
4
|
Variable selection in partially linear additive hazards model with grouped covariates and a diverging number of parameters. Comput Stat 2021. [DOI: 10.1007/s00180-020-01062-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
5
|
Huang L, Kopciuk K, Lu X. A group bridge approach for component selection in nonparametric accelerated failure time additive regression model. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2019.1651861] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Longlong Huang
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
| | - Karen Kopciuk
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
- Department of Cancer Epidemiology and Prevention Research, Alberta Health Services, Calgary, Alberta, Canada
| | - Xuewen Lu
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
6
|
Zhang H, Huang J, Sun L. Projection‐based and cross‐validated estimation in high‐dimensional Cox model. Scand Stat Theory Appl 2021. [DOI: 10.1111/sjos.12515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Haixiang Zhang
- Center for Applied Mathematics Tianjin University Tianjin China
| | - Jian Huang
- Department of Statistics and Actuarial Science University of Iowa Iowa City Iowa USA
| | - Liuquan Sun
- Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing China
| |
Collapse
|
7
|
Tang N, Yan X, Zhao X. Penalized generalized empirical likelihood with a diverging number of general estimating equations for censored data. Ann Stat 2020. [DOI: 10.1214/19-aos1870] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
8
|
Niu Y, Wang X, Cao H, Peng Y. Variable selection via penalized generalized estimating equations for a marginal survival model. Stat Methods Med Res 2020; 29:2493-2506. [PMID: 31994449 DOI: 10.1177/0962280220901728] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Clustered and multivariate survival times, such as times to recurrent events, commonly arise in biomedical and health research, and marginal survival models are often used to model such data. When a large number of predictors are available, variable selection is always an important issue when modeling such data with a survival model. We consider a Cox's proportional hazards model for a marginal survival model. Under the sparsity assumption, we propose a penalized generalized estimating equation approach to select important variables and to estimate regression coefficients simultaneously in the marginal model. The proposed method explicitly models the correlation structure within clusters or correlated variables by using a prespecified working correlation matrix. The asymptotic properties of the estimators from the penalized generalized estimating equations are established and the number of candidate covariates is allowed to increase in the same order as the number of clusters does. We evaluate the performance of the proposed method through a simulation study and analyze two real datasets for the application.
Collapse
Affiliation(s)
- Yi Niu
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| | - Xiaoguang Wang
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| | - Hui Cao
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| | - Yingwei Peng
- Department of Public Health Sciences, Queen's University, Kingston, Canada.,Department of Mathematics and Statistics, Queen's University, Kingston, Canada
| |
Collapse
|
9
|
Xue Y, Wang H, Yan J, Schifano ED. An online updating approach for testing the proportional hazards assumption with streams of survival data. Biometrics 2019; 76:171-182. [PMID: 31424095 DOI: 10.1111/biom.13137] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Accepted: 08/07/2019] [Indexed: 11/28/2022]
Abstract
The Cox model-which remains the first choice for analyzing time-to-event data, even for large data sets-relies on the proportional hazards (PH) assumption. When survival data arrive sequentially in chunks, a fast and minimally storage intensive approach to test the PH assumption is desirable. We propose an online updating approach that updates the standard test statistic as each new block of data becomes available and greatly lightens the computational burden. Under the null hypothesis of PH, the proposed statistic is shown to have the same asymptotic distribution as the standard version computed on an entire data stream with the data blocks pooled into one data set. In simulation studies, the test and its variant based on most recent data blocks maintain their sizes when the PH assumption holds and have substantial power to detect different violations of the PH assumption. We also show in simulation that our approach can be used successfully with "big data" that exceed a single computer's computational resources. The approach is illustrated with the survival analysis of patients with lymphoma cancer from the Surveillance, Epidemiology, and End Results Program. The proposed test promptly identified deviation from the PH assumption, which was not captured by the test based on the entire data.
Collapse
Affiliation(s)
- Yishu Xue
- Department of Statistics, University of Connecticut, Storrs, Connecticut
| | - HaiYing Wang
- Department of Statistics, University of Connecticut, Storrs, Connecticut
| | - Jun Yan
- Department of Statistics, University of Connecticut, Storrs, Connecticut
| | | |
Collapse
|
10
|
Hong HG, Zheng Q, Li Y. Forward regression for Cox models with high-dimensional covariates. J MULTIVARIATE ANAL 2019; 173:268-290. [PMID: 31007300 PMCID: PMC6469712 DOI: 10.1016/j.jmva.2019.02.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Forward regression, a classical variable screening method, has been widely used for model building when the number of covariates is relatively low. However, forward regression is seldom used in high-dimensional settings because of the cumbersome computation and unknown theoretical properties. Some recent works have shown that forward regression, coupled with an extended Bayesian information criterion (EBIC)-based stopping rule, can consistently identify all relevant predictors in high-dimensional linear regression settings. However, the results are based on the sum of residual squares from linear models and it is unclear whether forward regression can be applied to more general regression settings, such as Cox proportional hazards models. We introduce a forward variable selection procedure for Cox models. It selects important variables sequentially according to the increment of partial likelihood, with an EBIC stopping rule. To our knowledge, this is the first study that investigates the partial likelihood-based forward regression in high-dimensional survival settings and establishes selection consistency results. We show that, if the dimension of the true model is finite, forward regression can discover all relevant predictors within a finite number of steps and their order of entry is determined by the size of the increment in partial likelihood. As partial likelihood is not a regular density-based likelihood, we develop some new theoretical results on partial likelihood and use these results to establish the desired sure screening properties. The practical utility of the proposed method is examined via extensive simulations and analysis of a subset of the Boston Lung Cancer Survival Cohort study, a hospital-based study for identifying biomarkers related to lung cancer patients' survival.
Collapse
Affiliation(s)
- Hyokyoung G. Hong
- Department of Statistics and Probability, Michigan State University, 19 Red Cedar Road, East Lansing, MI 48823, USA
| | - Qi Zheng
- Department of Bioinformatics and Biostatistics, University of Louisville, 485 East Gray Street, Louisville, KY 40202, USA
| | - Yi Li
- Department of Biostatistics, University of Michigan, 1415 Washington Heights Ann Arbor, MI 48109-2029, USA
| |
Collapse
|
11
|
Lv S, Jiang J, Zhou F, Huang J, Lin H. Estimating high‐dimensional additive Cox model with time‐dependent covariate processes. Scand Stat Theory Appl 2018. [DOI: 10.1111/sjos.12327] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Shaogao Lv
- Center of Statistical Research, School of Statistics Southwestern University of Finance and Economics Chengdu
| | - Jiakun Jiang
- Center of Statistical Research, School of Statistics Southwestern University of Finance and Economics Chengdu
| | - Fanyin Zhou
- Center of Statistical Research, School of Statistics Southwestern University of Finance and Economics Chengdu
| | - Jian Huang
- Department of Statistics and Actuarial Science University of Iowa
| | - Huazhen Lin
- Center of Statistical Research, School of Statistics Southwestern University of Finance and Economics Chengdu
| |
Collapse
|
12
|
Grace HH, Li Y. Feature selection of ultrahigh-dimensional covariates with survival outcomes: a selective review. APPLIED MATHEMATICS : A JOURNAL OF CHINESE UNIVERSITIES 2017; 32:379-396. [PMID: 29683128 PMCID: PMC5906071 DOI: 10.1007/s11766-017-3547-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/28/2023]
Abstract
Many modern biomedical studies have yielded survival data with high-throughput predictors. The goals of scientific research often lie in identifying predictive biomarkers, understanding biological mechanisms and making accurate and precise predictions. Variable screening is a crucial first step in achieving these goals. This work conducts a selective review of feature screening procedures for survival data with ultrahigh dimensional covariates. We present the main methodologies, along with the key conditions that ensure sure screening properties. The practical utility of these methods is examined via extensive simulations. We conclude the review with some future opportunities in this field.
Collapse
Affiliation(s)
- Hong Hyokyoung Grace
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, U.S.A
| | - Yi Li
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, U.S.A
| |
Collapse
|
13
|
Chang YM, Shen PS, Chen CS. Adaptive-Cox model averaging for right-censored data. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2016.1208237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Yu-Mei Chang
- Department of Statistics, Tunghai University, Taichung, Taiwan
| | - Pao-Sheng Shen
- Department of Statistics, Tunghai University, Taichung, Taiwan
| | - Chun-Shu Chen
- Institute of Statistics and Information Science, National Changhua University of Education, Changhua, Taiwan
| |
Collapse
|
14
|
Fang J, Liu W, Lu X. Penalised empirical likelihood for the additive hazards model with high-dimensional data. J Nonparametr Stat 2017. [DOI: 10.1080/10485252.2017.1303062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Jianglin Fang
- College of Mathematics and Computer Science, Hunan Normal University, Changsha, People's Republic of China
- College of Science, Hunan Institute of Engineering, Xiangtan, Hunan, People's Republic of China
| | - Wanrong Liu
- College of Mathematics and Computer Science, Hunan Normal University, Changsha, People's Republic of China
| | - Xuewen Lu
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
15
|
Network regularised Cox regression and multiplex network models to predict disease comorbidities and survival of cancer. Comput Biol Chem 2015; 59 Pt B:15-31. [DOI: 10.1016/j.compbiolchem.2015.08.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2015] [Revised: 08/21/2015] [Accepted: 08/25/2015] [Indexed: 12/17/2022]
|
16
|
Extended Bayesian information criterion in the Cox model with a high-dimensional feature space. ANN I STAT MATH 2014. [DOI: 10.1007/s10463-014-0448-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
17
|
Hossain S, Ahmed SE. Penalized and Shrinkage Estimation in the Cox Proportional Hazards Model. COMMUN STAT-THEOR M 2014. [DOI: 10.1080/03610926.2013.826368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
18
|
|
19
|
Xu J. Resampling-based efficient shrinkage method for non-smooth minimands. J Nonparametr Stat 2013. [DOI: 10.1080/10485252.2013.797977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
20
|
Yang JY, Yoshihara K, Tanaka K, Hatae M, Masuzaki H, Itamochi H, Takano M, Ushijima K, Tanyi JL, Coukos G, Lu Y, Mills GB, Verhaak RGW. Predicting time to ovarian carcinoma recurrence using protein markers. J Clin Invest 2013; 123:3740-50. [PMID: 23945238 DOI: 10.1172/jci68509] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2012] [Accepted: 06/06/2013] [Indexed: 12/24/2022] Open
Abstract
Patients with ovarian cancer are at high risk of tumor recurrence. Prediction of therapy outcome may provide therapeutic avenues to improve patient outcomes. Using reverse-phase protein arrays, we generated ovarian carcinoma protein expression profiles on 412 cases from TCGA and constructed a PRotein-driven index of OVARian cancer (PROVAR). PROVAR significantly discriminated an independent cohort of 226 high-grade serous ovarian carcinomas into groups of high risk and low risk of tumor recurrence as well as short-term and long-term survivors. Comparison with gene expression-based outcome classification models showed a significantly improved capacity of the protein-based PROVAR to predict tumor progression. Identification of protein markers linked to disease recurrence may yield insights into tumor biology. When combined with features known to be associated with outcome, such as BRCA mutation, PROVAR may provide clinically useful predictions of time to tumor recurrence.
Collapse
Affiliation(s)
- Ji-Yeon Yang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77230-1402, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
|
22
|
Chen B, Yu Y, Zou H, Liang H. Profiled adaptive Elastic-Net procedure for partially linear models with high-dimensional covariates. J Stat Plan Inference 2012. [DOI: 10.1016/j.jspi.2012.02.035] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
23
|
High-Dimensional Cox Regression Analysis in Genetic Studies with Censored Survival Outcomes. JOURNAL OF PROBABILITY AND STATISTICS 2012. [DOI: 10.1155/2012/478680] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
With the advancement of high-throughput technologies, nowadays high-dimensional genomic and proteomic data are easy to obtain and have become ever increasingly important in unveiling the complex etiology of many diseases. While relating a large number of factors to a survival outcome through the Cox relative risk model, various techniques have been proposed in the literature. We review some recently developed methods for such analysis. For high-dimensional variable selection in the Cox model with parametric relative risk, we consider the univariate shrinkage method (US) using the lasso penalty and the penalized partial likelihood method using the folded penalties (PPL). The penalization methods are not restricted to the finite-dimensional case. For the high-dimensional (p→∞,p≪n) or ultrahigh-dimensional case (n→∞,n≪p), both the sure independence screening (SIS) method and the extended Bayesian information criterion (EBIC) can be further incorporated into the penalization methods for variable selection. We also consider the penalization method for the Cox model with semiparametric relative risk, and the modified partial least squares method for the Cox model. The comparison of different methods is discussed and numerical examples are provided for the illustration. Finally, areas of further research are presented.
Collapse
|
24
|
|
25
|
Abstract
This paper reviews the literature on sparse high dimensional models and discusses some applications in economics and finance. Recent developments of theory, methods, and implementations in penalized least squares and penalized likelihood methods are highlighted. These variable selection methods are proved to be effective in high dimensional sparse modeling. The limits of dimensionality that regularization methods can handle, the role of penalty functions, and their statistical properties are detailed. Some recent advances in ultra-high dimensional sparse modeling are also briefly discussed.
Collapse
Affiliation(s)
- Jianqing Fan
- Bendheim Center for Finance, Princeton University, Princeton, New Jersey 08544
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544
| | - Jinchi Lv
- Information and Operations Management Department, Marshall School of Business, University of Southern California, Los Angeles, California 90089
| | - Lei Qi
- Bendheim Center for Finance, Princeton University, Princeton, New Jersey 08544
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544
| |
Collapse
|
26
|
Leng C, Liang H, Martinson N. Parametric variable selection in generalized partially linear models with an application to assess condom use by HIV-infected patients. Stat Med 2011; 30:2015-27. [PMID: 21465515 DOI: 10.1002/sim.4233] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2010] [Accepted: 02/17/2011] [Indexed: 11/05/2022]
Abstract
To study significant predictors of condom use in HIV-infected adults, we propose the use of generalized partially linear models and develop a variable selection procedure incorporating a least squares approximation. Local polynomial regression and spline smoothing techniques are used to estimate the baseline nonparametric function. The asymptotic normality of the resulting estimate is established. We further demonstrate that, with the proper choice of the penalty functions and the regularization parameter, the resulting estimate performs as well as an oracle procedure. Finite sample performance of the proposed inference procedure is assessed by Monte Carlo simulation studies. An application to assess condom use by HIV-infected patients gains some interesting results, which cannot be obtained when an ordinary logistic model is used.
Collapse
Affiliation(s)
- Chenlei Leng
- Department of Statistics and Applied Probability, National University of Singapore, Singapore.
| | | | | |
Collapse
|
27
|
Abstract
Efron, Hastie, Johnstone and Tibshirani (2004) proposed Least Angle Regression (LAR), a solution path algorithm for the least squares regression. They pointed out that a slight modification of the LAR gives the LASSO (Tibshirani, 1996) solution path. However it is largely unknown how to extend this solution path algorithm to models beyond the least squares regression. In this work, we propose an extension of the LAR for generalized linear models and the quasi-likelihood model by showing that the corresponding solution path is piecewise given by solutions of ordinary differential equation systems. Our contribution is twofold. First, we provide a theoretical understanding on how the corresponding solution path propagates. Second, we propose an ordinary differential equation based algorithm to obtain the whole solution path.
Collapse
Affiliation(s)
- Yichao Wu
- Department of Statistics, North Carolina State University, 2311 Stinson Drive, Raleigh, NC 27695
| |
Collapse
|
28
|
ANTONIADIS ANESTIS, FRYZLEWICZ PIOTR, LETUÉ FRÉDÉRIQUE. The Dantzig Selector in Cox's Proportional Hazards Model. Scand Stat Theory Appl 2010. [DOI: 10.1111/j.1467-9469.2009.00685.x] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
29
|
Zhang HH, Lu W, Wang H. On Sparse Estimation for Semiparametric Linear Transformation Models. J MULTIVARIATE ANAL 2010; 101:1594-1606. [PMID: 20473356 PMCID: PMC2869045 DOI: 10.1016/j.jmva.2010.01.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Semiparametric linear transformation models have received much attention due to its high flexibility in modeling survival data. A useful estimating equation procedure was recently proposed by Chen et al. (2002) for linear transformation models to jointly estimate parametric and nonparametric terms. They showed that this procedure can yield a consistent and robust estimator. However, the problem of variable selection for linear transformation models is less studied, partially because a convenient loss function is not readily available under this context. In this paper, we propose a simple yet powerful approach to achieve both sparse and consistent estimation for linear transformation models. The main idea is to derive a profiled score from the estimating equation of Chen et al. (2002), construct a loss function based on the profile scored and its variance, and then minimize the loss subject to some shrinkage penalty. Under regularity conditions, we have shown that the resulting estimator is consistent for both model estimation and variable selection. Furthermore, the estimated parametric terms are asymptotically normal and can achieve higher efficiency than that yielded from the estimation equations. For computation, we suggest a one-step approximation algorithm which can take advantage of the LARS and build the entire solution path efficiently. Performance of the new procedure is illustrated through numerous simulations and real examples including one microarray data.
Collapse
|
30
|
Du P, Ma S, Liang H. PENALIZED VARIABLE SELECTION PROCEDURE FOR COX MODELS WITH SEMIPARAMETRIC RELATIVE RISK. Ann Stat 2010; 38:2092-2117. [PMID: 20802853 DOI: 10.1214/09-aos780] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
We study the Cox models with semiparametric relative risk, which can be partially linear with one nonparametric component, or multiple additive or nonadditive nonparametric components. A penalized partial likelihood procedure is proposed to simultaneously estimate the parameters and select variables for both the parametric and the nonparametric parts. Two penalties are applied sequentially. The first penalty, governing the smoothness of the multivariate nonlinear covariate effect function, provides a smoothing spline ANOVA framework that is exploited to derive an empirical model selection tool for the nonparametric part. The second penalty, either the smoothly-clipped-absolute-deviation (SCAD) penalty or the adaptive LASSO penalty, achieves variable selection in the parametric part. We show that the resulting estimator of the parametric part possesses the oracle property, and that the estimator of the nonparametric part achieves the optimal rate of convergence. The proposed procedures are shown to work well in simulation experiments, and then applied to a real data example on sexually transmitted diseases.
Collapse
Affiliation(s)
- Pang Du
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA,
| | | | | |
Collapse
|
31
|
Shows JH, Lu W, Zhang HH. Sparse Estimation and Inference for Censored Median Regression. J Stat Plan Inference 2010; 140:1903-1917. [PMID: 20607110 DOI: 10.1016/j.jspi.2010.01.043] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Censored median regression has proved useful for analyzing survival data in complicated situations, say, when the variance is heteroscedastic or the data contain outliers. In this paper, we study the sparse estimation for censored median regression models, which is an important problem for high dimensional survival data analysis. In particular, a new procedure is proposed to minimize an inverse-censoring-probability weighted least absolute deviation loss subject to the adaptive LASSO penalty and result in a sparse and robust median estimator. We show that, with a proper choice of the tuning parameter, the procedure can identify the underlying sparse model consistently and has desired large-sample properties including root-n consistency and the asymptotic normality. The procedure also enjoys great advantages in computation, since its entire solution path can be obtained efficiently. Furthermore, we propose a resampling method to estimate the variance of the estimator. The performance of the procedure is illustrated by extensive simulations and two real data applications including one microarray gene expression survival data.
Collapse
Affiliation(s)
- Justin Hall Shows
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | | | | |
Collapse
|
32
|
Xu J, Leng C, Ying Z. Rank-based variable selection with censored data. STATISTICS AND COMPUTING 2010; 20:165-176. [PMID: 24013588 PMCID: PMC3762511 DOI: 10.1007/s11222-009-9126-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
A rank-based variable selection procedure is developed for the semiparametric accelerated failure time model with censored observations where the penalized likelihood (partial likelihood) method is not directly applicable. The new method penalizes the rank-based Gehan-type loss function with the ℓ1 penalty. To correctly choose the tuning parameters, a novel likelihood-based χ2-type criterion is proposed. Desirable properties of the estimator such as the oracle properties are established through the local quadratic expansion of the Gehan loss function. In particular, our method can be easily implemented by the standard linear programming packages and hence numerically convenient. Extensions to marginal models for multivariate failure time are also considered. The performance of the new procedure is assessed through extensive simulation studies and illustrated with two real examples.
Collapse
Affiliation(s)
- Jinfeng Xu
- Department of Statistics and Applied Probability, Risk Management Institute, National University of Singapore, 117546 Singapore, Singapore
| | - Chenlei Leng
- Department of Statistics and Applied Probability, Risk Management Institute, National University of Singapore, 117546 Singapore, Singapore
| | - Zhiliang Ying
- Department of Statistics, Columbia University, New York, NY 10027, USA
| |
Collapse
|
33
|
Wang S, Nan B, Zhu N, Zhu J. Hierarchically penalized Cox regression with grouped variables. Biometrika 2009. [DOI: 10.1093/biomet/asp016] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
34
|
Abstract
MOTIVATION There has been an increasing interest in expressing a survival phenotype (e.g. time to cancer recurrence or death) or its distribution in terms of a subset of the expression data of a subset of genes. Due to high dimensionality of gene expression data, however, there is a serious problem of collinearity in fitting a prediction model, e.g. Cox's proportional hazards model. To avoid the collinearity problem, several methods based on penalized Cox proportional hazards models have been proposed. However, those methods suffer from severe computational problems, such as slow or even failed convergence, because of high-dimensional matrix inversions required for model fitting. We propose to implement the penalized Cox regression with a lasso penalty via the gradient lasso algorithm that yields faster convergence to the global optimum than do other algorithms. Moreover the gradient lasso algorithm is guaranteed to converge to the optimum under mild regularity conditions. Hence, our gradient lasso algorithm can be a useful tool in developing a prediction model based on high-dimensional covariates including gene expression data. RESULTS Results from simulation studies showed that the prediction model by gradient lasso recovers the prognostic genes. Also results from diffuse large B-cell lymphoma datasets and Norway/Stanford breast cancer dataset indicate that our method is very competitive compared with popular existing methods by Park and Hastie and Goeman in its computational time, prediction and selectivity. AVAILABILITY R package glcoxph is available at http://datamining.dongguk.ac.kr/R/glcoxph.
Collapse
Affiliation(s)
- Insuk Sohn
- Department of Biostatistics & Bioinformatics, Duke University, NC 27705, USA
| | | | | | | |
Collapse
|
35
|
Schmid M, Hothorn T. Flexible boosting of accelerated failure time models. BMC Bioinformatics 2008; 9:269. [PMID: 18538026 PMCID: PMC2453145 DOI: 10.1186/1471-2105-9-269] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2008] [Accepted: 06/06/2008] [Indexed: 12/02/2022] Open
Abstract
Background When boosting algorithms are used for building survival models from high-dimensional data, it is common to fit a Cox proportional hazards model or to use least squares techniques for fitting semiparametric accelerated failure time models. There are cases, however, where fitting a fully parametric accelerated failure time model is a good alternative to these methods, especially when the proportional hazards assumption is not justified. Boosting algorithms for the estimation of parametric accelerated failure time models have not been developed so far, since these models require the estimation of a model-specific scale parameter which traditional boosting algorithms are not able to deal with. Results We introduce a new boosting algorithm for censored time-to-event data which is suitable for fitting parametric accelerated failure time models. Estimation of the predictor function is carried out simultaneously with the estimation of the scale parameter, so that the negative log likelihood of the survival distribution can be used as a loss function for the boosting algorithm. The estimation of the scale parameter does not affect the favorable properties of boosting with respect to variable selection. Conclusion The analysis of a high-dimensional set of microarray data demonstrates that the new algorithm is able to outperform boosting with the Cox partial likelihood when the proportional hazards assumption is questionable. In low-dimensional settings, i.e., when classical likelihood estimation of a parametric accelerated failure time model is possible, simulations show that the new boosting algorithm closely approximates the estimates obtained from the maximum likelihood method.
Collapse
Affiliation(s)
- Matthias Schmid
- 1Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 6, D-91054 Erlangen, Germany.
| | | |
Collapse
|