1
|
Sit T, Xing Y. Distributed Censored Quantile Regression. J Comput Graph Stat 2023. [DOI: 10.1080/10618600.2023.2182310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Affiliation(s)
- Tony Sit
- Department of Statistics, The Chinese University of Hong Kong
| | - Yue Xing
- Department of Statistics, Purdue University
| |
Collapse
|
2
|
Salerno S, Li Y. High-Dimensional Survival Analysis: Methods and Applications. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2023; 10:25-49. [PMID: 36968638 PMCID: PMC10038209 DOI: 10.1146/annurev-statistics-032921-022127] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
In the era of precision medicine, time-to-event outcomes such as time to death or progression are routinely collected, along with high-throughput covariates. These high-dimensional data defy classical survival regression models, which are either infeasible to fit or likely to incur low predictability due to over-fitting. To overcome this, recent emphasis has been placed on developing novel approaches for feature selection and survival prognostication. We will review various cutting-edge methods that handle survival outcome data with high-dimensional predictors, highlighting recent innovations in machine learning approaches for survival prediction. We will cover the statistical intuitions and principles behind these methods and conclude with extensions to more complex settings, where competing events are observed. We exemplify these methods with applications to the Boston Lung Cancer Survival Cohort study, one of the largest cancer epidemiology cohorts investigating the complex mechanisms of lung cancer.
Collapse
Affiliation(s)
- Stephen Salerno
- Department of Biostatistics, University of Michigan, Ann Arbor, United States, 48109
| | - Yi Li
- Department of Biostatistics, University of Michigan, Ann Arbor, United States, 48109
| |
Collapse
|
3
|
Li Y, Liang M, Mao L, Wang S. Robust estimation and variable selection for the accelerated failure time model. Stat Med 2021; 40:4473-4491. [PMID: 34031919 PMCID: PMC8364878 DOI: 10.1002/sim.9042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 04/25/2021] [Accepted: 04/26/2021] [Indexed: 11/10/2022]
Abstract
This article concerns robust modeling of the survival time for cancer patients. Accurate prediction of patient survival time is crucial to the development of effective therapeutic strategies. To this goal, we propose a unified Expectation-Maximization approach combined with the L1 -norm penalty to perform variable selection and parameter estimation simultaneously in the accelerated failure time model with right-censored survival data of moderate sizes. Our approach accommodates general loss functions, and reduces to the well-known Buckley-James method when the squared-error loss is used without regularization. To mitigate the effects of outliers and heavy-tailed noise in real applications, we recommend the use of robust loss functions under the general framework. Furthermore, our approach can be extended to incorporate group structure among covariates. We conduct extensive simulation studies to assess the performance of the proposed methods with different loss functions and apply them to an ovarian carcinoma study as an illustration.
Collapse
Affiliation(s)
- Yi Li
- Department of Statistics, University of Wisconsin-Madison, Wisconsin, USA
| | - Muxuan Liang
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Washington, USA
| | - Lu Mao
- Department of Biostatistics and Medical Informatics, School of Medicine and Public Health, University of Wisconsin-Madison, Wisconsin, USA
| | - Sijian Wang
- Department of Statistics, Rutgers University, New Jersey, USA
| |
Collapse
|
4
|
Affiliation(s)
- Rahim Alhamzawi
- Department of Statistics, University of Al-Qadisiyah, Al Diwaniyah, Iraq
| |
Collapse
|
5
|
Jiang Y, Wang Y, Zhang J, Xie B, Liao J, Liao W. Outlier detection and robust variable selection via the penalized weighted LAD-LASSO method. J Appl Stat 2020; 48:234-246. [PMID: 35707691 PMCID: PMC9041793 DOI: 10.1080/02664763.2020.1722079] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 01/21/2020] [Indexed: 10/25/2022]
Abstract
This paper studies the outlier detection and robust variable selection problem in the linear regression model. The penalized weighted least absolute deviation (PWLAD) regression estimation method and the adaptive least absolute shrinkage and selection operator (LASSO) are combined to simultaneously achieve outlier detection, and robust variable selection. An iterative algorithm is proposed to solve the proposed optimization problem. Monte Carlo studies are evaluated the finite-sample performance of the proposed methods. The results indicate that the finite sample performance of the proposed methods performs better than that of the existing methods when there are leverage points or outliers in the response variable or explanatory variables. Finally, we apply the proposed methodology to analyze two real datasets.
Collapse
Affiliation(s)
- Yunlu Jiang
- Department of Statistics, College of Economics, Jinan University, Guangzhou, People's Republic of China
| | - Yan Wang
- Department of Statistics, College of Economics, Jinan University, Guangzhou, People's Republic of China
| | - Jiantao Zhang
- Department of Statistics, College of Economics, Jinan University, Guangzhou, People's Republic of China
| | - Baojian Xie
- College of Economics, Jinan University, Guangzhou, People's Republic of China
| | - Jibiao Liao
- Office of Educational Administration, Dongguan Open University, Dongguan, People's Republic of China
| | - Wenhui Liao
- School of Financial Mathematics and Statistics, Guangdong University of Finance, Guangzhou, People's Republic of China
| |
Collapse
|
6
|
Soret P, Avalos M, Wittkop L, Commenges D, Thiébaut R. Lasso regularization for left-censored Gaussian outcome and high-dimensional predictors. BMC Med Res Methodol 2018; 18:159. [PMID: 30514234 PMCID: PMC6280495 DOI: 10.1186/s12874-018-0609-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 11/02/2018] [Indexed: 12/14/2022] Open
Abstract
Background Biological assays for the quantification of markers may suffer from a lack of sensitivity and thus from an analytical detection limit. This is the case of human immunodeficiency virus (HIV) viral load. Below this threshold the exact value is unknown and values are consequently left-censored. Statistical methods have been proposed to deal with left-censoring but few are adapted in the context of high-dimensional data. Methods We propose to reverse the Buckley-James least squares algorithm to handle left-censored data enhanced with a Lasso regularization to accommodate high-dimensional predictors. We present a Lasso-regularized Buckley-James least squares method with both non-parametric imputation using Kaplan-Meier and parametric imputation based on the Gaussian distribution, which is typically assumed for HIV viral load data after logarithmic transformation. Cross-validation for parameter-tuning is based on an appropriate loss function that takes into account the different contributions of censored and uncensored observations. We specify how these techniques can be easily implemented using available R packages. The Lasso-regularized Buckley-James least square method was compared to simple imputation strategies to predict the response to antiretroviral therapy measured by HIV viral load according to the HIV genotypic mutations. We used a dataset composed of several clinical trials and cohorts from the Forum for Collaborative HIV Research (HIV Med. 2008;7:27-40). The proposed methods were also assessed on simulated data mimicking the observed data. Results Approaches accounting for left-censoring outperformed simple imputation methods in a high-dimensional setting. The Gaussian Buckley-James method with cross-validation based on the appropriate loss function showed the lowest prediction error on simulated data and, using real data, the most valid results according to the current literature on HIV mutations. Conclusions The proposed approach deals with high-dimensional predictors and left-censored outcomes and has shown its interest for predicting HIV viral load according to HIV mutations.
Collapse
Affiliation(s)
- Perrine Soret
- Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, UMR 1219, Bordeaux, F-33000, France.,Inria SISTM Team, Talence, F-33405, France.,Vaccine Research Institute (VRI), Créteil, F-94000, France
| | - Marta Avalos
- Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, UMR 1219, Bordeaux, F-33000, France. .,Inria SISTM Team, Talence, F-33405, France.
| | - Linda Wittkop
- Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, UMR 1219, Bordeaux, F-33000, France.,Inria SISTM Team, Talence, F-33405, France.,CHU Bordeaux, Department of Public Health, Bordeaux, F-33000, France
| | - Daniel Commenges
- Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, UMR 1219, Bordeaux, F-33000, France.,Inria SISTM Team, Talence, F-33405, France
| | - Rodolphe Thiébaut
- Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, UMR 1219, Bordeaux, F-33000, France.,Inria SISTM Team, Talence, F-33405, France.,Vaccine Research Institute (VRI), Créteil, F-94000, France.,CHU Bordeaux, Department of Public Health, Bordeaux, F-33000, France
| |
Collapse
|
7
|
Ahn KW, Banerjee A, Sahr N, Kim S. Group and within-group variable selection for competing risks data. LIFETIME DATA ANALYSIS 2018; 24:407-424. [PMID: 28779228 PMCID: PMC5797529 DOI: 10.1007/s10985-017-9400-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Accepted: 07/23/2017] [Indexed: 06/07/2023]
Abstract
Variable selection in the presence of grouped variables is troublesome for competing risks data: while some recent methods deal with group selection only, simultaneous selection of both groups and within-group variables remains largely unexplored. In this context, we propose an adaptive group bridge method, enabling simultaneous selection both within and between groups, for competing risks data. The adaptive group bridge is applicable to independent and clustered data. It also allows the number of variables to diverge as the sample size increases. We show that our new method possesses excellent asymptotic properties, including variable selection consistency at group and within-group levels. We also show superior performance in simulated and real data sets over several competing approaches, including group bridge, adaptive group lasso, and AIC / BIC-based methods.
Collapse
Affiliation(s)
- Kwang Woo Ahn
- Division of Biostatistics, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA.
| | - Anjishnu Banerjee
- Division of Biostatistics, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
| | - Natasha Sahr
- Division of Biostatistics, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
| | - Soyoung Kim
- Division of Biostatistics, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
| |
Collapse
|
8
|
Ahn KW, Kim S. Variable selection with group structure in competing risks quantile regression. Stat Med 2018; 37:1577-1586. [PMID: 29468710 DOI: 10.1002/sim.7619] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Revised: 12/05/2017] [Accepted: 01/03/2018] [Indexed: 11/11/2022]
Abstract
We study the group bridge and the adaptive group bridge penalties for competing risks quantile regression with group variables. While the group bridge consistently identifies nonzero group variables, the adaptive group bridge consistently selects variables not only at group level but also at within-group level. We allow the number of covariates to diverge as the sample size increases. The oracle property for both methods is also studied. The performance of the group bridge and the adaptive group bridge is compared in simulation and in a real data analysis. The simulation study shows that the adaptive group bridge selects nonzero within-group variables more consistently than the group bridge. A bone marrow transplant study is provided as an example.
Collapse
Affiliation(s)
- Kwang Woo Ahn
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Soyoung Kim
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| |
Collapse
|
9
|
Abstract
Censored quantile regression (CQR) has emerged as a useful regression tool for survival analysis. Some commonly used CQR methods can be characterized by stochastic integral-based estimating equations in a sequential manner across quantile levels. In this paper, we analyze CQR in a high dimensional setting where the regression functions over a continuum of quantile levels are of interest. We propose a two-step penalization procedure, which accommodates stochastic integral based estimating equations and address the challenges due to the recursive nature of the procedure. We establish the uniform convergence rates for the proposed estimators, and investigate the properties on weak convergence and variable selection. We conduct numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposals.
Collapse
Affiliation(s)
- Qi Zheng
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40242, USA
| | - Limin Peng
- Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Rd, NE, Atlanta, GA 30322, USA
| | - Xuming He
- Department of Statistics University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
10
|
Wu C, Ma S. A selective review of robust variable selection with applications in bioinformatics. Brief Bioinform 2015; 16:873-83. [PMID: 25479793 PMCID: PMC4570200 DOI: 10.1093/bib/bbu046] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Revised: 10/20/2014] [Indexed: 11/13/2022] Open
Abstract
A drastic amount of data have been and are being generated in bioinformatics studies. In the analysis of such data, the standard modeling approaches can be challenged by the heavy-tailed errors and outliers in response variables, the contamination in predictors (which may be caused by, for instance, technical problems in microarray gene expression studies), model mis-specification and others. Robust methods are needed to tackle these challenges. When there are a large number of predictors, variable selection can be as important as estimation. As a generic variable selection and regularization tool, penalization has been extensively adopted. In this article, we provide a selective review of robust penalized variable selection approaches especially designed for high-dimensional data from bioinformatics and biomedical studies. We discuss the robust loss functions, penalty functions and computational algorithms. The theoretical properties and implementation are also briefly examined. Application examples of the robust penalization approaches in representative bioinformatics and biomedical studies are also illustrated.
Collapse
|
11
|
Jiang L, Bondell HD, Wang HJ. Interquantile Shrinkage and Variable Selection in Quantile Regression. Comput Stat Data Anal 2014; 69:208-219. [PMID: 24653545 DOI: 10.1016/j.csda.2013.08.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Examination of multiple conditional quantile functions provides a comprehensive view of the relationship between the response and covariates. In situations where quantile slope coefficients share some common features, estimation efficiency and model interpretability can be improved by utilizing such commonality across quantiles. Furthermore, elimination of irrelevant predictors will also aid in estimation and interpretation. These motivations lead to the development of two penalization methods, which can identify the interquantile commonality and nonzero quantile coefficients simultaneously. The developed methods are based on a fused penalty that encourages sparsity of both quantile coefficients and interquantile slope differences. The oracle properties of the proposed penalization methods are established. Through numerical investigations, it is demonstrated that the proposed methods lead to simpler model structure and higher estimation efficiency than the traditional quantile regression estimation.
Collapse
Affiliation(s)
- Liewen Jiang
- Department of Statistics, North Carolina State University, Raleigh, NC 27606, U.S.A
| | - Howard D Bondell
- Department of Statistics, North Carolina State University, Raleigh, NC 27606, U.S.A
| | - Huixia Judy Wang
- Department of Statistics, North Carolina State University, Raleigh, NC 27606, U.S.A
| |
Collapse
|
12
|
Wagener J, Volgushev S, Dette H. The quantile process under random censoring. MATHEMATICAL METHODS OF STATISTICS 2012. [DOI: 10.3103/s1066530712020044] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|