1
|
Liu Y, Li G. Sure Joint Screening for High Dimensional Cox's Proportional Hazards Model Under the Case-Cohort Design. J Comput Biol 2023; 30:663-677. [PMID: 37140454 PMCID: PMC10282795 DOI: 10.1089/cmb.2022.0416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023] Open
Abstract
This study develops a sure joint feature screening method for the case-cohort design with ultrahigh-dimensional covariates. Our method is based on a sparsity-restricted Cox proportional hazards model. An iterative reweighted hard thresholding algorithm is proposed to approximate the sparsity-restricted, pseudo-partial likelihood estimator for joint screening. We rigorously show that our method possesses the sure screening property, with the probability of retaining all relevant covariates tending to 1 as the sample size goes to infinity. Our simulation results demonstrate that the proposed procedure has substantially improved screening performance over some existing feature screening methods for the case-cohort design, especially when some covariates are jointly correlated, but marginally uncorrelated, with the event time outcome. A real data illustration is provided using breast cancer data with high-dimensional genomic covariates. We have implemented the proposed method using MATLAB and made it available to readers through GitHub.
Collapse
Affiliation(s)
- Yi Liu
- Department of Mathematics, School of Mathematical Sciences, Ocean University of China, Qingdao, China
| | - Gang Li
- Department of Biostatistics, University of California at Los Angeles, Los Angeles, California, USA
| |
Collapse
|
2
|
Soave D, Lawless JF. Regularized regression for two phase failure time studies. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2023.107703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
3
|
Du M, Zhao X, Sun J. Variable selection for case-cohort studies with informatively interval-censored outcomes. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
4
|
Affiliation(s)
- Mingyue Du
- Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong China
| | - Jianguo Sun
- Department of Statistics University of Missouri Columbia MO 65211 USA
| |
Collapse
|
5
|
Zhang J, Zhou H, Liu Y, Cai J. Conditional screening for ultrahigh-dimensional survival data in case-cohort studies. LIFETIME DATA ANALYSIS 2021; 27:632-661. [PMID: 34417679 PMCID: PMC8561435 DOI: 10.1007/s10985-021-09531-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 08/05/2021] [Indexed: 06/13/2023]
Abstract
The case-cohort design has been widely used to reduce the cost of covariate measurements in large cohort studies. In many such studies, the number of covariates is very large, and the goal of the research is to identify active covariates which have great influence on response. Since the introduction of sure independence screening, screening procedures have achieved great success in terms of effectively reducing the dimensionality and identifying active covariates. However, commonly used screening methods are based on marginal correlation or its variants, they may fail to identify hidden active variables which are jointly important but are weakly correlated with the response. Moreover, these screening methods are mainly proposed for data under the simple random sampling and can not be directly applied to case-cohort data. In this paper, we consider the ultrahigh-dimensional survival data under the case-cohort design, and propose a conditional screening method by incorporating some important prior known information of active variables. This method can effectively detect hidden active variables. Furthermore, it possesses the sure screening property under some mild regularity conditions and does not require any complicated numerical optimization. We evaluate the finite sample performance of the proposed method via extensive simulation studies and further illustrate the new approach through a real data set from patients with breast cancer.
Collapse
Affiliation(s)
- Jing Zhang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599-7420, USA
| | - Yanyan Liu
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, China
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599-7420, USA.
| |
Collapse
|
6
|
Zhang J, Zhou H, Liu Y, Cai J. Feature screening for case‐cohort studies with failure time outcome. Scand Stat Theory Appl 2020; 48:349-370. [DOI: 10.1111/sjos.12503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jing Zhang
- School of Statistics and Mathematics Zhongnan University of Economics and Law Wuhan China
| | - Haibo Zhou
- Department of Biostatistics University of North Carolina at Chapel Hill Chapel Hill North Carolina USA
| | - Yanyan Liu
- School of Mathematics and Statistics Wuhan University Wuhan China
| | - Jianwen Cai
- Department of Biostatistics University of North Carolina at Chapel Hill Chapel Hill North Carolina USA
| |
Collapse
|
7
|
Wang JH, Pan CH, Chang IS, Hsiung CA. Penalized full likelihood approach to variable selection for Cox's regression model under nested case-control sampling. LIFETIME DATA ANALYSIS 2020; 26:292-314. [PMID: 31065967 DOI: 10.1007/s10985-019-09475-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 04/26/2019] [Indexed: 06/09/2023]
Abstract
Assuming Cox's regression model, we consider penalized full likelihood approach to conduct variable selection under nested case-control (NCC) sampling. Penalized non-parametric maximum likelihood estimates (PNPMLEs) are characterized by self-consistency equations derived from score functions. A cross-validation method based on profile likelihood is used to choose the tuning parameter within a family of penalty functions. Simulation studies indicate that the numerical performance of (P)NPMLE is better than weighted partial likelihood in estimating the log-relative risk and in identifying the covariates and the model, under NCC sampling. LASSO performs best when cohort size is small; SCAD performs best when cohort size is large and may eventually perform as well as the oracle estimator. Using the SCAD penalty, we establish the consistency, asymptotic normality, and oracle properties of the PNPMLE, as well as the sparsity property of the penalty. We also propose a consistent estimate of the asymptotic variance using observed profile likelihood. Our method is illustrated to analyze the diagnosis of liver cancer among those in a type 2 diabetic mellitus dataset who were treated with thiazolidinediones in Taiwan.
Collapse
Affiliation(s)
- Jie-Huei Wang
- Division of Biostatistics and Bioinformatics, Institute of Population Health Science, National Health Research Institutes, 35, Keyan Rd., Zhunan Town, Miaoli County, 35053, Taiwan
- Institute of Statistical Science, Academia Sinica, 128, Academia Rd., Section 2, Nankang, Taipei, 11529, Taiwan
| | - Chun-Hao Pan
- Institute of Statistical Science, Academia Sinica, 128, Academia Rd., Section 2, Nankang, Taipei, 11529, Taiwan
| | - I-Shou Chang
- Division of Biostatistics and Bioinformatics, Institute of Population Health Science, National Health Research Institutes, 35, Keyan Rd., Zhunan Town, Miaoli County, 35053, Taiwan.
- National Institute of Cancer Research, National Health Research Institutes, 35, Keyan Rd., Zhunan Town, Miaoli County, 35053, Taiwan.
| | - Chao Agnes Hsiung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Science, National Health Research Institutes, 35, Keyan Rd., Zhunan Town, Miaoli County, 35053, Taiwan
| |
Collapse
|
8
|
Zhao H, Wu Q, Gilbert PB, Chen YQ, Sun J. A regularized estimation approach for case-cohort periodic follow-up studies with an application to HIV vaccine trials. Biom J 2020; 62:1176-1191. [PMID: 32080888 DOI: 10.1002/bimj.201900180] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 11/21/2019] [Accepted: 11/27/2019] [Indexed: 11/05/2022]
Abstract
This paper discusses regression analysis of the failure time data arising from case-cohort periodic follow-up studies, and one feature of such data, which makes their analysis much more difficult, is that they are usually interval-censored rather than right-censored. Although some methods have been developed for general failure time data, there does not seem to exist an established procedure for the situation considered here. To address the problem, we present a semiparametric regularized procedure and develop a simple algorithm for the implementation of the proposed method. In addition, unlike some existing procedures for similar situations, the proposed procedure is shown to have the oracle property, and an extensive simulation is conducted and it suggests that the presented approach seems to work well for practical situations. The method is applied to an HIV vaccine trial that motivated this study.
Collapse
Affiliation(s)
- Hui Zhao
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, P. R. China
| | - Qiwei Wu
- Department of Statistics, University of Missouri, Columbia, MO, USA
| | - Peter B Gilbert
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center & Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Ying Q Chen
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center & Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Jianguo Sun
- Department of Statistics, University of Missouri, Columbia, MO, USA
| |
Collapse
|
9
|
Newcombe PJ, Connolly S, Seaman S, Richardson S, Sharp SJ. A two-step method for variable selection in the analysis of a case-cohort study. Int J Epidemiol 2019; 47:597-604. [PMID: 29136145 PMCID: PMC5913627 DOI: 10.1093/ije/dyx224] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/06/2017] [Indexed: 11/29/2022] Open
Abstract
Background Accurate detection and estimation of true exposure-outcome associations is important in aetiological analysis; when there are multiple potential exposure variables of interest, methods for detecting the subset of variables most likely to have true associations with the outcome of interest are required. Case-cohort studies often collect data on a large number of variables which have not been measured in the entire cohort (e.g. panels of biomarkers). There is a lack of guidance on methods for variable selection in case-cohort studies. Methods We describe and explore the application of three variable selection methods to data from a case-cohort study. These are: (i) selecting variables based on their level of significance in univariable (i.e. one-at-a-time) Prentice-weighted Cox regression models; (ii) stepwise selection applied to Prentice-weighted Cox regression; and (iii) a two-step method which applies a Bayesian variable selection algorithm to obtain posterior probabilities of selection for each variable using multivariable logistic regression followed by effect estimation using Prentice-weighted Cox regression. Results Across nine different simulation scenarios, the two-step method demonstrated higher sensitivity and lower false discovery rate than the one-at-a-time and stepwise methods. In an application of the methods to data from the EPIC-InterAct case-cohort study, the two-step method identified an additional two fatty acids as being associated with incident type 2 diabetes, compared with the one-at-a-time and stepwise methods. Conclusions The two-step method enables more powerful and accurate detection of exposure-outcome associations in case-cohort studies. An R package is available to enable researchers to apply this method.
Collapse
Affiliation(s)
| | | | - S Seaman
- MRC Biostatistics Unit, Cambridge, UK
| | | | - S J Sharp
- MRC Epidemiology Unit, Cambridge, UK
| |
Collapse
|
10
|
Fu GH, Yi LZ, Pan J. Tuning model parameters in class-imbalanced learning with precision-recall curve. Biom J 2018; 61:652-664. [PMID: 30548291 DOI: 10.1002/bimj.201800148] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 10/18/2018] [Accepted: 10/23/2018] [Indexed: 11/08/2022]
Abstract
An issue for class-imbalanced learning is what assessment metric should be employed. So far, precision-recall curve (PRC) as a metric is rarely used in practice as compared with its alternative of receiver operating characteristic (ROC). This study investigates the performance of PRC as the evaluating criterion to address the class-imbalanced data and focuses on the comparison of PRC with ROC. The advantages of PRC over ROC on assessing class-imbalanced data are also investigated and tested on our proposed algorithm by tuning the whole model parameters in simulation studies and real data examples. The result shows that PRC is competitive with ROC as performance measurement for handling class-imbalanced data in tuning the model parameters. PRC can be considered as an alternative but effective assessment for preprocessing (such as variable selection) skewed data and building a classifier in class-imbalanced learning.
Collapse
Affiliation(s)
- Guang-Hui Fu
- School of Science, Kunming University of Science and Technology, Kunming, P. R. China
| | - Lun-Zhao Yi
- Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming, P. R. China
| | - Jianxin Pan
- School of Mathematics, The University of Manchester, Manchester, UK
| |
Collapse
|
11
|
Kim S, Woo Ahn K. Bi-level variable selection for case-cohort studies with group variables. Stat Methods Med Res 2018; 28:3404-3414. [PMID: 30306838 DOI: 10.1177/0962280218803654] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The case-cohort design is an economical approach to estimate the effect of risk factors on the survival outcome when collecting exposure information or covariates on all patients is expensive in a large cohort study. Variables often have group structure such as categorical variables and highly correlated continuous variables. The existing literature for case-cohort data is limited to identifying non-zero variables at individual level only. In this article, we propose a bi-level variable selection method to select non-zero group and within-group variables for case-cohort data when variables have group structure. The proposed method allows the number of variables to diverge as the sample size increases. The asymptotic properties of the estimator including bi-level variable selection consistency and the asymptotic normality are shown. We also conduct simulations to compare our proposed method with some existing method and apply them to the Busselton Health data.
Collapse
Affiliation(s)
- Soyoung Kim
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Kwang Woo Ahn
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, USA
| |
Collapse
|
12
|
Ni A, Cai J. A regularized variable selection procedure in additive hazards model with stratified case-cohort design. LIFETIME DATA ANALYSIS 2018; 24:443-463. [PMID: 28755021 PMCID: PMC5787409 DOI: 10.1007/s10985-017-9402-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Accepted: 07/23/2017] [Indexed: 06/07/2023]
Abstract
Case-cohort designs are commonly used in large epidemiological studies to reduce the cost associated with covariate measurement. In many such studies the number of covariates is very large. An efficient variable selection method is needed for case-cohort studies where the covariates are only observed in a subset of the sample. Current literature on this topic has been focused on the proportional hazards model. However, in many studies the additive hazards model is preferred over the proportional hazards model either because the proportional hazards assumption is violated or the additive hazards model provides more relevent information to the research question. Motivated by one such study, the Atherosclerosis Risk in Communities (ARIC) study, we investigate the properties of a regularized variable selection procedure in stratified case-cohort design under an additive hazards model with a diverging number of parameters. We establish the consistency and asymptotic normality of the penalized estimator and prove its oracle property. Simulation studies are conducted to assess the finite sample performance of the proposed method with a modified cross-validation tuning parameter selection methods. We apply the variable selection procedure to the ARIC study to demonstrate its practical use.
Collapse
Affiliation(s)
- Ai Ni
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, 485 Lexington Ave., New York, NY, 10017, USA.
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC, 27599, USA
| |
Collapse
|
13
|
Wu M, Zheng M, Yu W, Wu R. Estimation and variable selection for semiparametric transformation models under a more efficient cohort sampling design. TEST-SPAIN 2017. [DOI: 10.1007/s11749-017-0562-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|