1
|
Wogu AF, Li H, Zhao S, Nichols HB, Cai J. Additive subdistribution hazards regression for competing risks data in case-cohort studies. Biometrics 2023; 79:3010-3022. [PMID: 36606409 PMCID: PMC10676749 DOI: 10.1111/biom.13821] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 12/20/2022] [Indexed: 01/07/2023]
Abstract
In survival data analysis, a competing risk is an event whose occurrence precludes or alters the chance of the occurrence of the primary event of interest. In large cohort studies with long-term follow-up, there are often competing risks. Further, if the event of interest is rare in such large studies, the case-cohort study design is widely used to reduce the cost and achieve the same efficiency as a cohort study. The conventional additive hazards modeling for competing risks data in case-cohort studies involves the cause-specific hazard function, under which direct assessment of covariate effects on the cumulative incidence function, or the subdistribution, is not possible. In this paper, we consider an additive hazard model for the subdistribution of a competing risk in case-cohort studies. We propose estimating equations based on inverse probability weighting methods for the estimation of the model parameters. Consistency and asymptotic normality of the proposed estimators are established. The performance of the proposed methods in finite samples is examined through simulation studies and the proposed approach is applied to a case-cohort dataset from the Sister Study.
Collapse
Affiliation(s)
- Adane F. Wogu
- Department of Biostatistics & Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Haolin Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Shanshan Zhao
- Biostatistics & Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Hazel B. Nichols
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
2
|
Lou Y, Wang P, Sun J. A semi-parametric weighted likelihood approach for regression analysis of bivariate interval-censored outcomes from case-cohort studies. LIFETIME DATA ANALYSIS 2023; 29:628-653. [PMID: 36862277 DOI: 10.1007/s10985-023-09593-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 02/03/2023] [Indexed: 06/13/2023]
Abstract
The case-cohort design was developed to reduce costs when disease incidence is low and covariates are difficult to obtain. However, most of the existing methods are for right-censored data and there exists only limited research on interval-censored data, especially on regression analysis of bivariate interval-censored data. Interval-censored failure time data frequently occur in many areas and a large literature on their analyses has been established. In this paper, we discuss the situation of bivariate interval-censored data arising from case-cohort studies. For the problem, a class of semiparametric transformation frailty models is presented and for inference, a sieve weighted likelihood approach is developed. The large sample properties, including the consistency of the proposed estimators and the asymptotic normality of the regression parameter estimators, are established. Moreover, a simulation is conducted to assess the finite sample performance of the proposed method and suggests that it performs well in practice.
Collapse
Affiliation(s)
- Yichen Lou
- School of Mathematics, Jilin University, Changchun, 130012, China
| | - Peijie Wang
- School of Mathematics, Jilin University, Changchun, 130012, China.
| | - Jianguo Sun
- Department of Statistics, University of Missouri, Columbia, Missouri, 65211, USA
| |
Collapse
|
3
|
Generalized accelerated failure time model with censored data from case-cohort studies. J Stat Plan Inference 2023. [DOI: 10.1016/j.jspi.2022.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
4
|
Son D, Choi S, Kang S. Quantile regression for competing risks data from stratified case-cohort studies: an induced-smoothing approach. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2134376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Dongjae Son
- Department of Applied Statistics, Yonsei University, Seoul, Republic of Korea
| | - Sangbum Choi
- Department of Statistics, Korea University, Seoul, Republic of Korea
| | - Sangwook Kang
- Department of Applied Statistics, Yonsei University, Seoul, Republic of Korea
- Department of Statistics and Data Science, Yonsei University, Seoul, Republic of Korea
| |
Collapse
|
5
|
Pan Y, Deng L. Generalized case-cohort and inference for Cox’s model with parameter constraints. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2020.1714661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Yingli Pan
- Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
| | - Lifeng Deng
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, Shandong, China
| |
Collapse
|
6
|
O'Brien KM, Lawrence KG, Keil AP. The Case for Case-Cohort: An Applied Epidemiologist's Guide to Reframing Case-Cohort Studies to Improve Usability and Flexibility. Epidemiology 2022; 33:354-361. [PMID: 35383643 PMCID: PMC9172927 DOI: 10.1097/ede.0000000000001469] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
When research questions require the use of precious samples, expensive assays or equipment, or labor-intensive data collection or analysis, nested case-control or case-cohort sampling of observational cohort study participants can often reduce costs. These study designs have similar statistical precision for addressing a singular research question, but case-cohort studies have broader efficiency and superior flexibility. Despite this, case-cohort designs are comparatively underutilized in the epidemiologic literature. Recent advances in statistical methods and software have made analyses of case-cohort data easier to implement, and advances from casual inference, such as inverse probability of sampling weights, have allowed the case-cohort design to be used with a variety of target parameters and populations. To provide an accessible link to this technical literature, we give a conceptual overview of case-cohort study analysis with inverse probability of sampling weights. We show how this general analytic approach can be leveraged to more efficiently study subgroups of interest or disease subtypes or to examine associations independent of case status. A brief discussion of how this framework could be extended to incorporate other related methodologic applications further demonstrates the broad cost-effectiveness and adaptability of case-cohort methods for a variety of modern epidemiologic applications in resource-limited settings.
Collapse
Affiliation(s)
- Katie M O'Brien
- From the Epidemiology Branch, National Institute of Environmental Health Sciences, NC
| | - Kaitlyn G Lawrence
- From the Epidemiology Branch, National Institute of Environmental Health Sciences, NC
| | - Alexander P Keil
- From the Epidemiology Branch, National Institute of Environmental Health Sciences, NC
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC
| |
Collapse
|
7
|
Zhang H, Ding J. Hypothesis testing in outcome-dependent sampling design under generalized linear models. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2019.1682155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Haodong Zhang
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
| | - Jieli Ding
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
| |
Collapse
|
8
|
Xu Y, Kim S, Zhang MJ, Couper D, Ahn KW. Competing risks regression models with covariates-adjusted censoring weight under the generalized case-cohort design. LIFETIME DATA ANALYSIS 2022; 28:241-262. [PMID: 35034255 PMCID: PMC8977245 DOI: 10.1007/s10985-022-09546-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Accepted: 12/31/2021] [Indexed: 06/14/2023]
Abstract
A generalized case-cohort design has been used when measuring exposures is expensive and events are not rare in the full cohort. This design collects expensive exposure information from a (stratified) randomly selected subset from the full cohort, called the subcohort, and a fraction of cases outside the subcohort. For the full cohort study with competing risks, He et al. (Scand J Stat 43:103-122, 2016) studied the non-stratified proportional subdistribution hazards model with covariate-dependent censoring to directly evaluate covariate effects on the cumulative incidence function. In this paper, we propose a stratified proportional subdistribution hazards model with covariate-adjusted censoring weights for competing risks data under the generalized case-cohort design. We consider a general class of weight functions to account for the generalized case-cohort design. Then, we derive the optimal weight function which minimizes the asymptotic variance of parameter estimates within the general class of weight functions. The proposed estimator is shown to be consistent and asymptotically normally distributed. The simulation studies show (i) the proposed estimator with covariate-adjusted weight is unbiased when the censoring distribution depends on covariates; and (ii) the proposed estimator with the optimal weight function gains parameter estimation efficiency. We apply the proposed method to stem cell transplantation and diabetes data sets.
Collapse
Affiliation(s)
- Yayun Xu
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA
| | - Soyoung Kim
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA.
| | - Mei-Jie Zhang
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA
| | - David Couper
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kwang Woo Ahn
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA
| |
Collapse
|
9
|
Case-cohort design in hematopoietic cell transplant studies. Bone Marrow Transplant 2022; 57:1-5. [PMID: 34400795 PMCID: PMC8738130 DOI: 10.1038/s41409-021-01433-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 07/14/2021] [Accepted: 08/03/2021] [Indexed: 02/08/2023]
Abstract
SERIES EDITORS- NOTE Imagine you and your colleagues have done 1000 transplants in persons with acute myeloid leukaemia (AML) in 1st remission. 5 percent of the 20 percent of recipients relapsing posttransplant have an isolated central nervous system relapse. You are curious and want to know whether there is anything special about this 5 percent, specifically whether this risk corelates with any pretransplant clinical and laboratory co-variates. You have extensive clinical data and some typical laboratory data on all 1000 but you suspect the culprit is mutation topography. What to do? Fortunately you have bio-banked DNA from the 1000. If resources and monies are not limiting you can do targeted or next generation sequencing on all 1000 DNA samples and off you go. However, most of us lack unlimited resources and monies. How can you sensibly and efficiently tackle this research problem? The answer is a case-cohort design study. In the typescript which follows Profs. Cai and Kim explain how to accomplish this. If you follow their advice you may need only to analyze samples from < 300 recipients rather than 1000 to test your hypothesis. They explain how to design such a study and provide references to estimate sample size.Sadly, their typescript will not tell you how to get funding for the study, whish poor devil who will have to write the protocol, worse, who will shepherd it though endless committees for approval and the like. Help on these issues is outside the scope of our statistics series. In this context we suggest advice from Woody Allen's article in the New Yorker: The Kugelmass Episode (April 24, 1977). When Prof. Kugelmass (English, City College) tells his analyst Dr. Mandel he has fallen in love with Emma Bovary who died of arsenic poisoning near Rouen, France 120 years earlier the analyst says: After all, I'm an analyst, not a magician. Kugelmass' reply: Then perhaps what I need is a magician and is off to Coney Island to find one. Good luck, the magician may still be there! (Note: This typescript is R-rated. It contains an equation.)Robert Peter Gale, Imperial College London, and Mei-Jie Zhang, Medical College of Wisconsin and CIBMTR.
Collapse
|
10
|
Pan Y, Liu Z, Song G, Wei S. Case-cohort and inference for the proportional hazards model with covariate adjustment. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.1996607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Yingli Pan
- Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
| | - Zhan Liu
- Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
| | - Guangyu Song
- Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
| | - Sha Wei
- Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
| |
Collapse
|
11
|
Zhang J, Zhou H, Liu Y, Cai J. Conditional screening for ultrahigh-dimensional survival data in case-cohort studies. LIFETIME DATA ANALYSIS 2021; 27:632-661. [PMID: 34417679 PMCID: PMC8561435 DOI: 10.1007/s10985-021-09531-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 08/05/2021] [Indexed: 06/13/2023]
Abstract
The case-cohort design has been widely used to reduce the cost of covariate measurements in large cohort studies. In many such studies, the number of covariates is very large, and the goal of the research is to identify active covariates which have great influence on response. Since the introduction of sure independence screening, screening procedures have achieved great success in terms of effectively reducing the dimensionality and identifying active covariates. However, commonly used screening methods are based on marginal correlation or its variants, they may fail to identify hidden active variables which are jointly important but are weakly correlated with the response. Moreover, these screening methods are mainly proposed for data under the simple random sampling and can not be directly applied to case-cohort data. In this paper, we consider the ultrahigh-dimensional survival data under the case-cohort design, and propose a conditional screening method by incorporating some important prior known information of active variables. This method can effectively detect hidden active variables. Furthermore, it possesses the sure screening property under some mild regularity conditions and does not require any complicated numerical optimization. We evaluate the finite sample performance of the proposed method via extensive simulation studies and further illustrate the new approach through a real data set from patients with breast cancer.
Collapse
Affiliation(s)
- Jing Zhang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599-7420, USA
| | - Yanyan Liu
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, China
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599-7420, USA.
| |
Collapse
|
12
|
Pan Y, Song G, Liu Z. Statistical inference for case-cohort design under the additive hazards model with covariate adjustment. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2021.1975133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Yingli Pan
- Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
| | - Guangyu Song
- Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
| | - Zhan Liu
- Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
| |
Collapse
|
13
|
Zhou Q, Cai J, Zhou H. Semiparametric regression analysis of case-cohort studies with multiple interval-censored disease outcomes. Stat Med 2021; 40:3106-3123. [PMID: 33783001 DOI: 10.1002/sim.8962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 03/01/2021] [Accepted: 03/10/2021] [Indexed: 11/05/2022]
Abstract
Interval-censored failure time data commonly arise in epidemiological and biomedical studies where the occurrence of an event or a disease is determined via periodic examinations. Subject to interval-censoring, available information on the failure time can be quite limited. Cost-effective sampling designs are desirable to enhance the study power, especially when the disease rate is low and the covariates are expensive to obtain. In this work, we formulate the case-cohort design with multiple interval-censored disease outcomes and also generalize it to nonrare diseases where only a portion of diseased subjects are sampled. We develop a marginal sieve weighted likelihood approach, which assumes that the failure times marginally follow the proportional hazards model. We consider two types of weights to account for the sampling bias, and adopt a sieve method with Bernstein polynomials to handle the unknown baseline functions. We employ a weighted bootstrap procedure to obtain a variance estimate that is robust to the dependence structure between failure times. The proposed method is examined via simulation studies and illustrated with a dataset on incident diabetes and hypertension from the Atherosclerosis Risk in Communities study.
Collapse
Affiliation(s)
- Qingning Zhou
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, North Carolina, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
14
|
Yu J, Zhou H, Cai J. Accelerated failure time model for data from outcome-dependent sampling. LIFETIME DATA ANALYSIS 2021; 27:15-37. [PMID: 33044612 PMCID: PMC7856009 DOI: 10.1007/s10985-020-09508-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Accepted: 09/29/2020] [Indexed: 05/26/2023]
Abstract
Outcome-dependent sampling designs such as the case-control or case-cohort design are widely used in epidemiological studies for their outstanding cost-effectiveness. In this article, we propose and develop a smoothed weighted Gehan estimating equation approach for inference in an accelerated failure time model under a general failure time outcome-dependent sampling scheme. The proposed estimating equation is continuously differentiable and can be solved by the standard numerical methods. In addition to developing asymptotic properties of the proposed estimator, we also propose and investigate a new optimal power-based subsamples allocation criteria in the proposed design by maximizing the power function of a significant test. Simulation results show that the proposed estimator is more efficient than other existing competing estimators and the optimal power-based subsamples allocation will provide an ODS design that yield improved power for the test of exposure effect. We illustrate the proposed method with a data set from the Norwegian Mother and Child Cohort Study to evaluate the relationship between exposure to perfluoroalkyl substances and women's subfecundity.
Collapse
Affiliation(s)
- Jichang Yu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, Hubei, China
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
15
|
Cao Y, Yu J. A class of goodness-of-fit test for the additive hazards model with case-cohort data. Pharm Stat 2020; 20:451-461. [PMID: 33305424 DOI: 10.1002/pst.2087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Revised: 10/13/2020] [Accepted: 11/16/2020] [Indexed: 11/12/2022]
Abstract
The case-cohort design is commonly used in epidemiological studies due to its cost-effectiveness. The additive hazards model is widely used in survival analysis when the hazards difference is constant. In this article, we propose a class of goodness-of-fit test statistics for the assumption of the additive hazards model with case-cohort data through a class of asymptotically mean-zero multiparameter stochastic processes. We also establish the asymptotic theory of the proposed test statistics and a resampling scheme is adopted to approximate its asymptotic distribution. The performance of the proposed test statistics is evaluated through simulation studies and a real dataset is analyzed to illustrate the proposed method.
Collapse
Affiliation(s)
- Yongxiu Cao
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| | - Jichang Yu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| |
Collapse
|
16
|
Zhang J, Zhou H, Liu Y, Cai J. Feature screening for case‐cohort studies with failure time outcome. Scand Stat Theory Appl 2020; 48:349-370. [DOI: 10.1111/sjos.12503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jing Zhang
- School of Statistics and Mathematics Zhongnan University of Economics and Law Wuhan China
| | - Haibo Zhou
- Department of Biostatistics University of North Carolina at Chapel Hill Chapel Hill North Carolina USA
| | - Yanyan Liu
- School of Mathematics and Statistics Wuhan University Wuhan China
| | - Jianwen Cai
- Department of Biostatistics University of North Carolina at Chapel Hill Chapel Hill North Carolina USA
| |
Collapse
|
17
|
Parner ET, Andersen PK, Overgaard M. Cumulative risk regression in case-cohort studies using pseudo-observations. LIFETIME DATA ANALYSIS 2020; 26:639-658. [PMID: 31933047 DOI: 10.1007/s10985-020-09492-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 01/03/2020] [Indexed: 06/10/2023]
Abstract
Case-cohort studies are useful when information on certain risk factors is difficult or costly to ascertain. Particularly, a case-cohort study may be well suited in situations where several case series are of interest, e.g. in studies with competing risks, because the same sub-cohort may serve as a comparison group for all case series. Previous analyses of this kind of sampled cohort data most often involved estimation of rate ratios based on a Cox regression model. However, with competing risks this method will not provide parameters that directly describe the association between covariates and cumulative risks. In this paper, we study regression analysis of cause-specific cumulative risks in case-cohort studies using pseudo-observations. We focus mainly on the situation with competing risks. However, as a by-product, we also develop a method by which absolute mortality risks may be analyzed directly from case-cohort survival data. We adjust for the case-cohort sampling by inverse sampling probabilities applied to a generalized estimation equation. The large-sample properties of the proposed estimator are developed and small-sample properties are evaluated in a simulation study. We apply the methodology to study the effect of a specific diet component and a specific gene on the absolute risk of atrial fibrillation.
Collapse
Affiliation(s)
- Erik T Parner
- Section for Biostatistics, Aarhus University, Bartholins Allé 2, 8000, Aarhus C, Denmark.
| | - Per K Andersen
- Section of Biostatistics, University of Copenhagen, Øster Farimagsgade 5, 1014, Copenhagen K, Denmark
| | - Morten Overgaard
- Section for Biostatistics, Aarhus University, Bartholins Allé 2, 8000, Aarhus C, Denmark
| |
Collapse
|
18
|
Kim S, Xu Y, Zhang M, Ahn K. Stratified proportional subdistribution hazards model with covariate‐adjusted censoring weight for case‐cohort studies. Scand Stat Theory Appl 2020. [DOI: 10.1111/sjos.12461] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Soyoung Kim
- Division of Biostatistics Medical College of Wisconsin USA
| | - Yayun Xu
- Division of Biostatistics Medical College of Wisconsin USA
| | - Mei‐Jie Zhang
- Division of Biostatistics Medical College of Wisconsin USA
| | - Kwang‐Woo Ahn
- Division of Biostatistics Medical College of Wisconsin USA
| |
Collapse
|
19
|
Du M, Zhou Q, Zhao S, Sun J. Regression Analysis of Case-cohort Studies in the Presence of Dependent Interval Censoring. J Appl Stat 2020; 48:846-865. [PMID: 33767519 DOI: 10.1080/02664763.2020.1752633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The case-cohort design is widely used as a means of reducing the cost in large cohort studies, especially when the disease rate is low and covariate measurements may be expensive, and has been discussed by many authors. In this paper, we discuss regression analysis of case-cohort studies that produce interval-censored failure time with dependent censoring, a situation for which there does not seem to exist an established approach. For inference, a sieve inverse probability weighting estimation procedure is developed with the use of Bernstein polynomials to approximate the unknown baseline cumulative hazard functions. The proposed estimators are shown to be consistent and the asymptotic normality of the resulting regression parameter estimators are established. A simulation study is conducted to assess the finite sample properties of the proposed approach and indicates that it works well in practical situations. The proposed method is applied to an HIV/AIDS case-cohort study that motivated this investigation.
Collapse
Affiliation(s)
- Mingyue Du
- Center for Applied Statistical Research and College of Mathematics, Jilin University, Changchun, China
| | - Qingning Zhou
- Department of Mathematics and Statistics, The University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Shishun Zhao
- Center for Applied Statistical Research and College of Mathematics, Jilin University, Changchun, China
| | - Jianguo Sun
- Department of Statistics, University of Missouri, Columbia, MO, USA
| |
Collapse
|
20
|
Pan Y. Generalized case-cohort analysis for constrained estimation in the Cox’s model. COMMUN STAT-SIMUL C 2020. [DOI: 10.1080/03610918.2018.1475008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Yingli Pan
- School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
21
|
Zhou Q, Cai J, Zhou H. Semiparametric inference for a two-stage outcome-dependent sampling design with interval-censored failure time data. LIFETIME DATA ANALYSIS 2020; 26:85-108. [PMID: 30617753 PMCID: PMC6612481 DOI: 10.1007/s10985-019-09461-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 01/02/2019] [Indexed: 06/09/2023]
Abstract
We propose a two-stage outcome-dependent sampling design and inference procedure for studies that concern interval-censored failure time outcomes. This design enhances the study efficiency by allowing the selection probabilities of the second-stage sample, for which the expensive exposure variable is ascertained, to depend on the first-stage observed interval-censored failure time outcomes. In particular, the second-stage sample is enriched by selectively including subjects who are known or observed to experience the failure at an early or late time. We develop a sieve semiparametric maximum pseudo likelihood procedure that makes use of all available data from the proposed two-stage design. The resulting regression parameter estimator is shown to be consistent and asymptotically normal, and a consistent estimator for its asymptotic variance is derived. Simulation results demonstrate that the proposed design and inference procedure performs well in practical situations and is more efficient than the existing designs and methods. An application to a phase 3 HIV vaccine trial is provided.
Collapse
Affiliation(s)
- Qingning Zhou
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, Fretwell 335L, 9201 University City Blvd., Charlotte, NC, 28223, USA.
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, 3101D McGavran-Greenberg Hall, Chapel Hill, NC, 27599, USA
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, 3104C McGavran-Greenberg Hall, Chapel Hill, NC, 27599, USA
| |
Collapse
|
22
|
Maitra P, Amorim LDAF, Cai J. Multiplicative rates model for recurrent events in case-cohort studies. LIFETIME DATA ANALYSIS 2020; 26:134-157. [PMID: 30734884 PMCID: PMC6687570 DOI: 10.1007/s10985-019-09466-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 01/29/2019] [Indexed: 06/09/2023]
Abstract
In large prospective cohort studies, accumulation of covariate information and follow-up data make up the majority of the cost involved in the study. This might lead to the study being infeasible when there are some expensive variables and/or the event is rare. Prentice (Biometrika 73(1):1-11, 1986) proposed the case-cohort study for time to event data to tackle this problem. There has been extensive research on the analysis of univariate and clustered failure time data, where the clusters are formed among different individuals under case-cohort sampling scheme. However, recurrent event data are quite common in biomedical and public health research. In this paper, we propose case-cohort sampling schemes for recurrent events. We consider a multiplicative rates model for the recurrent events and propose a weighted estimating equations approach for parameter estimation. We show that the estimators are consistent and asymptotically normally distributed. The proposed estimator performed well in finite samples in our simulation studies. For illustration purposes, we examined the association between prior occurrence of measles on acute lower respiratory tract infections (ALRI) among young children in Brazil.
Collapse
Affiliation(s)
- Poulami Maitra
- Department of Biostatistics, University of North Carolina at Chapel Hill,
| | - Leila DAF Amorim
- Department of Statistics, Institute of Mathematics, Federal University of Bahia, Brazil,
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill,
| |
Collapse
|
23
|
Cao Y, Shi Y, Yu J. Statistical inference for the accelerated failure time model under two-stage generalized case–cohort design. COMMUN STAT-THEOR M 2019. [DOI: 10.1080/03610926.2018.1528363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Yongxiu Cao
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| | - Yueyong Shi
- School of Economics and Management, China University of Geosciences, Wuhan, China
- Center for Resources and Environmental Economic Research, China University of Geosciences, Wuhan, China
| | - Jichang Yu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| |
Collapse
|
24
|
Pan Y, Ding J, Liu Y. Statistical inference for generalized case-cohort design under the proportional hazards model with parameter constraints. COMMUN STAT-SIMUL C 2019. [DOI: 10.1080/03610918.2018.1458128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Yingli Pan
- School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Jieli Ding
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
| | - Yanyan Liu
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
| |
Collapse
|
25
|
Kim S, Woo Ahn K. Bi-level variable selection for case-cohort studies with group variables. Stat Methods Med Res 2018; 28:3404-3414. [PMID: 30306838 DOI: 10.1177/0962280218803654] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The case-cohort design is an economical approach to estimate the effect of risk factors on the survival outcome when collecting exposure information or covariates on all patients is expensive in a large cohort study. Variables often have group structure such as categorical variables and highly correlated continuous variables. The existing literature for case-cohort data is limited to identifying non-zero variables at individual level only. In this article, we propose a bi-level variable selection method to select non-zero group and within-group variables for case-cohort data when variables have group structure. The proposed method allows the number of variables to diverge as the sample size increases. The asymptotic properties of the estimator including bi-level variable selection consistency and the asymptotic normality are shown. We also conduct simulations to compare our proposed method with some existing method and apply them to the Busselton Health data.
Collapse
Affiliation(s)
- Soyoung Kim
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Kwang Woo Ahn
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, USA
| |
Collapse
|
26
|
Kim S, Zeng D, Cai J. Analysis of multiple survival events in generalized case-cohort designs. Biometrics 2018; 74:1250-1260. [PMID: 29992545 DOI: 10.1111/biom.12923] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 05/01/2018] [Accepted: 05/01/2018] [Indexed: 01/04/2023]
Abstract
Generalized case-cohort design has been proposed to assess the effects of exposures on survival outcomes when measuring exposures is expensive and events are not rare in the cohort. In such design, expensive exposure information is collected from both a (stratified) randomly selected subcohort and a subset of individuals with events. In this article, we consider extension of such design to study multiple types of survival events by selecting a proportion of cases for each type of event. We propose a general weighting scheme to analyze data. Furthermore, we examine the optimal choice of weights and show that this optimal weighting yields much improved efficiency gain both asymptotically and in simulation studies. Finally, we apply our proposed methods to data from the Atherosclerosis Risk in Communities study.
Collapse
Affiliation(s)
- Soyoung Kim
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, U.S.A
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| |
Collapse
|
27
|
Lee U, Sun Y, Scheike TH, Gilbert PB. Analysis of Generalized Semiparametric Regression Models for Cumulative Incidence Functions with Missing Covariates. Comput Stat Data Anal 2018; 122:59-79. [PMID: 29892140 DOI: 10.1016/j.csda.2018.01.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The cumulative incidence function quantifies the probability of failure over time due to a specific cause for competing risks data. The generalized semiparametric regression models for the cumulative incidence functions with missing covariates are investigated. The effects of some covariates are modeled as non-parametric functions of time while others are modeled as parametric functions of time. Different link functions can be selected to add flexibility in modeling the cumulative incidence functions. The estimation procedures based on the direct binomial regression and the inverse probability weighting of complete cases are developed. This approach modifies the full data weighted least squares equations by weighting the contributions of observed members through the inverses of estimated sampling probabilities which depend on the censoring status and the event types among other subject characteristics. The asymptotic properties of the proposed estimators are established. The finite-sample performances of the proposed estimators and their relative efficiencies under different two-phase sampling designs are examined in simulations. The methods are applied to analyze data from the RV144 vaccine efficacy trial to investigate the associations of immune response biomarkers with the cumulative incidence of HIV-1 infection.
Collapse
Affiliation(s)
- Unkyung Lee
- Department of Statistics, Texas A&M University, College Station, TX 77843, U.S.A
| | - Yanqing Sun
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Thomas H Scheike
- Department of Biostatistics, University of Copenhagen, Øster Farimagsgade 5, DK-1014, Denmark
| | - Peter B Gilbert
- Department of Biostatistics, University of Washington, Seattle, WA 98195, U.S.A.,Vaccine and Infectious Disease and Public Health Sciences Divisions, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, U.S.A
| |
Collapse
|
28
|
Deng L, Ding J, Liu Y, Wei C. Regression analysis for the proportional hazards model with parameter constraints under case-cohort design. Comput Stat Data Anal 2018. [DOI: 10.1016/j.csda.2017.08.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
29
|
Pan Y, Cai J, Kim S, Zhou H. Regression analysis for secondary response variable in a case-cohort study. Biometrics 2017; 74:1014-1022. [PMID: 29286533 DOI: 10.1111/biom.12838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 11/01/2017] [Accepted: 11/01/2017] [Indexed: 12/01/2022]
Abstract
Case-cohort study design has been widely used for its cost-effectiveness. In any real study, there are always other important outcomes of interest beside the failure time that the original case-cohort study is based on. How to utilize the available case-cohort data to study the relationship of a secondary outcome with the primary exposure obtained through the case-cohort study is not well studied. In this article, we propose a non-parametric estimated likelihood approach for analyzing a secondary outcome in a case-cohort study. The estimation is based on maximizing a semiparametric likelihood function that is built jointly on both time-to-failure outcome and the secondary outcome. The proposed estimator is shown to be consistent, efficient, and asymptotically normal. Finite sample performance is evaluated via simulation studies. Data from the Sister Study is analyzed to illustrate our method.
Collapse
Affiliation(s)
- Yinghao Pan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Sangmi Kim
- Medical College of Georgia, GRU Cancer Center, Augusta University, Augusta, Georgia 30912, U.S.A
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|
30
|
Yan Y, Zhou H, Cai J. Improving efficiency of parameter estimation in case-cohort studies with multivariate failure time data. Biometrics 2017; 73:1042-1052. [PMID: 28112795 DOI: 10.1111/biom.12657] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 12/01/2016] [Accepted: 12/01/2016] [Indexed: 11/30/2022]
Abstract
The case-cohort study design is an effective way to reduce cost of assembling and measuring expensive covariates in large cohort studies. Recently, several weighted estimators were proposed for the case-cohort design when multiple diseases are of interest. However, these existing weighted estimators do not make effective use of the covariate information available in the whole cohort. Furthermore, the auxiliary information for the expensive covariates, which may be available in the studies, cannot be incorporated directly. In this article, we propose a class of updated-estimators. We show that, by making effective use of the whole cohort information, the proposed updated-estimators are guaranteed to be more efficient than the existing weighted estimators asymptotically. Furthermore, they are flexible to incorporate the auxiliary information whenever available. The advantages of the proposed updated-estimators are demonstrated in simulation studies and a real data analysis.
Collapse
Affiliation(s)
- Ying Yan
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada, T2N 1N4
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|
31
|
Zhou Q, Cai J, Zhou H. Outcome-dependent sampling with interval-censored failure time data. Biometrics 2017; 74:58-67. [PMID: 28771664 DOI: 10.1111/biom.12744] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 06/01/2017] [Accepted: 06/01/2017] [Indexed: 11/30/2022]
Abstract
Epidemiologic studies and disease prevention trials often seek to relate an exposure variable to a failure time that suffers from interval-censoring. When the failure rate is low and the time intervals are wide, a large cohort is often required so as to yield reliable precision on the exposure-failure-time relationship. However, large cohort studies with simple random sampling could be prohibitive for investigators with a limited budget, especially when the exposure variables are expensive to obtain. Alternative cost-effective sampling designs and inference procedures are therefore desirable. We propose an outcome-dependent sampling (ODS) design with interval-censored failure time data, where we enrich the observed sample by selectively including certain more informative failure subjects. We develop a novel sieve semiparametric maximum empirical likelihood approach for fitting the proportional hazards model to data from the proposed interval-censoring ODS design. This approach employs the empirical likelihood and sieve methods to deal with the infinite-dimensional nuisance parameters, which greatly reduces the dimensionality of the estimation problem and eases the computation difficulty. The consistency and asymptotic normality of the resulting regression parameter estimator are established. The results from our extensive simulation study show that the proposed design and method works well for practical situations and is more efficient than the alternative designs and competing approaches. An example from the Atherosclerosis Risk in Communities (ARIC) study is provided for illustration.
Collapse
Affiliation(s)
- Qingning Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|
32
|
Cao Y, Yang Q, Yu J. Optimal generalized case–cohort analysis with accelerated failure time model. J Korean Stat Soc 2017. [DOI: 10.1016/j.jkss.2016.10.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
33
|
Cao Y, Yu J. Optimal generalized case–cohort sampling design under the additive hazard model. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2015.1085563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
34
|
Kang S. Fitting semiparametric accelerated failure time models for nested case–control data. J STAT COMPUT SIM 2017. [DOI: 10.1080/00949655.2016.1222611] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
35
|
Abstract
The case-cohort design has been widely used as a means of cost reduction in assembling or measuring expensive covariates in large cohort studies. The existing literature on the case-cohort design is mainly focused on right-censored data. In practice, however, the failure time is often subject to interval-censoring; it is known only to fall within some random time interval. In this paper, we consider the case-cohort study design for interval-censored failure time and develop a sieve semiparametric likelihood approach for analyzing data from this design under the proportional hazards model. We construct the likelihood function using inverse probability weighting and build the sieves with Bernstein polynomials. The consistency and asymptotic normality of the resulting regression parameter estimator are established and a weighted bootstrap procedure is considered for variance estimation. Simulations show that the proposed method works well for practical situations, and an application to real data is provided.
Collapse
Affiliation(s)
- Q Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - H Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - J Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|
36
|
Yu J, Liu Y, Cai J, Sandler DP, Zhou H. Outcome-Dependent Sampling Design and Inference for Cox's Proportional Hazards Model. J Stat Plan Inference 2017; 178:24-36. [PMID: 28090134 DOI: 10.1016/j.jspi.2016.05.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We propose a cost-effective outcome-dependent sampling design for the failure time data and develop an efficient inference procedure for data collected with this design. To account for the biased sampling scheme, we derive estimators from a weighted partial likelihood estimating equation. The proposed estimators for regression parameters are shown to be consistent and asymptotically normally distributed. A criteria that can be used to optimally implement the ODS design in practice is proposed and studied. The small sample performance of the proposed method is evaluated by simulation studies. The proposed design and inference procedure is shown to be statistically more powerful than existing alternative designs with the same sample sizes. We illustrate the proposed method with an existing real data from the Cancer Incidence and Mortality of Uranium Miners Study.
Collapse
Affiliation(s)
- Jichang Yu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, Hubei 430073, China; School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei 430072, China
| | - Yanyan Liu
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei 430072, China
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Dale P Sandler
- Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
37
|
Ding J, Lu TS, Cai J, Zhou H. Recent progresses in outcome-dependent sampling with failure time data. LIFETIME DATA ANALYSIS 2017; 23:57-82. [PMID: 26759313 PMCID: PMC4942414 DOI: 10.1007/s10985-015-9355-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 12/22/2015] [Indexed: 06/05/2023]
Abstract
An outcome-dependent sampling (ODS) design is a retrospective sampling scheme where one observes the primary exposure variables with a probability that depends on the observed value of the outcome variable. When the outcome of interest is failure time, the observed data are often censored. By allowing the selection of the supplemental samples depends on whether the event of interest happens or not and oversampling subjects from the most informative regions, ODS design for the time-to-event data can reduce the cost of the study and improve the efficiency. We review recent progresses and advances in research on ODS designs with failure time data. This includes researches on ODS related designs like case-cohort design, generalized case-cohort design, stratified case-cohort design, general failure-time ODS design, length-biased sampling design and interval sampling design.
Collapse
Affiliation(s)
- Jieli Ding
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, 430072, China
| | - Tsui-Shan Lu
- Department of Mathematics, National Taiwan Normal University, Taipei, 116, Taiwan
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
38
|
Affiliation(s)
- Huijuan Ma
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| | - Yong Zhou
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
- Academy of Mathematics and Systems Science Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
39
|
Abstract
Case-cohort designs are widely used in large cohort studies to reduce the cost associated with covariate measurement. In many such studies the number of covariates is very large, so an efficient variable selection method is necessary. In this paper, we study the properties of a variable selection procedure using the smoothly clipped absolute deviation penalty in a case-cohort design with a diverging number of parameters. We establish the consistency and asymptotic normality of the maximum penalized pseudo-partial-likelihood estimator, and show that the proposed variable selection method is consistent and has an asymptotic oracle property. Simulation studies compare the finite-sample performance of the procedure with tuning parameter selection methods based on the Akaike information criterion and the Bayesian information criterion. We make recommendations for use of the proposed procedures in case-cohort studies, and apply them to the Busselton Health Study.
Collapse
Affiliation(s)
- A I Ni
- 3101 McGavran-Greenberg Hall, CB 7420, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Jianwen Cai
- 3101 McGavran-Greenberg Hall, CB 7420, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Donglin Zeng
- 3101 McGavran-Greenberg Hall, CB 7420, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|
40
|
Payne R, Yang M, Zheng Y, Jensen MK, Cai T. Robust risk prediction with biomarkers under two-phase stratified cohort design. Biometrics 2016; 72:1037-1045. [PMID: 27037494 DOI: 10.1111/biom.12515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2014] [Revised: 12/01/2015] [Accepted: 02/01/2016] [Indexed: 11/27/2022]
Abstract
Identification of novel biomarkers for risk prediction is important for disease prevention and optimal treatment selection. However, studies aiming to discover which biomarkers are useful for risk prediction often require the use of stored biological samples from large assembled cohorts, and thus the depletion of a finite and precious resource. To make efficient use of such stored samples, two-phase sampling designs are often adopted as resource-efficient sampling strategies, especially when the outcome of interest is rare. Existing methods for analyzing data from two-phase studies focus primarily on single marker analysis or fitting the Cox regression model to combine information from multiple markers. However, the Cox model may not fit the data well. Under model misspecification, the composite score derived from the Cox model may not perform well in predicting the outcome. Under a general two-phase stratified cohort sampling design, we present a novel approach to combining multiple markers to optimize prediction by fitting a flexible nonparametric transformation model. Using inverse probability weighting to account for the outcome-dependent sampling, we propose to estimate the model parameters by maximizing an objective function which can be interpreted as a weighted C-statistic for survival outcomes. Regardless of model adequacy, the proposed procedure yields a sensible composite risk score for prediction. A major obstacle for making inference under two phase studies is due to the correlation induced by the finite population sampling, which prevents standard inference procedures such as the bootstrap from being used for variance estimation. We propose a resampling procedure to derive valid confidence intervals for the model parameters and the C-statistic accuracy measure. We illustrate the new methods with simulation studies and an analysis of a two-phase study of high-density lipoprotein cholesterol (HDL-C) subtypes for predicting the risk of coronary heart disease.
Collapse
Affiliation(s)
- Rebecca Payne
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, U.S.A
| | - Ming Yang
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, U.S.A
| | - Yingye Zheng
- Fred Hutchinson Cancer Research Center, Seattle, Washington, U.S.A
| | - Majken K Jensen
- Department of Nutrition, Harvard School of Public Health, Boston, Massachusetts, U.S.A
| | - Tianxi Cai
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, U.S.A
| |
Collapse
|
41
|
Crude incidence in two-phase designs in the presence of competing risks. BMC Med Res Methodol 2016; 16:5. [PMID: 26754746 PMCID: PMC4710022 DOI: 10.1186/s12874-015-0103-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 12/17/2015] [Indexed: 11/12/2022] Open
Abstract
Background In many studies, some information might not be available for the whole cohort, some covariates, or even the outcome, might be ascertained in selected subsamples. These studies are part of a broad category termed two-phase studies. Common examples include the nested case-control and the case-cohort designs. For two-phase studies, appropriate weighted survival estimates have been derived; however, no estimator of cumulative incidence accounting for competing events has been proposed. This is relevant in the presence of multiple types of events, where estimation of event type specific quantities are needed for evaluating outcome. Methods We develop a non parametric estimator of the cumulative incidence function of events accounting for possible competing events. It handles a general sampling design by weights derived from the sampling probabilities. The variance is derived from the influence function of the subdistribution hazard. Results The proposed method shows good performance in simulations. It is applied to estimate the crude incidence of relapse in childhood acute lymphoblastic leukemia in groups defined by a genotype not available for everyone in a cohort of nearly 2000 patients, where death due to toxicity acted as a competing event. In a second example the aim was to estimate engagement in care of a cohort of HIV patients in resource limited setting, where for some patients the outcome itself was missing due to lost to follow-up. A sampling based approach was used to identify outcome in a subsample of lost patients and to obtain a valid estimate of connection to care. Conclusions A valid estimator for cumulative incidence of events accounting for competing risks under a general sampling design from an infinite target population is derived.
Collapse
|
42
|
Yu J, Liu Y, Sandler DP, Zhou H. Statistical inference for the additive hazards model under outcome-dependent sampling. CAN J STAT 2015; 43:436-453. [PMID: 26379363 DOI: 10.1002/cjs.11257] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.
Collapse
Affiliation(s)
- Jichang Yu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, Hubei 430073, China
| | - Yanyan Liu
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei 430072, China
| | - Dale P Sandler
- Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, U.S.A
| | - Haibo Zhou
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|
43
|
Ding J, Zhou H, Liu Y, Cai J, Longnecker MP. Estimating effect of environmental contaminants on women's subfecundity for the MoBa study data with an outcome-dependent sampling scheme. Biostatistics 2014; 15:636-50. [PMID: 24812419 DOI: 10.1093/biostatistics/kxu016] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivated by the need from our on-going environmental study in the Norwegian Mother and Child Cohort (MoBa) study, we consider an outcome-dependent sampling (ODS) scheme for failure-time data with censoring. Like the case-cohort design, the ODS design enriches the observed sample by selectively including certain failure subjects. We present an estimated maximum semiparametric empirical likelihood estimation (EMSELE) under the proportional hazards model framework. The asymptotic properties of the proposed estimator were derived. Simulation studies were conducted to evaluate the small-sample performance of our proposed method. Our analyses show that the proposed estimator and design is more efficient than the current default approach and other competing approaches. Applying the proposed approach with the data set from the MoBa study, we found a significant effect of an environmental contaminant on fecundability.
Collapse
Affiliation(s)
- Jieli Ding
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei 430072, China
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yanyan Liu
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei 430072, China
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Matthew P Longnecker
- National Institute of Environmental Health Sciences, National Institute of Health, Research Triangle Park, NC 27709, USA
| |
Collapse
|
44
|
Zeng D, Lin DY. Efficient Estimation of Semiparametric Transformation Models for Two-Phase Cohort Studies. J Am Stat Assoc 2014; 109:371-383. [PMID: 24659837 DOI: 10.1080/01621459.2013.842172] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Under two-phase cohort designs, such as case-cohort and nested case-control sampling, information on observed event times, event indicators, and inexpensive covariates is collected in the first phase, and the first-phase information is used to select subjects for measurements of expensive covariates in the second phase; inexpensive covariates are also used in the data analysis to control for confounding and to evaluate interactions. This paper provides efficient estimation of semiparametric transformation models for such designs, accommodating both discrete and continuous covariates and allowing inexpensive and expensive covariates to be correlated. The estimation is based on the maximization of a modified nonparametric likelihood function through a generalization of the expectation-maximization algorithm. The resulting estimators are shown to be consistent, asymptotically normal and asymptotically efficient with easily estimated variances. Simulation studies demonstrate that the asymptotic approximations are accurate in practical situations. Empirical data from Wilms' tumor studies and the Atherosclerosis Risk in Communities (ARIC) study are presented.
Collapse
Affiliation(s)
- Donglin Zeng
- Department of Biostatistics, CB#7420, University of North Carolina, Chapel Hill, NC 27599-7420
| | - D Y Lin
- Department of Biostatistics, CB#7420, University of North Carolina, Chapel Hill, NC 27599-7420
| |
Collapse
|
45
|
Abstract
The case-cohort study design, used to reduce costs in large cohort studies, is a random sample of the entire cohort, named the subcohort, augmented with subjects having the disease of interest but not in the subcohort sample. When several diseases are of interest, several case-cohort studies may be conducted using the same subcohort, with each disease analyzed separately, ignoring the additional exposure measurements collected on subjects with the other diseases. This is not an efficient use of the data, and in this paper, we propose more efficient estimators. We consider both joint and separate analyses for the multiple diseases. We propose an estimating equation approach with a new weight function, and we establish the consistency and asymptotic normality of the resulting estimator. Simulation studies show that the proposed methods using all available information gain efficiency. We apply our proposed method to the data from the Busselton Health Study.
Collapse
Affiliation(s)
- S Kim
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - J Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - W Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, U.S.A
| |
Collapse
|
46
|
Kang S, Cai J. Asymptotic results for fitting marginal hazards models from stratified case-cohort studies with multiple disease outcomes. J Korean Stat Soc 2010; 39:371-385. [PMID: 22442642 DOI: 10.1016/j.jkss.2010.03.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
In stratified case-cohort designs, samplings of case-cohort samples are conducted via a stratified random sampling based on covariate information available on the entire cohort members. In this paper, we extended the work of Kang & Cai (2009) to a generalized stratified case-cohort study design for failure time data with multiple disease outcomes. Under this study design, we developed weighted estimating procedures for model parameters in marginal multiplicative intensity models and for the cumulative baseline hazard function. The asymptotic properties of the estimators are studied using martingales, modern empirical process theory, and results for finite population sampling.
Collapse
Affiliation(s)
- Sangwook Kang
- Department of Epidemiology and Biostatistics, University of Georgia, Athens, Georgia 30602, United States
| | | |
Collapse
|