1
|
Etievant L, Gail MH. Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data. LIFETIME DATA ANALYSIS 2024:10.1007/s10985-024-09621-2. [PMID: 38565754 DOI: 10.1007/s10985-024-09621-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 01/30/2024] [Indexed: 04/04/2024]
Abstract
The case-cohort design obtains complete covariate data only on cases and on a random sample (the subcohort) of the entire cohort. Subsequent publications described the use of stratification and weight calibration to increase efficiency of estimates of Cox model log-relative hazards, and there has been some work estimating pure risk. Yet there are few examples of these options in the medical literature, and we could not find programs currently online to analyze these various options. We therefore present a unified approach and R software to facilitate such analyses. We used influence functions adapted to the various design and analysis options together with variance calculations that take the two-phase sampling into account. This work clarifies when the widely used "robust" variance estimate of Barlow (Biometrics 50:1064-1072, 1994) is appropriate. The corresponding R software, CaseCohortCoxSurvival, facilitates analysis with and without stratification and/or weight calibration, for subcohort sampling with or without replacement. We also allow for phase-two data to be missing at random for stratified designs. We provide inference not only for log-relative hazards in the Cox model, but also for cumulative baseline hazards and covariate-specific pure risks. We hope these calculations and software will promote wider use of more efficient and principled design and analysis options for case-cohort studies.
Collapse
Affiliation(s)
- Lola Etievant
- Division of Cancer Epidemiology and Genetics, Biostatistics Branch, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850-9780, USA.
| | - Mitchell H Gail
- Division of Cancer Epidemiology and Genetics, Biostatistics Branch, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850-9780, USA.
| |
Collapse
|
2
|
Fang X, Ahn KW, Cai J, Kim S. Efficient estimation for left-truncated competing risks regression for case-cohort studies. Biometrics 2024; 80:ujad008. [PMID: 38281769 PMCID: PMC10826882 DOI: 10.1093/biomtc/ujad008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 09/15/2023] [Accepted: 11/06/2023] [Indexed: 01/30/2024]
Abstract
The case-cohort study design provides a cost-effective study design for a large cohort study with competing risk outcomes. The proportional subdistribution hazards model is widely used to estimate direct covariate effects on the cumulative incidence function for competing risk data. In biomedical studies, left truncation often occurs and brings extra challenges to the analysis. Existing inverse probability weighting methods for case-cohort studies with competing risk data not only have not addressed left truncation, but also are inefficient in regression parameter estimation for fully observed covariates. We propose an augmented inverse probability-weighted estimating equation for left-truncated competing risk data to address these limitations of the current literature. We further propose a more efficient estimator when extra information from the other causes is available. The proposed estimators are consistent and asymptotically normally distributed. Simulation studies show that the proposed estimator is unbiased and leads to estimation efficiency gain in the regression parameter estimation. We analyze the Atherosclerosis Risk in Communities study data using the proposed methods.
Collapse
Affiliation(s)
- Xi Fang
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI 53226, United States
| | - Kwang Woo Ahn
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI 53226, United States
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, NC 27599, United States
| | - Soyoung Kim
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI 53226, United States
| |
Collapse
|
3
|
O'Brien KM, Lawrence KG, Keil AP. The Case for Case-Cohort: An Applied Epidemiologist's Guide to Reframing Case-Cohort Studies to Improve Usability and Flexibility. Epidemiology 2022; 33:354-361. [PMID: 35383643 PMCID: PMC9172927 DOI: 10.1097/ede.0000000000001469] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
When research questions require the use of precious samples, expensive assays or equipment, or labor-intensive data collection or analysis, nested case-control or case-cohort sampling of observational cohort study participants can often reduce costs. These study designs have similar statistical precision for addressing a singular research question, but case-cohort studies have broader efficiency and superior flexibility. Despite this, case-cohort designs are comparatively underutilized in the epidemiologic literature. Recent advances in statistical methods and software have made analyses of case-cohort data easier to implement, and advances from casual inference, such as inverse probability of sampling weights, have allowed the case-cohort design to be used with a variety of target parameters and populations. To provide an accessible link to this technical literature, we give a conceptual overview of case-cohort study analysis with inverse probability of sampling weights. We show how this general analytic approach can be leveraged to more efficiently study subgroups of interest or disease subtypes or to examine associations independent of case status. A brief discussion of how this framework could be extended to incorporate other related methodologic applications further demonstrates the broad cost-effectiveness and adaptability of case-cohort methods for a variety of modern epidemiologic applications in resource-limited settings.
Collapse
Affiliation(s)
- Katie M O'Brien
- From the Epidemiology Branch, National Institute of Environmental Health Sciences, NC
| | - Kaitlyn G Lawrence
- From the Epidemiology Branch, National Institute of Environmental Health Sciences, NC
| | - Alexander P Keil
- From the Epidemiology Branch, National Institute of Environmental Health Sciences, NC
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC
| |
Collapse
|
4
|
Kim S, Kim JK, Ahn KW. A calibrated Bayesian method for the stratified proportional hazards model with missing covariates. LIFETIME DATA ANALYSIS 2022; 28:169-193. [PMID: 35034213 PMCID: PMC8977246 DOI: 10.1007/s10985-021-09542-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 12/21/2021] [Indexed: 06/14/2023]
Abstract
Missing covariates are commonly encountered when evaluating covariate effects on survival outcomes. Excluding missing data from the analysis may lead to biased parameter estimation and a misleading conclusion. The inverse probability weighting method is widely used to handle missing covariates. However, obtaining asymptotic variance in frequentist inference is complicated because it involves estimating parameters for propensity scores. In this paper, we propose a new approach based on an approximate Bayesian method without using Taylor expansion to handle missing covariates for survival data. We consider a stratified proportional hazards model so that it can be used for the non-proportional hazards structure. Two cases for missing pattern are studied: a single missing pattern and multiple missing patterns. The proposed estimators are shown to be consistent and asymptotically normal, which matches the frequentist asymptotic properties. Simulation studies show that our proposed estimators are asymptotically unbiased and the credible region obtained from posterior distribution is close to the frequentist confidence interval. The algorithm is straightforward and computationally efficient. We apply the proposed method to a stem cell transplantation data set.
Collapse
Affiliation(s)
- Soyoung Kim
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA.
| | - Jae-Kwang Kim
- Department of Statistics, Iowa State University, 2438 Osborn Dr Ames, Ames, IA, 50011-1090, USA
| | - Kwang Woo Ahn
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA
| |
Collapse
|
5
|
Xu Y, Kim S, Zhang MJ, Couper D, Ahn KW. Competing risks regression models with covariates-adjusted censoring weight under the generalized case-cohort design. LIFETIME DATA ANALYSIS 2022; 28:241-262. [PMID: 35034255 PMCID: PMC8977245 DOI: 10.1007/s10985-022-09546-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Accepted: 12/31/2021] [Indexed: 06/14/2023]
Abstract
A generalized case-cohort design has been used when measuring exposures is expensive and events are not rare in the full cohort. This design collects expensive exposure information from a (stratified) randomly selected subset from the full cohort, called the subcohort, and a fraction of cases outside the subcohort. For the full cohort study with competing risks, He et al. (Scand J Stat 43:103-122, 2016) studied the non-stratified proportional subdistribution hazards model with covariate-dependent censoring to directly evaluate covariate effects on the cumulative incidence function. In this paper, we propose a stratified proportional subdistribution hazards model with covariate-adjusted censoring weights for competing risks data under the generalized case-cohort design. We consider a general class of weight functions to account for the generalized case-cohort design. Then, we derive the optimal weight function which minimizes the asymptotic variance of parameter estimates within the general class of weight functions. The proposed estimator is shown to be consistent and asymptotically normally distributed. The simulation studies show (i) the proposed estimator with covariate-adjusted weight is unbiased when the censoring distribution depends on covariates; and (ii) the proposed estimator with the optimal weight function gains parameter estimation efficiency. We apply the proposed method to stem cell transplantation and diabetes data sets.
Collapse
Affiliation(s)
- Yayun Xu
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA
| | - Soyoung Kim
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA.
| | - Mei-Jie Zhang
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA
| | - David Couper
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kwang Woo Ahn
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA
| |
Collapse
|
6
|
Case-cohort design in hematopoietic cell transplant studies. Bone Marrow Transplant 2022; 57:1-5. [PMID: 34400795 PMCID: PMC8738130 DOI: 10.1038/s41409-021-01433-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 07/14/2021] [Accepted: 08/03/2021] [Indexed: 02/08/2023]
Abstract
SERIES EDITORS- NOTE Imagine you and your colleagues have done 1000 transplants in persons with acute myeloid leukaemia (AML) in 1st remission. 5 percent of the 20 percent of recipients relapsing posttransplant have an isolated central nervous system relapse. You are curious and want to know whether there is anything special about this 5 percent, specifically whether this risk corelates with any pretransplant clinical and laboratory co-variates. You have extensive clinical data and some typical laboratory data on all 1000 but you suspect the culprit is mutation topography. What to do? Fortunately you have bio-banked DNA from the 1000. If resources and monies are not limiting you can do targeted or next generation sequencing on all 1000 DNA samples and off you go. However, most of us lack unlimited resources and monies. How can you sensibly and efficiently tackle this research problem? The answer is a case-cohort design study. In the typescript which follows Profs. Cai and Kim explain how to accomplish this. If you follow their advice you may need only to analyze samples from < 300 recipients rather than 1000 to test your hypothesis. They explain how to design such a study and provide references to estimate sample size.Sadly, their typescript will not tell you how to get funding for the study, whish poor devil who will have to write the protocol, worse, who will shepherd it though endless committees for approval and the like. Help on these issues is outside the scope of our statistics series. In this context we suggest advice from Woody Allen's article in the New Yorker: The Kugelmass Episode (April 24, 1977). When Prof. Kugelmass (English, City College) tells his analyst Dr. Mandel he has fallen in love with Emma Bovary who died of arsenic poisoning near Rouen, France 120 years earlier the analyst says: After all, I'm an analyst, not a magician. Kugelmass' reply: Then perhaps what I need is a magician and is off to Coney Island to find one. Good luck, the magician may still be there! (Note: This typescript is R-rated. It contains an equation.)Robert Peter Gale, Imperial College London, and Mei-Jie Zhang, Medical College of Wisconsin and CIBMTR.
Collapse
|
7
|
Wang W, Lu SE, Cheng JQ, Xie M, Kostis JB. Multivariate survival analysis in big data: A divide-and-combine approach. Biometrics 2021; 78:852-866. [PMID: 33847371 DOI: 10.1111/biom.13469] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 03/02/2021] [Accepted: 03/25/2021] [Indexed: 11/29/2022]
Abstract
Multivariate failure time data are frequently analyzed using the marginal proportional hazards models and the frailty models. When the sample size is extraordinarily large, using either approach could face computational challenges. In this paper, we focus on the marginal model approach and propose a divide-and-combine method to analyze large-scale multivariate failure time data. Our method is motivated by the Myocardial Infarction Data Acquisition System (MIDAS), a New Jersey statewide database that includes 73,725,160 admissions to nonfederal hospitals and emergency rooms (ERs) from 1995 to 2017. We propose to randomly divide the full data into multiple subsets and propose a weighted method to combine these estimators obtained from individual subsets using three weights. Under mild conditions, we show that the combined estimator is asymptotically equivalent to the estimator obtained from the full data as if the data were analyzed all at once. In addition, to screen out risk factors with weak signals, we propose to perform the regularized estimation on the combined estimator using its combined confidence distribution. Theoretical properties, such as consistency, oracle properties, and asymptotic equivalence between the divide-and-combine approach and the full data approach are studied. Performance of the proposed method is investigated using simulation studies. Our method is applied to the MIDAS data to identify risk factors related to multivariate cardiovascular-related health outcomes.
Collapse
Affiliation(s)
- Wei Wang
- Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, New Jersey, USA
| | - Shou-En Lu
- Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, New Jersey, USA
| | - Jerry Q Cheng
- Department of Computer Science, New York Institute of Technology, New York, New York, USA
| | - Minge Xie
- Department of Statistics, Rutgers University, Piscataway, New Jersey, USA
| | - John B Kostis
- Cardiovascular Institute, Rutgers Robert Wood Johnson Medical School, New Brunswick, New Jersey, USA
| |
Collapse
|
8
|
Zhou Q, Cai J, Zhou H. Semiparametric regression analysis of case-cohort studies with multiple interval-censored disease outcomes. Stat Med 2021; 40:3106-3123. [PMID: 33783001 DOI: 10.1002/sim.8962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 03/01/2021] [Accepted: 03/10/2021] [Indexed: 11/05/2022]
Abstract
Interval-censored failure time data commonly arise in epidemiological and biomedical studies where the occurrence of an event or a disease is determined via periodic examinations. Subject to interval-censoring, available information on the failure time can be quite limited. Cost-effective sampling designs are desirable to enhance the study power, especially when the disease rate is low and the covariates are expensive to obtain. In this work, we formulate the case-cohort design with multiple interval-censored disease outcomes and also generalize it to nonrare diseases where only a portion of diseased subjects are sampled. We develop a marginal sieve weighted likelihood approach, which assumes that the failure times marginally follow the proportional hazards model. We consider two types of weights to account for the sampling bias, and adopt a sieve method with Bernstein polynomials to handle the unknown baseline functions. We employ a weighted bootstrap procedure to obtain a variance estimate that is robust to the dependence structure between failure times. The proposed method is examined via simulation studies and illustrated with a dataset on incident diabetes and hypertension from the Atherosclerosis Risk in Communities study.
Collapse
Affiliation(s)
- Qingning Zhou
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, North Carolina, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
9
|
Kim S, Xu Y, Zhang M, Ahn K. Stratified proportional subdistribution hazards model with covariate‐adjusted censoring weight for case‐cohort studies. Scand Stat Theory Appl 2020. [DOI: 10.1111/sjos.12461] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Soyoung Kim
- Division of Biostatistics Medical College of Wisconsin USA
| | - Yayun Xu
- Division of Biostatistics Medical College of Wisconsin USA
| | - Mei‐Jie Zhang
- Division of Biostatistics Medical College of Wisconsin USA
| | - Kwang‐Woo Ahn
- Division of Biostatistics Medical College of Wisconsin USA
| |
Collapse
|