1
|
Etievant L, Gail MH. Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data. LIFETIME DATA ANALYSIS 2024:10.1007/s10985-024-09621-2. [PMID: 38565754 DOI: 10.1007/s10985-024-09621-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 01/30/2024] [Indexed: 04/04/2024]
Abstract
The case-cohort design obtains complete covariate data only on cases and on a random sample (the subcohort) of the entire cohort. Subsequent publications described the use of stratification and weight calibration to increase efficiency of estimates of Cox model log-relative hazards, and there has been some work estimating pure risk. Yet there are few examples of these options in the medical literature, and we could not find programs currently online to analyze these various options. We therefore present a unified approach and R software to facilitate such analyses. We used influence functions adapted to the various design and analysis options together with variance calculations that take the two-phase sampling into account. This work clarifies when the widely used "robust" variance estimate of Barlow (Biometrics 50:1064-1072, 1994) is appropriate. The corresponding R software, CaseCohortCoxSurvival, facilitates analysis with and without stratification and/or weight calibration, for subcohort sampling with or without replacement. We also allow for phase-two data to be missing at random for stratified designs. We provide inference not only for log-relative hazards in the Cox model, but also for cumulative baseline hazards and covariate-specific pure risks. We hope these calculations and software will promote wider use of more efficient and principled design and analysis options for case-cohort studies.
Collapse
Affiliation(s)
- Lola Etievant
- Division of Cancer Epidemiology and Genetics, Biostatistics Branch, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850-9780, USA.
| | - Mitchell H Gail
- Division of Cancer Epidemiology and Genetics, Biostatistics Branch, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850-9780, USA.
| |
Collapse
|
2
|
Mao F, Cook RJ. Two-phase designs with current status data. Stat Med 2023; 42:1207-1232. [PMID: 36690474 DOI: 10.1002/sim.9666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 11/01/2022] [Accepted: 01/05/2023] [Indexed: 01/25/2023]
Abstract
We consider the design and analysis of two-phase studies aiming to assess the relation between a fixed (eg, genetic) marker and an event time under current status observation. We consider a common setting in which a phase I sample is comprised of a large cohort of individuals with outcome (ie, current status) data and a vector of inexpensive covariates. Stored biospecimens for individuals in the phase I sample can be assayed to record the marker of interest for individuals selected in a phase II sub-sample. The design challenge is then to select the phase II sub-sample in order to maximize the precision of the marker effect on the time of interest under a proportional hazards model. This problem has not been examined before for current status data and the role of the assessment time is highlighted. Inference based on likelihood and inverse probability weighted estimating functions are considered, with designs centered on score-based residuals, extreme current status observations, or stratified sampling schemes. Data from a registry of patients with psoriatic arthritis is used in an illustration where we study the risk of diabetes as a comorbidity.
Collapse
Affiliation(s)
- Fangya Mao
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada
| | - Richard J Cook
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada
| |
Collapse
|
3
|
Zhang H, Ding J. Hypothesis testing in outcome-dependent sampling design under generalized linear models. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2019.1682155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Haodong Zhang
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
| | - Jieli Ding
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
| |
Collapse
|
4
|
Zhou Q, Cai J, Zhou H. Semiparametric regression analysis of case-cohort studies with multiple interval-censored disease outcomes. Stat Med 2021; 40:3106-3123. [PMID: 33783001 DOI: 10.1002/sim.8962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 03/01/2021] [Accepted: 03/10/2021] [Indexed: 11/05/2022]
Abstract
Interval-censored failure time data commonly arise in epidemiological and biomedical studies where the occurrence of an event or a disease is determined via periodic examinations. Subject to interval-censoring, available information on the failure time can be quite limited. Cost-effective sampling designs are desirable to enhance the study power, especially when the disease rate is low and the covariates are expensive to obtain. In this work, we formulate the case-cohort design with multiple interval-censored disease outcomes and also generalize it to nonrare diseases where only a portion of diseased subjects are sampled. We develop a marginal sieve weighted likelihood approach, which assumes that the failure times marginally follow the proportional hazards model. We consider two types of weights to account for the sampling bias, and adopt a sieve method with Bernstein polynomials to handle the unknown baseline functions. We employ a weighted bootstrap procedure to obtain a variance estimate that is robust to the dependence structure between failure times. The proposed method is examined via simulation studies and illustrated with a dataset on incident diabetes and hypertension from the Atherosclerosis Risk in Communities study.
Collapse
Affiliation(s)
- Qingning Zhou
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, North Carolina, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
5
|
Cao Y, Yu J. A class of goodness-of-fit test for the additive hazards model with case-cohort data. Pharm Stat 2020; 20:451-461. [PMID: 33305424 DOI: 10.1002/pst.2087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Revised: 10/13/2020] [Accepted: 11/16/2020] [Indexed: 11/12/2022]
Abstract
The case-cohort design is commonly used in epidemiological studies due to its cost-effectiveness. The additive hazards model is widely used in survival analysis when the hazards difference is constant. In this article, we propose a class of goodness-of-fit test statistics for the assumption of the additive hazards model with case-cohort data through a class of asymptotically mean-zero multiparameter stochastic processes. We also establish the asymptotic theory of the proposed test statistics and a resampling scheme is adopted to approximate its asymptotic distribution. The performance of the proposed test statistics is evaluated through simulation studies and a real dataset is analyzed to illustrate the proposed method.
Collapse
Affiliation(s)
- Yongxiu Cao
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| | - Jichang Yu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| |
Collapse
|
6
|
Zhou Q, Cai J, Zhou H. Semiparametric inference for a two-stage outcome-dependent sampling design with interval-censored failure time data. LIFETIME DATA ANALYSIS 2020; 26:85-108. [PMID: 30617753 PMCID: PMC6612481 DOI: 10.1007/s10985-019-09461-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 01/02/2019] [Indexed: 06/09/2023]
Abstract
We propose a two-stage outcome-dependent sampling design and inference procedure for studies that concern interval-censored failure time outcomes. This design enhances the study efficiency by allowing the selection probabilities of the second-stage sample, for which the expensive exposure variable is ascertained, to depend on the first-stage observed interval-censored failure time outcomes. In particular, the second-stage sample is enriched by selectively including subjects who are known or observed to experience the failure at an early or late time. We develop a sieve semiparametric maximum pseudo likelihood procedure that makes use of all available data from the proposed two-stage design. The resulting regression parameter estimator is shown to be consistent and asymptotically normal, and a consistent estimator for its asymptotic variance is derived. Simulation results demonstrate that the proposed design and inference procedure performs well in practical situations and is more efficient than the existing designs and methods. An application to a phase 3 HIV vaccine trial is provided.
Collapse
Affiliation(s)
- Qingning Zhou
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, Fretwell 335L, 9201 University City Blvd., Charlotte, NC, 28223, USA.
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, 3101D McGavran-Greenberg Hall, Chapel Hill, NC, 27599, USA
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, 3104C McGavran-Greenberg Hall, Chapel Hill, NC, 27599, USA
| |
Collapse
|
7
|
Abstract
The two-phase design is a cost-effective sampling strategy to evaluate the effects of covariates on an outcome when certain covariates are too expensive to be measured on all study subjects. Under such a design, the outcome and inexpensive covariates are measured on all subjects in the first phase and the first-phase information is used to select subjects for measurements of expensive covariates in the second phase. Previous research on two-phase studies has focused largely on the inference procedures rather than the design aspects. We investigate the design efficiency of the two-phase study, as measured by the semiparametric efficiency bound for estimating the regression coefficients of expensive covariates. We consider general two-phase studies, where the outcome variable can be continuous, discrete, or censored, and the second-phase sampling can depend on the first-phase data in any manner. We develop optimal or approximately optimal two-phase designs, which can be substantially more efficient than the existing designs. We demonstrate the improvements of the new designs over the existing ones through extensive simulation studies and two large medical studies.
Collapse
Affiliation(s)
- Ran Tao
- Department of Biostatistics and Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232.,Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599
| | - Donglin Zeng
- Department of Biostatistics and Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232.,Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599
| | - Dan-Yu Lin
- Department of Biostatistics and Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232.,Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599
| |
Collapse
|
8
|
Shen W, Liu S, Chen Y, Ning J. Regression analysis of longitudinal data with outcome-dependent sampling and informative censoring. Scand Stat Theory Appl 2019; 46:831-847. [PMID: 32066989 PMCID: PMC7025472 DOI: 10.1111/sjos.12373] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Accepted: 11/03/2018] [Indexed: 11/28/2022]
Abstract
We consider regression analysis of longitudinal data in the presence of outcome-dependent observation times and informative censoring. Existing approaches commonly require correct specification of the joint distribution of the longitudinal measurements, observation time process and informative censoring time under the joint modeling framework, and can be computationally cumbersome due to the complex form of the likelihood function. In view of these issues, we propose a semi-parametric joint regression model and construct a composite likelihood function based on a conditional order statistics argument. As a major feature of our proposed methods, the aforementioned joint distribution is not required to be specified and the random effect in the proposed joint model is treated as a nuisance parameter. Consequently, the derived composite likelihood bypasses the need to integrate over the random effect and offers the advantage of easy computation. We show that the resulting estimators are consistent and asymptotically normal. We use simulation studies to evaluate the finite-sample performance of the proposed method, and apply it to a study of weight loss data that motivated our investigation.
Collapse
Affiliation(s)
- Weining Shen
- Department of Statistics, University of California, Irvine
| | - Suyu Liu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center
| | - Yong Chen
- Department of Biostatistics and Epidemiology, The University of Pennsylvania
| | - Jing Ning
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center
| |
Collapse
|
9
|
Lawless JF. Two-phase outcome-dependent studies for failure times and testing for effects of expensive covariates. LIFETIME DATA ANALYSIS 2018; 24:28-44. [PMID: 27900633 DOI: 10.1007/s10985-016-9386-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 11/23/2016] [Indexed: 06/06/2023]
Abstract
Two- or multi-phase study designs are often used in settings involving failure times. In most studies, whether or not certain covariates are measured on an individual depends on their failure time and status. For example, when failures are rare, case-cohort or case-control designs are used to increase the number of failures relative to a random sample of the same size. Another scenario is where certain covariates are expensive to measure, so they are obtained only for selected individuals in a cohort. This paper considers such situations and focuses on cases where we wish to test hypotheses of no association between failure time and expensive covariates. Efficient score tests based on maximum likelihood are developed and shown to have a simple form for a wide class of models and sampling designs. Some numerical comparisons of study designs are presented.
Collapse
Affiliation(s)
- J F Lawless
- Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada.
| |
Collapse
|
10
|
Zhou Q, Cai J, Zhou H. Outcome-dependent sampling with interval-censored failure time data. Biometrics 2017; 74:58-67. [PMID: 28771664 DOI: 10.1111/biom.12744] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 06/01/2017] [Accepted: 06/01/2017] [Indexed: 11/30/2022]
Abstract
Epidemiologic studies and disease prevention trials often seek to relate an exposure variable to a failure time that suffers from interval-censoring. When the failure rate is low and the time intervals are wide, a large cohort is often required so as to yield reliable precision on the exposure-failure-time relationship. However, large cohort studies with simple random sampling could be prohibitive for investigators with a limited budget, especially when the exposure variables are expensive to obtain. Alternative cost-effective sampling designs and inference procedures are therefore desirable. We propose an outcome-dependent sampling (ODS) design with interval-censored failure time data, where we enrich the observed sample by selectively including certain more informative failure subjects. We develop a novel sieve semiparametric maximum empirical likelihood approach for fitting the proportional hazards model to data from the proposed interval-censoring ODS design. This approach employs the empirical likelihood and sieve methods to deal with the infinite-dimensional nuisance parameters, which greatly reduces the dimensionality of the estimation problem and eases the computation difficulty. The consistency and asymptotic normality of the resulting regression parameter estimator are established. The results from our extensive simulation study show that the proposed design and method works well for practical situations and is more efficient than the alternative designs and competing approaches. An example from the Atherosclerosis Risk in Communities (ARIC) study is provided for illustration.
Collapse
Affiliation(s)
- Qingning Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|