1
|
Etievant L, Gail MH. Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data. LIFETIME DATA ANALYSIS 2024; 30:572-599. [PMID: 38565754 DOI: 10.1007/s10985-024-09621-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 01/30/2024] [Indexed: 04/04/2024]
Abstract
The case-cohort design obtains complete covariate data only on cases and on a random sample (the subcohort) of the entire cohort. Subsequent publications described the use of stratification and weight calibration to increase efficiency of estimates of Cox model log-relative hazards, and there has been some work estimating pure risk. Yet there are few examples of these options in the medical literature, and we could not find programs currently online to analyze these various options. We therefore present a unified approach and R software to facilitate such analyses. We used influence functions adapted to the various design and analysis options together with variance calculations that take the two-phase sampling into account. This work clarifies when the widely used "robust" variance estimate of Barlow (Biometrics 50:1064-1072, 1994) is appropriate. The corresponding R software, CaseCohortCoxSurvival, facilitates analysis with and without stratification and/or weight calibration, for subcohort sampling with or without replacement. We also allow for phase-two data to be missing at random for stratified designs. We provide inference not only for log-relative hazards in the Cox model, but also for cumulative baseline hazards and covariate-specific pure risks. We hope these calculations and software will promote wider use of more efficient and principled design and analysis options for case-cohort studies.
Collapse
Affiliation(s)
- Lola Etievant
- Division of Cancer Epidemiology and Genetics, Biostatistics Branch, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850-9780, USA.
| | - Mitchell H Gail
- Division of Cancer Epidemiology and Genetics, Biostatistics Branch, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850-9780, USA.
| |
Collapse
|
2
|
Lee M, Gail MH. Absolute risk from double nested case-control designs: cause-specific proportional hazards models with and without augmented estimating equations. Biometrics 2024; 80:ujae062. [PMID: 38994640 DOI: 10.1093/biomtc/ujae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 06/01/2024] [Accepted: 06/21/2024] [Indexed: 07/13/2024]
Abstract
We estimate relative hazards and absolute risks (or cumulative incidence or crude risk) under cause-specific proportional hazards models for competing risks from double nested case-control (DNCC) data. In the DNCC design, controls are time-matched not only to cases from the cause of primary interest, but also to cases from competing risks (the phase-two sample). Complete covariate data are available in the phase-two sample, but other cohort members only have information on survival outcomes and some covariates. Design-weighted estimators use inverse sampling probabilities computed from Samuelsen-type calculations for DNCC. To take advantage of additional information available on all cohort members, we augment the estimating equations with a term that is unbiased for zero but improves the efficiency of estimates from the cause-specific proportional hazards model. We establish the asymptotic properties of the proposed estimators, including the estimator of absolute risk, and derive consistent variance estimators. We show that augmented design-weighted estimators are more efficient than design-weighted estimators. Through simulations, we show that the proposed asymptotic methods yield nominal operating characteristics in practical sample sizes. We illustrate the methods using prostate cancer mortality data from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial Study of the National Cancer Institute.
Collapse
Affiliation(s)
- Minjung Lee
- Department of Statistics, Kangwon National University, Chuncheon, Gangwon 24341, South Korea
| | - Mitchell H Gail
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, United States
| |
Collapse
|
3
|
Zheng J, Zheng Y, Hsu L. Re-calibrating pure risk integrating individual data from two-phase studies with external summary statistics. Biometrics 2022; 78:1515-1529. [PMID: 34390251 PMCID: PMC8895713 DOI: 10.1111/biom.13543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 07/18/2021] [Accepted: 08/03/2021] [Indexed: 12/30/2022]
Abstract
Accurate risk assessment is critical in clinical decision-making. It entails the projected risk based on a risk prediction model agreeing with the observed risk in the target cohort. However, the model often over- or under-estimates the risk. Building a new model for the target cohort would be ideal but costly. It is therefore of great interest to recalibrate an existing model for the target cohort. Existing methods have been proposed to recalibrate the model by leveraging the disease incidence rates from the target cohort. However, they assume the same covariate distribution across cohorts and when the assumption is violated, the recalibrated model can be substantially biased. Further, recalibration is also complicated by the two-phase sampling design that is commonly used for developing risk prediction models. In this paper, we develop a weighted estimating-equation approach accounting for the two-phase design and combine it with a weighted empirical likelihood that leverages the summary information on both disease incidence rates and covariates from the target cohort. We provide a resampling-based inference procedure. Our extensive simulation results show that using the summary information from the target population, the proposed recalibration method yields nearly unbiased risk estimates under a wide range of scenarios. An application to a colorectal cancer study also illustrates that the proposed method yields a well-calibrated model in the target cohort.
Collapse
Affiliation(s)
- Jiayin Zheng
- Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Yingye Zheng
- Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Li Hsu
- Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| |
Collapse
|
4
|
Eriksson M, Destounis S, Czene K, Zeiberg A, Day R, Conant EF, Schilling K, Hall P. A risk model for digital breast tomosynthesis to predict breast cancer and guide clinical care. Sci Transl Med 2022; 14:eabn3971. [PMID: 35544593 DOI: 10.1126/scitranslmed.abn3971] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Screening with digital breast tomosynthesis (DBT) improves breast cancer detection and reduces false positives. However, currently, no breast cancer risk model takes advantage of the additional information generated by DBT imaging for breast cancer risk prediction. We developed and internally validated a DBT-based short-term risk model for predicting future late-stage and interval breast cancers after negative screening exams. We included the available 805 incident breast cancers and a random sample of 5173 healthy women matched on year of study entry in a nested case-control study from 154,200 multiethnic women, aged 35 to 74, attending DBT screening in the United States between 2014 and 2019. A relative risk model was trained using elastic net logistic regression and nested cross-validation to estimate risks for using imaging features and age. An absolute risk model was developed using derived risks and U.S. incidence and competing mortality rates. Absolute risks, discrimination performance, and risk stratification were estimated in the left-out validation set. The discrimination performance of 1-year risk was 0.82 (95% CI, 0.79 to 0.85) with good calibration (P = 0.7). Using the U.S. Preventive Service Task Force guidelines, 14% of the women were at high risk, 19.6 times higher compared to general risk. In this high-risk group, 76% of stage II and III cancers and 59% of stage 0 cancers were observed (P < 0.01). Using mammographic features generated from DBT screens, our image-based risk prediction model could guide radiologists in selecting women for clinical care, potentially leading to earlier detection and improved prognoses.
Collapse
Affiliation(s)
- Mikael Eriksson
- Department of Medical Epidemiology and Biostatistics, Karolinska institutet, SE-171 77 Stockholm, Sweden
| | | | - Kamila Czene
- Department of Medical Epidemiology and Biostatistics, Karolinska institutet, SE-171 77 Stockholm, Sweden
| | - Andrew Zeiberg
- Radiology Associates of Burlington County, Hainesport, NJ 08036, USA
| | - Robert Day
- Zwanger-Pesiri Radiology, Lindenhurst, NY 11757, USA
| | - Emily F Conant
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Per Hall
- Department of Medical Epidemiology and Biostatistics, Karolinska institutet, SE-171 77 Stockholm, Sweden.,Department of Oncology, Södersjukhuset University Hospital, Stockholm SE-118 61, Sweden
| |
Collapse
|
5
|
Shin YE, Gail MH, Pfeiffer RM. Assessing risk model calibration with missing covariates. Biostatistics 2021; 23:875-890. [PMID: 33616159 DOI: 10.1093/biostatistics/kxaa060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 12/07/2020] [Accepted: 12/11/2020] [Indexed: 11/12/2022] Open
Abstract
When validating a risk model in an independent cohort, some predictors may be missing for some subjects. Missingness can be unplanned or by design, as in case-cohort or nested case-control studies, in which some covariates are measured only in subsampled subjects. Weighting methods and imputation are used to handle missing data. We propose methods to increase the efficiency of weighting to assess calibration of a risk model (i.e. bias in model predictions), which is quantified by the ratio of the number of observed events, $\mathcal{O}$, to expected events, $\mathcal{E}$, computed from the model. We adjust known inverse probability weights by incorporating auxiliary information available for all cohort members. We use survey calibration that requires the weighted sum of the auxiliary statistics in the complete data subset to equal their sum in the full cohort. We show that a pseudo-risk estimate that approximates the actual risk value but uses only variables available for the entire cohort is an excellent auxiliary statistic to estimate $\mathcal{E}$. We derive analytic variance formulas for $\mathcal{O}/\mathcal{E}$ with adjusted weights. In simulations, weight adjustment with pseudo-risk was much more efficient than inverse probability weighting and yielded consistent estimates even when the pseudo-risk was a poor approximation. Multiple imputation was often efficient but yielded biased estimates when the imputation model was misspecified. Using these methods, we assessed calibration of an absolute risk model for second primary thyroid cancer in an independent cohort.
Collapse
Affiliation(s)
- Yei Eun Shin
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Mitchell H Gail
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Ruth M Pfeiffer
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA
| |
Collapse
|
6
|
Shin YE, Pfeiffer RM, Graubard BI, Gail MH. Weight calibration to improve efficiency for estimating pure risks from the additive hazards model with the nested case-control design. Biometrics 2020; 78:179-191. [PMID: 33270907 DOI: 10.1111/biom.13413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 10/02/2020] [Accepted: 11/13/2020] [Indexed: 11/28/2022]
Abstract
We study the efficiency of covariate-specific estimates of pure risk (one minus the survival function) when some covariates are only available for case-control samples nested in a cohort. We focus on the semiparametric additive hazards model in which the hazard function equals a baseline hazard plus a linear combination of covariates with either time-varying or time-invariant coefficients. A published approach uses the design-based inclusion probabilities to reweight the nested case-control data. We obtain more efficient estimates of pure risks by calibrating the design weights to data available in the entire cohort, for both time-varying and time-invariant covariate coefficients. We develop explicit variance formulas for the weight-calibrated estimates based on influence functions. Simulations show the improvement in precision by using weight calibration and confirm the consistency of variance estimators and the validity of inference based on asymptotic normality. Examples are provided using data from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial Study (PLCO).
Collapse
Affiliation(s)
- Yei Eun Shin
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| | - Ruth M Pfeiffer
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| | - Barry I Graubard
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| | - Mitchell H Gail
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| |
Collapse
|