1
|
Cao Y, Haneuse S, Zheng Y, Chen J. Two-phase stratified sampling and analysis for predicting binary outcomes. Biostatistics 2021:6470040. [PMID: 34923588 DOI: 10.1093/biostatistics/kxab044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 11/03/2021] [Accepted: 11/22/2021] [Indexed: 11/13/2022] Open
Abstract
The two-phase study design is a cost-efficient sampling strategy when certain data elements are expensive and, thus, can only be collected on a sub-sample of subjects. To date guidance on how best to allocate resources within the design has assumed that primary interest lies in estimating association parameters. When primary interest lies in the development and evaluation of a risk prediction tool, however, such guidance may, in fact, be detrimental. To resolve this, we propose a novel strategy for resource allocation based on oversampling cases and subjects who have more extreme risk estimates according to a preliminary model developed using fully observed predictors. Key to the proposed strategy is that it focuses on enhancing efficiency regarding estimation of measures of predictive accuracy, rather than on efficiency regarding association parameters which is the standard paradigm. Towards valid estimation and inference for accuracy measures using the resultant data, we extend an existing semiparametric maximum likelihood ethod for estimating odds ratio association parameters to accommodate the biased sampling scheme and data incompleteness. Motivated by our sampling design, we additionally propose a general post-stratification scheme for analyzing general two-phase data for estimating predictive accuracy measures. Through theoretical calculations and simulation studies, we show that the proposed sampling strategy and post-stratification scheme achieve the promised efficiency improvement. Finally, we apply the proposed methods to develop and evaluate a preliminary model for predicting the risk of hospital readmission after cardiac surgery using data from the Pennsylvania Health Care Cost Containment Council.
Collapse
Affiliation(s)
- Yaqi Cao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA and Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China
| | - Sebastien Haneuse
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Yingye Zheng
- Department of Biostatistics, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Seattle, WA 98109, USA
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| |
Collapse
|
2
|
Yu J, Zhou H, Cai J. Accelerated failure time model for data from outcome-dependent sampling. LIFETIME DATA ANALYSIS 2021; 27:15-37. [PMID: 33044612 PMCID: PMC7856009 DOI: 10.1007/s10985-020-09508-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Accepted: 09/29/2020] [Indexed: 05/26/2023]
Abstract
Outcome-dependent sampling designs such as the case-control or case-cohort design are widely used in epidemiological studies for their outstanding cost-effectiveness. In this article, we propose and develop a smoothed weighted Gehan estimating equation approach for inference in an accelerated failure time model under a general failure time outcome-dependent sampling scheme. The proposed estimating equation is continuously differentiable and can be solved by the standard numerical methods. In addition to developing asymptotic properties of the proposed estimator, we also propose and investigate a new optimal power-based subsamples allocation criteria in the proposed design by maximizing the power function of a significant test. Simulation results show that the proposed estimator is more efficient than other existing competing estimators and the optimal power-based subsamples allocation will provide an ODS design that yield improved power for the test of exposure effect. We illustrate the proposed method with a data set from the Norwegian Mother and Child Cohort Study to evaluate the relationship between exposure to perfluoroalkyl substances and women's subfecundity.
Collapse
Affiliation(s)
- Jichang Yu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, Hubei, China
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
3
|
Cao Y, Yu J. A class of goodness-of-fit test for the additive hazards model with case-cohort data. Pharm Stat 2020; 20:451-461. [PMID: 33305424 DOI: 10.1002/pst.2087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Revised: 10/13/2020] [Accepted: 11/16/2020] [Indexed: 11/12/2022]
Abstract
The case-cohort design is commonly used in epidemiological studies due to its cost-effectiveness. The additive hazards model is widely used in survival analysis when the hazards difference is constant. In this article, we propose a class of goodness-of-fit test statistics for the assumption of the additive hazards model with case-cohort data through a class of asymptotically mean-zero multiparameter stochastic processes. We also establish the asymptotic theory of the proposed test statistics and a resampling scheme is adopted to approximate its asymptotic distribution. The performance of the proposed test statistics is evaluated through simulation studies and a real dataset is analyzed to illustrate the proposed method.
Collapse
Affiliation(s)
- Yongxiu Cao
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| | - Jichang Yu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| |
Collapse
|
4
|
Wang T, Wang X, Zhou H, Cai J, George SL. Auxiliary variable-enriched biomarker-stratified design. Stat Med 2018; 37:4610-4635. [PMID: 30221368 DOI: 10.1002/sim.7938] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 06/04/2018] [Accepted: 07/15/2018] [Indexed: 12/18/2022]
Abstract
Clinical trials in the era of precision medicine require assessment of biomarkers to identify appropriate subgroups of patients for targeted therapy. In a biomarker-stratified design (BSD), biomarkers are measured on all patients and used as stratification variables. However, such a trial can be both inefficient and costly, especially when the prevalence of the subgroup of primary interest is low and the cost of assessing the biomarkers is high. Efficiency can be improved and costs reduced by using enriched biomarker-stratified designs, in which patients of primary interest, typically the biomarker-positive patients, are oversampled. We consider a special type of enrichment design, an auxiliary variable-enriched design (AEBSD), in which enrichment is based on some inexpensive auxiliary variable that is positively correlated with the true biomarker. The proposed AEBSD reduces the total cost of the trial compared with a standard BSD when the prevalence rate of true biomarker positivity is small and the positive predictive value (PPV) of the auxiliary biomarker is larger than the prevalence rate. In addition, for an AEBSD, we can immediately randomize the patients selected in the screening process without waiting for the result of the true biomarker test, reducing the treatment waiting time. We propose an adaptive Bayesian method to adjust the assumed PPV while the trial is ongoing. Numerical studies and an example illustrate the approach. An R package is available.
Collapse
Affiliation(s)
- Ting Wang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Xiaofei Wang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Stephen L George
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina
| |
Collapse
|
5
|
Pan Y, Cai J, Longnecker MP, Zhou H. Secondary outcome analysis for data from an outcome-dependent sampling design. Stat Med 2018; 37:2321-2337. [PMID: 29682775 DOI: 10.1002/sim.7672] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 01/19/2018] [Accepted: 03/08/2018] [Indexed: 11/11/2022]
Abstract
Outcome-dependent sampling (ODS) scheme is a cost-effective way to conduct a study. For a study with continuous primary outcome, an ODS scheme can be implemented where the expensive exposure is only measured on a simple random sample and supplemental samples selected from 2 tails of the primary outcome variable. With the tremendous cost invested in collecting the primary exposure information, investigators often would like to use the available data to study the relationship between a secondary outcome and the obtained exposure variable. This is referred as secondary analysis. Secondary analysis in ODS designs can be tricky, as the ODS sample is not a random sample from the general population. In this article, we use the inverse probability weighted and augmented inverse probability weighted estimating equations to analyze the secondary outcome for data obtained from the ODS design. We do not make any parametric assumptions on the primary and secondary outcome and only specify the form of the regression mean models, thus allow an arbitrary error distribution. Our approach is robust to second- and higher-order moment misspecification. It also leads to more precise estimates of the parameters by effectively using all the available participants. Through simulation studies, we show that the proposed estimator is consistent and asymptotically normal. Data from the Collaborative Perinatal Project are analyzed to illustrate our method.
Collapse
Affiliation(s)
- Yinghao Pan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Matthew P Longnecker
- Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
6
|
Pan Y, Cai J, Kim S, Zhou H. Regression analysis for secondary response variable in a case-cohort study. Biometrics 2017; 74:1014-1022. [PMID: 29286533 DOI: 10.1111/biom.12838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 11/01/2017] [Accepted: 11/01/2017] [Indexed: 12/01/2022]
Abstract
Case-cohort study design has been widely used for its cost-effectiveness. In any real study, there are always other important outcomes of interest beside the failure time that the original case-cohort study is based on. How to utilize the available case-cohort data to study the relationship of a secondary outcome with the primary exposure obtained through the case-cohort study is not well studied. In this article, we propose a non-parametric estimated likelihood approach for analyzing a secondary outcome in a case-cohort study. The estimation is based on maximizing a semiparametric likelihood function that is built jointly on both time-to-failure outcome and the secondary outcome. The proposed estimator is shown to be consistent, efficient, and asymptotically normal. Finite sample performance is evaluated via simulation studies. Data from the Sister Study is analyzed to illustrate our method.
Collapse
Affiliation(s)
- Yinghao Pan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Sangmi Kim
- Medical College of Georgia, GRU Cancer Center, Augusta University, Augusta, Georgia 30912, U.S.A
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|
7
|
Wang X, Zhou J, Wang T, George SL. On Enrichment Strategies for Biomarker Stratified Clinical Trials. J Biopharm Stat 2017; 28:292-308. [PMID: 28933670 DOI: 10.1080/10543406.2017.1379532] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
In the era of precision medicine, drugs are increasingly developed to target subgroups of patients with certain biomarkers. In large all-comer trials using a biomarker stratified design, the cost of treating and following patients for clinical outcomes may be prohibitive. With a fixed number of randomized patients, the efficiency of testing certain treatments parameters, including the treatment effect among biomarker-positive patients and the interaction between treatment and biomarker, can be improved by increasing the proportion of biomarker positives on study, especially when the prevalence rate of biomarker positives is low in the underlying patient population. When the cost of assessing the true biomarker is prohibitive, one can further improve the study efficiency by oversampling biomarker positives with a cheaper auxiliary variable or a surrogate biomarker that correlates with the true biomarker. To improve efficiency and reduce cost, we can adopt an enrichment strategy for both scenarios by concentrating on testing and treating patient subgroups that contain more information about specific treatment parameters of primary interest to the investigators. In the first scenario, an enriched biomarker stratified design enriches the cohort of randomized patients by directly oversampling the relevant patients with the true biomarker, while in the second scenario, an auxiliary-variable-enriched biomarker stratified design enriches the randomized cohort based on an inexpensive auxiliary variable, thereby avoiding testing the true biomarker on all screened patients and reducing treatment waiting time. For both designs, we discuss how to choose the optimal enrichment proportion when testing a single hypothesis or two hypotheses simultaneously. At a requisite power, we compare the two new designs with the BSD design in terms of the number of randomized patients and the cost of trial under scenarios mimicking real biomarker stratified trials. The new designs are illustrated with hypothetical examples for designing biomarker-driven cancer trials.
Collapse
Affiliation(s)
- Xiaofei Wang
- a Department of Biostatistics and Bioinformatics , Duke University , Durham , NC , U.S.A
| | - Jingzhu Zhou
- a Department of Biostatistics and Bioinformatics , Duke University , Durham , NC , U.S.A
| | - Ting Wang
- b Department of Biostatistics , University of North Carolina at Chapel Hill , Chapel Hill , NC , U.S.A
| | - Stephen L George
- a Department of Biostatistics and Bioinformatics , Duke University , Durham , NC , U.S.A
| |
Collapse
|
8
|
Yu J, Liu Y, Cai J, Sandler DP, Zhou H. Outcome-Dependent Sampling Design and Inference for Cox's Proportional Hazards Model. J Stat Plan Inference 2017; 178:24-36. [PMID: 28090134 DOI: 10.1016/j.jspi.2016.05.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We propose a cost-effective outcome-dependent sampling design for the failure time data and develop an efficient inference procedure for data collected with this design. To account for the biased sampling scheme, we derive estimators from a weighted partial likelihood estimating equation. The proposed estimators for regression parameters are shown to be consistent and asymptotically normally distributed. A criteria that can be used to optimally implement the ODS design in practice is proposed and studied. The small sample performance of the proposed method is evaluated by simulation studies. The proposed design and inference procedure is shown to be statistically more powerful than existing alternative designs with the same sample sizes. We illustrate the proposed method with an existing real data from the Cancer Incidence and Mortality of Uranium Miners Study.
Collapse
Affiliation(s)
- Jichang Yu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, Hubei 430073, China; School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei 430072, China
| | - Yanyan Liu
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei 430072, China
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Dale P Sandler
- Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
9
|
Ding J, Lu TS, Cai J, Zhou H. Recent progresses in outcome-dependent sampling with failure time data. LIFETIME DATA ANALYSIS 2017; 23:57-82. [PMID: 26759313 PMCID: PMC4942414 DOI: 10.1007/s10985-015-9355-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 12/22/2015] [Indexed: 06/05/2023]
Abstract
An outcome-dependent sampling (ODS) design is a retrospective sampling scheme where one observes the primary exposure variables with a probability that depends on the observed value of the outcome variable. When the outcome of interest is failure time, the observed data are often censored. By allowing the selection of the supplemental samples depends on whether the event of interest happens or not and oversampling subjects from the most informative regions, ODS design for the time-to-event data can reduce the cost of the study and improve the efficiency. We review recent progresses and advances in research on ODS designs with failure time data. This includes researches on ODS related designs like case-cohort design, generalized case-cohort design, stratified case-cohort design, general failure-time ODS design, length-biased sampling design and interval sampling design.
Collapse
Affiliation(s)
- Jieli Ding
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, 430072, China
| | - Tsui-Shan Lu
- Department of Mathematics, National Taiwan Normal University, Taipei, 116, Taiwan
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
10
|
Zhu Z, Wang X, Saha-Chaudhuri P, Kosinski AS, George SL. Time-dependent classification accuracy curve under marker-dependent sampling. Biom J 2016; 58:974-92. [PMID: 27119599 DOI: 10.1002/bimj.201500171] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Revised: 01/25/2016] [Accepted: 02/06/2016] [Indexed: 11/10/2022]
Abstract
Evaluating the classification accuracy of a candidate biomarker signaling the onset of disease or disease status is essential for medical decision making. A good biomarker would accurately identify the patients who are likely to progress or die at a particular time in the future or who are in urgent need for active treatments. To assess the performance of a candidate biomarker, the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) are commonly used. In many cases, the standard simple random sampling (SRS) design used for biomarker validation studies is costly and inefficient. In order to improve the efficiency and reduce the cost of biomarker validation, marker-dependent sampling (MDS) may be used. In a MDS design, the selection of patients to assess true survival time is dependent on the result of a biomarker assay. In this article, we introduce a nonparametric estimator for time-dependent AUC under a MDS design. The consistency and the asymptotic normality of the proposed estimator is established. Simulation shows the unbiasedness of the proposed estimator and a significant efficiency gain of the MDS design over the SRS design.
Collapse
Affiliation(s)
- Zhaoyin Zhu
- Division of Biostatistics, New York University School of Medicine, New York, NY 10016, USA
| | - Xiaofei Wang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27705, USA
| | - Paramita Saha-Chaudhuri
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC H3A 1A2, Canada
| | - Andrzej S Kosinski
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27705, USA
| | - Stephen L George
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27705, USA
| |
Collapse
|
11
|
Yu J, Liu Y, Sandler DP, Zhou H. Statistical inference for the additive hazards model under outcome-dependent sampling. CAN J STAT 2015; 43:436-453. [PMID: 26379363 DOI: 10.1002/cjs.11257] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.
Collapse
Affiliation(s)
- Jichang Yu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, Hubei 430073, China
| | - Yanyan Liu
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei 430072, China
| | - Dale P Sandler
- Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, U.S.A
| | - Haibo Zhou
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|
12
|
Zhou H, Xu W, Zeng D, Cai J. Semiparametric Inference for Data with a Continuous Outcome from a Two-Phase Probability Dependent Sampling Scheme. J R Stat Soc Series B Stat Methodol 2013; 76:197-215. [PMID: 24737947 DOI: 10.1111/rssb.12029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Multi-phased designs and biased sampling designs are two of the well recognized approaches to enhance study efficiency. In this paper, we propose a new and cost-effective sampling design, the two-phase probability dependent sampling design (PDS), for studies with a continuous outcome. This design will enable investigators to make efficient use of resources by targeting more informative subjects for sampling. We develop a new semiparametric empirical likelihood inference method to take advantage of data obtained through a PDS design. Simulation study results indicate that the proposed sampling scheme, coupled with the proposed estimator, is more efficient and more powerful than the existing outcome dependent sampling design and the simple random sampling design with the same sample size. We illustrate the proposed method with a real data set from an environmental epidemiologic study.
Collapse
Affiliation(s)
- Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Wangli Xu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A ; Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, 100872, China
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|
13
|
Abstract
Treatment-selection markers are biological molecules or patient characteristics associated with one's response to treatment. They can be used to predict treatment effects for individual subjects and subsequently help deliver treatment to those most likely to benefit from it. Statistical tools are needed to evaluate a marker's capacity to help with treatment selection. The commonly adopted criterion for a good treatment-selection marker has been the interaction between marker and treatment. While a strong interaction is important, it is, however, not sufficient for good marker performance. In this article, we develop novel measures for assessing a continuous treatment-selection marker, based on a potential outcomes framework. Under a set of assumptions, we derive the optimal decision rule based on the marker to classify individuals according to treatment benefit, and characterize the marker's performance using the corresponding classification accuracy as well as the overall distribution of the classifier. We develop a constrained maximum-likelihood method for estimation and testing in a randomized trial setting. Simulation studies are conducted to demonstrate the performance of our methods. Finally, we illustrate the methods using an HIV vaccine trial where we explore the value of the level of preexisting immunity to adenovirus serotype 5 for predicting a vaccine-induced increase in the risk of HIV acquisition.
Collapse
Affiliation(s)
- Ying Huang
- Fred Hutchinson Cancer Research Center, Seattle, Washington, 98109, USA.
| | | | | |
Collapse
|
14
|
Xu W, Zhou H. Mixed effect regression analysis for a cluster-based two-stage outcome-auxiliary-dependent sampling design with a continuous outcome. Biostatistics 2012; 13:650-64. [PMID: 22723503 PMCID: PMC3440236 DOI: 10.1093/biostatistics/kxs013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2011] [Revised: 04/20/2012] [Accepted: 04/23/2012] [Indexed: 11/13/2022] Open
Abstract
Two-stage design is a well-known cost-effective way for conducting biomedical studies when the exposure variable is expensive or difficult to measure. Recent research development further allowed one or both stages of the two-stage design to be outcome dependent on a continuous outcome variable. This outcome-dependent sampling feature enables further efficiency gain in parameter estimation and overall cost reduction of the study (e.g. Wang, X. and Zhou, H., 2010. Design and inference for cancer biomarker study with an outcome and auxiliary-dependent subsampling. Biometrics 66, 502-511; Zhou, H., Song, R., Wu, Y. and Qin, J., 2011. Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome. Biometrics 67, 194-202). In this paper, we develop a semiparametric mixed effect regression model for data from a two-stage design where the second-stage data are sampled with an outcome-auxiliary-dependent sample (OADS) scheme. Our method allows the cluster- or center-effects of the study subjects to be accounted for. We propose an estimated likelihood function to estimate the regression parameters. Simulation study indicates that greater study efficiency gains can be achieved under the proposed two-stage OADS design with center-effects when compared with other alternative sampling schemes. We illustrate the proposed method by analyzing a dataset from the Collaborative Perinatal Project.
Collapse
Affiliation(s)
- Wangli Xu
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing 100872, China and Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | |
Collapse
|
15
|
Ding J, Liu Y, Peden DB, Kleeberger SR, Zhou H. Regression analysis for a summed missing data problem under an outcome-dependent sampling scheme. CAN J STAT 2012. [DOI: 10.1002/cjs.11131] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
16
|
Zhou H, Song R, Wu Y, Qin J. Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome. Biometrics 2011; 67:194-202. [PMID: 20560938 DOI: 10.1111/j.1541-0420.2010.01446.x] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The two-stage case-control design has been widely used in epidemiology studies for its cost-effectiveness and improvement of the study efficiency (White, 1982, American Journal of Epidemiology 115, 119-128; Breslow and Cain, 1988, Biometrika 75, 11-20). The evolution of modern biomedical studies has called for cost-effective designs with a continuous outcome and exposure variables. In this article, we propose a new two-stage outcome-dependent sampling (ODS) scheme with a continuous outcome variable, where both the first-stage data and the second-stage data are from ODS schemes. We develop a semiparametric empirical likelihood estimation for inference about the regression parameters in the proposed design. Simulation studies were conducted to investigate the small-sample behavior of the proposed estimator. We demonstrate that, for a given statistical power, the proposed design will require a substantially smaller sample size than the alternative designs. The proposed method is illustrated with an environmental health study conducted at National Institutes of Health.
Collapse
Affiliation(s)
- Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7420, USA.
| | | | | | | |
Collapse
|
17
|
Zhou H, Wu Y, Liu Y, Cai J. Semiparametric inference for a 2-stage outcome-auxiliary-dependent sampling design with continuous outcome. Biostatistics 2011; 12:521-34. [PMID: 21252082 DOI: 10.1093/biostatistics/kxq080] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Two-stage design has long been recognized to be a cost-effective way for conducting biomedical studies. In many trials, auxiliary covariate information may also be available, and it is of interest to exploit these auxiliary data to improve the efficiency of inferences. In this paper, we propose a 2-stage design with continuous outcome where the second-stage data is sampled with an "outcome-auxiliary-dependent sampling" (OADS) scheme. We propose an estimator which is the maximizer for an estimated likelihood function. We show that the proposed estimator is consistent and asymptotically normally distributed. The simulation study indicates that greater study efficiency gains can be achieved under the proposed 2-stage OADS design by utilizing the auxiliary covariate information when compared with other alternative sampling schemes. We illustrate the proposed method by analyzing a data set from an environmental epidemiologic study.
Collapse
Affiliation(s)
- Haibo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7420, USA.
| | | | | | | |
Collapse
|
18
|
Shao Y. Translational Medicine: Strategies and Statistical Methods edited by COSMATOS, D. and CHOW, S.-C. Biometrics 2010. [DOI: 10.1111/j.1541-0420.2010.01430.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
19
|
Wang X, Wu Y, Zhou H. Outcome- and auxiliary-dependent subsampling and its statistical inference. J Biopharm Stat 2010; 19:1132-50. [PMID: 20183468 DOI: 10.1080/10543400903243025] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The performance of a biomarker predicting clinical outcome is often evaluated in a large prospective study. Due to high costs associated with bioassay, investigators need to select a subset from all available patients for biomarker assessment. We consider an outcome- and auxiliary-dependent subsampling (OADS) scheme, in which the probability of selecting a patient into the subset depends on the patient's clinical outcome and an auxiliary variable. We proposed a semiparametric empirical likelihood method to estimate the association between biomarker and clinical outcome. Asymptotic properties of the estimator are given. Simulation study shows that the proposed method outperforms alternative methods.
Collapse
Affiliation(s)
- Xiaofei Wang
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, North Carolina, USA
| | | | | |
Collapse
|