1
|
Wang C, Du M. Martingale-residual-based greedy model averaging for high-dimensional current status data. Stat Med 2024; 43:1726-1742. [PMID: 38381059 DOI: 10.1002/sim.10037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 12/08/2023] [Accepted: 02/01/2024] [Indexed: 02/22/2024]
Abstract
Current status data are a type of failure time data that arise when the failure time of study subject cannot be determined precisely but is known only to occur before or after a random monitoring time. Variable selection methods for the failure time data have been discussed extensively in the literature. However, the statistical inference of the model selected based on the variable selection method ignores the uncertainty caused by model selection. To enhance the prediction accuracy for risk quantities such as survival probability, we propose two optimal model averaging methods under semiparametric additive hazards models. Specifically, based on martingale residuals processes, a delete-one cross-validation (CV) process is defined, and two new CV functional criteria are derived for choosing model weights. Furthermore, we present a greedy algorithm for the implementation of the techniques, and the asymptotic optimality of the proposed model averaging approaches is established, along with the convergence of the greedy averaging algorithms. A series of simulation experiments demonstrate the effectiveness and superiority of the proposed methods. Finally, a real-data example is provided as an illustration.
Collapse
Affiliation(s)
- Chang Wang
- School of Mathematics, Jilin University, Changchun, China
| | - Mingyue Du
- School of Mathematics, Jilin University, Changchun, China
| |
Collapse
|
2
|
Wu Q, Tong X, Zhao X. Deep partially linear cox model for current status data. Biometrics 2024; 80:ujae024. [PMID: 38563532 DOI: 10.1093/biomtc/ujae024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 03/05/2024] [Accepted: 03/13/2024] [Indexed: 04/04/2024]
Abstract
Deep learning has continuously attained huge success in diverse fields, while its application to survival data analysis remains limited and deserves further exploration. For the analysis of current status data, a deep partially linear Cox model is proposed to circumvent the curse of dimensionality. Modeling flexibility is attained by using deep neural networks (DNNs) to accommodate nonlinear covariate effects and monotone splines to approximate the baseline cumulative hazard function. We establish the convergence rate of the proposed maximum likelihood estimators. Moreover, we derive that the finite-dimensional estimator for treatment covariate effects is $\sqrt{n}$-consistent, asymptotically normal, and attains semiparametric efficiency. Finally, we demonstrate the performance of our procedures through extensive simulation studies and application to real-world data on news popularity.
Collapse
Affiliation(s)
- Qiang Wu
- School of Statistics, Beijing Normal University, Beijing 100875, China
| | - Xingwei Tong
- School of Statistics, Beijing Normal University, Beijing 100875, China
| | - Xingqiu Zhao
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China
| |
Collapse
|
3
|
Guan Z. Maximum approximate likelihood estimation in accelerated failure time model for interval-censored data. Stat Med 2023; 42:4886-4896. [PMID: 37652042 DOI: 10.1002/sim.9893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 06/22/2023] [Accepted: 08/21/2023] [Indexed: 09/02/2023]
Abstract
The approximate Bernstein polynomial model, a mixture of beta distributions, is applied to obtain maximum likelihood estimates of the regression coefficients, the baseline density and the survival functions in an accelerated failure time model based on interval censored data including current status data. The estimators of the regression coefficients and the underlying baseline density function are shown to be consistent with almost parametric rates of convergence under some conditions for uncensored and/or interval censored data. Simulation shows that the proposed method is better than its competitors. The proposed method is illustrated by fitting the Breast Cosmetic and the HIV infection time data using the accelerated failure time model.
Collapse
Affiliation(s)
- Zhong Guan
- Department of Mathematical Sciences, Indiana University South Bend, South Bend, Indiana
| |
Collapse
|
4
|
Fang L, Li S, Sun L, Song X. Semiparametric probit regression model with misclassified current status data. Stat Med 2023; 42:4440-4457. [PMID: 37574218 DOI: 10.1002/sim.9869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 06/30/2023] [Accepted: 07/27/2023] [Indexed: 08/15/2023]
Abstract
Current status data arise when each subject under study is examined only once at an observation time, and one only knows the failure status of the event of interest at the observation time rather than the exact failure time. Moreover, the obtained failure status is frequently subject to misclassification due to imperfect tests, yielding misclassified current status data. This article conducts regression analysis of such data with the semiparametric probit model, which serves as an important alternative to existing semiparametric models and has recently received considerable attention in failure time data analysis. We consider the nonparametric maximum likelihood estimation and develop an expectation-maximization (EM) algorithm by incorporating the generalized pool-adjacent-violators (PAV) algorithm to maximize the intractable likelihood function. The resulting estimators of regression parameters are shown to be consistent, asymptotically normal, and semiparametrically efficient. Furthermore, the numerical results in simulation studies indicate that the proposed method performs satisfactorily in finite samples and outperforms the naive method that ignores misclassification. We then apply the proposed method to a real dataset on chlamydia infection.
Collapse
Affiliation(s)
- Lijun Fang
- School of Economics and Statistics, Guangzhou University, Guangzhou, China
| | - Shuwei Li
- School of Economics and Statistics, Guangzhou University, Guangzhou, China
| | - Liuquan Sun
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Xinyuan Song
- Department of Statistics, Chinese University of Hong Kong, Hong Kong, Hong Kong
| |
Collapse
|
5
|
Mao F, Cook RJ. Two-phase designs with current status data. Stat Med 2023; 42:1207-1232. [PMID: 36690474 DOI: 10.1002/sim.9666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 11/01/2022] [Accepted: 01/05/2023] [Indexed: 01/25/2023]
Abstract
We consider the design and analysis of two-phase studies aiming to assess the relation between a fixed (eg, genetic) marker and an event time under current status observation. We consider a common setting in which a phase I sample is comprised of a large cohort of individuals with outcome (ie, current status) data and a vector of inexpensive covariates. Stored biospecimens for individuals in the phase I sample can be assayed to record the marker of interest for individuals selected in a phase II sub-sample. The design challenge is then to select the phase II sub-sample in order to maximize the precision of the marker effect on the time of interest under a proportional hazards model. This problem has not been examined before for current status data and the role of the assessment time is highlighted. Inference based on likelihood and inverse probability weighted estimating functions are considered, with designs centered on score-based residuals, extreme current status observations, or stratified sampling schemes. Data from a registry of patients with psoriatic arthritis is used in an illustration where we study the risk of diabetes as a comorbidity.
Collapse
Affiliation(s)
- Fangya Mao
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada
| | - Richard J Cook
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada
| |
Collapse
|
6
|
Hou J, Chan SF, Wang X, Cai T. Risk prediction with imperfect survival outcome information from electronic health records. Biometrics 2023; 79:190-202. [PMID: 34747010 PMCID: PMC9741856 DOI: 10.1111/biom.13599] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 10/28/2021] [Accepted: 10/29/2021] [Indexed: 12/14/2022]
Abstract
Readily available proxies for the time of disease onset such as the time of the first diagnostic code can lead to substantial risk prediction error if performing analyses based on poor proxies. Due to the lack of detailed documentation and labor intensiveness of manual annotation, it is often only feasible to ascertain for a small subset the current status of the disease by a follow-up time rather than the exact time. In this paper, we aim to develop risk prediction models for the onset time efficiently leveraging both a small number of labels on the current status and a large number of unlabeled observations on imperfect proxies. Under a semiparametric transformation model for onset and a highly flexible measurement error model for proxy onset time, we propose the semisupervised risk prediction method by combining information from proxies and limited labels efficiently. From an initially estimator solely based on the labeled subset, we perform a one-step correction with the full data augmenting against a mean zero rank correlation score derived from the proxies. We establish the consistency and asymptotic normality of the proposed semisupervised estimator and provide a resampling procedure for interval estimation. Simulation studies demonstrate that the proposed estimator performs well in a finite sample. We illustrate the proposed estimator by developing a genetic risk prediction model for obesity using data from Mass General Brigham Healthcare Biobank.
Collapse
Affiliation(s)
- Jue Hou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Stephanie F. Chan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Xuan Wang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
7
|
Luo L, Yu J, Zhao H. The sparse estimation of the semiparametric linear transformation model with dependent current status data. J Appl Stat 2022; 51:759-779. [PMID: 38414802 PMCID: PMC10896163 DOI: 10.1080/02664763.2022.2161488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 12/18/2022] [Indexed: 12/31/2022]
Abstract
In this paper, we study the sparse estimation under the semiparametric linear transformation models for the current status data, also called type I interval-censored data. For the problem, the failure time of interest may be dependent on the censoring time and the association parameter between them is left unspecified. To address this, we employ the copula model to describe the dependence between them and a two-stage estimation procedure to estimate both the association parameter and the regression parameter. In addition, we propose a penalized maximum likelihood estimation procedure based on the broken adaptive ridge regression, and Bernstein polynomials are used to approximate the nonparametric functions involved. The oracle property of the proposed method is established and the numerical studies suggest that the method works well for practical situations. Finally, the method is applied to an Alzheimer's disease study that motivated this investigation.
Collapse
Affiliation(s)
- Lin Luo
- College of Science, Zhongyuan University of Technology, Zhengzhou, People's Republic of China
| | - Jinzhao Yu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, People's Republic of China
| | - Hui Zhao
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, People's Republic of China
| |
Collapse
|
8
|
Yefenof J, Goldberg Y, Wiler J, Mandelbaum A, Ritov Y. Self-reporting and screening: Data with right-censored, left-censored, and complete observations. Stat Med 2022; 41:3561-3578. [PMID: 35608143 PMCID: PMC9546051 DOI: 10.1002/sim.9434] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 02/25/2022] [Accepted: 04/25/2022] [Indexed: 11/06/2022]
Abstract
We consider survival data that combine three types of observations: uncensored, right-censored, and left-censored. Such data arises from screening a medical condition, in situations where self-detection arises naturally. Our goal is to estimate the failure-time distribution, based on these three observation types. We propose a novel methodology for distribution estimation using both semiparametric and nonparametric techniques. We then evaluate the performance of these estimators via simulated data. Finally, as a case study, we estimate the patience of patients who arrive at an emergency department and wait for treatment. Three categories of patients are observed: those who leave the system and announce it, and thus their patience time is observed; those who get service and thus their patience time is right-censored by the waiting time; and those who leave the system without announcing it. For this third category, the patients' absence is revealed only when they are called to service, which is after they have already left; formally, their patience time is left-censored. Other applications of our proposed methodology are discussed.
Collapse
Affiliation(s)
- Jonathan Yefenof
- Statistics and Data ScienceThe Hebrew University of JerusalemJerusalemIsrael
- Present address:
Department of StatisticsThe Hebrew University of JerusalemJerusalemIsrael.
| | - Yair Goldberg
- The Faculty of Industrial Engineering and ManagementTechnion ‐ Israel Institute of TechnologyHaifaIsrael
| | - Jennifer Wiler
- School of MedicineUniversity of ColoradoBoulderColoradoUSA
| | - Avishai Mandelbaum
- The Faculty of Industrial Engineering and ManagementTechnion ‐ Israel Institute of TechnologyHaifaIsrael
| | - Ya'acov Ritov
- Statistics and Data ScienceThe Hebrew University of JerusalemJerusalemIsrael
- Department of StatisticsUniversity of MichiganAnn ArborMichiganUSA
| |
Collapse
|
9
|
Chan S, Wang X, Jazić I, Peskoe S, Zheng Y, Cai T. Developing and evaluating risk prediction models with panel current status data. Biometrics 2021; 77:599-609. [PMID: 32562264 PMCID: PMC8168594 DOI: 10.1111/biom.13317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 04/22/2020] [Accepted: 05/27/2020] [Indexed: 12/24/2022]
Abstract
Panel current status data arise frequently in biomedical studies when the occurrence of a particular clinical condition is only examined at several prescheduled visit times. Existing methods for analyzing current status data have largely focused on regression modeling based on commonly used survival models such as the proportional hazards model and the accelerated failure time model. However, these procedures have the limitations of being difficult to implement and performing sub-optimally in relatively small sample sizes. The performance of these procedures is also unclear under model misspecification. In addition, no methods currently exist to evaluate the prediction performance of estimated risk models with panel current status data. In this paper, we propose a simple estimator under a general class of nonparametric transformation (NPT) models by fitting a logistic regression working model and demonstrate that our proposed estimator is consistent for the NPT model parameter up to a scale multiplier. Furthermore, we propose nonparametric estimators for evaluating the prediction performance of the risk score derived from model fitting, which is valid regardless of the adequacy of the fitted model. Extensive simulation results suggest that our proposed estimators perform well in finite samples and the regression parameter estimators outperform existing estimators under various scenarios. We illustrate the proposed procedures using data from the Framingham Offspring Study.
Collapse
Affiliation(s)
- Stephanie Chan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Xuan Wang
- Department of Statistics, School of Mathematical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Ina Jazić
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Sarah Peskoe
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Yingye Zheng
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109
| | - Tianxi Cai
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| |
Collapse
|
10
|
Lam KF, Lee CY, Wong KY, Bandyopadhyay D. Marginal analysis of current status data with informative cluster size using a class of semiparametric transformation cure models. Stat Med 2021; 40:2400-2412. [PMID: 33586218 DOI: 10.1002/sim.8910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/18/2021] [Accepted: 01/27/2021] [Indexed: 11/12/2022]
Abstract
This research is motivated by a periodontal disease dataset that possesses certain special features. The dataset consists of clustered current status time-to-event observations with large and varying cluster sizes, where the cluster size is associated with the disease outcome. Also, heavy censoring is present in the data even with long follow-up time, suggesting the presence of a cured subpopulation. In this paper, we propose a computationally efficient marginal approach, namely the cluster-weighted generalized estimating equation approach, to analyze the data based on a class of semiparametric transformation cure models. The parametric and nonparametric components of the model are estimated using a Bernstein-polynomial based sieve maximum pseudo-likelihood approach. The asymptotic properties of the proposed estimators are studied. Simulation studies are conducted to evaluate the performance of the proposed estimators in scenarios with different degree of informative clustering and within-cluster dependence. The proposed method is applied to the motivating periodontal disease data for illustration.
Collapse
Affiliation(s)
- Kwok Fai Lam
- Department of Statistics and Actuarial Science, The University of Hong Kong, Pok Fu Lam, Hong Kong.,Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Chun Yin Lee
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
| | - Kin Yau Wong
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
| | | |
Collapse
|
11
|
Wang MC, Yang Y. Complexity and bias in cross-sectional data with binary disease outcome in observational studies. Stat Med 2020; 40:950-962. [PMID: 33169416 DOI: 10.1002/sim.8812] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 10/28/2020] [Accepted: 10/29/2020] [Indexed: 11/09/2022]
Abstract
A cross sectional population is defined as a population of living individuals at the sampling or observational time. Cross-sectionally sampled data with binary disease outcome are commonly analyzed in observational studies for identifying how covariates correlate with disease occurrence. It is generally understood that cross-sectional binary outcome is not as informative as longitudinally collected time-to-event data, but there is insufficient understanding as to whether bias can possibly exist in cross-sectional data and how the bias is related to the population risk of interest. As the progression of a disease typically involves both time and disease status, we consider how the binary disease outcome from the cross-sectional population is connected to birth-illness-death process in the target population. We argue that the distribution of cross-sectional binary outcome is different from the risk distribution from the target population and that bias would typically arise when using cross-sectional data to draw inference for population risk. In general, the cross-sectional risk probability is determined jointly by the population risk probability and the ratio of duration of diseased state to the duration of disease-free state. Through explicit formulas we conclude that bias can almost never be avoided from cross-sectional data. We present age-specific risk probability (ARP) and argue that models based on ARP offers a compromised but still biased approach to understand the population risk. An analysis based on Alzheimer's disease data is presented to illustrate the ARP model and possible critiques for the analysis results.
Collapse
Affiliation(s)
- Mei-Cheng Wang
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA
| | - Yuchen Yang
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA
| |
Collapse
|
12
|
Jonker MA, Vart P, Rodriguez Girondo M. Estimating the age at onset distribution of the asymptomatic stage of a genetic disease based on pedigree data. Stat Methods Med Res 2019; 29:2344-2359. [PMID: 31880204 PMCID: PMC7391479 DOI: 10.1177/0962280219893400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Information on the age at onset distribution of the asymptomatic stage of a disease can be of paramount importance in early detection and timely management of that disease. However, accurately estimating this distribution is challenging, because the asymptomatic stage is difficult to recognize for the patient and is often detected as an incidental finding or in case of recommended screening; the age at onset is often interval-censored. In this paper, we propose a method for the estimation of the age at onset distribution of the asymptomatic stage of a genetic disease based on ascertained pedigree data that take into account the way the data are ascertained to overcome selection bias. Simulation studies show that the estimates seem to be asymptotically unbiased. Our work is motivated by the analysis of data on facioscapulohumeral muscular dystrophy, a genetic muscle disorder. In our application, carriers of the genetic causal variant are identified through genetic screening of the relatives of symptomatic carriers and their disease status is determined by a medical examination. The estimates reveal an early age at onset of the asymptomatic stage of facioscapulohumeral muscular dystrophy.
Collapse
Affiliation(s)
- Marianne A Jonker
- Department for Health Evidence, Section Biostatistics, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Priya Vart
- Department for Health Evidence, Section Biostatistics, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Mar Rodriguez Girondo
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
| |
Collapse
|
13
|
Wang C, Li Q, Song X, Dong X. Bayesian adaptive lasso for additive hazard regression with current status data. Stat Med 2019; 38:3703-3718. [PMID: 31197854 DOI: 10.1002/sim.8137] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Revised: 11/27/2018] [Accepted: 02/01/2019] [Indexed: 12/18/2022]
Abstract
Variable selection is a crucial issue in model building and it has received considerable attention in the literature of survival analysis. However, available approaches in this direction have mainly focused on time-to-event data with right censoring. Moreover, a majority of existing variable selection procedures for survival models are developed in a frequentist framework. In this article, we consider additive hazards model in the presence of current status data. We propose a Bayesian adaptive least absolute shrinkage and selection operator procedure to conduct a simultaneous variable selection and parameter estimation. Efficient Markov chain Monte Carlo methods are developed to implement posterior sampling and inference. The empirical performance of the proposed method is demonstrated by simulation studies. An application to a study on the risk factors of heart failure disease for type 2 diabetes patients is presented.
Collapse
Affiliation(s)
- Chunjie Wang
- School of Mathematics and Statistics, Changchun University of Technology, Changchun, China
| | - Qun Li
- School of Mathematics and Statistics, Changchun University of Technology, Changchun, China
| | - Xinyuan Song
- Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Xiaogang Dong
- School of Mathematics and Statistics, Changchun University of Technology, Changchun, China
| |
Collapse
|
14
|
Mao L. Proportional hazards regression of survival-sacrifice data with cause-of-death information in animal carcinogenicity studies. Stat Med 2019; 38:3628-3641. [PMID: 31074119 DOI: 10.1002/sim.8201] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 04/13/2019] [Accepted: 04/18/2019] [Indexed: 11/08/2022]
Abstract
Rodent survival-sacrifice experiments are routinely conducted to assess the tumor-inducing potential of a certain exposure or drug. Because most tumors under study are impalpable, animals are examined at death for evidence of tumor formation. In some studies, the cause of death is ascertained by a pathologist to account for possible correlation between tumor development and death. Existing methods for survival-sacrifice data with cause-of-death information have been restricted to multi-group testing or one-sample estimation of tumor onset distribution and thus do not provide a natural way to quantify treatment effect or dose-response relationship. In this paper, we propose semiparametric regression methods under the popular proportional hazards model for both tumor onset and tumor-caused death. For inference, we develop a maximum pseudo-likelihood estimation procedure using a modified iterative convex minorant algorithm, which is guaranteed to converge to the unique maximizer of the objective function. Simulation studies under different tumor rates show that the new methods provide valid inference on the covariate-outcome relationship and outperform alternative approaches. A real study investigating the effects of benzidine dihydrochloride on liver tumor in mice is analyzed as an illustration.
Collapse
Affiliation(s)
- Lu Mao
- Department of Biostatistics and Medical Informatics, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin
| |
Collapse
|
15
|
Abstract
Frailty models have been developed to quantify both heterogeneity as well as association in multivariate time-to-event data. In recent years, numerous shared and correlated frailty models have been proposed in the survival literature allowing for different association structures and frailty distributions. A bivariate correlated gamma frailty model with an additive decomposition of the frailty variables into a sum of independent gamma components was introduced before. Although this model has a very convenient closed-form representation for the bivariate survival function, the correlation among event- or subject-specific frailties is bounded above which becomes a severe limitation when the values of the two frailty variances differ substantially. In this article, we review existing correlated gamma frailty models and propose novel ones based on bivariate gamma frailty distributions. Such models are found to be useful for the analysis of bivariate survival time data regardless of the censoring type involved. The frailty methodology was applied to right-censored and left-truncated Danish twins mortality data and serological survey current status data on varicella zoster virus and parvovirus B19 infections in Belgium. From our analyses, it has been shown that fitting more flexible correlated gamma frailty models in terms of the imposed association and correlation structure outperforms existing frailty models including the one with an additive decomposition.
Collapse
Affiliation(s)
- Adelino Martins
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium.,Department of Mathematics and Informatics, Eduardo Mondlane University, Maputo, Mozambique
| | - Marc Aerts
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium
| | - Niel Hens
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium.,Centre for Health Economics Research and Modelling Infectious Diseases, Centre for the Evaluation of Vaccination, Vaccine & Infectious Disease Institute (WHO Collaborating Centre), University of Antwerp, Wilrijk, Belgium
| | - Andreas Wienke
- Institute of Medical Epidemiology, Biostatistics and Informatics, Martin Luther University of Halle-Wittenberg, Halle, Germany
| | - Steven Abrams
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium
| |
Collapse
|
16
|
Lu M, Li CS. Penalized estimation for proportional hazards models with current status data. Stat Med 2017; 36:4893-4907. [PMID: 28872695 DOI: 10.1002/sim.7489] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Revised: 07/12/2017] [Accepted: 08/16/2017] [Indexed: 11/07/2022]
Abstract
We provide a simple and practical, yet flexible, penalized estimation method for a Cox proportional hazards model with current status data. We approximate the baseline cumulative hazard function by monotone B-splines and use a hybrid approach based on the Fisher-scoring algorithm and the isotonic regression to compute the penalized estimates. We show that the penalized estimator of the nonparametric component achieves the optimal rate of convergence under some smooth conditions and that the estimators of the regression parameters are asymptotically normal and efficient. Moreover, a simple variance estimation method is considered for inference on the regression parameters. We perform 2 extensive Monte Carlo studies to evaluate the finite-sample performance of the penalized approach and compare it with the 3 competing R packages: C1.coxph, intcox, and ICsurv. A goodness-of-fit test and model diagnostics are also discussed. The methodology is illustrated with 2 real applications.
Collapse
Affiliation(s)
- Minggen Lu
- School of Community Health Sciences, University of Nevada, Reno, NV, U.S.A
| | - Chin-Shang Li
- Division of Biostatistics, Department of Public Health Sciences, University of California, Davis, CA, U.S.A
| |
Collapse
|
17
|
Wang C, Sun J, Sun L, Zhou J, Wang D. Nonparametric estimation of current status data with dependent censoring. Lifetime Data Anal 2012; 18:434-445. [PMID: 22735973 PMCID: PMC4538943 DOI: 10.1007/s10985-012-9223-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 06/08/2012] [Indexed: 06/01/2023]
Abstract
This paper discusses nonparametric estimation of a survival function when one observes only current status data (McKeown and Jewell, Lifetime Data Anal 16:215-230, 2010; Sun, The statistical analysis of interval-censored failure time data, 2006; Sun and Sun, Can J Stat 33:85-96, 2005). In this case, each subject is observed only once and the failure time of interest is observed to be either smaller or larger than the observation or censoring time. If the failure time and the observation time can be assumed to be independent, several methods have been developed for the problem. Here we will focus on the situation where the independent assumption does not hold and propose two simple estimation procedures under the copula model framework. The proposed estimates allow one to perform sensitivity analysis or identify the shape of a survival function among other uses. A simulation study performed indicates that the two methods work well and they are applied to a motivating example from a tumorigenicity study.
Collapse
Affiliation(s)
- Chunjie Wang
- Mathematics School and Institute of Jilin University, Changchun, 130012, People's Republic of China
| | | | | | | | | |
Collapse
|
18
|
Maathuis MH, Hudgens MG. Nonparametric inference for competing risks current status data with continuous, discrete or grouped observation times. Biometrika 2011; 98:325-340. [PMID: 22822257 PMCID: PMC3372275 DOI: 10.1093/biomet/asq083] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
New methods and theory have recently been developed to nonparametrically estimate cumulative incidence functions for competing risks survival data subject to current status censoring. In particular, the limiting distribution of the nonparametric maximum likelihood estimator and a simplified naive estimator have been established under certain smoothness conditions. In this paper, we establish the large-sample behaviour of these estimators in two additional models, namely when the observation time distribution has discrete support and when the observation times are grouped. These asymptotic results are applied to the construction of confidence intervals in the three different models. The methods are illustrated on two datasets regarding the cumulative incidence of different types of menopause from a cross-sectional sample of women in the United States and subtype-specific HIV infection from a sero-prevalence study in injecting drug users in Thailand.
Collapse
Affiliation(s)
- M H Maathuis
- Seminar für Statistik, ETH Zürich, Rämistrasse 101, 8092 Zürich, Switzerland ,
| | | |
Collapse
|
19
|
Sal Y Rosas VG, Hughes JP. Nonparametric and Semiparametric Analysis of Current Status Data Subject to Outcome Misclassification. Stat Commun Infect Dis 2010; 2010:364. [PMID: 22408713 PMCID: PMC3298195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
In this article, we present nonparametric and semiparametric methods to analyze current status data subject to outcome misclassification. Our methods use nonparametric maximum likelihood estimation (NPMLE) to estimate the distribution function of the failure time when sensitivity and specificity are known and may vary among subgroups. A nonparametric test is proposed for the two sample hypothesis testing. In regression analysis, we apply the Cox proportional hazard model and likelihood ratio based confidence intervals for the regression coefficients are proposed. Our methods are motivated and demonstrated by data collected from an infectious disease study in Seattle, WA.
Collapse
|
20
|
Zhang S, Zhang Y, Chaloner K, Stapleton JT. A copula model for bivariate hybrid censored survival data with application to the MACS study. Lifetime Data Anal 2010; 16:231-249. [PMID: 19921432 PMCID: PMC3567926 DOI: 10.1007/s10985-009-9139-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2008] [Accepted: 10/31/2009] [Indexed: 05/28/2023]
Abstract
A copula model for bivariate survival data with hybrid censoring is proposed to study the association between survival time of individuals infected with HIV and persistence time of infection with an additional virus. Survival with HIV is right censored and the persistence time of the additional virus is subject to interval censoring case 1. A pseudo-likelihood method is developed to study the association between the two event times under such hybrid censoring. Asymptotic consistency and normality of the pseudo-likelihood estimator are established based on empirical process theory. Simulation studies indicate good performance of the estimator with moderate sample size. The method is applied to a motivating HIV study which investigates the effect of GB virus type C (GBV-C) co-infection on survival time of HIV infected individuals.
Collapse
Affiliation(s)
- Suhong Zhang
- Division of Biostatistics, Edwards Lifesciences, One Edwards Way,
Irvine, CA 92612, USA
| | - Ying Zhang
- Department of Biostatistics, University of Iowa, C22 GH, 200
Hawkins Drive, Iowa City, IA 52242, USA
| | - Kathryn Chaloner
- Department of Biostatistics, University of Iowa, C22 GH, 200
Hawkins Drive, Iowa City, IA 52242, USA
| | - Jack T. Stapleton
- Department of Internal Medicine, University of Iowa and Iowa City
VA Medical Center, SW54-15 GH, 200 Hawkins Drive, Iowa City, IA 52242, USA
| |
Collapse
|
21
|
Abstract
We describe a simple method for nonparametric estimation of a distribution function based on current status data where observations of current status information are subject to misclassification. Nonparametric maximum likelihood techniques lead to use of a straightforward set of adjustments to the familiar pool-adjacent-violators estimator used when misclassification is assumed absent. The methods consider alternative misclassification models and are extended to regression models for the underlying survival time. The ideas are motivated by and applied to an example on human papilloma virus (HPV) infection status of a sample of women examined in San Francisco.
Collapse
Affiliation(s)
- Karen McKeown
- Division of Biostatistics, School of Public Health, University of California, 101 Haviland Hall MC 7358, Berkeley, CA, 94720, USA.
| | | |
Collapse
|
22
|
Abstract
We study nonparametric estimation for current status data with competing risks. Our main interest is in the nonparametric maximum likelihood estimator (MLE), and for comparison we also consider a simpler 'naive estimator'. Groeneboom, Maathuis and Wellner [8] proved that both types of estimators converge globally and locally at rate n(1/3). We use these results to derive the local limiting distributions of the estimators. The limiting distribution of the naive estimator is given by the slopes of the convex minorants of correlated Brownian motion processes with parabolic drifts. The limiting distribution of the MLE involves a new self-induced limiting process. Finally, we present a simulation study showing that the MLE is superior to the naive estimator in terms of mean squared error, both for small sample sizes and asymptotically.
Collapse
Affiliation(s)
- Piet Groeneboom
- Department of Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands, e-mail:
| | - Marloes H. Maathuis
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195, USA, e-mail:
| | - Jon A. Wellner
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195, USA, e-mail:
| |
Collapse
|