1
|
Lotspeich SC, Ashner MC, Vazquez JE, Richardson BD, Grosser KF, Bodek BE, Garcia TP. Making Sense of Censored Covariates: Statistical Methods for Studies of Huntington's Disease. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2024; 11:255-277. [PMID: 38962579 PMCID: PMC11220439 DOI: 10.1146/annurev-statistics-040522-095944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/05/2024]
Abstract
The landscape of survival analysis is constantly being revolutionized to answer biomedical challenges, most recently the statistical challenge of censored covariates rather than outcomes. There are many promising strategies to tackle censored covariates, including weighting, imputation, maximum likelihood, and Bayesian methods. Still, this is a relatively fresh area of research, different from the areas of censored outcomes (i.e., survival analysis) or missing covariates. In this review, we discuss the unique statistical challenges encountered when handling censored covariates and provide an in-depth review of existing methods designed to address those challenges. We emphasize each method's relative strengths and weaknesses, providing recommendations to help investigators pinpoint the best approach to handling censored covariates in their data.
Collapse
Affiliation(s)
- Sarah C Lotspeich
- Department of Statistical Sciences, Wake Forest University, Winston-Salem, North Carolina, USA
| | - Marissa C Ashner
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jesus E Vazquez
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Brian D Richardson
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kyle F Grosser
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Benjamin E Bodek
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Tanya P Garcia
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
2
|
Ye P, Bai S, Tang W, Feng H, Qiao X, Tu S, He H. Joint modeling approaches for censored predictors due to detection limits with applications to metabolites data. Stat Med 2024; 43:674-688. [PMID: 38043523 DOI: 10.1002/sim.9978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 09/05/2023] [Accepted: 11/21/2023] [Indexed: 12/05/2023]
Abstract
Measures of substance concentration in urine, serum or other biological matrices often have an assay limit of detection. When concentration levels fall below the limit, exact measures cannot be obtained, and thus are left censored. The problem becomes more challenging when the censored data come from heterogeneous populations consisting of exposed and non-exposed subjects. If the censored data come from non-exposed subjects, their measures are always zero and hence censored, forming a latent class governed by a distinct censoring mechanism compared with the exposed subjects. The exposed group's censored measurements are always greater than zero, but less than the detection limit. It is very often that the exposed and non-exposed subjects may have different disease traits or different relationships with outcomes of interest, so we need to disentangle the two different populations for valid inference. In this article, we aim to fill the methodological gaps in the literature by developing a novel joint modeling approach to not only address the censoring issue in predictors, but also untangle different relationships of exposed and non-exposed subjects with the outcome. Simulation studies are performed to assess the numerical performance of our proposed approach when the sample size is small to moderate. The joint modeling approach is also applied to examine associations between plasma metabolites and blood pressure in Bogalusa Heart Study, and identify new metabolites that are highly associated with blood pressure.
Collapse
Affiliation(s)
- Peng Ye
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Shuo Bai
- Department of Epidemiology, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, USA
| | - Wan Tang
- Department of Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, USA
| | - Han Feng
- Tulane Research and Innovation for Arrhythmia Discovery- TRIAD Center, School of Medicine, Tulane University, New Orleans, Louisiana, USA
| | - Xinhua Qiao
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Shengjia Tu
- Division of Biostatistics and Bioinformatics Herbert Wertheim School of Public Health and Human Longevity Science, La Jolla, California, USA
| | - Hua He
- Department of Epidemiology, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, USA
- Department of Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, USA
| |
Collapse
|
3
|
Yu A, Zhong Y, Feng X, Wei Y. Quantile regression for nonignorable missing data with its application of analyzing electronic medical records. Biometrics 2023; 79:2036-2049. [PMID: 35861675 DOI: 10.1111/biom.13723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 07/15/2022] [Indexed: 11/27/2022]
Abstract
Over the past decade, there has been growing enthusiasm for using electronic medical records (EMRs) for biomedical research. Quantile regression estimates distributional associations, providing unique insights into the intricacies and heterogeneity of the EMR data. However, the widespread nonignorable missing observations in EMR often obscure the true associations and challenge its potential for robust biomedical discoveries. We propose a novel method to estimate the covariate effects in the presence of nonignorable missing responses under quantile regression. This method imposes no parametric specifications on response distributions, which subtly uses implicit distributions induced by the corresponding quantile regression models. We show that the proposed estimator is consistent and asymptotically normal. We also provide an efficient algorithm to obtain the proposed estimate and a randomly weighted bootstrap approach for statistical inferences. Numerical studies, including an empirical analysis of real-world EMR data, are used to assess the proposed method's finite-sample performance compared to existing literature.
Collapse
Affiliation(s)
- Aiai Yu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Yujie Zhong
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Xingdong Feng
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Ying Wei
- Department of Biostatistics, Columbia University, New York, New York, USA
| |
Collapse
|
4
|
Jiang H, Huang L, Xia Y. Nonparametric regression with right‐censored covariate via conditional density function. Stat Med 2022; 41:2025-2051. [DOI: 10.1002/sim.9343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 12/19/2021] [Accepted: 01/17/2022] [Indexed: 11/11/2022]
Affiliation(s)
- Hui Jiang
- School of Mathematics and Statistics Huazhong University of Science and Technology Wuhan China
| | - Lei Huang
- School of Mathematics Southwest Jiaotong University Chengdu China
| | - Yingcun Xia
- Department of Statistics and Data Science National University of Singapore Singapore
- School of Mathematics University of Electronic Science and Technology of China Chengdu China
| |
Collapse
|
5
|
Kaciroti NA, Little RJA. Bayesian sensitivity analyses for longitudinal data with dropouts that are potentially missing not at random: A high dimensional pattern-mixture model. Stat Med 2021; 40:4609-4628. [PMID: 34405912 DOI: 10.1002/sim.9083] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Revised: 04/05/2021] [Accepted: 05/10/2021] [Indexed: 11/05/2022]
Abstract
Randomized clinical trials with outcome measured longitudinally are frequently analyzed using either random effect models or generalized estimating equations. Both approaches assume that the dropout mechanism is missing at random (MAR) or missing completely at random (MCAR). We propose a Bayesian pattern-mixture model to incorporate missingness mechanisms that might be missing not at random (MNAR), where the distribution of the outcome measure at the follow-up time t k , conditional on the prior history, differs across the patterns of missing data. We then perform sensitivity analysis on estimates of the parameters of interest. The sensitivity parameters relate the distribution of the outcome of interest between subjects from a missing-data pattern at time t k with that of the observed subjects at time t k . The large number of the sensitivity parameters is reduced by treating them as random with a prior distribution having some pre-specified mean and variance, which are varied to explore the sensitivity of inferences. The missing at random (MAR) mechanism is a special case of the proposed model, allowing a sensitivity analysis of deviations from MAR. The proposed approach is applied to data from the Trial of Preventing Hypertension.
Collapse
Affiliation(s)
- Niko A Kaciroti
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA.,Department of Pediatrics, Medical School, University of Michigan, Ann Arbor, Michigan, USA
| | - Roderick J A Little
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
6
|
Estimating Outcome-Exposure Associations when Exposure Biomarker Detection Limits vary Across Batches. Epidemiology 2020; 30:746-755. [PMID: 31299670 DOI: 10.1097/ede.0000000000001052] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Limit of detection (LOD) issues are ubiquitous in exposure assessment. Although there is an extensive literature on modeling exposure data under such imperfect measurement processes, including likelihood-based methods and multiple imputation, the standard practice continues to be naïve single imputation by a constant (e.g., (Equation is included in full-text article.)). In this article, we consider the situation where, due to the practical logistics of data accrual, sampling, and resource constraints, exposure data are analyzed in multiple batches where the LOD and the proportion of censored observations differ across batches. Compounding this problem is the potential for nonrandom assignment of samples to each batch, often driven by enrollment patterns and biosample storage. This issue is particularly important for binary outcome data where batches may have different levels of outcome enrichment. We first consider variants of existing methods to address varying LODs across multiple batches. We then propose a likelihood-based multiple imputation strategy to impute observations that are below the LOD while simultaneously accounting for differential batch assignment. Our simulation study shows that our proposed method has superior estimation properties (i.e., bias, coverage, statistical efficiency) compared to standard alternatives, provided that distributional assumptions are satisfied. Additionally, in most batch assignment configurations, complete-case analysis can be made unbiased by including batch indicator terms in the analysis model, although this strategy is less efficient relative to the proposed method. We illustrate our method by analyzing data from a cohort study in Puerto Rico that is investigating the relation between endocrine disruptor exposures and preterm birth.
Collapse
|
7
|
Yu T, Xiang L, Wang HJ. Quantile regression for survival data with covariates subject to detection limits. Biometrics 2020; 77:610-621. [PMID: 32453884 DOI: 10.1111/biom.13309] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 04/16/2020] [Accepted: 05/08/2020] [Indexed: 11/28/2022]
Abstract
With advances in biomedical research, biomarkers are becoming increasingly important prognostic factors for predicting overall survival, while the measurement of biomarkers is often censored due to instruments' lower limits of detection. This leads to two types of censoring: random censoring in overall survival outcomes and fixed censoring in biomarker covariates, posing new challenges in statistical modeling and inference. Existing methods for analyzing such data focus primarily on linear regression ignoring censored responses or semiparametric accelerated failure time models with covariates under detection limits (DL). In this paper, we propose a quantile regression for survival data with covariates subject to DL. Comparing to existing methods, the proposed approach provides a more versatile tool for modeling the distribution of survival outcomes by allowing covariate effects to vary across conditional quantiles of the survival time and requiring no parametric distribution assumptions for outcome data. To estimate the quantile process of regression coefficients, we develop a novel multiple imputation approach based on another quantile regression for covariates under DL, avoiding stringent parametric restrictions on censored covariates as often assumed in the literature. Under regularity conditions, we show that the estimation procedure yields uniformly consistent and asymptotically normal estimators. Simulation results demonstrate the satisfactory finite-sample performance of the method. We also apply our method to the motivating data from a study of genetic and inflammatory markers of Sepsis.
Collapse
Affiliation(s)
- Tonghui Yu
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore
| | - Liming Xiang
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore
| | - Huixia Judy Wang
- Department of Statistics, George Washington University, Washington, District of Columbia
| |
Collapse
|
8
|
Lee M, Rahbar MH, Gensler LS, Brown M, Weisman M, Reveille JD. A latent class based imputation method under Bayesian quantile regression framework using asymmetric Laplace distribution for longitudinal medication usage data with intermittent missing values. J Biopharm Stat 2019; 30:160-177. [PMID: 31730441 DOI: 10.1080/10543406.2019.1684306] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Evaluating the association between diseases and the longitudinal pattern of pharmacological therapy has become increasingly important. However, in many longitudinal studies, self-reported medication usage data collected at patients' follow-up visits could be missing for various reasons. These pieces of missing or inaccurate/untenable information complicate determining the trajectory of medication use and its complete effects for patients. Although longitudinal models can deal with specific types of missing data, inappropriate handling of this issue can lead to a biased estimation of regression parameters especially when missing data mechanisms are complex and depend upon multiple sources of variation. We propose a latent class-based multiple imputation (MI) approach using a Bayesian quantile regression (BQR) that incorporates cluster of unobserved heterogeneity for medication usage data with intermittent missing values. Findings from our simulation study indicate that the proposed method performs better than traditional MI methods under certain scenarios of data distribution. We also demonstrate applications of the proposed method to data from the Prospective Study of Outcomes in Ankylosing Spondylitis (AS) cohort when assessing an association between longitudinal nonsteroidal anti-inflammatory drugs (NSAIDs) usage and radiographic damage in AS, while the longitudinal NSAID index data are intermittently missing.
Collapse
Affiliation(s)
- Minjae Lee
- Division of Clinical and Translational Sciences, Department of Internal Medicine, University of Texas McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Mohammad H Rahbar
- Division of Clinical and Translational Sciences, Department of Internal Medicine, University of Texas McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA.,Department of Human Genetics & Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Lianne S Gensler
- Department of Medicine/Rheumatology, University of California, San Francisco, California, USA
| | - Matthew Brown
- Translational Genomics Group, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Australia
| | - Michael Weisman
- Division of Rheumatology, School of Medicine, Cedars-Sinai Medical Center in Los Angeles, Los Angeles, California, USA
| | - John D Reveille
- Division of Rheumatology and Clinical Immunogenetics, Department of Internal Medicine, University of Texas McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
9
|
Chiou SH, Betensky RA, Balasubramanian R. The missing indicator approach for censored covariates subject to limit of detection in logistic regression models. Ann Epidemiol 2019; 38:57-64. [PMID: 31604610 PMCID: PMC6812630 DOI: 10.1016/j.annepidem.2019.07.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 07/12/2019] [Accepted: 07/24/2019] [Indexed: 12/14/2022]
Abstract
PURPOSE In several biomedical studies, one or more exposures of interest may be subject to nonrandom missingness because of the failure of the measurement assay at levels below its limit of detection. This issue is commonly encountered in studies of the metabolome using tandem mass spectrometry-based technologies. Owing to a large number of metabolites measured in these studies, preserving statistical power is of utmost interest. In this article, we evaluate the small sample properties of the missing indicator approach in logistic and conditional logistic regression models. METHODS For nested case-control or matched case control study designs, we evaluate the bias, power, and type I error associated with the missing indicator method using simulation. We compare the missing indicator approach to complete case analysis and several imputation approaches. RESULTS We show that under a variety of settings, the missing indicator approach outperforms complete case analysis and other imputation approaches with regard to bias, mean squared error, and power. CONCLUSIONS For nested case-control and matched study designs of modest sample sizes, the missing indicator model minimizes loss of information and thus provides an attractive alternative to the oft-used complete case analysis and other imputation approaches.
Collapse
Affiliation(s)
- Sy Han Chiou
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA
| | - Rebecca A Betensky
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA
| | - Raji Balasubramanian
- Department of Biostatistics and Epidemiology, University of Massachusetts - Amherst, Amherst, MA.
| |
Collapse
|
10
|
Wang J, Ning J, Shete S. Mediation analysis in a case-control study when the mediator is a censored variable. Stat Med 2019; 38:1213-1229. [PMID: 30421436 DOI: 10.1002/sim.8028] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Revised: 09/11/2018] [Accepted: 10/15/2018] [Indexed: 11/10/2022]
Abstract
Mediation analysis is an approach for assessing the direct and indirect effects of an initial variable on an outcome through a mediator. In practice, mediation models can involve a censored mediator (eg, a woman's age at menopause). The current research for mediation analysis with a censored mediator focuses on scenarios where outcomes are continuous. However, the outcomes can be binary (eg, type 2 diabetes). Another challenge when analyzing such a mediation model is to use data from a case-control study, which results in biased estimations for the initial variable-mediator association if a standard approach is directly applied. In this study, we propose an approach (denoted as MAC-CC) to analyze the mediation model with a censored mediator given data from a case-control study, based on the semiparametric accelerated failure time model along with a pseudo-likelihood function. We adapted the measures for assessing the indirect and direct effects using counterfactual definitions. We conducted simulation studies to investigate the performance of MAC-CC and compared it to those of the naïve approach and the complete-case approach. MAC-CC accurately estimates the coefficients of different paths, the indirect effects, and the proportions of the total effects mediated. We applied the proposed and existing approaches to the mediation study of genetic variants, a woman's age at menopause, and type 2 diabetes based on a case-control study of type 2 diabetes. Our results indicate that there is no mediating effect from the age at menopause on the association between the genetic variants and type 2 diabetes.
Collapse
Affiliation(s)
- Jian Wang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Jing Ning
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Sanjay Shete
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas.,Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| |
Collapse
|
11
|
Atem FD, Matsouaka RA, Zimmern VE. Cox regression model with randomly censored covariates. Biom J 2019; 61:1020-1032. [PMID: 30908720 DOI: 10.1002/bimj.201800275] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 02/07/2019] [Accepted: 02/07/2019] [Indexed: 11/11/2022]
Abstract
This paper deals with a Cox proportional hazards regression model, where some covariates of interest are randomly right-censored. While methods for censored outcomes have become ubiquitous in the literature, methods for censored covariates have thus far received little attention and, for the most part, dealt with the issue of limit-of-detection. For randomly censored covariates, an often-used method is the inefficient complete-case analysis (CCA) which consists in deleting censored observations in the data analysis. When censoring is not completely independent, the CCA leads to biased and spurious results. Methods for missing covariate data, including type I and type II covariate censoring as well as limit-of-detection do not readily apply due to the fundamentally different nature of randomly censored covariates. We develop a novel method for censored covariates using a conditional mean imputation based on either Kaplan-Meier estimates or a Cox proportional hazards model to estimate the effects of these covariates on a time-to-event outcome. We evaluate the performance of the proposed method through simulation studies and show that it provides good bias reduction and statistical efficiency. Finally, we illustrate the method using data from the Framingham Heart Study to assess the relationship between offspring and parental age of onset of cardiovascular events.
Collapse
Affiliation(s)
- Folefac D Atem
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Roland A Matsouaka
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.,Program for Comparative Effectiveness Methodology, Duke Clinical Research Institute, Duke University, Durham, NC, USA
| | - Vincent E Zimmern
- Department of Pediatrics, University of Texas Southwestern Medical School, Dallas, TX, USA.,Department of Pediatrics, Children Hospital Dallas, Dallas, TX, USA
| |
Collapse
|
12
|
Tang ML, Tang N, Zhao P, Zhu H. Efficient Robust Estimation for Linear Models with Missing Response at Random. Scand Stat Theory Appl 2018; 45:366-381. [PMID: 30078929 DOI: 10.1111/sjos.12296] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Coefficient estimation in linear regression models with missing data is routinely done in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy-tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an IC Q -type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets.
Collapse
Affiliation(s)
- Man-Lai Tang
- Department of Mathematics and Statistics, Hang Seng Management College, Hong Kong
| | - Niansheng Tang
- Department of Statistics, Yunnan University, P. R. of China
| | - Puying Zhao
- Department of Statistics, Yunnan University, P. R. of China
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, USA
| |
Collapse
|
13
|
Qian J, Chiou SH, Maye JE, Atem F, Johnson KA, Betensky RA. Threshold regression to accommodate a censored covariate. Biometrics 2018; 74:1261-1270. [PMID: 29933515 DOI: 10.1111/biom.12922] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Revised: 05/01/2018] [Accepted: 05/01/2018] [Indexed: 12/01/2022]
Abstract
In several common study designs, regression modeling is complicated by the presence of censored covariates. Examples of such covariates include maternal age of onset of dementia that may be right censored in an Alzheimer's amyloid imaging study of healthy subjects, metabolite measurements that are subject to limit of detection censoring in a case-control study of cardiovascular disease, and progressive biomarkers whose baseline values are of interest, but are measured post-baseline in longitudinal neuropsychological studies of Alzheimer's disease. We propose threshold regression approaches for linear regression models with a covariate that is subject to random censoring. Threshold regression methods allow for immediate testing of the significance of the effect of a censored covariate. In addition, they provide for unbiased estimation of the regression coefficient of the censored covariate. We derive the asymptotic properties of the resulting estimators under mild regularity conditions. Simulations demonstrate that the proposed estimators have good finite-sample performance, and often offer improved efficiency over existing methods. We also derive a principled method for selection of the threshold. We illustrate the approach in application to an Alzheimer's disease study that investigated brain amyloid levels in older individuals, as measured through positron emission tomography scans, as a function of maternal age of dementia onset, with adjustment for other covariates. We have developed an R package, censCov, for implementation of our method, available at CRAN.
Collapse
Affiliation(s)
- Jing Qian
- Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, Massachusetts, U.S.A
| | - Sy Han Chiou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, U.S.A
| | - Jacqueline E Maye
- Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, U.S.A.,Department of Clinical and Health Psychology, University of Florida, Gainesville, Florida, U.S.A
| | - Folefac Atem
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston, Houston, Texas, U.S.A
| | - Keith A Johnson
- Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, U.S.A
| | - Rebecca A Betensky
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, U.S.A
| |
Collapse
|
14
|
Ding Y, Kong S, Kang S, Chen W. A semiparametric imputation approach for regression with censored covariate with application to an AMD progression study. Stat Med 2018; 37:3293-3308. [PMID: 29845616 DOI: 10.1002/sim.7816] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Revised: 04/11/2018] [Accepted: 04/23/2018] [Indexed: 11/09/2022]
Abstract
This research is motivated by studying the progression of age-related macular degeneration where both a covariate and the response variable are subject to censoring. We develop a general framework to handle regression with censored covariate where the response can be different types and the censoring can be random or subject to (constant) detection limits. Multiple imputation is a popular technique to handle missing data that requires compatibility between the imputation model and the substantive model to obtain valid estimates. With censored covariate, we propose a novel multiple imputation-based approach, namely, the semiparametric two-step importance sampling imputation (STISI) method, to impute the censored covariate. Specifically, STISI imputes the missing covariate from a semiparametric accelerated failure time model conditional on fully observed covariates (Step 1) with the acceptance probability derived from the substantive model (Step 2). The 2-step procedure automatically ensures compatibility and takes full advantage of the relaxed semiparametric assumption in the imputation. Extensive simulations demonstrate that the STISI method yields valid estimates in all scenarios and outperforms some existing methods that are commonly used in practice. We apply STISI on data from the Age-related Eye Disease Study, to investigate the association between the progression time of the less severe eye and that of the more severe eye. We also illustrate the method by analyzing the urine arsenic data for patients from National Health and Nutrition Examination Survey (2003-2004) where the response is binary and 1 covariate is subject to detection limit.
Collapse
Affiliation(s)
- Ying Ding
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shengchun Kong
- Biometrics Department, Gilead Science Inc., Foster City, CA, USA
| | - Shan Kang
- Ad Technologies, A9.com Inc., Palo Alto, CA, USA
| | - Wei Chen
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
15
|
Ahn S, Lim J, Paik MC, Sacco RL, Elkind MS. Cox model with interval-censored covariate in cohort studies. Biom J 2018; 60:797-814. [PMID: 29775990 DOI: 10.1002/bimj.201700090] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 12/19/2017] [Accepted: 02/27/2018] [Indexed: 11/07/2022]
Abstract
In cohort studies the outcome is often time to a particular event, and subjects are followed at regular intervals. Periodic visits may also monitor a secondary irreversible event influencing the event of primary interest, and a significant proportion of subjects develop the secondary event over the period of follow-up. The status of the secondary event serves as a time-varying covariate, but is recorded only at the times of the scheduled visits, generating incomplete time-varying covariates. While information on a typical time-varying covariate is missing for entire follow-up period except the visiting times, the status of the secondary event are unavailable only between visits where the status has changed, thus interval-censored. One may view interval-censored covariate of the secondary event status as missing time-varying covariates, yet missingness is partial since partial information is provided throughout the follow-up period. Current practice of using the latest observed status produces biased estimators, and the existing missing covariate techniques cannot accommodate the special feature of missingness due to interval censoring. To handle interval-censored covariates in the Cox proportional hazards model, we propose an available-data estimator, a doubly robust-type estimator as well as the maximum likelihood estimator via EM algorithm and present their asymptotic properties. We also present practical approaches that are valid. We demonstrate the proposed methods using our motivating example from the Northern Manhattan Study.
Collapse
Affiliation(s)
- Soohyun Ahn
- Department of Mathematics, Ajou University, Suwon, Korea
| | - Johan Lim
- Department of Statistics, Seoul National University, Seoul, Korea
| | | | - Ralph L Sacco
- Department of Neurology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | | |
Collapse
|
16
|
A multiple imputation method based on weighted quantile regression models for longitudinal censored biomarker data with missing values at early visits. BMC Med Res Methodol 2018; 18:8. [PMID: 29325529 PMCID: PMC5765696 DOI: 10.1186/s12874-017-0463-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Accepted: 12/18/2017] [Indexed: 11/17/2022] Open
Abstract
Background In patient-based studies, biomarker data are often subject to left censoring due to the detection limits, or to incomplete sample or data collection. In the context of longitudinal regression analysis, inappropriate handling of these issues could lead to biased parameter estimates. We developed a specific multiple imputation (MI) strategy based on weighted censored quantile regression (CQR) that not only accounts for censoring, but also missing data at early visits when longitudinal biomarker data are modeled as a covariate. Methods We assessed through simulation studies the performances of developed imputation approach by considering various scenarios of covariance structures of longitudinal data and levels of censoring. We also illustrated the application of the proposed method to the Prospective Study of Outcomes in Ankylosing spondylitis (AS) (PSOAS) data to address the issues of censored or missing C-reactive protein (CRP) level at early visits for a group of patients. Results Our findings from simulation studies indicated that the proposed method performs better than other MI methods by having a higher relative efficiency. We also found that our approach is not sensitive to the choice of covariance structure as compared to other methods that assume normality of biomarker data. The analysis results of PSOAS data from the imputed CRP levels based on our method suggested that higher CRP is significantly associated with radiographic damage, while those from other methods did not result in a significant association. Conclusion The MI based on weighted CQR offers a more valid statistical approach to evaluate a biomarker of disease in the presence of both issues with censoring and missing data in early visits.
Collapse
|
17
|
Choi BY, Fine JP, Brookhart MA. On two-stage estimation of structural instrumental variable models. Biometrika 2017; 104:881-899. [PMID: 29430042 DOI: 10.1093/biomet/asx056] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Indexed: 11/14/2022] Open
Abstract
Two-stage least squares estimation is popular for structural equation models with unmeasured confounders. In such models, both the outcome and the exposure are assumed to follow linear models conditional on the measured confounders and instrumental variable, which is related to the outcome only via its relation with the exposure. We consider data where both the outcome and the exposure may be incompletely observed, with particular attention to the case where both are censored event times. A general class of two-stage minimum distance estimators is proposed that separately fits linear models for the outcome and exposure and then uses a minimum distance criterion based on the reduced-form model for the outcome to estimate the regression parameters of interest. An optimal minimum distance estimator is identified which may be superior to the usual two-stage least squares estimator with fully observed data. Simulation studies demonstrate that the proposed methods perform well with realistic sample sizes. Their practical utility is illustrated in a study of the comparative effectiveness of colon cancer treatments, where the effect of chemotherapy on censored survival times may be confounded with patient status.
Collapse
Affiliation(s)
- Byeong Yeob Choi
- Department of Epidemiology and Biostatistics, University of Texas Health Science Center, 7703 Floyd Curl Drive, San Antonio, Texas 78229,
| | - Jason P Fine
- Department of Biostatistics, University of North Carolina, 3103B McGavran-Greenberg Hall, Chapel Hill, North Carolina 27599,
| | - M Alan Brookhart
- Department of Epidemiology, University of North Carolina, 2105F McGavran-Greenberg Hall, Chapel Hill, North Carolina 27599,
| |
Collapse
|
18
|
Atem FD, Sampene E, Greene TJ. Improved conditional imputation for linear regression with a randomly censored predictor. Stat Methods Med Res 2017; 28:432-444. [PMID: 28830304 DOI: 10.1177/0962280217727033] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This article describes a nonparametric conditional imputation analytic method for randomly censored covariates in linear regression. While some existing methods make assumptions about the distribution of covariates or underestimate standard error due to lack of imputation error, the proposed approach is distribution-free and utilizes resampling to correct for variance underestimation. The performance of the novel method is assessed using simulations, and results are contrasted with methods currently used for a limit of detection censored design, including the complete case approach and other nonparametric approaches. Theoretical justifications for the proposed method are provided, and its application is demonstrated through a study of association between lipoprotein cholesterol in offspring and parental history of cardiovascular disease.
Collapse
|
19
|
Wang J, Shete S. Estimation of indirect effect when the mediator is a censored variable. Stat Methods Med Res 2017; 27:3010-3025. [PMID: 28132585 DOI: 10.1177/0962280217690414] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
A mediation model explores the direct and indirect effects of an initial variable ( X) on an outcome variable ( Y) by including a mediator ( M). In many realistic scenarios, investigators observe censored data instead of the complete data. Current research in mediation analysis for censored data focuses mainly on censored outcomes, but not censored mediators. In this study, we proposed a strategy based on the accelerated failure time model and a multiple imputation approach. We adapted a measure of the indirect effect for the mediation model with a censored mediator, which can assess the indirect effect at both the group and individual levels. Based on simulation, we established the bias in the estimations of different paths (i.e. the effects of X on M [ a], of M on Y [ b] and of X on Y given mediator M [ c']) and indirect effects when analyzing the data using the existing approaches, including a naïve approach implemented in software such as Mplus, complete-case analysis, and the Tobit mediation model. We conducted simulation studies to investigate the performance of the proposed strategy compared to that of the existing approaches. The proposed strategy accurately estimates the coefficients of different paths, indirect effects and percentages of the total effects mediated. We applied these mediation approaches to the study of SNPs, age at menopause and fasting glucose levels. Our results indicate that there is no indirect effect of association between SNPs and fasting glucose level that is mediated through the age at menopause.
Collapse
Affiliation(s)
- Jian Wang
- 1 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, USA
| | - Sanjay Shete
- 1 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, USA.,2 Department of Epidemiology, The University of Texas MD Anderson Cancer Center, USA
| |
Collapse
|
20
|
Wu Y, Yin G. Multiple imputation for cure rate quantile regression with censored data. Biometrics 2016; 73:94-103. [PMID: 27479513 DOI: 10.1111/biom.12574] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Revised: 06/01/2016] [Accepted: 06/01/2016] [Indexed: 11/30/2022]
Abstract
The main challenge in the context of cure rate analysis is that one never knows whether censored subjects are cured or uncured, or whether they are susceptible or insusceptible to the event of interest. Considering the susceptible indicator as missing data, we propose a multiple imputation approach to cure rate quantile regression for censored data with a survival fraction. We develop an iterative algorithm to estimate the conditionally uncured probability for each subject. By utilizing this estimated probability and Bernoulli sample imputation, we can classify each subject as cured or uncured, and then employ the locally weighted method to estimate the quantile regression coefficients with only the uncured subjects. Repeating the imputation procedure multiple times and taking an average over the resultant estimators, we obtain consistent estimators for the quantile regression coefficients. Our approach relaxes the usual global linearity assumption, so that we can apply quantile regression to any particular quantile of interest. We establish asymptotic properties for the proposed estimators, including both consistency and asymptotic normality. We conduct simulation studies to assess the finite-sample performance of the proposed multiple imputation method and apply it to a lung cancer study as an illustration.
Collapse
Affiliation(s)
- Yuanshan Wu
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei 430072, China
| | - Guosheng Yin
- Department of Statistics and Actuarial Science, University of Hong Kong, Pokfulam Road, Hong Kong
| |
Collapse
|
21
|
Atem FD, Qian J, Maye JE, Johnson KA, Betensky RA. Linear Regression with a Randomly Censored Covariate: Application to an Alzheimer's Study. J R Stat Soc Ser C Appl Stat 2016; 66:313-328. [PMID: 28239197 DOI: 10.1111/rssc.12164] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The association between maternal age of onset of dementia and amyloid deposition (measured by in vivo positron emission tomography (PET) imaging) in cognitively normal older offspring is of interest. In a regression model for amyloid, special methods are required due to the random right censoring of the covariate of maternal age of onset of dementia. Prior literature has proposed methods to address the problem of censoring due to assay limit of detection, but not random censoring. We propose imputation methods and a survival regression method that do not require parametric assumptions about the distribution of the censored covariate. Existing imputation methods address missing covariates, but not right censored covariates. In simulation studies, we compare these methods to the simple, but inefficient complete case analysis, and to thresholding approaches. We apply the methods to the Alzheimer's study.
Collapse
Affiliation(s)
| | - Jing Qian
- University of Massachusetts, Amherst, USA
| | | | | | | |
Collapse
|
22
|
Yue YR, Wang XF. Bayesian inference for generalized linear mixed models with predictors subject to detection limits: an approach that leverages information from auxiliary variables. Stat Med 2015; 35:1689-705. [PMID: 26643287 DOI: 10.1002/sim.6830] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 11/08/2015] [Indexed: 11/05/2022]
Abstract
This paper is motivated from a retrospective study of the impact of vitamin D deficiency on the clinical outcomes for critically ill patients in multi-center critical care units. The primary predictors of interest, vitamin D2 and D3 levels, are censored at a known detection limit. Within the context of generalized linear mixed models, we investigate statistical methods to handle multiple censored predictors in the presence of auxiliary variables. A Bayesian joint modeling approach is proposed to fit the complex heterogeneous multi-center data, in which the data information is fully used to estimate parameters of interest. Efficient Monte Carlo Markov chain algorithms are specifically developed depending on the nature of the response. Simulation studies demonstrate the outperformance of the proposed Bayesian approach over other existing methods. An application to the data set from the vitamin D deficiency study is presented. Possible extensions of the method regarding the absence of auxiliary variables, semiparametric models, as well as the type of censoring are also discussed.
Collapse
Affiliation(s)
- Yu Ryan Yue
- Department of Statistics and CIS, Zicklin School of Business, Baruch College, The City University of New York, New York, NY, U.S.A
| | - Xiao-Feng Wang
- Department of Quantitative Health Sciences / Biostatistics Section, Cleveland Clinic Lerner Research Institute, Cleveland, OH, U.S.A
| |
Collapse
|
23
|
Sattar A, Sinha SK, Wang XF, Li Y. Frailty models for pneumonia to death with a left-censored covariate. Stat Med 2015; 34:2266-80. [PMID: 25728821 DOI: 10.1002/sim.6466] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Revised: 02/02/2015] [Accepted: 02/11/2015] [Indexed: 11/08/2022]
Abstract
Frailty models are multiplicative hazard models for studying association between survival time and important clinical covariates. When some values of a clinical covariate are unobserved but known to be below a threshold called the limit of detection (LOD), naive approaches ignoring this problem, such as replacing the undetected value by the LOD or half of the LOD, often produce biased parameter estimate with larger mean squared error of the estimate. To address the LOD problem in a frailty model, we propose a flexible smooth nonparametric density estimator along with Simpson's numerical integration technique. This is an extension of an existing method in the likelihood framework for the estimation and inference of the model parameters. The proposed new method shows the estimators are asymptotically unbiased and gives smaller mean squared error of the estimates. Compared with the existing method, the proposed new method does not require distributional assumptions for the underlying covariates. Simulation studies were conducted to evaluate the performance of the new method in realistic scenarios. We illustrate the use of the proposed method with a data set from Genetic and Inflammatory Markers of Sepsis study in which interlekuin-10 was subject to LOD.
Collapse
Affiliation(s)
- Abdus Sattar
- Department of Epidemiology & Biostatistics, Case Western Reserve University, Cleveland, OH, U.S.A
| | - Sanjoy K Sinha
- School of Mathematics and Statistics, Carleton University, Ottawa, ON, Canada
| | - Xiao-Feng Wang
- Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, OH, U.S.A
| | - Yehua Li
- Department of Statistics, Iowa State University, Ames, IA, U.S.A
| |
Collapse
|
24
|
Bernhardt PW, Wang HJ, Zhang D. Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits. STATISTICS IN BIOSCIENCES 2013; 7:68-89. [PMID: 26257836 DOI: 10.1007/s12561-013-9099-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.
Collapse
Affiliation(s)
- Paul W Bernhardt
- Department of Mathematics and Statistics, Villanova University, Villanova, USA
| | - Huixia J Wang
- Department of Statistics, North Carolina State University, Raleigh, USA
| | - Daowen Zhang
- Department of Statistics, North Carolina State University, Raleigh, USA
| |
Collapse
|