26
|
Olivari RC, Garay AM, Lachos VH, Matos LA. Mixed-effects models for censored data with autoregressive errors. J Biopharm Stat 2020; 31:273-294. [PMID: 33315523 DOI: 10.1080/10543406.2020.1852246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Mixed-effects models, with modifications to accommodate censored observations (LMEC/NLMEC), are routinely used to analyze measurements, collected irregularly over time, which are often subject to some upper and lower detection limits. This paper presents a likelihood-based approach for fitting LMEC/NLMEC models with autoregressive of order p dependence of the error term. An EM-type algorithm is developed for computing the maximum likelihood estimates, obtaining as a byproduct the standard errors of the fixed effects and the likelihood value. Moreover, the constraints on the parameter space that arise from the stationarity conditions for the autoregressive parameters in the EM algorithm are handled by a reparameterization scheme, as discussed in Lin and Lee (2007). To examine the performance of the proposed method, we present some simulation studies and analyze a real AIDS case study. The proposed algorithm and methods are implemented in the new R package ARpLMEC.
Collapse
|
27
|
Huynh TB, Groth CP, Ramachandran G, Banerjee S, Stenzel M, Quick H, Blair A, Engel LS, Kwok RK, Sandler DP, Stewart PA. Estimates of Occupational Inhalation Exposures to Six Oil-Related Compounds on the Four Rig Vessels Responding to the Deepwater Horizon Oil Spill. Ann Work Expo Health 2020; 66:i89-i110. [PMID: 33009797 PMCID: PMC8989034 DOI: 10.1093/annweh/wxaa072] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 05/27/2020] [Accepted: 06/22/2020] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND The 2010 Deepwater Horizon (DWH) oil spill involved thousands of workers and volunteers to mitigate the oil release and clean-up after the spill. Health concerns for these participants led to the initiation of a prospective epidemiological study (GuLF STUDY) to investigate potential adverse health outcomes associated with the oil spill response and clean-up (OSRC). Characterizing the chemical exposures of the OSRC workers was an essential component of the study. Workers on the four oil rig vessels mitigating the spill and located within a 1852 m (1 nautical mile) radius of the damaged wellhead [the Discoverer Enterprise (Enterprise), the Development Driller II (DDII), the Development Driller III (DDIII), and the HelixQ4000] had some of the greatest potential for chemical exposures. OBJECTIVES The aim of this paper is to characterize potential personal chemical exposures via the inhalation route for workers on those four rig vessels. Specifically, we presented our methodology and descriptive statistics of exposure estimates for total hydrocarbons (THCs), benzene, toluene, ethylbenzene, xylene, and n-hexane (BTEX-H) for various job groups to develop exposure groups for the GuLF STUDY cohort. METHODS Using descriptive information associated with the measurements taken on various jobs on these rig vessels and with job titles from study participant responses to the study questionnaire, job groups [unique job/rig/time period (TP) combinations] were developed to describe groups of workers with the same or closely related job titles. A total of 500 job groups were considered for estimation using the available 8139 personal measurements. We used a univariate Bayesian model to analyze the THC measurements and a bivariate Bayesian regression framework to jointly model the measurements of THC and each of the BTEX-H chemicals separately, both models taking into account the many measurements that were below the analytic limit of detection. RESULTS Highest THC exposures occurred in TP1a and TP1b, which was before the well was mechanically capped. The posterior medians of the arithmetic mean (AM) ranged from 0.11 ppm ('Inside/Other', TP1b, DDII; and 'Driller', TP3, DDII) to 14.67 ppm ('Methanol Operations', TP1b, Enterprise). There were statistical differences between the THC AMs by broad job groups, rigs, and time periods. The AMs for BTEX-H were generally about two to three orders of magnitude lower than the THC AMs, with benzene and ethylbenzene measurements being highly censored. CONCLUSIONS Our results add new insights to the limited literature on exposures associated with oil spill responses and support the current epidemiologic investigation of potential adverse health effects of the oil spill.
Collapse
|
28
|
Simoneau G, Moodie EEM, Nijjar JS, Platt RW. Finite sample variance estimation for optimal dynamic treatment regimes of survival outcomes. Stat Med 2020; 39:4466-4479. [PMID: 32929753 DOI: 10.1002/sim.8735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 07/18/2020] [Accepted: 07/27/2020] [Indexed: 11/06/2022]
Abstract
Deriving valid confidence intervals for complex estimators is a challenging task in practice. Estimators of dynamic weighted survival modeling (DWSurv), a method to estimate an optimal dynamic treatment regime of censored outcomes, are asymptotically normal and consistent for their target parameters when at least a subset of the nuisance models is correctly specified. However, their behavior in finite samples and the impact of model misspecification on inferences remain unclear. In addition, the estimators' nonregularity may negatively affect the inferences under some specific data generating mechanisms. Our objective was to compare five methods, two asymptotic variance formulas (adjusting or not for the estimation of nuisance parameters) to three bootstrap approaches, to construct confidence intervals for the DWSurv parameters in finite samples. Via simulations, we considered practical scenarios, for example, when some nuisance models are misspecified or when nonregularity is problematic. We also compared the five methods in an application about the treatment of rheumatoid arthritis. We found that the bootstrap approaches performed consistently well at the cost of longer computational times. The asymptotic variance with adjustments generally yielded conservative confidence intervals. The asymptotic variance without adjustments yielded nominal coverages for large sample sizes. We recommend using the asymptotic variance with adjustments in small samples and the bootstrap if computationally feasible. Caution should be taken when nonregularity may be an issue.
Collapse
|
29
|
Loumponias K, Tsaklidis G. Kalman filtering with censored measurements. J Appl Stat 2020; 49:317-335. [PMID: 35707209 PMCID: PMC9196092 DOI: 10.1080/02664763.2020.1810645] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 08/09/2020] [Indexed: 10/23/2022]
Abstract
This paper concerns Kalman filtering when the measurements of the process are censored. The censored measurements are addressed by the Tobit model of Type I and are one-dimensional with two censoring limits, while the (hidden) state vectors are multidimensional. For this model, Bayesian estimates for the state vectors are provided through a recursive algorithm of Kalman filtering type. Experiments are presented to illustrate the effectiveness and applicability of the algorithm. The experiments show that the proposed method outperforms other filtering methodologies in minimizing the computational cost as well as the overall Root Mean Square Error (RMSE) for synthetic and real data sets.
Collapse
|
30
|
Arfè A, Alexander B, Trippa L. Optimality of testing procedures for survival data in the nonproportional hazards setting. Biometrics 2020; 77:587-598. [PMID: 32535892 DOI: 10.1111/biom.13315] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 05/25/2020] [Accepted: 05/27/2020] [Indexed: 02/06/2023]
Abstract
Most statistical tests for treatment effects used in randomized clinical trials with survival outcomes are based on the proportional hazards assumption, which often fails in practice. Data from early exploratory studies may provide evidence of nonproportional hazards, which can guide the choice of alternative tests in the design of practice-changing confirmatory trials. We developed a test to detect treatment effects in a late-stage trial, which accounts for the deviations from proportional hazards suggested by early-stage data. Conditional on early-stage data, among all tests that control the frequentist Type I error rate at a fixed α level, our testing procedure maximizes the Bayesian predictive probability that the study will demonstrate the efficacy of the experimental treatment. Hence, the proposed test provides a useful benchmark for other tests commonly used in the presence of nonproportional hazards, for example, weighted log-rank tests. We illustrate this approach in simulations based on data from a published cancer immunotherapy phase III trial.
Collapse
|
31
|
Simoneau G, Moodie EEM, Azoulay L, Platt RW. Adaptive Treatment Strategies With Survival Outcomes: An Application to the Treatment of Type 2 Diabetes Using a Large Observational Database. Am J Epidemiol 2020; 189:461-469. [PMID: 31903490 DOI: 10.1093/aje/kwz272] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 12/10/2019] [Accepted: 12/10/2019] [Indexed: 01/16/2023] Open
Abstract
Sequences of treatments that adapt to a patient's changing condition over time are often needed for the management of chronic diseases. An adaptive treatment strategy (ATS) consists of personalized treatment rules to be applied through the course of a disease that input the patient's characteristics at the time of decision-making and output a recommended treatment. An optimal ATS is the sequence of tailored treatments that yields the best clinical outcome for patients sharing similar characteristics. Methods for estimating optimal adaptive treatment strategies, which must disentangle short- and long-term treatment effects, can be theoretically involved and hard to explain to clinicians, especially when the outcome to be optimized is a survival time subject to right-censoring. In this paper, we describe dynamic weighted survival modeling, a method for estimating an optimal ATS with survival outcomes. Using data from the Clinical Practice Research Datalink, a large primary-care database, we illustrate how it can answer an important clinical question about the treatment of type 2 diabetes. We identify an ATS pertaining to which drug add-ons to recommend when metformin in monotherapy does not achieve the therapeutic goals.
Collapse
|
32
|
Zhao YQ, Zhu R, Chen G, Zheng Y. Constructing dynamic treatment regimes with shared parameters for censored data. Stat Med 2020; 39:1250-1263. [PMID: 31951041 PMCID: PMC7305816 DOI: 10.1002/sim.8473] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 10/31/2019] [Accepted: 12/16/2019] [Indexed: 01/28/2023]
Abstract
Dynamic treatment regimes are sequential decision rules that adapt throughout disease progression according to a patient's evolving characteristics. In many clinical applications, it is desirable that the format of the decision rules remains consistent over time. Unlike the estimation of dynamic treatment regimes in regular settings, where decision rules are formed without shared parameters, the derivation of the shared decision rules requires estimating shared parameters indexing the decision rules across different decision points. Estimation of such rules becomes more complicated when the clinical outcome of interest is a survival time subject to censoring. To address these challenges, we propose two novel methods: censored shared-Q-learning and censored shared-O-learning. Both methods incorporate clinical preferences into a qualitative rule, where the parameters indexing the decision rules are shared across different decision points and estimated simultaneously. We use simulation studies to demonstrate the superior performance of the proposed methods. The methods are further applied to the Framingham Heart Study to derive treatment rules for cardiovascular disease.
Collapse
|
33
|
Baharith LA, AL-Beladi KM, Klakattawi HS. The Odds Exponential-Pareto IV Distribution: Regression Model and Application. ENTROPY 2020; 22:e22050497. [PMID: 33286270 PMCID: PMC7516982 DOI: 10.3390/e22050497] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 04/15/2020] [Accepted: 04/23/2020] [Indexed: 11/16/2022]
Abstract
This article introduces the odds exponential-Pareto IV distribution, which belongs to the odds family of distributions. We studied the statistical properties of this new distribution. The odds exponential-Pareto IV distribution provided decreasing, increasing, and upside-down hazard functions. We employed the maximum likelihood method to estimate the distribution parameters. The estimators performance was assessed by conducting simulation studies. A new log location-scale regression model based on the odds exponential-Pareto IV distribution was also introduced. Parameter estimates of the proposed model were obtained using both maximum likelihood and jackknife methods for right-censored data. Real data sets were analyzed under the odds exponential-Pareto IV distribution and log odds exponential-Pareto IV regression model to show their flexibility and potentiality.
Collapse
|
34
|
Wang X, Zhong Y, Mukhopadhyay P, Schaubel DE. Computationally efficient inference for center effects based on restricted mean survival time. Stat Med 2019; 38:5133-5145. [PMID: 31502288 DOI: 10.1002/sim.8356] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 06/04/2019] [Accepted: 07/26/2019] [Indexed: 11/06/2022]
Abstract
Restricted mean survival time (RMST) has gained increased attention in biostatistical and clinical studies. Directly modeling RMST (as opposed to modeling then transforming the hazard function) is appealing computationally and in terms of interpreting covariate effects. We propose computationally convenient methods for evaluating center effects based on RMST. A multiplicative model for the RMST is assumed. Estimation proceeds through an algorithm analogous to stratification, which permits the evaluation of thousands of centers. We derive the asymptotic properties of the proposed estimators and evaluate finite sample performance through simulation. We demonstrate that considerable decreases in computational burden are achievable through the proposed methods, in terms of both storage requirements and run time. The methods are applied to evaluate more than 5000 US dialysis facilities using data from a national end-stage renal disease registry.
Collapse
|
35
|
Wang H, Li G. Extreme learning machine Cox model for high-dimensional survival analysis. Stat Med 2019; 38:2139-2156. [PMID: 30632193 PMCID: PMC6498851 DOI: 10.1002/sim.8090] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 10/11/2018] [Accepted: 12/12/2018] [Indexed: 11/07/2022]
Abstract
Some interesting recent studies have shown that neural network models are useful alternatives in modeling survival data when the assumptions of a classical parametric or semiparametric survival model such as the Cox (1972) model are seriously violated. However, to the best of our knowledge, the plausibility of adapting the emerging extreme learning machine (ELM) algorithm for single-hidden-layer feedforward neural networks to survival analysis has not been explored. In this paper, we present a kernel ELM Cox model regularized by an L0 -based broken adaptive ridge (BAR) penalization method. Then, we demonstrate that the resulting method, referred to as ELMCoxBAR, can outperform some other state-of-art survival prediction methods such as L1 - or L2 -regularized Cox regression, random survival forest with various splitting rules, and boosted Cox model, in terms of its predictive performance using both simulated and real world datasets. In addition to its good predictive performance, we illustrate that the proposed method has a key computational advantage over the above competing methods in terms of computation time efficiency using an a real-world ultra-high-dimensional survival data.
Collapse
|
36
|
Arboretti R, Bathke AC, Carrozzo E, Pesarin F, Salmaso L. Multivariate permutation tests for two sample testing in presence of nondetects with application to microarray data. Stat Methods Med Res 2019; 29:258-271. [PMID: 30799774 DOI: 10.1177/0962280219832225] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Very often, data collected in medical research are characterized by censored observations and/or data with mass on the value zero. This happens for example when some measurements fall below the detection limits of the specific instrument used. This type of left censored observations is called "nondetects". Such a situation of an excessive number of zeros in a data set is also referred to as zero-inflated data. In the present work, we aim at comparing different multivariate permutation procedures in two-sample testing for data with nondetects. The effect of censoring is investigated with regard to the different values that may be attributed to nondetected values, both under the null hypothesis and under alternative. We motivate the problem using data from allergy research.
Collapse
|
37
|
Lachos VH, A Matos L, Castro LM, Chen MH. Flexible longitudinal linear mixed models for multiple censored responses data. Stat Med 2018; 38:1074-1102. [PMID: 30421470 DOI: 10.1002/sim.8017] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Revised: 09/27/2018] [Accepted: 10/01/2018] [Indexed: 11/06/2022]
Abstract
In biomedical studies and clinical trials, repeated measures are often subject to some upper and/or lower limits of detection. Hence, the responses are either left or right censored. A complication arises when more than one series of responses is repeatedly collected on each subject at irregular intervals over a period of time and the data exhibit tails heavier than the normal distribution. The multivariate censored linear mixed effect (MLMEC) model is a frequently used tool for a joint analysis of more than one series of longitudinal data. In this context, we develop a robust generalization of the MLMEC based on the scale mixtures of normal distributions. To take into account the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is considered. For this complex longitudinal structure, we propose an exact estimation procedure to obtain the maximum-likelihood estimates of the fixed effects and variance components using a stochastic approximation of the EM algorithm. This approach allows us to estimate the parameters of interest easily and quickly as well as to obtain the standard errors of the fixed effects, the predictions of unobservable values of the responses, and the log-likelihood function as a byproduct. The proposed method is applied to analyze a set of AIDS data and is examined via a simulation study.
Collapse
|
38
|
Chik AHS, Schmidt PJ, Emelko MB. Learning Something From Nothing: The Critical Importance of Rethinking Microbial Non-detects. Front Microbiol 2018; 9:2304. [PMID: 30344512 PMCID: PMC6182096 DOI: 10.3389/fmicb.2018.02304] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2018] [Accepted: 09/10/2018] [Indexed: 11/18/2022] Open
Abstract
Accurate estimation of microbial concentrations is necessary to inform many important environmental science and public health decisions and regulations. Critically, widespread misconceptions about laboratory-reported microbial non-detects have led to their erroneous description and handling as "censored" values. This ultimately compromises their interpretation and undermines efforts to describe and model microbial concentrations accurately. Herein, these misconceptions are dispelled by (1) discussing the critical differences between discrete microbial observations and continuous data acquired using analytical chemistry methodologies and (2) demonstrating the bias introduced by statistical approaches tailored for chemistry data and misapplied to discrete microbial data. Notably, these approaches especially preclude the accurate representation of low concentrations and those estimated using microbial methods with low or variable analytical recovery, which can be expected to result in non-detects. Techniques that account for the probabilistic relationship between observed data and underlying microbial concentrations have been widely demonstrated, and their necessity for handling non-detects (in a way which is consistent with the handling of positive observations) is underscored herein. Habitual reporting of raw microbial observations and sample sizes is proposed to facilitate accurate estimation and analysis of microbial concentrations.
Collapse
|
39
|
Szarka AZ, Hayworth CG, Ramanarayanan TS, Joseph RSI. Statistical Techniques to Analyze Pesticide Data Program Food Residue Observations. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2018; 66:7165-7171. [PMID: 29902006 DOI: 10.1021/acs.jafc.8b00863] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The U.S. EPA conducts dietary-risk assessments to ensure that levels of pesticides on food in the U.S. food supply are safe. Often these assessments utilize conservative residue estimates, maximum residue levels (MRLs), and a high-end estimate derived from registrant-generated field-trial data sets. A more realistic estimate of consumers' pesticide exposure from food may be obtained by utilizing residues from food-monitoring programs, such as the Pesticide Data Program (PDP) of the U.S. Department of Agriculture. A substantial portion of food-residue concentrations in PDP monitoring programs are below the limits of detection (left-censored), which makes the comparison of regulatory-field-trial and PDP residue levels difficult. In this paper, we present a novel adaption of established statistical techniques, the Kaplan-Meier estimator (K-M), the robust regression on ordered statistic (ROS), and the maximum-likelihood estimator (MLE), to quantify the pesticide-residue concentrations in the presence of heavily censored data sets. The examined statistical approaches include the most commonly used parametric and nonparametric methods for handling left-censored data that have been used in the fields of medical and environmental sciences. This work presents a case study in which data of thiamethoxam residue on bell pepper generated from registrant field trials were compared with PDP-monitoring residue values. The results from the statistical techniques were evaluated and compared with commonly used simple substitution methods for the determination of summary statistics. It was found that the maximum-likelihood estimator (MLE) is the most appropriate statistical method to analyze this residue data set. Using the MLE technique, the data analyses showed that the median and mean PDP bell pepper residue levels were approximately 19 and 7 times lower, respectively, than the corresponding statistics of the field-trial residues.
Collapse
|
40
|
Orbe J, Virto J. Penalized spline smoothing using Kaplan-Meier weights with censored data. Biom J 2018; 60:947-961. [PMID: 29943440 DOI: 10.1002/bimj.201700213] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Revised: 05/29/2018] [Accepted: 05/30/2018] [Indexed: 11/11/2022]
Abstract
In this paper, we consider the problem of nonparametric curve fitting in the specific context of censored data. We propose an extension of the penalized splines approach using Kaplan-Meier weights to take into account the effect of censorship and generalized cross-validation techniques to choose the smoothing parameter adapted to the case of censored samples. Using various simulation studies, we analyze the effectiveness of the censored penalized splines method proposed and show that the performance is quite satisfactory. We have extended this proposal to a generalized additive models (GAM) framework introducing a correction of the censorship effect, thus enabling more complex models to be estimated immediately. A real dataset from Stanford Heart Transplant data is also used to illustrate the methodology proposed, which is shown to be a good alternative when the probability distribution for the response variable and the functional form are not known in censored regression models.
Collapse
|
41
|
Lin TI, Lachos VH, Wang WL. Multivariate longitudinal data analysis with censored and intermittent missing responses. Stat Med 2018; 37:2822-2835. [PMID: 29740829 DOI: 10.1002/sim.7692] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 03/31/2018] [Accepted: 04/02/2018] [Indexed: 11/08/2022]
Abstract
The multivariate linear mixed model (MLMM) has emerged as an important analytical tool for longitudinal data with multiple outcomes. However, the analysis of multivariate longitudinal data could be complicated by the presence of censored measurements because of a detection limit of the assay in combination with unavoidable missing values arising when subjects miss some of their scheduled visits intermittently. This paper presents a generalization of the MLMM approach, called the MLMM-CM, for a joint analysis of the multivariate longitudinal data with censored and intermittent missing responses. A computationally feasible expectation maximization-based procedure is developed to carry out maximum likelihood estimation within the MLMM-CM framework. Moreover, the asymptotic standard errors of fixed effects are explicitly obtained via the information-based method. We illustrate our methodology by using simulated data and a case study from an AIDS clinical trial. Experimental results reveal that the proposed method is able to provide more satisfactory performance as compared with the traditional MLMM approach.
Collapse
|
42
|
Abstract
In modeling censored data, survival forest models are a competitive nonparametric alternative to traditional parametric or semiparametric models when the function forms are possibly misspecified or the underlying assumptions are violated. In this work, we propose a survival forest approach with trees constructed using a novel pseudo R2 splitting rules. By studying the well-known benchmark data sets, we find that the proposed model generally outperforms popular survival models such as random survival forest with different splitting rules, Cox proportional hazard model, and generalized boosted model in terms of C-index metric.
Collapse
|
43
|
Khan MHR. On the performance of adaptive preprocessing technique in analyzing high-dimensional censored data. Biom J 2018; 60:687-702. [PMID: 29603360 DOI: 10.1002/bimj.201600256] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Revised: 09/05/2017] [Accepted: 10/20/2017] [Indexed: 11/09/2022]
Abstract
Preprocessing for high-dimensional censored datasets, such as the microarray data, is generally considered as an important technique to gain further stability by reducing potential noise from the data. When variable selection including inference is carried out with high-dimensional censored data the objective is to obtain a smaller subset of variables and then perform the inferential analysis using model estimates based on the selected subset of variables. This two stage inferential analysis is prone to circularity bias because of the noise that might still remain in the dataset. In this work, I propose an adaptive preprocessing technique that uses sure independence screening (SIS) idea to accomplish variable selection and reduces the circularity bias by some popularly known refined high-dimensional methods such as the elastic net, adaptive elastic net, weighted elastic net, elastic net-AFT, and two greedy variable selection methods known as TCS, PC-simple all implemented with the accelerated lifetime models. The proposed technique addresses several features including the issue of collinearity between important and some unimportant covariates, which is often the case in high-dimensional setting under variable selection framework, and different level of censoring. Simulation studies along with an empirical analysis with a real microarray data, mantle cell lymphoma, is carried out to demonstrate the performance of the adaptive pre-processing technique.
Collapse
|
44
|
Li X, Xie S, Zeng D, Wang Y. Efficient ℓ 0 -norm feature selection based on augmented and penalized minimization. Stat Med 2018; 37:473-486. [PMID: 29082539 PMCID: PMC5768461 DOI: 10.1002/sim.7526] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Revised: 07/04/2017] [Accepted: 09/13/2017] [Indexed: 11/06/2022]
Abstract
Advances in high-throughput technologies in genomics and imaging yield unprecedentedly large numbers of prognostic biomarkers. To accommodate the scale of biomarkers and study their association with disease outcomes, penalized regression is often used to identify important biomarkers. The ideal variable selection procedure would search for the best subset of predictors, which is equivalent to imposing an ℓ0 -penalty on the regression coefficients. Since this optimization is a nondeterministic polynomial-time hard (NP-hard) problem that does not scale with number of biomarkers, alternative methods mostly place smooth penalties on the regression parameters, which lead to computationally feasible optimization problems. However, empirical studies and theoretical analyses show that convex approximation of ℓ0 -norm (eg, ℓ1 ) does not outperform their ℓ0 counterpart. The progress for ℓ0 -norm feature selection is relatively slower, where the main methods are greedy algorithms such as stepwise regression or orthogonal matching pursuit. Penalized regression based on regularizing ℓ0 -norm remains much less explored in the literature. In this work, inspired by the recently popular augmenting and data splitting algorithms including alternating direction method of multipliers, we propose a 2-stage procedure for ℓ0 -penalty variable selection, referred to as augmented penalized minimization-L0 (APM-L0 ). The APM-L0 targets ℓ0 -norm as closely as possible while keeping computation tractable, efficient, and simple, which is achieved by iterating between a convex regularized regression and a simple hard-thresholding estimation. The procedure can be viewed as arising from regularized optimization with truncated ℓ1 norm. Thus, we propose to treat regularization parameter and thresholding parameter as tuning parameters and select based on cross-validation. A 1-step coordinate descent algorithm is used in the first stage to significantly improve computational efficiency. Through extensive simulation studies and real data application, we demonstrate superior performance of the proposed method in terms of selection accuracy and computational speed as compared to existing methods. The proposed APM-L0 procedure is implemented in the R-package APML0.
Collapse
|
45
|
Jaspers S, Komárek A, Aerts M. Bayesian estimation of multivariate normal mixtures with covariate-dependent mixing weights, with an application in antimicrobial resistance monitoring. Biom J 2018; 60:7-19. [PMID: 28898442 DOI: 10.1002/bimj.201600253] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Revised: 07/04/2017] [Accepted: 07/07/2017] [Indexed: 11/05/2022]
Abstract
Bacteria with a reduced susceptibility against antimicrobials pose a major threat to public health. Therefore, large programs have been set up to collect minimum inhibition concentration (MIC) values. These values can be used to monitor the distribution of the nonsusceptible isolates in the general population. Data are collected within several countries and over a number of years. In addition, the sampled bacterial isolates were not tested for susceptibility against one antimicrobial, but rather against an entire range of substances. Interest is therefore in the analysis of the joint distribution of MIC data on two or more antimicrobials, while accounting for a possible effect of covariates. In this regard, we present a Bayesian semiparametric density estimation routine, based on multivariate Gaussian mixtures. The mixing weights are allowed to depend on certain covariates, thereby allowing the user to detect certain changes over, for example, time. The new approach was applied to data collected in Europe in 2010, 2012, and 2013. We investigated the susceptibility of Escherichia coli isolates against ampicillin and trimethoprim, where we found that there seems to be a significant increase in the proportion of nonsusceptible isolates. In addition, a simulation study was carried out, showing the promising behavior of the proposed method in the field of antimicrobial resistance.
Collapse
|
46
|
Fang EX, Ning Y, Liu H. Testing and Confidence Intervals for High Dimensional Proportional Hazards Model. J R Stat Soc Series B Stat Methodol 2017; 79:1415-1437. [PMID: 37854943 PMCID: PMC10584375 DOI: 10.1111/rssb.12224] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
This paper proposes a decorrelation-based approach to test hypotheses and construct confidence intervals for the low dimensional component of high dimensional proportional hazards models. Motivated by the geometric projection principle, we propose new decorrelated score, Wald and partial likelihood ratio statistics. Without assuming model selection consistency, we prove the asymptotic normality of these test statistics, establish their semiparametric optimality. We also develop new procedures for constructing pointwise confidence intervals for the baseline hazard function and baseline survival function. Thorough numerical results are provided to back up our theory.
Collapse
|
47
|
Yousefi A, Dougherty DD, Eskandar EN, Widge AS, Eden UT. Estimating Dynamic Signals From Trial Data With Censored Values. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2017; 1:58-81. [PMID: 29601047 PMCID: PMC5774187 DOI: 10.1162/cpsy_a_00003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 04/05/2017] [Indexed: 11/12/2022]
Abstract
Censored data occur commonly in trial-structured behavioral experiments and many other forms of longitudinal data. They can lead to severe bias and reduction of statistical power in subsequent analyses. Principled approaches for dealing with censored data, such as data imputation and methods based on the complete data's likelihood, work well for estimating fixed features of statistical models but have not been extended to dynamic measures, such as serial estimates of an underlying latent variable over time. Here we propose an approach to the censored-data problem for dynamic behavioral signals. We developed a state-space modeling framework with a censored observation process at the trial timescale. We then developed a filter algorithm to compute the posterior distribution of the state process using the available data. We showed that special cases of this framework can incorporate the three most common approaches to censored observations: ignoring trials with censored data, imputing the censored data values, or using the full information available in the data likelihood. Finally, we derived a computationally efficient approximate Gaussian filter that is similar in structure to a Kalman filter, but that efficiently accounts for censored data. We compared the performances of these methods in a simulation study and provide recommendations of approaches to use, based on the expected amount of censored data in an experiment. These new techniques can broadly be applied in many research domains in which censored data interfere with estimation, including survival analysis and other clinical trial applications.
Collapse
|
48
|
Lee AJ, Marder K, Alcalay RN, Mejia-Santana H, Orr-Urtreger A, Giladi N, Bressman S, Wang Y. Estimation of genetic risk function with covariates in the presence of missing genotypes. Stat Med 2017; 36:3533-3546. [PMID: 28656686 PMCID: PMC5583003 DOI: 10.1002/sim.7376] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 02/28/2017] [Accepted: 05/30/2017] [Indexed: 12/13/2022]
Abstract
In genetic epidemiological studies, family history data are collected on relatives of study participants and used to estimate the age-specific risk of disease for individuals who carry a causal mutation. However, a family member's genotype data may not be collected because of the high cost of in-person interview to obtain blood sample or death of a relative. Previously, efficient nonparametric genotype-specific risk estimation in censored mixture data has been proposed without considering covariates. With multiple predictive risk factors available, risk estimation requires a multivariate model to account for additional covariates that may affect disease risk simultaneously. Therefore, it is important to consider the role of covariates in genotype-specific distribution estimation using family history data. We propose an estimation method that permits more precise risk prediction by controlling for individual characteristics and incorporating interaction effects with missing genotypes in relatives, and thus, gene-gene interactions and gene-environment interactions can be handled within the framework of a single model. We examine performance of the proposed methods by simulations and apply them to estimate the age-specific cumulative risk of Parkinson's disease (PD) in carriers of the LRRK2 G2019S mutation using first-degree relatives who are at genetic risk for PD. The utility of estimated carrier risk is demonstrated through designing a future clinical trial under various assumptions. Such sample size estimation is seen in the Huntington's disease literature using the length of abnormal expansion of a CAG repeat in the HTT gene but is less common in the PD literature. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
|
49
|
Han X, Zhang Y, Shao Y. On comparing 2 correlated C indices with censored survival data. Stat Med 2017; 36:4041-4049. [PMID: 28758216 DOI: 10.1002/sim.7414] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Revised: 06/01/2017] [Accepted: 06/21/2017] [Indexed: 12/21/2022]
Abstract
As new biomarkers and risk prediction procedures are in rapid development, it is of great interest to develop valid methods for comparing predictive power of 2 biomarkers or risk score systems. Harrell C statistic has been routinely used as a global adequacy assessment of a risk score system, and the difference of 2 Harrell C statistics as a test statistic has been suggested in recent literature for comparison of predictive power of 2 biomarkers for censored outcome. In this study, we showed that such a test can have severely inflated type I errors as the difference between the 2 Harrell C statistics does not converge to zero under the null hypothesis of equal predictive power measured by concordance probabilities, as illustrated by 2 counterexamples and corresponding numerical simulations. We further investigate a necessary and sufficient condition under which the difference of 2 Harrell C statistics converges to zero under the null hypothesis.
Collapse
|
50
|
Procházka B, Kynčl J. Estimating the Baseline Incidence of a Seasonal Disease Independently of Epidemic Outbreaks. Cent Eur J Public Health 2017; 24:199-205. [PMID: 27760285 DOI: 10.21101/cejph.a4800] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 09/23/2016] [Indexed: 11/15/2022]
Abstract
In epidemiology, it is very important to estimate the baseline incidence of infectious diseases, but the available data are often subject to outliers due to epidemic outbreaks. Consequently, the estimate of the baseline incidence is biased and so is the predicted epidemic threshold which is a crucial reference indicator used to suspect and detect an epidemic outbreak. Another problem is that the "usual" incidence varies in a season dependent manner, i.e. it may not be constant throughout the year, is often periodic, and may also show a trend between years. To take account of these factors, more complicated models adjusted for outliers are used. If not adjusted for outliers, the baseline incidence estimate is biased. As a result, the epidemic threshold can be overestimated and thus can make the detection of an epidemic outbreak more difficult. Classical Serfling's model is based on the sine function with a phase shift and amplitude. Multiple approaches are applied to model the long-term and seasonal trends. Nevertheless, none of them controls for the effect of epidemic outbreaks. The present article deals with the adjustment of the data biased by epidemic outbreaks. Some models adjusted for outliers, i.e. for the effect of epidemic outbreaks, are presented. A possible option is to remove the epidemic weeks from the analysis, but consequently, in some calendar weeks, data will only be available for a small number of years. Furthermore, the detection of an epidemic outbreak by experts (epidemiologists and microbiologists) will be compared with that in various models.
Collapse
|