1
|
Wang X, Lee H, Haaland B, Kerrigan K, Puri S, Akerley W, Shen J. A matching-based machine learning approach to estimating optimal dynamic treatment regimes with time-to-event outcomes. Stat Methods Med Res 2024; 33:794-806. [PMID: 38502008 DOI: 10.1177/09622802241236954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Observational data (e.g. electronic health records) has become increasingly important in evidence-based research on dynamic treatment regimes, which tailor treatments over time to patients based on their characteristics and evolving clinical history. It is of great interest for clinicians and statisticians to identify an optimal dynamic treatment regime that can produce the best expected clinical outcome for each individual and thus maximize the treatment benefit over the population. Observational data impose various challenges for using statistical tools to estimate optimal dynamic treatment regimes. Notably, the task becomes more sophisticated when the clinical outcome of primary interest is time-to-event. Here, we propose a matching-based machine learning method to identify the optimal dynamic treatment regime with time-to-event outcomes subject to right-censoring using electronic health record data. In contrast to the established inverse probability weighting-based dynamic treatment regime methods, our proposed approach provides better protection against model misspecification and extreme weights in the context of treatment sequences, effectively addressing a prevalent challenge in the longitudinal analysis of electronic health record data. In simulations, the proposed method demonstrates robust performance across a range of scenarios. In addition, we illustrate the method with an application to estimate optimal dynamic treatment regimes for patients with advanced non-small cell lung cancer using a real-world, nationwide electronic health record database from Flatiron Health.
Collapse
Affiliation(s)
- Xuechen Wang
- Department of Population Health Sciences, Division of Biostatistics, University of Utah, Salt Lake City, UT, USA
| | - Hyejung Lee
- Department of Population Health Sciences, Division of Biostatistics, University of Utah, Salt Lake City, UT, USA
| | - Benjamin Haaland
- Department of Population Health Sciences, Division of Biostatistics, University of Utah, Salt Lake City, UT, USA
| | - Kathleen Kerrigan
- Department of Internal Medicine, Division of Oncology, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
| | - Sonam Puri
- Department of Internal Medicine, Division of Oncology, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
| | - Wallace Akerley
- Department of Internal Medicine, Division of Oncology, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
| | - Jincheng Shen
- Department of Population Health Sciences, Division of Biostatistics, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
2
|
Wang WL, Castro LM, Li HJ, Lin TI. Mixtures of t $$ t $$ factor analysers with censored responses and external covariates: An application to educational data from Peru. Br J Math Stat Psychol 2024; 77:316-336. [PMID: 38095333 DOI: 10.1111/bmsp.12329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 10/21/2023] [Accepted: 11/16/2023] [Indexed: 04/10/2024]
Abstract
Analysing data from educational tests allows governments to make decisions for improving the quality of life of individuals in a society. One of the key responsibilities of statisticians is to develop models that provide decision-makers with pertinent information about the latent process that educational tests seek to represent. Mixtures oft $$ t $$ factor analysers (MtFA) have emerged as a powerful device for model-based clustering and classification of high-dimensional data containing one or several groups of observations with fatter tails or anomalous outliers. This paper considers an extension of MtFA for robust clustering of censored data, referred to as the MtFAC model, by incorporating external covariates. The enhanced flexibility of including covariates in MtFAC enables cluster-specific multivariate regression analysis of dependent variables with censored responses arising from upper and/or lower detection limits of experimental equipment. An alternating expectation conditional maximization (AECM) algorithm is developed for maximum likelihood estimation of the proposed model. Two simulation experiments are conducted to examine the effectiveness of the techniques presented. Furthermore, the proposed methodology is applied to Peruvian data from the 2007 Early Grade Reading Assessment, and the results obtained from the analysis provide new insights regarding the reading skills of Peruvian students.
Collapse
Affiliation(s)
- Wan-Lun Wang
- Department of Statistics and Institute of Data Science, National Cheng Kung University, Tainan, Taiwan
| | - Luis M Castro
- Department of Statistics, Pontificia Universidad Católica de Chile, Santiago, Chile
- Center for the Discovery of Structures in Complex Data, Santiago, Chile
| | - Huei-Jyun Li
- Institute of Statistics, National Chung Hsing University, Taichung, Taiwan
| | - Tsung-I Lin
- Institute of Statistics, National Chung Hsing University, Taichung, Taiwan
- Department of Public Health, China Medical University, Taichung, Taiwan
| |
Collapse
|
3
|
Henderson NC, Nam K, Feng D. Nonparametric analysis of delayed treatment effects using single-crossing constraints. Biom J 2024; 66:e2200165. [PMID: 38403463 DOI: 10.1002/bimj.202200165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 07/13/2023] [Accepted: 07/26/2023] [Indexed: 02/27/2024]
Abstract
Clinical trials involving novel immuno-oncology therapies frequently exhibit survival profiles which violate the proportional hazards assumption due to a delay in treatment effect, and, in such settings, the survival curves in the two treatment arms may have a crossing before the two curves eventually separate. To flexibly model such scenarios, we describe a nonparametric approach for estimating the treatment arm-specific survival functions which constrains these two survival functions to cross at most once without making any additional assumptions about how the survival curves are related. A main advantage of our approach is that it provides an estimate of a crossing time if such a crossing exists, and, moreover, our method generates interpretable measures of treatment benefit including crossing-conditional survival probabilities and crossing-conditional estimates of restricted residual mean life. Our estimates of these measures may be used together with efficacy measures from a primary analysis to provide further insight into differences in survival across treatment arms. We demonstrate the use and effectiveness of our approach with a large simulation study and an analysis of reconstructed outcomes from a recent combination therapy trial.
Collapse
Affiliation(s)
| | - Kijoeng Nam
- BARDS, Merck & Co., Inc., North Wales, Pennsylvania, USA
| | - Dai Feng
- Data and Statistical Sciences, AbbVie Inc., North Chicago, Illinois, USA
| |
Collapse
|
4
|
Ye P, Bai S, Tang W, Feng H, Qiao X, Tu S, He H. Joint modeling approaches for censored predictors due to detection limits with applications to metabolites data. Stat Med 2024; 43:674-688. [PMID: 38043523 DOI: 10.1002/sim.9978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 09/05/2023] [Accepted: 11/21/2023] [Indexed: 12/05/2023]
Abstract
Measures of substance concentration in urine, serum or other biological matrices often have an assay limit of detection. When concentration levels fall below the limit, exact measures cannot be obtained, and thus are left censored. The problem becomes more challenging when the censored data come from heterogeneous populations consisting of exposed and non-exposed subjects. If the censored data come from non-exposed subjects, their measures are always zero and hence censored, forming a latent class governed by a distinct censoring mechanism compared with the exposed subjects. The exposed group's censored measurements are always greater than zero, but less than the detection limit. It is very often that the exposed and non-exposed subjects may have different disease traits or different relationships with outcomes of interest, so we need to disentangle the two different populations for valid inference. In this article, we aim to fill the methodological gaps in the literature by developing a novel joint modeling approach to not only address the censoring issue in predictors, but also untangle different relationships of exposed and non-exposed subjects with the outcome. Simulation studies are performed to assess the numerical performance of our proposed approach when the sample size is small to moderate. The joint modeling approach is also applied to examine associations between plasma metabolites and blood pressure in Bogalusa Heart Study, and identify new metabolites that are highly associated with blood pressure.
Collapse
Affiliation(s)
- Peng Ye
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Shuo Bai
- Department of Epidemiology, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, USA
| | - Wan Tang
- Department of Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, USA
| | - Han Feng
- Tulane Research and Innovation for Arrhythmia Discovery- TRIAD Center, School of Medicine, Tulane University, New Orleans, Louisiana, USA
| | - Xinhua Qiao
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Shengjia Tu
- Division of Biostatistics and Bioinformatics Herbert Wertheim School of Public Health and Human Longevity Science, La Jolla, California, USA
| | - Hua He
- Department of Epidemiology, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, USA
- Department of Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, USA
| |
Collapse
|
5
|
You N, He X, Dai H, Wang X. Ball divergence for the equality test of crossing survival curves. Stat Med 2023; 42:5353-5368. [PMID: 37752757 DOI: 10.1002/sim.9914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 08/07/2023] [Accepted: 09/11/2023] [Indexed: 09/28/2023]
Abstract
It is a very common problem to test survival equality using the right-censored time-to-event data in clinical research. Although the log-rank test is popularly used in various studies, it may become insensitive when the proportional hazards assumption is violated. As follows, there have a variety of statistical methods being proposed to identify the discrepancy between crossing survival curves or hazard functions. The omnibus tests against general alternatives are usually preferred due to their wide applicability to complicated scenarios in real applications. In this paper, we propose two novel statistics to estimate the ball divergence using the right-censored survival data, and then implement them in the equality test on survival time in two independent groups. The simulation analysis demonstrates their efficiency in identifying the survival discrepancy. Compared to the existing methods, our proposed methods present higher power in situations with complex distributions, especially when there is a scale shift between groups. Real examples illustrate its advantage in practical applications.
Collapse
Affiliation(s)
- Na You
- School of Mathematics, Sun Yat-sen University, Guangdong, China
- Department of Mathematical Sciences, University of Essex, Colchester, UK
| | - Xueyi He
- School of Mathematics, Sun Yat-sen University, Guangdong, China
| | - Hongsheng Dai
- Department of Mathematical Sciences, University of Essex, Colchester, UK
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne, UK
| | - Xueqin Wang
- School of Management, University of Science and Technology of China, Anhui, China
| |
Collapse
|
6
|
Grazian C. Clustering minimal inhibitory concentration data through Bayesian mixture models: An application to detect Mycobacterium tuberculosis resistance mutations. Stat Methods Med Res 2023; 32:2423-2439. [PMID: 37920984 PMCID: PMC10710010 DOI: 10.1177/09622802231211010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2023]
Abstract
Antimicrobial resistance is becoming a major threat to public health throughout the world. Researchers are attempting to contrast it by developing both new antibiotics and patient-specific treatments. In the second case, whole-genome sequencing has had a huge impact in two ways: first, it is becoming cheaper and faster to perform whole-genome sequencing, and this makes it competitive with respect to standard phenotypic tests; second, it is possible to statistically associate the phenotypic patterns of resistance to specific mutations in the genome. Therefore, it is now possible to develop catalogues of genomic variants associated with resistance to specific antibiotics, in order to improve prediction of resistance and suggest treatments. It is essential to have robust methods for identifying mutations associated to resistance and continuously updating the available catalogues. This work proposes a general method to study minimal inhibitory concentration distributions and to identify clusters of strains showing different levels of resistance to antimicrobials. Once the clusters are identified and strains allocated to each of them, it is possible to perform regression method to identify with high statistical power the mutations associated with resistance. The method is applied to a new 96-well microtiter plate used for testing Mycobacterium tuberculosis.
Collapse
Affiliation(s)
- Clara Grazian
- School of Mathematics and Statistics, University of Sydney, NSW, Australia
- ARC Training Centre in Data Analytics for Resources and Environments (DARE), Australia
| |
Collapse
|
7
|
Franchini F, Fedyashov V, IJzerman MJ, Degeling K. Implementing competing risks in discrete event simulation: the event-specific probabilities and distributions approach. Front Pharmacol 2023; 14:1255021. [PMID: 37964874 PMCID: PMC10642769 DOI: 10.3389/fphar.2023.1255021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 10/10/2023] [Indexed: 11/16/2023] Open
Abstract
Background: Although several strategies for modelling competing events in discrete event simulation (DES) exist, a methodological gap for the event-specific probabilities and distributions (ESPD) approach when dealing with censored data remains. This study defines and illustrates the ESPD strategy for censored data. Methods: The ESPD approach assumes that events are generated through a two-step process. First, the type of event is selected according to some (unknown) mixture proportions. Next, the times of occurrence of the events are sampled from a corresponding survival distribution. Both of these steps can be modelled based on covariates. Performance was evaluated through a simulation study, considering sample size and levels of censoring. Additionally, an oncology-related case study was conducted to assess the ability to produce realistic results, and to demonstrate its implementation using both frequentist and Bayesian frameworks in R. Results: The simulation study showed good performance of the ESPD approach, with accuracy decreasing as sample sizes decreased and censoring levels increased. The average relative absolute error of the event probability (95%-confidence interval) ranged from 0.04 (0.00; 0.10) to 0.23 (0.01; 0.66) for 60% censoring and sample size 50, showing that increased censoring and decreased sample size resulted in lower accuracy. The approach yielded realistic results in the case study. Discussion: The ESPD approach can be used to model competing events in DES based on censored data. Further research is warranted to compare the approach to other modelling approaches for DES, and to evaluate its usefulness in estimating cumulative event incidences in a broader context.
Collapse
Affiliation(s)
- Fanny Franchini
- Cancer Health Services Research, Centre for Health Policy, Melbourne School of Population and Global Health, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
- Cancer Health Services Research, Centre for Cancer Research, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Victor Fedyashov
- ARC Training Centre in Cognitive Computing for Medical Technologies, The University of Melbourne, Parkville, VIC, Australia
| | - Maarten J. IJzerman
- Cancer Health Services Research, Centre for Health Policy, Melbourne School of Population and Global Health, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
- Cancer Health Services Research, Centre for Cancer Research, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
- Department of Cancer Research, Peter MacCallum Cancer Centre, Melbourne, Australia
- Erasmus School of Health Policy & Management, Erasmus University, Rotterdam, Netherlands
| | - Koen Degeling
- Cancer Health Services Research, Centre for Health Policy, Melbourne School of Population and Global Health, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
- Cancer Health Services Research, Centre for Cancer Research, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
| |
Collapse
|
8
|
Hafner J, Fenner K, Scheidegger A. Systematic Handling of Environmental Fate Data for Model Development-Illustrated for the Case of Biodegradation Half-Life Data. Environ Sci Technol Lett 2023; 10:859-864. [PMID: 37840818 PMCID: PMC10569042 DOI: 10.1021/acs.estlett.3c00526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 09/18/2023] [Accepted: 09/18/2023] [Indexed: 10/17/2023]
Abstract
The assessment of environmental hazard indicators such as persistence, mobility, toxicity, or bioaccumulation of chemicals often results in highly variable experimental outcomes. Persistence is particularly affected due to a multitude of influencing environmental factors, with biodegradation experiments resulting in half-lives spanning several orders of magnitude. Also, half-lives may lie beyond the limits of reliable half-life quantification, and the number of available data points per substance may vary considerably, requiring a statistically robust approach for the characterization of data. Here, we apply Bayesian inference to address these challenges and characterize the distributions of reported soil half-lives. Our model estimates the mean, standard deviation, and corresponding uncertainties from a set of reported half-lives experimentally obtained for a single substance. We apply our inference model to 893 pesticides and pesticide transformation products with experimental soil half-lives of varying data quantity and quality, and we infer the half-life distribution for each compound. By estimating average half-lives, their experimental variability, and the uncertainty of the estimations, we provide a reliable data source for building predictive models, which are urgently needed by regulatory authorities to manage existing chemicals and by industry to design benign, nonpersistent chemicals. Our approach can be readily adapted for other environmental hazard indicators.
Collapse
Affiliation(s)
- Jasmin Hafner
- Swiss
Federal Institute of Aquatic Science and Technology (Eawag), 8600 Dübendorf, Zürich, Switzerland
- University
of Zürich, 8057 Zürich, Switzerland
| | - Kathrin Fenner
- Swiss
Federal Institute of Aquatic Science and Technology (Eawag), 8600 Dübendorf, Zürich, Switzerland
- University
of Zürich, 8057 Zürich, Switzerland
| | - Andreas Scheidegger
- Swiss
Federal Institute of Aquatic Science and Technology (Eawag), 8600 Dübendorf, Zürich, Switzerland
| |
Collapse
|
9
|
Hussein M, Rodrigues GM, Ortega EMM, Vila R, Elsayed H. A New Truncated Lindley-Generated Family of Distributions: Properties, Regression Analysis, and Applications. Entropy (Basel) 2023; 25:1359. [PMID: 37761658 PMCID: PMC10528314 DOI: 10.3390/e25091359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 09/07/2023] [Accepted: 09/16/2023] [Indexed: 09/29/2023]
Abstract
We present the truncated Lindley-G (TLG) model, a novel class of probability distributions with an additional shape parameter, by composing a unit distribution called the truncated Lindley distribution with a parent distribution function G(x). The proposed model's characteristics including critical points, moments, generating function, quantile function, mean deviations, and entropy are discussed. Also, we introduce a regression model based on the truncated Lindley-Weibull distribution considering two systematic components. The model parameters are estimated using the maximum likelihood method. In order to investigate the behavior of the estimators, some simulations are run for various parameter settings, censoring percentages, and sample sizes. Four real datasets are used to demonstrate the new model's potential.
Collapse
Affiliation(s)
- Mohamed Hussein
- Department of Mathematics and Computer Science, Alexandria University, Alexandria 21544, Egypt;
- Department of Business Administration, College of Business, King Khalid University, Abha 61421, Saudi Arabia
| | - Gabriela M. Rodrigues
- Department of Exact Sciences, University of São Paulo, Piracicaba 13418-900, Brazil; (G.M.R.); (E.M.M.O.)
| | - Edwin M. M. Ortega
- Department of Exact Sciences, University of São Paulo, Piracicaba 13418-900, Brazil; (G.M.R.); (E.M.M.O.)
| | - Roberto Vila
- Department of Statistics, University of Brasilia, Brasilia 70910-900, Brazil;
| | - Howaida Elsayed
- Department of Business Administration, College of Business, King Khalid University, Abha 61421, Saudi Arabia
| |
Collapse
|
10
|
Brožová K, Michalec J, Brabec M, Bořilová P, Kohout P, Brož J. Dynamics of glucose concentration during the initiation of ketogenic diet treatment in children with refractory epilepsy: Results of continuous glucose monitoring. Epilepsia Open 2023; 8:1021-1027. [PMID: 37345572 PMCID: PMC10472364 DOI: 10.1002/epi4.12778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 06/11/2023] [Indexed: 06/23/2023] Open
Abstract
OBJECTIVE The ketogenic diet (KD) is a diet low in carbohydrates and rich in fats which has long been used to treat refractory epilepsy. The metabolic changes related to the KD may increase the risk of hypoglycemia, especially during the first days. The study focused on the impact of KD initiation on glycemia in non-diabetic patients with refractory epilepsy. METHODS The subjects were 10 pediatric patients (6 boys, mean age 6.1 ± 2.4 years), treated for intractable epilepsy. Blinded continuous glucose monitoring system (CGM) Dexcom G4 was used. Patients started on their regular diet in the first 36 hours of monitoring, followed by an increase in lipids intake and a gradual reduction of carbohydrates (relations 1:1; 2:1; 3:1; 3.5:1). We analyzed changes in glycemia during fat: nonfat ratio changes using a generalized linear model. RESULTS The mean monitored time per person was 6 days, 10 hours and 44 minutes. The mean ± SD glycemia for the regular diet was 4.84 ± 0.20 mmol/L, for the carbohydrates/fat ratio of 1:1 it was 4.03 ± 0.16, for the ratio of 2:1 it was 3.57 ± 0.10, for the ratio 3:1 it was 3.39 ± 0.13 and for the final ratio of 3.5:1 it was 2.79 ± 0.06 mmol/L (P < 0.001). The portions of time spent in glycemia ≤3.5 mmol/L (≤2.5 mmol/L respectively) were: on the normal diet 0.88% (0.31%) of the monitored period, during 1:1 KD ratio 1.92% (0.95%), during 2:1 ratio 3.18% (1.02%), and during 3:1 and 3.5:1 ratios 13.64% (2.36%) of the monitored time (P < 0.05). SIGNIFICANCE Continuous glucose monitoring system shows the dynamic of glucose concentration in ketogenic diet treatment initiation. It may be a useful tool to control the effects of this diet on glucose metabolism, especially in hypoglycemia detection.
Collapse
Affiliation(s)
- Klára Brožová
- Department of Pediatric NeurologyThomayer University HospitalPragueCzech Republic
- Third Medical FacultyCharles UniversityPragueCzech Republic
| | - Juraj Michalec
- Department of Internal Medicine, Second Faculty of MedicineCharles UniversityPragueCzech Republic
| | - Marek Brabec
- Institute of Computer ScienceAcademy of Science of the Czech RepublicPragueCzech Republic
| | - Petra Bořilová
- Department of Pediatric NeurologyThomayer University HospitalPragueCzech Republic
| | - Pavel Kohout
- Department of Internal MedicineThird Faculty of Medicine Charles University and Thomayer University HospitalPragueCzech Republic
- Center of NutritionThomayer University HospitalPragueCzech Republic
| | - Jan Brož
- Department of Internal Medicine, Second Faculty of MedicineCharles UniversityPragueCzech Republic
| |
Collapse
|
11
|
Bebu I, Diao G, Hamasaki T. Generalized fiducial inference for the restricted mean survival time. Stat Methods Med Res 2023:9622802231163333. [PMID: 36974594 DOI: 10.1177/09622802231163333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
Abstract
The standard modeling approach for time-to-event outcomes subject to censoring is based on the hazard function, with hazard ratios capturing the effect of exposures on the risk of outcome. The restricted mean survival time, defined as the expected time to event up to a pre-specified time horizon, provides an alternative useful summary of time-to-event outcomes. Restricted mean survival time can be estimated nonparametrically and can be used to compare groups or interventions when the proportional hazards (PHs) assumption does not hold. Moreover, even when the proportional hazards assumption holds, the restricted mean survival time, an additive measure of risk, provides additional information to the hazard ratio, which is a measure of relative risk that can be difficult to interpret in absence of an estimate of the reference risk. Herein, a generalized fiducial approach is proposed for restricted mean survival time, and its asymptotic properties are investigated. Numerical simulations show the proposed approach provides one- and two-sided confidence intervals with coverage probabilities close to nominal values and controls the type-I error for two-group comparisons even for small sample sizes with a low number of events. Data from a type 1 diabetes study is used for illustration.
Collapse
Affiliation(s)
- Ionut Bebu
- The Biostatistics Center, Department of Biostatistics and Bioinformatics, 8367The George Washington University, Washington, MD, USA
| | - Guoqing Diao
- The Biostatistics Center, Department of Biostatistics and Bioinformatics, 8367The George Washington University, Washington, MD, USA
| | - Toshimitsu Hamasaki
- The Biostatistics Center, Department of Biostatistics and Bioinformatics, 8367The George Washington University, Washington, MD, USA
| |
Collapse
|
12
|
Zakkour A, Perret C, Slaoui Y. Stochastic Expectation Maximization Algorithm for Linear Mixed-Effects Model with Interactions in the Presence of Incomplete Data. Entropy (Basel) 2023; 25:473. [PMID: 36981361 PMCID: PMC10047691 DOI: 10.3390/e25030473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 02/27/2023] [Accepted: 03/02/2023] [Indexed: 06/18/2023]
Abstract
The purpose of this paper is to propose a new algorithm based on stochastic expectation maximization (SEM) to deal with the problem of unobserved values when multiple interactions in a linear mixed-effects model (LMEM) are present. We test the effectiveness of the proposed algorithm with the stochastic approximation expectation maximization (SAEM) and Monte Carlo Markov chain (MCMC) algorithms. This comparison is implemented to highlight the importance of including the maximum effects that can affect the model. The applications are made on both simulated psychological and real data. The findings demonstrate that our proposed SEM algorithm is highly preferable to the other competitor algorithms.
Collapse
Affiliation(s)
- Alandra Zakkour
- Laboratoire de Mathématiques et Applications, Université de Poitiers, 11 Boulevard Marie et Pierre Curie, 86962 Futuroscope Chasseneuil, CEDEX 9, 86073 Poitiers, France
- CeRCA-CNRS UMR 7295, Université de Poitiers, 5 rue T. Lefebvre, MSHS, CEDEX 9, 86073 Poitiers, France
| | - Cyril Perret
- Laboratoire de Mathématiques et Applications, Université de Poitiers, 11 Boulevard Marie et Pierre Curie, 86962 Futuroscope Chasseneuil, CEDEX 9, 86073 Poitiers, France
- CeRCA-CNRS UMR 7295, Université de Poitiers, 5 rue T. Lefebvre, MSHS, CEDEX 9, 86073 Poitiers, France
| | - Yousri Slaoui
- Laboratoire de Mathématiques et Applications, Université de Poitiers, 11 Boulevard Marie et Pierre Curie, 86962 Futuroscope Chasseneuil, CEDEX 9, 86073 Poitiers, France
| |
Collapse
|
13
|
Li W, Ma H, Faraggi D, Dinse GE. Generalized mean residual life models for survival data with missing censoring indicators. Stat Med 2023; 42:264-280. [PMID: 36437483 DOI: 10.1002/sim.9615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 10/23/2022] [Accepted: 11/15/2022] [Indexed: 11/29/2022]
Abstract
The mean residual life (MRL) function is an important and attractive alternative to the hazard function for characterizing the distribution of a time-to-event variable. In this article, we study the modeling and inference of a family of generalized MRL models for right-censored survival data with censoring indicators missing at random. To estimate the model parameters, augmented inverse probability weighted estimating equation approaches are developed, in which the non-missingness probability and the conditional probability of an uncensored observation are estimated by parametric methods or nonparametric kernel smoothing techniques. Asymptotic properties of the proposed estimators are established and finite sample performance is evaluated by extensive simulation studies. An application to brain cancer data is presented to illustrate the proposed methods.
Collapse
Affiliation(s)
- Wenwen Li
- KLATASDS-MOE, School of Statistics and Academy of Statistics and Interdisciplinary Sciences, East China Normal University, Shanghai, China
| | - Huijuan Ma
- KLATASDS-MOE, School of Statistics and Academy of Statistics and Interdisciplinary Sciences, East China Normal University, Shanghai, China
| | - David Faraggi
- KLATASDS-MOE, School of Statistics and Academy of Statistics and Interdisciplinary Sciences, East China Normal University, Shanghai, China.,Department of Statistics, University of Haifa, Haifa, Israel
| | - Gregg E Dinse
- Public Health & Scientific Research, Social and Scientific Systems, Durham, North Carolina, USA
| |
Collapse
|
14
|
Wang H, Li Q, Liu Y. Regularized Buckley-James method for right-censored outcomes with block-missing multimodal covariates. Stat (Int Stat Inst) 2022; 11:e515. [PMID: 37854542 PMCID: PMC10583730 DOI: 10.1002/sta4.515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 10/10/2022] [Indexed: 10/20/2023]
Abstract
High-dimensional data with censored outcomes of interest are prevalent in medical research. To analyze such data, the regularized Buckley-James estimator has been successfully applied to build accurate predictive models and conduct variable selection. In this paper, we consider the problem of parameter estimation and variable selection for the semiparametric accelerated failure time model for high-dimensional block-missing multimodal neuroimaging data with censored outcomes. We propose a penalized Buckley-James method that can simultaneously handle block-wise missing covariates and censored outcomes. This method can also perform variable selection. The proposed method is evaluated by simulations and applied to a multimodal neuroimaging dataset and obtains meaningful results.
Collapse
Affiliation(s)
- Haodong Wang
- Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill, Chapel Hill, 27599, North Carolina, USA
| | - Quefeng Li
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, 27516, North Carolina, USA
| | - Yufeng Liu
- Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill, Chapel Hill, 27599, North Carolina, USA
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, 27516, North Carolina, USA
- Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, 27599-7264, North Carolina, USA
- Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, 27514, North Carolina, USA
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, 27514, North Carolina, USA
| |
Collapse
|
15
|
Qi X, Zhou S, Wang Y, Peterson C. Bayesian sparse modeling to identify high-risk subgroups in meta-analysis of safety data. Res Synth Methods 2022; 13:807-820. [PMID: 36054779 PMCID: PMC9649868 DOI: 10.1002/jrsm.1597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 06/16/2022] [Accepted: 07/13/2022] [Indexed: 11/08/2022]
Abstract
Meta-analysis allows researchers to combine evidence from multiple studies, making it a powerful tool for synthesizing information on the safety profiles of new medical interventions. There is a critical need to identify subgroups at high risk of experiencing treatment-related toxicities. However, this remains quite challenging from a statistical perspective as there are a variety of clinical risk factors that may be relevant for different types of adverse events, and adverse events of interest may be rare or incompletely reported. We frame this challenge as a variable selection problem and propose a Bayesian hierarchical model which incorporates a horseshoe prior on the interaction terms to identify high-risk groups. Our proposed model is motivated by a meta-analysis of adverse events in cancer immunotherapy, and our results uncover key factors driving the risk of specific types of treatment-related adverse events.
Collapse
Affiliation(s)
- Xinyue Qi
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX
| | - Shouhao Zhou
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA
| | - Yucai Wang
- Division of Hematology, Mayo Clinic, Rochester, Minnesota
| | - Christine Peterson
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX
| |
Collapse
|
16
|
Chen S, Hoch JS. Net-benefit regression with censored cost-effectiveness data from randomized or observational studies. Stat Med 2022; 41:3958-3974. [PMID: 35665527 PMCID: PMC9427707 DOI: 10.1002/sim.9486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 03/25/2022] [Accepted: 05/18/2022] [Indexed: 11/10/2022]
Abstract
Cost-effectiveness analysis is an essential part of the evaluation of new medical interventions. While in many studies both costs and effectiveness (eg, survival time) are censored, standard survival analysis techniques are often invalid due to the induced dependent censoring problem. We propose methods for censored cost-effectiveness data using the net-benefit regression framework, which allow covariate-adjustment and subgroup identification when comparing two intervention groups. The methods provide a straightforward way to construct cost-effectiveness acceptability curves with censored data. We also propose a more efficient doubly robust estimator of average causal incremental net benefit, which increases the likelihood that the results will represent a valid inference in observational studies. Lastly, we conduct extensive numerical studies to examine the finite-sample performance of the proposed methods, and illustrate the proposed methods with a real data example using both survival time and quality-adjusted survival time as the measures of effectiveness.
Collapse
Affiliation(s)
- Shuai Chen
- Division of Biostatistics, Department of Public Health Sciences, University of California, Davis, Davis, California, USA
| | - Jeffrey S. Hoch
- Division of Health Policy and Management, Department of Public Health Sciences, University of California, Davis, Sacramento, California, USA
- Center for Healthcare Policy and Research, University of California, Davis, Sacramento, California, USA
| |
Collapse
|
17
|
Zhang Z, Yi D, Fan Y. Doubly robust estimation of optimal dynamic treatment regimes with multicategory treatments and survival outcomes. Stat Med 2022; 41:4903-4923. [PMID: 35948279 DOI: 10.1002/sim.9543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 05/31/2022] [Accepted: 07/21/2022] [Indexed: 11/06/2022]
Abstract
Patients with chronic diseases, such as cancer or epilepsy, are often followed through multiple stages of clinical interventions. Dynamic treatment regimes (DTRs) are sequences of decision rules that assign treatments at each stage based on measured covariates for each patient. A DTR is said to be optimal if the expectation of the desirable clinical benefit reaches a maximum when applied to a population. When there are three or more options for treatments at each decision point and the clinical outcome of interest is a time-to-event variable, estimating an optimal DTR can be complicated. We propose a doubly robust method to estimate optimal DTRs with multicategory treatments and survival outcomes. A novel blip function is defined to measure the difference in expected outcomes among treatments, and a doubly robust weighted least squares algorithm is designed for parameter estimation. Simulations using various weight functions and scenarios support the advantages of the proposed method in estimating optimal DTRs over existing approaches. We further illustrate the practical value of our method by applying it to data from the Standard and New Antiepileptic Drugs study. In this analysis, the proposed method supports the use of the new drug lamotrigine over the standard option carbamazepine. When the actual treatments match the estimated optimal treatments, survival outcomes tend to be better. The newly developed method provides a practical approach for clinicians that is not limited to cases of binary treatment options.
Collapse
Affiliation(s)
- Zhang Zhang
- Center for Applied Statistics, Renmin University of China, Beijing, China.,School of Statistics, Renmin University of China, Beijing, China
| | - Danhui Yi
- Center for Applied Statistics, Renmin University of China, Beijing, China.,School of Statistics, Renmin University of China, Beijing, China
| | - Yiwei Fan
- School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
18
|
Mattos TB, Lachos VH, Castro LM, Matos LA. Extending multivariate Student's- t $$ t $$ semiparametric mixed models for longitudinal data with censored responses and heavy tails. Stat Med 2022; 41:3696-3719. [PMID: 35596519 DOI: 10.1002/sim.9443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 04/25/2022] [Accepted: 05/10/2022] [Indexed: 11/08/2022]
Abstract
This article extends the semiparametric mixed model for longitudinal censored data with Gaussian errors by considering the Student's t $$ t $$ -distribution. This model allows us to consider a flexible, functional dependence of an outcome variable over the covariates using nonparametric regression. Moreover, the proposed model takes into account the correlation between observations by using random effects. Penalized likelihood equations are applied to derive the maximum likelihood estimates that appear to be robust against outlying observations with respect to the Mahalanobis distance. We estimate nonparametric functions using smoothing splines under an EM-type algorithm framework. Finally, the proposed approach's performance is evaluated through extensive simulation studies and an application to two datasets from acquired immunodeficiency syndrome clinical trials.
Collapse
Affiliation(s)
- Thalita B Mattos
- Departamento de Estatística, Universidade Estadual de Campinas, São Paulo, Brazil
| | - Victor H Lachos
- Department of Statistics, University of Connecticut, Storrs, Connecticut, USA
| | - Luis M Castro
- Department of Statistics, Pontificia Universidad Católica de Chile, Santiago, Chile.,Millennium Nucleus Center for the Discovery of Structures in Complex Data, and Centro de Riesgos y Seguros UC, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Larissa A Matos
- Departamento de Estatística, Universidade Estadual de Campinas, São Paulo, Brazil
| |
Collapse
|
19
|
Wang X, Cai T, Tian L, Bourgeois F, Parast L. Quantifying the feasibility of shortening clinical trial duration using surrogate markers. Stat Med 2021; 40:6321-6343. [PMID: 34474500 DOI: 10.1002/sim.9185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 08/08/2021] [Accepted: 08/17/2021] [Indexed: 11/09/2022]
Abstract
The potential benefit of using a surrogate marker in place of a long-term primary outcome is very attractive in terms of the impact on study length and cost. Many available methods for quantifying the effectiveness of a surrogate endpoint either rely on strict parametric modeling assumptions or require that the primary outcome and surrogate marker are fully observed that is, not subject to censoring. Moreover, available methods for quantifying surrogacy typically provide a proportion of treatment effect explained (PTE) measure and do not directly address the important questions of whether and how the trial can be ended earlier using the surrogate marker. In this article, we specifically address these important questions by proposing a PTE measure to quantify the feasibility of ending trials early based on endpoint information collected at an earlier landmark point t 0 in a time-to-event outcome setting. We provide a framework for deriving an optimally predicted outcome for individual patients at t 0 based on a combination of surrogate marker and event time information in the presence of censoring. We propose a non-parametric estimator for the PTE measure and derive the asymptotic properties of our estimators. Finite sample performance of our estimators are illustrated via extensive simulation studies and a real data application examining the potential of hemoglobin A1c and fasting plasma glucose to predict treatment effects on long term diabetes risk based on the Diabetes Prevention Program study.
Collapse
Affiliation(s)
- Xuan Wang
- Department of Biostatistics, Harvard University, Boston, Massachusetts, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard University, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard University, Boston, Massachusetts, USA
| | - Lu Tian
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| | | | - Layla Parast
- Statistics Group, RAND Corporation, Santa Monica, California, USA
| |
Collapse
|
20
|
Gierz K, Park K. Detection of multiple change points in a Weibull accelerated failure time model using sequential testing. Biom J 2021; 64:617-634. [PMID: 34873728 DOI: 10.1002/bimj.202000262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 06/21/2021] [Accepted: 07/19/2021] [Indexed: 11/07/2022]
Abstract
With improvements to cancer diagnoses and treatments, incidences and mortality rates have changed. However, the most commonly used analysis methods do not account for such distributional changes. In survival analysis, change point problems can concern a shift in a distribution for a set of time-ordered observations, potentially under censoring or truncation. We propose a sequential testing approach for detecting multiple change points in the Weibull accelerated failure time model, since this is sufficiently flexible to accommodate increasing, decreasing, or constant hazard rates and is also the only continuous distribution for which the accelerated failure time model can be reparameterized as a proportional hazards model. Our sequential testing procedure does not require the number of change points to be known; this information is instead inferred from the data. We conduct a simulation study to show that the method accurately detects change points and estimates the model. The numerical results along with real data applications demonstrate that our proposed method can detect change points in the hazard rate.
Collapse
Affiliation(s)
| | - Kayoung Park
- Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA, USA
| |
Collapse
|
21
|
Bertrand F, Maumy-Bertrand M. Fitting and Cross-Validating Cox Models to Censored Big Data With Missing Values Using Extensions of Partial Least Squares Regression Models. Front Big Data 2021; 4:684794. [PMID: 34790895 PMCID: PMC8591675 DOI: 10.3389/fdata.2021.684794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 10/07/2021] [Indexed: 11/22/2022] Open
Abstract
Fitting Cox models in a big data context -on a massive scale in terms of volume, intensity, and complexity exceeding the capacity of usual analytic tools-is often challenging. If some data are missing, it is even more difficult. We proposed algorithms that were able to fit Cox models in high dimensional settings using extensions of partial least squares regression to the Cox models. Some of them were able to cope with missing data. We were recently able to extend our most recent algorithms to big data, thus allowing to fit Cox model for big data with missing values. When cross-validating standard or extended Cox models, the commonly used criterion is the cross-validated partial loglikelihood using a naive or a van Houwelingen scheme -to make efficient use of the death times of the left out data in relation to the death times of all the data. Quite astonishingly, we will show, using a strong simulation study involving three different data simulation algorithms, that these two cross-validation methods fail with the extensions, either straightforward or more involved ones, of partial least squares regression to the Cox model. This is quite an interesting result for at least two reasons. Firstly, several nice features of PLS based models, including regularization, interpretability of the components, missing data support, data visualization thanks to biplots of individuals and variables -and even parsimony or group parsimony for Sparse partial least squares or sparse group SPLS based models, account for a common use of these extensions by statisticians who usually select their hyperparameters using cross-validation. Secondly, they are almost always featured in benchmarking studies to assess the performance of a new estimation technique used in a high dimensional or big data context and often show poor statistical properties. We carried out a vast simulation study to evaluate more than a dozen of potential cross-validation criteria, either AUC or prediction error based. Several of them lead to the selection of a reasonable number of components. Using these newly found cross-validation criteria to fit extensions of partial least squares regression to the Cox model, we performed a benchmark reanalysis that showed enhanced performances of these techniques. In addition, we proposed sparse group extensions of our algorithms and defined a new robust measure based on the Schmid score and the R coefficient of determination for least absolute deviation: the integrated R Schmid Score weighted. The R-package used in this article is available on the CRAN, http://cran.r-project.org/web/packages/plsRcox/index.html. The R package bigPLS will soon be available on the CRAN and, until then, is available on Github https://github.com/fbertran/bigPLS.
Collapse
Affiliation(s)
- Frédéric Bertrand
- LIST3N, Université de Technologie de Troyes, Troyes, France
- IRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France
| | - Myriam Maumy-Bertrand
- LIST3N, Université de Technologie de Troyes, Troyes, France
- IRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France
| |
Collapse
|
22
|
Li Y, Liang M, Mao L, Wang S. Robust estimation and variable selection for the accelerated failure time model. Stat Med 2021; 40:4473-4491. [PMID: 34031919 PMCID: PMC8364878 DOI: 10.1002/sim.9042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 04/25/2021] [Accepted: 04/26/2021] [Indexed: 11/10/2022]
Abstract
This article concerns robust modeling of the survival time for cancer patients. Accurate prediction of patient survival time is crucial to the development of effective therapeutic strategies. To this goal, we propose a unified Expectation-Maximization approach combined with the L1 -norm penalty to perform variable selection and parameter estimation simultaneously in the accelerated failure time model with right-censored survival data of moderate sizes. Our approach accommodates general loss functions, and reduces to the well-known Buckley-James method when the squared-error loss is used without regularization. To mitigate the effects of outliers and heavy-tailed noise in real applications, we recommend the use of robust loss functions under the general framework. Furthermore, our approach can be extended to incorporate group structure among covariates. We conduct extensive simulation studies to assess the performance of the proposed methods with different loss functions and apply them to an ovarian carcinoma study as an illustration.
Collapse
Affiliation(s)
- Yi Li
- Department of Statistics, University of Wisconsin-Madison, Wisconsin, USA
| | - Muxuan Liang
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Washington, USA
| | - Lu Mao
- Department of Biostatistics and Medical Informatics, School of Medicine and Public Health, University of Wisconsin-Madison, Wisconsin, USA
| | - Sijian Wang
- Department of Statistics, Rutgers University, New Jersey, USA
| |
Collapse
|
23
|
Stenzel MR, Groth CP, Banerjee S, Ramachandran G, Kwok RK, Engel LS, Sandler DP, Stewart PA. Exposure Assessment Techniques Applied to the Highly Censored Deepwater Horizon Gulf Oil Spill Personal Measurements. Ann Work Expo Health 2021; 66:i56-i70. [PMID: 34417597 DOI: 10.1093/annweh/wxab060] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 06/07/2021] [Accepted: 07/13/2021] [Indexed: 11/14/2022] Open
Abstract
The GuLF Long-term Follow-up Study (GuLF STUDY) is investigating potential adverse health effects of workers involved in the Deepwater Horizon (DWH) oil spill response and cleanup (OSRC). Over 93% of the 160 000 personal air measurements taken on OSRC workers were below the limit of detection (LOD), as reported by the analytic labs. At this high level of censoring, our ability to develop exposure estimates was limited. The primary objective here was to reduce the number of measurements below the labs' reported LODs to reflect the analytic methods' true LODs, thereby facilitating the use of a relatively unbiased and precise Bayesian method to develop exposure estimates for study exposure groups (EGs). The estimates informed a job-exposure matrix to characterize exposure of study participants. A second objective was to develop descriptive statistics for relevant EGs that did not meet the Bayesian criteria of sample size ≥5 and censoring ≤80% to achieve the aforementioned level of bias and precision. One of the analytic labs recalculated the measurements using the analytic method's LOD; the second lab provided raw analytical data, allowing us to recalculate the data values that fell between the originally reported LOD and the analytical method's LOD. We developed rules for developing Bayesian estimates for EGs with >80% censoring. The remaining EGs were 100% censored. An order-based statistical method (OBSM) was developed to estimate exposures that considered the number of measurements, geometric standard deviation, and average LOD of the censored samples for N ≥ 20. For N < 20, substitution of ½ of the LOD was assigned. Recalculation of the measurements lowered overall censoring from 93.2 to 60.5% and of the THC measurements, from 83.1 to 11.2%. A total of 71% of the EGs met the ≤15% relative bias and <65% imprecision goal. Another 15% had censoring >80% but enough non-censored measurements to apply Bayesian methods. We used the OBSM for 3% of the estimates and the simple substitution method for 11%. The methods presented here substantially reduced the degree of censoring in the dataset and increased the number of EGs meeting our Bayesian method's desired performance goal. The OBSM allowed for a systematic and consistent approach impacting only the lowest of the exposure estimates. This approach should be considered when dealing with highly censored datasets.
Collapse
Affiliation(s)
- Mark R Stenzel
- Exposure Assessment Applications, LLC, Arlington, VA, USA
| | - Caroline P Groth
- Department of Epidemiology and Biostatistics, WVU School of Public Health, West Virginia University, Morgantown, WV, USA
| | - Sudipto Banerjee
- Department of Biostatistics, UCLA Fielding School of Public Health, University of California-Los Angeles, Los Angeles, CA, USA
| | - Gurumurthy Ramachandran
- Department of Environmental Health and Engineering, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Richard K Kwok
- Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA.,Office of the Director, National Institute of Environmental Health Sciences, Bethesda, MD, USA
| | - Lawrence S Engel
- Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA.,Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Dale P Sandler
- Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
| | | |
Collapse
|
24
|
Yi GY, He W, Carroll RJ. Feature screening with large-scale and high-dimensional survival data. Biometrics 2021; 78:894-907. [PMID: 33881782 DOI: 10.1111/biom.13479] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 01/27/2021] [Accepted: 04/07/2020] [Indexed: 11/27/2022]
Abstract
Data with a huge size present great challenges in modeling, inferences, and computation. In handling big data, much attention has been directed to settings with "large p small n", and relatively less work has been done to address problems with p and n being both large, though data with such a feature have now become more accessible than before, where p represents the number of variables and n stands for the sample size. The big volume of data does not automatically ensure good quality of inferences because a large number of unimportant variables may be collected in the process of gathering informative variables. To carry out valid statistical analysis, it is imperative to screen out noisy variables that have no predictive value for explaining the outcome variable. In this paper, we develop a screening method for handling large-sized survival data, where the sample size n is large and the dimension p of covariates is of non-polynomial order of the sample size n, or the so-called NP-dimension. We rigorously establish theoretical results for the proposed method and conduct numerical studies to assess its performance. Our research offers multiple extensions of existing work and enlarges the scope of high-dimensional data analysis. The proposed method capitalizes on the connections among useful regression settings and offers a computationally efficient screening procedure. Our method can be applied to different situations with large-scale data including genomic data.
Collapse
Affiliation(s)
- Grace Y Yi
- Department of Statistical and Actuarial Sciences, Department of Computer Science, University of Western Ontario, London, Ontario, Canada
| | - Wenqing He
- Department of Statistical and Actuarial Sciences, University of Western Ontario, London, Ontario, Canada
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, College Station, Texas, USA.,School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway, Australia
| |
Collapse
|
25
|
Karttunen A, Valkama M, Talvitie J. Influence of Noise-Limited Censored Path Loss on Model Fitting and Path Loss-Based Positioning. Sensors (Basel) 2021; 21:987. [PMID: 33540651 DOI: 10.3390/s21030987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Revised: 01/18/2021] [Accepted: 01/27/2021] [Indexed: 11/26/2022]
Abstract
Positioning is considered one of the key features in various novel industry verticals in future radio systems. Since path loss (PL) or received signal strength-based measurements are widely available in the majority of wireless standards, PL-based positioning has an important role among positioning technologies. Conventionally, PL-based positioning has two phases—fitting a PL model to training data and positioning based on the link distance estimates. However, in both phases, the maximum measurable PL is limited by measurement noise. Such immeasurable samples are called censored PL data and such noisy data are commonly neglected in both the model fitting and in the positioning phase. In the case of censored PL, the loss is known to be above a known threshold level and that information can be used in model fitting and in the positioning phase. In this paper, we examine and propose how to use censored PL data in PL model-based positioning. Additionally, we demonstrate with several simulations the potential of the proposed approach for considerable improvements in positioning accuracy (23–57%) and improved robustness against PL model fitting errors.
Collapse
|
26
|
Olivari RC, Garay AM, Lachos VH, Matos LA. Mixed-effects models for censored data with autoregressive errors. J Biopharm Stat 2020; 31:273-294. [PMID: 33315523 DOI: 10.1080/10543406.2020.1852246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Mixed-effects models, with modifications to accommodate censored observations (LMEC/NLMEC), are routinely used to analyze measurements, collected irregularly over time, which are often subject to some upper and lower detection limits. This paper presents a likelihood-based approach for fitting LMEC/NLMEC models with autoregressive of order p dependence of the error term. An EM-type algorithm is developed for computing the maximum likelihood estimates, obtaining as a byproduct the standard errors of the fixed effects and the likelihood value. Moreover, the constraints on the parameter space that arise from the stationarity conditions for the autoregressive parameters in the EM algorithm are handled by a reparameterization scheme, as discussed in Lin and Lee (2007). To examine the performance of the proposed method, we present some simulation studies and analyze a real AIDS case study. The proposed algorithm and methods are implemented in the new R package ARpLMEC.
Collapse
Affiliation(s)
- Rommy C Olivari
- Department of Statistics, Federal University of Pernambuco, Recife, Brazil
| | - Aldo M Garay
- Department of Statistics, Federal University of Pernambuco, Recife, Brazil
| | - Victor H Lachos
- Department of Statistics, University of Connecticut, Storrs, Connecticut, USA
| | - Larissa A Matos
- Department of Statistics, State University of Campinas, Campinas, São Paulo, Brazil
| |
Collapse
|
27
|
Huynh TB, Groth CP, Ramachandran G, Banerjee S, Stenzel M, Quick H, Blair A, Engel LS, Kwok RK, Sandler DP, Stewart PA. Estimates of Occupational Inhalation Exposures to Six Oil-Related Compounds on the Four Rig Vessels Responding to the Deepwater Horizon Oil Spill. Ann Work Expo Health 2020; 66:i89-i110. [PMID: 33009797 PMCID: PMC8989034 DOI: 10.1093/annweh/wxaa072] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 05/27/2020] [Accepted: 06/22/2020] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND The 2010 Deepwater Horizon (DWH) oil spill involved thousands of workers and volunteers to mitigate the oil release and clean-up after the spill. Health concerns for these participants led to the initiation of a prospective epidemiological study (GuLF STUDY) to investigate potential adverse health outcomes associated with the oil spill response and clean-up (OSRC). Characterizing the chemical exposures of the OSRC workers was an essential component of the study. Workers on the four oil rig vessels mitigating the spill and located within a 1852 m (1 nautical mile) radius of the damaged wellhead [the Discoverer Enterprise (Enterprise), the Development Driller II (DDII), the Development Driller III (DDIII), and the HelixQ4000] had some of the greatest potential for chemical exposures. OBJECTIVES The aim of this paper is to characterize potential personal chemical exposures via the inhalation route for workers on those four rig vessels. Specifically, we presented our methodology and descriptive statistics of exposure estimates for total hydrocarbons (THCs), benzene, toluene, ethylbenzene, xylene, and n-hexane (BTEX-H) for various job groups to develop exposure groups for the GuLF STUDY cohort. METHODS Using descriptive information associated with the measurements taken on various jobs on these rig vessels and with job titles from study participant responses to the study questionnaire, job groups [unique job/rig/time period (TP) combinations] were developed to describe groups of workers with the same or closely related job titles. A total of 500 job groups were considered for estimation using the available 8139 personal measurements. We used a univariate Bayesian model to analyze the THC measurements and a bivariate Bayesian regression framework to jointly model the measurements of THC and each of the BTEX-H chemicals separately, both models taking into account the many measurements that were below the analytic limit of detection. RESULTS Highest THC exposures occurred in TP1a and TP1b, which was before the well was mechanically capped. The posterior medians of the arithmetic mean (AM) ranged from 0.11 ppm ('Inside/Other', TP1b, DDII; and 'Driller', TP3, DDII) to 14.67 ppm ('Methanol Operations', TP1b, Enterprise). There were statistical differences between the THC AMs by broad job groups, rigs, and time periods. The AMs for BTEX-H were generally about two to three orders of magnitude lower than the THC AMs, with benzene and ethylbenzene measurements being highly censored. CONCLUSIONS Our results add new insights to the limited literature on exposures associated with oil spill responses and support the current epidemiologic investigation of potential adverse health effects of the oil spill.
Collapse
Affiliation(s)
- Tran B Huynh
- Department of Environmental and Occupational Health, Dornsife School of Public Health, Drexel University, Philadelphia, PA 19104, USA
| | - Caroline P Groth
- Department of Biostatistics, School of Public Health, West Virginia University, Morgantown, WV 26506-9190, USA
| | - Gurumurthy Ramachandran
- Department of Environmental Health and Engineering, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA,Author to whom correspondence should be addressed. Tel: +1-410-502-0182; e-mail:
| | - Sudipto Banerjee
- Department of Biostatistics, UCLA Fielding School of Public Health, University of California—Los Angeles, Los Angeles, CA 90095, USA
| | - Mark Stenzel
- Exposure Assessment Applications, LLC, Arlington, VA 22207, USA
| | - Harrison Quick
- Department of Epidemiology and Biostatistics, Dornsife School of Public Health, Drexel University, Philadelphia, PA 19104, USA
| | - Aaron Blair
- National Cancer Institute, Occupational and Environmental Epidemiology Branch, Gaithersburg, MN 20892, USA
| | - Lawrence S Engel
- National Institute of Environmental Health Sciences, Epidemiology Branch, Research Triangle Park, NC 27709, USA,University of North Carolina at Chapel Hill, Department of Epidemiology, Chapel Hill, NC 27599, USA
| | - Richard K Kwok
- National Institute of Environmental Health Sciences, Epidemiology Branch, Research Triangle Park, NC 27709, USA
| | - Dale P Sandler
- National Institute of Environmental Health Sciences, Epidemiology Branch, Research Triangle Park, NC 27709, USA
| | | |
Collapse
|
28
|
Simoneau G, Moodie EEM, Nijjar JS, Platt RW. Finite sample variance estimation for optimal dynamic treatment regimes of survival outcomes. Stat Med 2020; 39:4466-4479. [PMID: 32929753 DOI: 10.1002/sim.8735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 07/18/2020] [Accepted: 07/27/2020] [Indexed: 11/06/2022]
Abstract
Deriving valid confidence intervals for complex estimators is a challenging task in practice. Estimators of dynamic weighted survival modeling (DWSurv), a method to estimate an optimal dynamic treatment regime of censored outcomes, are asymptotically normal and consistent for their target parameters when at least a subset of the nuisance models is correctly specified. However, their behavior in finite samples and the impact of model misspecification on inferences remain unclear. In addition, the estimators' nonregularity may negatively affect the inferences under some specific data generating mechanisms. Our objective was to compare five methods, two asymptotic variance formulas (adjusting or not for the estimation of nuisance parameters) to three bootstrap approaches, to construct confidence intervals for the DWSurv parameters in finite samples. Via simulations, we considered practical scenarios, for example, when some nuisance models are misspecified or when nonregularity is problematic. We also compared the five methods in an application about the treatment of rheumatoid arthritis. We found that the bootstrap approaches performed consistently well at the cost of longer computational times. The asymptotic variance with adjustments generally yielded conservative confidence intervals. The asymptotic variance without adjustments yielded nominal coverages for large sample sizes. We recommend using the asymptotic variance with adjustments in small samples and the bootstrap if computationally feasible. Caution should be taken when nonregularity may be an issue.
Collapse
Affiliation(s)
- Gabrielle Simoneau
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
| | - Erica E M Moodie
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
| | - Jagtar S Nijjar
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - Robert W Platt
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada.,Centre for Clinical Epidemiology, Lady Davis Institute, Jewish General Hospital, Montreal, Quebec, Canada
| |
Collapse
|
29
|
Abstract
This paper concerns Kalman filtering when the measurements of the process are censored. The censored measurements are addressed by the Tobit model of Type I and are one-dimensional with two censoring limits, while the (hidden) state vectors are multidimensional. For this model, Bayesian estimates for the state vectors are provided through a recursive algorithm of Kalman filtering type. Experiments are presented to illustrate the effectiveness and applicability of the algorithm. The experiments show that the proposed method outperforms other filtering methodologies in minimizing the computational cost as well as the overall Root Mean Square Error (RMSE) for synthetic and real data sets.
Collapse
Affiliation(s)
- Kostas Loumponias
- Department of Mathematics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - George Tsaklidis
- Department of Mathematics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
30
|
Arfè A, Alexander B, Trippa L. Optimality of testing procedures for survival data in the nonproportional hazards setting. Biometrics 2020; 77:587-598. [PMID: 32535892 DOI: 10.1111/biom.13315] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 05/25/2020] [Accepted: 05/27/2020] [Indexed: 02/06/2023]
Abstract
Most statistical tests for treatment effects used in randomized clinical trials with survival outcomes are based on the proportional hazards assumption, which often fails in practice. Data from early exploratory studies may provide evidence of nonproportional hazards, which can guide the choice of alternative tests in the design of practice-changing confirmatory trials. We developed a test to detect treatment effects in a late-stage trial, which accounts for the deviations from proportional hazards suggested by early-stage data. Conditional on early-stage data, among all tests that control the frequentist Type I error rate at a fixed α level, our testing procedure maximizes the Bayesian predictive probability that the study will demonstrate the efficacy of the experimental treatment. Hence, the proposed test provides a useful benchmark for other tests commonly used in the presence of nonproportional hazards, for example, weighted log-rank tests. We illustrate this approach in simulations based on data from a published cancer immunotherapy phase III trial.
Collapse
Affiliation(s)
- Andrea Arfè
- Harvard-MIT Center for Regulatory Science, Harvard Medical School, Boston, Massachusetts
| | - Brian Alexander
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Lorenzo Trippa
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts
| |
Collapse
|
31
|
Simoneau G, Moodie EEM, Azoulay L, Platt RW. Adaptive Treatment Strategies With Survival Outcomes: An Application to the Treatment of Type 2 Diabetes Using a Large Observational Database. Am J Epidemiol 2020; 189:461-469. [PMID: 31903490 DOI: 10.1093/aje/kwz272] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 12/10/2019] [Accepted: 12/10/2019] [Indexed: 01/16/2023] Open
Abstract
Sequences of treatments that adapt to a patient's changing condition over time are often needed for the management of chronic diseases. An adaptive treatment strategy (ATS) consists of personalized treatment rules to be applied through the course of a disease that input the patient's characteristics at the time of decision-making and output a recommended treatment. An optimal ATS is the sequence of tailored treatments that yields the best clinical outcome for patients sharing similar characteristics. Methods for estimating optimal adaptive treatment strategies, which must disentangle short- and long-term treatment effects, can be theoretically involved and hard to explain to clinicians, especially when the outcome to be optimized is a survival time subject to right-censoring. In this paper, we describe dynamic weighted survival modeling, a method for estimating an optimal ATS with survival outcomes. Using data from the Clinical Practice Research Datalink, a large primary-care database, we illustrate how it can answer an important clinical question about the treatment of type 2 diabetes. We identify an ATS pertaining to which drug add-ons to recommend when metformin in monotherapy does not achieve the therapeutic goals.
Collapse
|
32
|
Zhao YQ, Zhu R, Chen G, Zheng Y. Constructing dynamic treatment regimes with shared parameters for censored data. Stat Med 2020; 39:1250-1263. [PMID: 31951041 PMCID: PMC7305816 DOI: 10.1002/sim.8473] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 10/31/2019] [Accepted: 12/16/2019] [Indexed: 01/28/2023]
Abstract
Dynamic treatment regimes are sequential decision rules that adapt throughout disease progression according to a patient's evolving characteristics. In many clinical applications, it is desirable that the format of the decision rules remains consistent over time. Unlike the estimation of dynamic treatment regimes in regular settings, where decision rules are formed without shared parameters, the derivation of the shared decision rules requires estimating shared parameters indexing the decision rules across different decision points. Estimation of such rules becomes more complicated when the clinical outcome of interest is a survival time subject to censoring. To address these challenges, we propose two novel methods: censored shared-Q-learning and censored shared-O-learning. Both methods incorporate clinical preferences into a qualitative rule, where the parameters indexing the decision rules are shared across different decision points and estimated simultaneously. We use simulation studies to demonstrate the superior performance of the proposed methods. The methods are further applied to the Framingham Heart Study to derive treatment rules for cardiovascular disease.
Collapse
Affiliation(s)
- Ying-Qi Zhao
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, U.S.A
| | - Ruoqing Zhu
- Department of Statistics, University of Illinois Urbana-Champaign, Champaign, Illinois, 61820, U.S.A
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, 53792, U.S.A
| | - Yingye Zheng
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, U.S.A
| |
Collapse
|
33
|
Baharith LA, AL-Beladi KM, Klakattawi HS. The Odds Exponential-Pareto IV Distribution: Regression Model and Application. Entropy (Basel) 2020; 22:e22050497. [PMID: 33286270 PMCID: PMC7516982 DOI: 10.3390/e22050497] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 04/15/2020] [Accepted: 04/23/2020] [Indexed: 11/16/2022]
Abstract
This article introduces the odds exponential-Pareto IV distribution, which belongs to the odds family of distributions. We studied the statistical properties of this new distribution. The odds exponential-Pareto IV distribution provided decreasing, increasing, and upside-down hazard functions. We employed the maximum likelihood method to estimate the distribution parameters. The estimators performance was assessed by conducting simulation studies. A new log location-scale regression model based on the odds exponential-Pareto IV distribution was also introduced. Parameter estimates of the proposed model were obtained using both maximum likelihood and jackknife methods for right-censored data. Real data sets were analyzed under the odds exponential-Pareto IV distribution and log odds exponential-Pareto IV regression model to show their flexibility and potentiality.
Collapse
Affiliation(s)
- Lamya A. Baharith
- Department of Statistics, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia; (K.M.A.-B.); (H.S.K.)
- Correspondence:
| | - Kholod M. AL-Beladi
- Department of Statistics, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia; (K.M.A.-B.); (H.S.K.)
- Department of Statistics, Faculty Science, University of Jeddah, Jeddah 21959, Saudi Arabia
| | - Hadeel S. Klakattawi
- Department of Statistics, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia; (K.M.A.-B.); (H.S.K.)
| |
Collapse
|
34
|
Wang X, Zhong Y, Mukhopadhyay P, Schaubel DE. Computationally efficient inference for center effects based on restricted mean survival time. Stat Med 2019; 38:5133-5145. [PMID: 31502288 DOI: 10.1002/sim.8356] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 06/04/2019] [Accepted: 07/26/2019] [Indexed: 11/06/2022]
Abstract
Restricted mean survival time (RMST) has gained increased attention in biostatistical and clinical studies. Directly modeling RMST (as opposed to modeling then transforming the hazard function) is appealing computationally and in terms of interpreting covariate effects. We propose computationally convenient methods for evaluating center effects based on RMST. A multiplicative model for the RMST is assumed. Estimation proceeds through an algorithm analogous to stratification, which permits the evaluation of thousands of centers. We derive the asymptotic properties of the proposed estimators and evaluate finite sample performance through simulation. We demonstrate that considerable decreases in computational burden are achievable through the proposed methods, in terms of both storage requirements and run time. The methods are applied to evaluate more than 5000 US dialysis facilities using data from a national end-stage renal disease registry.
Collapse
Affiliation(s)
- Xin Wang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan.,Vertex Pharmaceuticals, Boston, Massachusetts
| | - Yingchao Zhong
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | | | - Douglas E Schaubel
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan.,Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
35
|
Wang H, Li G. Extreme learning machine Cox model for high-dimensional survival analysis. Stat Med 2019; 38:2139-2156. [PMID: 30632193 PMCID: PMC6498851 DOI: 10.1002/sim.8090] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 10/11/2018] [Accepted: 12/12/2018] [Indexed: 11/07/2022]
Abstract
Some interesting recent studies have shown that neural network models are useful alternatives in modeling survival data when the assumptions of a classical parametric or semiparametric survival model such as the Cox (1972) model are seriously violated. However, to the best of our knowledge, the plausibility of adapting the emerging extreme learning machine (ELM) algorithm for single-hidden-layer feedforward neural networks to survival analysis has not been explored. In this paper, we present a kernel ELM Cox model regularized by an L0 -based broken adaptive ridge (BAR) penalization method. Then, we demonstrate that the resulting method, referred to as ELMCoxBAR, can outperform some other state-of-art survival prediction methods such as L1 - or L2 -regularized Cox regression, random survival forest with various splitting rules, and boosted Cox model, in terms of its predictive performance using both simulated and real world datasets. In addition to its good predictive performance, we illustrate that the proposed method has a key computational advantage over the above competing methods in terms of computation time efficiency using an a real-world ultra-high-dimensional survival data.
Collapse
Affiliation(s)
- Hong Wang
- School of Mathematics and Statistics, Central South University, Changsha, China
| | - Gang Li
- Department of Biostatistics, UCLA Fielding School of Public Health, University of California, Los Angeles, California
| |
Collapse
|
36
|
Arboretti R, Bathke AC, Carrozzo E, Pesarin F, Salmaso L. Multivariate permutation tests for two sample testing in presence of nondetects with application to microarray data. Stat Methods Med Res 2019; 29:258-271. [PMID: 30799774 DOI: 10.1177/0962280219832225] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Very often, data collected in medical research are characterized by censored observations and/or data with mass on the value zero. This happens for example when some measurements fall below the detection limits of the specific instrument used. This type of left censored observations is called "nondetects". Such a situation of an excessive number of zeros in a data set is also referred to as zero-inflated data. In the present work, we aim at comparing different multivariate permutation procedures in two-sample testing for data with nondetects. The effect of censoring is investigated with regard to the different values that may be attributed to nondetected values, both under the null hypothesis and under alternative. We motivate the problem using data from allergy research.
Collapse
Affiliation(s)
- Rosa Arboretti
- Department of Civil Environmental and Architectural Engineering, University of Padova, Padua, Italy
| | - Arne C Bathke
- Department of Mathematics, University of Salzburg, Salzburg, Austria.,Department of Statistics, University of Kentucky, Lexington, KY, USA
| | - Eleonora Carrozzo
- Department of Management Engineering, University of Padova, Padua, Italy
| | | | - Luigi Salmaso
- Department of Management Engineering, University of Padova, Padua, Italy
| |
Collapse
|
37
|
Lachos VH, A Matos L, Castro LM, Chen MH. Flexible longitudinal linear mixed models for multiple censored responses data. Stat Med 2018; 38:1074-1102. [PMID: 30421470 DOI: 10.1002/sim.8017] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Revised: 09/27/2018] [Accepted: 10/01/2018] [Indexed: 11/06/2022]
Abstract
In biomedical studies and clinical trials, repeated measures are often subject to some upper and/or lower limits of detection. Hence, the responses are either left or right censored. A complication arises when more than one series of responses is repeatedly collected on each subject at irregular intervals over a period of time and the data exhibit tails heavier than the normal distribution. The multivariate censored linear mixed effect (MLMEC) model is a frequently used tool for a joint analysis of more than one series of longitudinal data. In this context, we develop a robust generalization of the MLMEC based on the scale mixtures of normal distributions. To take into account the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is considered. For this complex longitudinal structure, we propose an exact estimation procedure to obtain the maximum-likelihood estimates of the fixed effects and variance components using a stochastic approximation of the EM algorithm. This approach allows us to estimate the parameters of interest easily and quickly as well as to obtain the standard errors of the fixed effects, the predictions of unobservable values of the responses, and the log-likelihood function as a byproduct. The proposed method is applied to analyze a set of AIDS data and is examined via a simulation study.
Collapse
Affiliation(s)
- Victor H Lachos
- Department of Statistics, University of Connecticut, Storrs, Connecticut
| | - Larissa A Matos
- Departamento de Estatística, Universidade Estadual de Campinas, Campinas, Brazil
| | - Luis M Castro
- Departamento de Estadística, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Ming-Hui Chen
- Department of Statistics, University of Connecticut, Storrs, Connecticut
| |
Collapse
|
38
|
Chik AHS, Schmidt PJ, Emelko MB. Learning Something From Nothing: The Critical Importance of Rethinking Microbial Non-detects. Front Microbiol 2018; 9:2304. [PMID: 30344512 PMCID: PMC6182096 DOI: 10.3389/fmicb.2018.02304] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2018] [Accepted: 09/10/2018] [Indexed: 11/18/2022] Open
Abstract
Accurate estimation of microbial concentrations is necessary to inform many important environmental science and public health decisions and regulations. Critically, widespread misconceptions about laboratory-reported microbial non-detects have led to their erroneous description and handling as "censored" values. This ultimately compromises their interpretation and undermines efforts to describe and model microbial concentrations accurately. Herein, these misconceptions are dispelled by (1) discussing the critical differences between discrete microbial observations and continuous data acquired using analytical chemistry methodologies and (2) demonstrating the bias introduced by statistical approaches tailored for chemistry data and misapplied to discrete microbial data. Notably, these approaches especially preclude the accurate representation of low concentrations and those estimated using microbial methods with low or variable analytical recovery, which can be expected to result in non-detects. Techniques that account for the probabilistic relationship between observed data and underlying microbial concentrations have been widely demonstrated, and their necessity for handling non-detects (in a way which is consistent with the handling of positive observations) is underscored herein. Habitual reporting of raw microbial observations and sample sizes is proposed to facilitate accurate estimation and analysis of microbial concentrations.
Collapse
Affiliation(s)
- Alex Ho Shing Chik
- Department of Civil and Environmental Engineering, Faculty of Engineering, University of Waterloo, Waterloo, ON, Canada
- Institute of Hydraulic Engineering and Water Resources Management, Vienna University of Technology, Vienna, Austria
- Department of Earth Sciences, Faculty of Geosciences, Utrecht University, Utrecht, Netherlands
| | - Philip J. Schmidt
- Department of Civil and Environmental Engineering, Faculty of Engineering, University of Waterloo, Waterloo, ON, Canada
| | - Monica B. Emelko
- Department of Civil and Environmental Engineering, Faculty of Engineering, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|
39
|
Szarka AZ, Hayworth CG, Ramanarayanan TS, Joseph RSI. Statistical Techniques to Analyze Pesticide Data Program Food Residue Observations. J Agric Food Chem 2018; 66:7165-7171. [PMID: 29902006 DOI: 10.1021/acs.jafc.8b00863] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The U.S. EPA conducts dietary-risk assessments to ensure that levels of pesticides on food in the U.S. food supply are safe. Often these assessments utilize conservative residue estimates, maximum residue levels (MRLs), and a high-end estimate derived from registrant-generated field-trial data sets. A more realistic estimate of consumers' pesticide exposure from food may be obtained by utilizing residues from food-monitoring programs, such as the Pesticide Data Program (PDP) of the U.S. Department of Agriculture. A substantial portion of food-residue concentrations in PDP monitoring programs are below the limits of detection (left-censored), which makes the comparison of regulatory-field-trial and PDP residue levels difficult. In this paper, we present a novel adaption of established statistical techniques, the Kaplan-Meier estimator (K-M), the robust regression on ordered statistic (ROS), and the maximum-likelihood estimator (MLE), to quantify the pesticide-residue concentrations in the presence of heavily censored data sets. The examined statistical approaches include the most commonly used parametric and nonparametric methods for handling left-censored data that have been used in the fields of medical and environmental sciences. This work presents a case study in which data of thiamethoxam residue on bell pepper generated from registrant field trials were compared with PDP-monitoring residue values. The results from the statistical techniques were evaluated and compared with commonly used simple substitution methods for the determination of summary statistics. It was found that the maximum-likelihood estimator (MLE) is the most appropriate statistical method to analyze this residue data set. Using the MLE technique, the data analyses showed that the median and mean PDP bell pepper residue levels were approximately 19 and 7 times lower, respectively, than the corresponding statistics of the field-trial residues.
Collapse
Affiliation(s)
- Arpad Z Szarka
- Operator and Consumer Safety , Syngenta Crop Protection, LLC , Greensboro , North Carolina 27419 , United States
| | - Carol G Hayworth
- Operator and Consumer Safety , Syngenta Crop Protection, LLC , Greensboro , North Carolina 27419 , United States
| | - Tharacad S Ramanarayanan
- Operator and Consumer Safety , Syngenta Crop Protection, LLC , Greensboro , North Carolina 27419 , United States
| | - Robert S I Joseph
- Operator and Consumer Safety , Syngenta Crop Protection, LLC , Greensboro , North Carolina 27419 , United States
| |
Collapse
|
40
|
Orbe J, Virto J. Penalized spline smoothing using Kaplan-Meier weights with censored data. Biom J 2018; 60:947-961. [PMID: 29943440 DOI: 10.1002/bimj.201700213] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Revised: 05/29/2018] [Accepted: 05/30/2018] [Indexed: 11/11/2022]
Abstract
In this paper, we consider the problem of nonparametric curve fitting in the specific context of censored data. We propose an extension of the penalized splines approach using Kaplan-Meier weights to take into account the effect of censorship and generalized cross-validation techniques to choose the smoothing parameter adapted to the case of censored samples. Using various simulation studies, we analyze the effectiveness of the censored penalized splines method proposed and show that the performance is quite satisfactory. We have extended this proposal to a generalized additive models (GAM) framework introducing a correction of the censorship effect, thus enabling more complex models to be estimated immediately. A real dataset from Stanford Heart Transplant data is also used to illustrate the methodology proposed, which is shown to be a good alternative when the probability distribution for the response variable and the functional form are not known in censored regression models.
Collapse
Affiliation(s)
- Jesus Orbe
- Department of Econometrics and Statistics, University of the Basque Country UPV/EHU, Bilbao, Spain
| | - Jorge Virto
- Department of Econometrics and Statistics, University of the Basque Country UPV/EHU, Bilbao, Spain
| |
Collapse
|
41
|
Lin TI, Lachos VH, Wang WL. Multivariate longitudinal data analysis with censored and intermittent missing responses. Stat Med 2018; 37:2822-2835. [PMID: 29740829 DOI: 10.1002/sim.7692] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 03/31/2018] [Accepted: 04/02/2018] [Indexed: 11/08/2022]
Abstract
The multivariate linear mixed model (MLMM) has emerged as an important analytical tool for longitudinal data with multiple outcomes. However, the analysis of multivariate longitudinal data could be complicated by the presence of censored measurements because of a detection limit of the assay in combination with unavoidable missing values arising when subjects miss some of their scheduled visits intermittently. This paper presents a generalization of the MLMM approach, called the MLMM-CM, for a joint analysis of the multivariate longitudinal data with censored and intermittent missing responses. A computationally feasible expectation maximization-based procedure is developed to carry out maximum likelihood estimation within the MLMM-CM framework. Moreover, the asymptotic standard errors of fixed effects are explicitly obtained via the information-based method. We illustrate our methodology by using simulated data and a case study from an AIDS clinical trial. Experimental results reveal that the proposed method is able to provide more satisfactory performance as compared with the traditional MLMM approach.
Collapse
Affiliation(s)
- Tsung-I Lin
- Institute of Statistics, National Chung Hsing University, Taichung 402, Taiwan
- Department of Public Health, China Medical University, Taichung 404, Taiwan
| | - Victor H Lachos
- Department of Statistics, University of Connecticut, Storrs, CT 06269, USA
| | - Wan-Lun Wang
- Department of Statistics, Graduate Institute of Statistics and Actuarial Science, Feng Chia University, Taichung 40724, Taiwan
| |
Collapse
|
42
|
Abstract
In modeling censored data, survival forest models are a competitive nonparametric alternative to traditional parametric or semiparametric models when the function forms are possibly misspecified or the underlying assumptions are violated. In this work, we propose a survival forest approach with trees constructed using a novel pseudo R2 splitting rules. By studying the well-known benchmark data sets, we find that the proposed model generally outperforms popular survival models such as random survival forest with different splitting rules, Cox proportional hazard model, and generalized boosted model in terms of C-index metric.
Collapse
Affiliation(s)
- Hong Wang
- School of Mathematics and Statistics, Central South University, Changsha, China
| | - Xiaolin Chen
- School of Statistics, Qufu Normal University, Qufu, China
| | - Gang Li
- Department of Biostatistics, School of Public Health, University of California at Los Angeles, Los Angeles, California
| |
Collapse
|
43
|
Khan MHR. On the performance of adaptive preprocessing technique in analyzing high-dimensional censored data. Biom J 2018; 60:687-702. [PMID: 29603360 DOI: 10.1002/bimj.201600256] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Revised: 09/05/2017] [Accepted: 10/20/2017] [Indexed: 11/09/2022]
Abstract
Preprocessing for high-dimensional censored datasets, such as the microarray data, is generally considered as an important technique to gain further stability by reducing potential noise from the data. When variable selection including inference is carried out with high-dimensional censored data the objective is to obtain a smaller subset of variables and then perform the inferential analysis using model estimates based on the selected subset of variables. This two stage inferential analysis is prone to circularity bias because of the noise that might still remain in the dataset. In this work, I propose an adaptive preprocessing technique that uses sure independence screening (SIS) idea to accomplish variable selection and reduces the circularity bias by some popularly known refined high-dimensional methods such as the elastic net, adaptive elastic net, weighted elastic net, elastic net-AFT, and two greedy variable selection methods known as TCS, PC-simple all implemented with the accelerated lifetime models. The proposed technique addresses several features including the issue of collinearity between important and some unimportant covariates, which is often the case in high-dimensional setting under variable selection framework, and different level of censoring. Simulation studies along with an empirical analysis with a real microarray data, mantle cell lymphoma, is carried out to demonstrate the performance of the adaptive pre-processing technique.
Collapse
Affiliation(s)
- Md Hasinur Rahaman Khan
- Applied Statistics, Institute of Statistical Research and Training, University of Dhaka, Dhaka, 1000, Bangladesh
| |
Collapse
|
44
|
Li X, Xie S, Zeng D, Wang Y. Efficient ℓ 0 -norm feature selection based on augmented and penalized minimization. Stat Med 2018; 37:473-486. [PMID: 29082539 PMCID: PMC5768461 DOI: 10.1002/sim.7526] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Revised: 07/04/2017] [Accepted: 09/13/2017] [Indexed: 11/06/2022]
Abstract
Advances in high-throughput technologies in genomics and imaging yield unprecedentedly large numbers of prognostic biomarkers. To accommodate the scale of biomarkers and study their association with disease outcomes, penalized regression is often used to identify important biomarkers. The ideal variable selection procedure would search for the best subset of predictors, which is equivalent to imposing an ℓ0 -penalty on the regression coefficients. Since this optimization is a nondeterministic polynomial-time hard (NP-hard) problem that does not scale with number of biomarkers, alternative methods mostly place smooth penalties on the regression parameters, which lead to computationally feasible optimization problems. However, empirical studies and theoretical analyses show that convex approximation of ℓ0 -norm (eg, ℓ1 ) does not outperform their ℓ0 counterpart. The progress for ℓ0 -norm feature selection is relatively slower, where the main methods are greedy algorithms such as stepwise regression or orthogonal matching pursuit. Penalized regression based on regularizing ℓ0 -norm remains much less explored in the literature. In this work, inspired by the recently popular augmenting and data splitting algorithms including alternating direction method of multipliers, we propose a 2-stage procedure for ℓ0 -penalty variable selection, referred to as augmented penalized minimization-L0 (APM-L0 ). The APM-L0 targets ℓ0 -norm as closely as possible while keeping computation tractable, efficient, and simple, which is achieved by iterating between a convex regularized regression and a simple hard-thresholding estimation. The procedure can be viewed as arising from regularized optimization with truncated ℓ1 norm. Thus, we propose to treat regularization parameter and thresholding parameter as tuning parameters and select based on cross-validation. A 1-step coordinate descent algorithm is used in the first stage to significantly improve computational efficiency. Through extensive simulation studies and real data application, we demonstrate superior performance of the proposed method in terms of selection accuracy and computational speed as compared to existing methods. The proposed APM-L0 procedure is implemented in the R-package APML0.
Collapse
Affiliation(s)
- Xiang Li
- Statistics and Decision Sciences, Janssen Research & Development, LLC, Raritan, NJ, USA
| | - Shanghong Xie
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032, USA
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Yuanjia Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032, USA
| |
Collapse
|
45
|
Jaspers S, Komárek A, Aerts M. Bayesian estimation of multivariate normal mixtures with covariate-dependent mixing weights, with an application in antimicrobial resistance monitoring. Biom J 2018; 60:7-19. [PMID: 28898442 DOI: 10.1002/bimj.201600253] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Revised: 07/04/2017] [Accepted: 07/07/2017] [Indexed: 11/05/2022]
Abstract
Bacteria with a reduced susceptibility against antimicrobials pose a major threat to public health. Therefore, large programs have been set up to collect minimum inhibition concentration (MIC) values. These values can be used to monitor the distribution of the nonsusceptible isolates in the general population. Data are collected within several countries and over a number of years. In addition, the sampled bacterial isolates were not tested for susceptibility against one antimicrobial, but rather against an entire range of substances. Interest is therefore in the analysis of the joint distribution of MIC data on two or more antimicrobials, while accounting for a possible effect of covariates. In this regard, we present a Bayesian semiparametric density estimation routine, based on multivariate Gaussian mixtures. The mixing weights are allowed to depend on certain covariates, thereby allowing the user to detect certain changes over, for example, time. The new approach was applied to data collected in Europe in 2010, 2012, and 2013. We investigated the susceptibility of Escherichia coli isolates against ampicillin and trimethoprim, where we found that there seems to be a significant increase in the proportion of nonsusceptible isolates. In addition, a simulation study was carried out, showing the promising behavior of the proposed method in the field of antimicrobial resistance.
Collapse
Affiliation(s)
- Stijn Jaspers
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, BE3590, Diepenbeek, Belgium
| | - Arnošt Komárek
- Department of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, Charles University, CZ-186, 75 Praha 8-Karln, Czech Republic
| | - Marc Aerts
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, BE3590, Diepenbeek, Belgium
| |
Collapse
|
46
|
Abstract
This paper proposes a decorrelation-based approach to test hypotheses and construct confidence intervals for the low dimensional component of high dimensional proportional hazards models. Motivated by the geometric projection principle, we propose new decorrelated score, Wald and partial likelihood ratio statistics. Without assuming model selection consistency, we prove the asymptotic normality of these test statistics, establish their semiparametric optimality. We also develop new procedures for constructing pointwise confidence intervals for the baseline hazard function and baseline survival function. Thorough numerical results are provided to back up our theory.
Collapse
Affiliation(s)
- Ethan X. Fang
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Yang Ning
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Han Liu
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
47
|
Yousefi A, Dougherty DD, Eskandar EN, Widge AS, Eden UT. Estimating Dynamic Signals From Trial Data With Censored Values. Comput Psychiatr 2017; 1:58-81. [PMID: 29601047 PMCID: PMC5774187 DOI: 10.1162/cpsy_a_00003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 04/05/2017] [Indexed: 11/12/2022]
Abstract
Censored data occur commonly in trial-structured behavioral experiments and many other forms of longitudinal data. They can lead to severe bias and reduction of statistical power in subsequent analyses. Principled approaches for dealing with censored data, such as data imputation and methods based on the complete data's likelihood, work well for estimating fixed features of statistical models but have not been extended to dynamic measures, such as serial estimates of an underlying latent variable over time. Here we propose an approach to the censored-data problem for dynamic behavioral signals. We developed a state-space modeling framework with a censored observation process at the trial timescale. We then developed a filter algorithm to compute the posterior distribution of the state process using the available data. We showed that special cases of this framework can incorporate the three most common approaches to censored observations: ignoring trials with censored data, imputing the censored data values, or using the full information available in the data likelihood. Finally, we derived a computationally efficient approximate Gaussian filter that is similar in structure to a Kalman filter, but that efficiently accounts for censored data. We compared the performances of these methods in a simulation study and provide recommendations of approaches to use, based on the expected amount of censored data in an experiment. These new techniques can broadly be applied in many research domains in which censored data interfere with estimation, including survival analysis and other clinical trial applications.
Collapse
Affiliation(s)
- Ali Yousefi
- Department of Neurological Surgery, Massachusetts General Hospital and Harvard Medical School, Boston, MA
- Department of Mathematics and Statistics, Boston University, Boston, MA
| | - Darin D. Dougherty
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Emad N. Eskandar
- Department of Neurological Surgery, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Alik S. Widge
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA
- Picower Institute for Learning & Memory, Massachusetts Institute of Technology, Cambridge, MA
| | - Uri T. Eden
- Department of Mathematics and Statistics, Boston University, Boston, MA
| |
Collapse
|
48
|
Han X, Zhang Y, Shao Y. On comparing 2 correlated C indices with censored survival data. Stat Med 2017; 36:4041-4049. [PMID: 28758216 DOI: 10.1002/sim.7414] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Revised: 06/01/2017] [Accepted: 06/21/2017] [Indexed: 12/21/2022]
Abstract
As new biomarkers and risk prediction procedures are in rapid development, it is of great interest to develop valid methods for comparing predictive power of 2 biomarkers or risk score systems. Harrell C statistic has been routinely used as a global adequacy assessment of a risk score system, and the difference of 2 Harrell C statistics as a test statistic has been suggested in recent literature for comparison of predictive power of 2 biomarkers for censored outcome. In this study, we showed that such a test can have severely inflated type I errors as the difference between the 2 Harrell C statistics does not converge to zero under the null hypothesis of equal predictive power measured by concordance probabilities, as illustrated by 2 counterexamples and corresponding numerical simulations. We further investigate a necessary and sufficient condition under which the difference of 2 Harrell C statistics converges to zero under the null hypothesis.
Collapse
Affiliation(s)
- Xiaoxia Han
- Departments of Population Health, New York University School of Medicine, New York, NY10016, USA
| | - Yilong Zhang
- Merck Research Laboratories, Rahway, 07065, NJ, USA
| | - Yongzhao Shao
- Departments of Population Health, New York University School of Medicine, New York, NY10016, USA
| |
Collapse
|
49
|
Lee AJ, Marder K, Alcalay RN, Mejia-Santana H, Orr-Urtreger A, Giladi N, Bressman S, Wang Y. Estimation of genetic risk function with covariates in the presence of missing genotypes. Stat Med 2017; 36:3533-3546. [PMID: 28656686 DOI: 10.1002/sim.7376] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 02/28/2017] [Accepted: 05/30/2017] [Indexed: 12/13/2022]
Abstract
In genetic epidemiological studies, family history data are collected on relatives of study participants and used to estimate the age-specific risk of disease for individuals who carry a causal mutation. However, a family member's genotype data may not be collected because of the high cost of in-person interview to obtain blood sample or death of a relative. Previously, efficient nonparametric genotype-specific risk estimation in censored mixture data has been proposed without considering covariates. With multiple predictive risk factors available, risk estimation requires a multivariate model to account for additional covariates that may affect disease risk simultaneously. Therefore, it is important to consider the role of covariates in genotype-specific distribution estimation using family history data. We propose an estimation method that permits more precise risk prediction by controlling for individual characteristics and incorporating interaction effects with missing genotypes in relatives, and thus, gene-gene interactions and gene-environment interactions can be handled within the framework of a single model. We examine performance of the proposed methods by simulations and apply them to estimate the age-specific cumulative risk of Parkinson's disease (PD) in carriers of the LRRK2 G2019S mutation using first-degree relatives who are at genetic risk for PD. The utility of estimated carrier risk is demonstrated through designing a future clinical trial under various assumptions. Such sample size estimation is seen in the Huntington's disease literature using the length of abnormal expansion of a CAG repeat in the HTT gene but is less common in the PD literature. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Annie J Lee
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, U.S.A
| | - Karen Marder
- Department of Neurology, College of Physicians and Surgeons, Columbia University, New York, NY, U.S.A.,Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University, New York, NY, U.S.A
| | - Roy N Alcalay
- Department of Neurology, College of Physicians and Surgeons, Columbia University, New York, NY, U.S.A.,Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University, New York, NY, U.S.A
| | - Helen Mejia-Santana
- Department of Neurology, College of Physicians and Surgeons, Columbia University, New York, NY, U.S.A
| | - Avi Orr-Urtreger
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.,Genetic Institute, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Nir Giladi
- Sagol School for Neurosciences, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.,Neurological Institute, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Susan Bressman
- Department of Neurology, Mount Sinai Beth Israel Medical Center, New York, NY, U.S.A
| | - Yuanjia Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, U.S.A
| |
Collapse
|
50
|
Procházka B, Kynčl J. Estimating the Baseline Incidence of a Seasonal Disease Independently of Epidemic Outbreaks. Cent Eur J Public Health 2017; 24:199-205. [PMID: 27760285 DOI: 10.21101/cejph.a4800] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 09/23/2016] [Indexed: 11/15/2022]
Abstract
In epidemiology, it is very important to estimate the baseline incidence of infectious diseases, but the available data are often subject to outliers due to epidemic outbreaks. Consequently, the estimate of the baseline incidence is biased and so is the predicted epidemic threshold which is a crucial reference indicator used to suspect and detect an epidemic outbreak. Another problem is that the "usual" incidence varies in a season dependent manner, i.e. it may not be constant throughout the year, is often periodic, and may also show a trend between years. To take account of these factors, more complicated models adjusted for outliers are used. If not adjusted for outliers, the baseline incidence estimate is biased. As a result, the epidemic threshold can be overestimated and thus can make the detection of an epidemic outbreak more difficult. Classical Serfling's model is based on the sine function with a phase shift and amplitude. Multiple approaches are applied to model the long-term and seasonal trends. Nevertheless, none of them controls for the effect of epidemic outbreaks. The present article deals with the adjustment of the data biased by epidemic outbreaks. Some models adjusted for outliers, i.e. for the effect of epidemic outbreaks, are presented. A possible option is to remove the epidemic weeks from the analysis, but consequently, in some calendar weeks, data will only be available for a small number of years. Furthermore, the detection of an epidemic outbreak by experts (epidemiologists and microbiologists) will be compared with that in various models.
Collapse
Affiliation(s)
- Bohumír Procházka
- Unit for Biostatistics, National Institute of Public Health, Prague, Czech Republic.,Department of Child and Youth Health, 3rd Faculty of Medicine, Charles University in Prague, Prague, Czech Republic
| | - Jan Kynčl
- Unit for Infectious Diseases Epidemiology, National Institute of Public Health, Prague, Czech Republic.,Department of Epidemiology, 3rd Faculty of Medicine, Charles University in Prague, Prague, Czech Republic
| |
Collapse
|