1
|
Yu A, Zhong Y, Feng X, Wei Y. Quantile regression for nonignorable missing data with its application of analyzing electronic medical records. Biometrics 2023; 79:2036-2049. [PMID: 35861675 DOI: 10.1111/biom.13723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 07/15/2022] [Indexed: 11/27/2022]
Abstract
Over the past decade, there has been growing enthusiasm for using electronic medical records (EMRs) for biomedical research. Quantile regression estimates distributional associations, providing unique insights into the intricacies and heterogeneity of the EMR data. However, the widespread nonignorable missing observations in EMR often obscure the true associations and challenge its potential for robust biomedical discoveries. We propose a novel method to estimate the covariate effects in the presence of nonignorable missing responses under quantile regression. This method imposes no parametric specifications on response distributions, which subtly uses implicit distributions induced by the corresponding quantile regression models. We show that the proposed estimator is consistent and asymptotically normal. We also provide an efficient algorithm to obtain the proposed estimate and a randomly weighted bootstrap approach for statistical inferences. Numerical studies, including an empirical analysis of real-world EMR data, are used to assess the proposed method's finite-sample performance compared to existing literature.
Collapse
Affiliation(s)
- Aiai Yu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Yujie Zhong
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Xingdong Feng
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Ying Wei
- Department of Biostatistics, Columbia University, New York, New York, USA
| |
Collapse
|
2
|
Garcia-Vicuña D, López-Cheda A, Jácome MA, Mallor F. Estimation of patient flow in hospitals using up-to-date data. Application to bed demand prediction during pandemic waves. PLoS One 2023; 18:e0282331. [PMID: 36848360 PMCID: PMC9970104 DOI: 10.1371/journal.pone.0282331] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 02/13/2023] [Indexed: 03/01/2023] Open
Abstract
Hospital bed demand forecast is a first-order concern for public health action to avoid healthcare systems to be overwhelmed. Predictions are usually performed by estimating patients flow, that is, lengths of stay and branching probabilities. In most approaches in the literature, estimations rely on not updated published information or historical data. This may lead to unreliable estimates and biased forecasts during new or non-stationary situations. In this paper, we introduce a flexible adaptive procedure using only near-real-time information. Such method requires handling censored information from patients still in hospital. This approach allows the efficient estimation of the distributions of lengths of stay and probabilities used to represent the patient pathways. This is very relevant at the first stages of a pandemic, when there is much uncertainty and too few patients have completely observed pathways. Furthermore, the performance of the proposed method is assessed in an extensive simulation study in which the patient flow in a hospital during a pandemic wave is modelled. We further discuss the advantages and limitations of the method, as well as potential extensions.
Collapse
Affiliation(s)
| | - Ana López-Cheda
- Departamento de Matemáticas, Research Group MODES, CITIC, Universidade da Coruña, A Coruña, Spain
| | - María Amalia Jácome
- Departamento de Matemáticas, Research Group MODES, CITIC, Universidade da Coruña, A Coruña, Spain
| | - Fermin Mallor
- Institute of Smart Cities, Public University of Navawordpadrre, Pamplona, Spain
| |
Collapse
|
3
|
Statistical Inference for Partially Linear Varying Coefficient Quantile Models with Missing Responses. Symmetry (Basel) 2022. [DOI: 10.3390/sym14112258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The construction of confidence intervals is investigated for the partially linear varying coefficient quantile model with missing random responses. Combined with quantile regression, an imputation-based empirical likelihood method is proposed to construct confidence intervals for parametric and varying coefficient components. Then, it is proved that the proposed empirical log-likelihood ratios are asymptotically Chi-square in theory. Finally, the symmetry confidence intervals of the parametric components and the point-by-point confidence intervals of the varying coefficient components are constructed in the simulation studies to demonstrate further that the proposed method yields smaller confidence intervals and higher coverage probabilities.
Collapse
|
4
|
Safari WC, López-de-Ullibarri I, Jácome MA. Nonparametric kernel estimation of the probability of cure in a mixture cure model when the cure status is partially observed. Stat Methods Med Res 2022; 31:2164-2188. [PMID: 35912505 DOI: 10.1177/09622802221115880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Cure models are a class of time-to-event models where a proportion of individuals will never experience the event of interest. The lifetimes of these so-called cured individuals are always censored. It is usually assumed that one never knows which censored observation is cured and which is uncured, so the cure status is unknown for censored times. In this paper, we develop a method to estimate the probability of cure in the mixture cure model when some censored individuals are known to be cured. A cure probability estimator that incorporates the cure status information is introduced. This estimator is shown to be strongly consistent and asymptotically normally distributed. Two alternative estimators are also presented. The first one considers a competing risks approach with two types of competing events, the event of interest and the cure. The second alternative estimator is based on the fact that the probability of cure can be written as the conditional mean of the cure status. Hence, nonparametric regression methods can be applied to estimate this conditional mean. However, the cure status remains unknown for some censored individuals. Consequently, the application of regression methods in this context requires handling missing data in the response variable (cure status). Simulations are performed to evaluate the finite sample performance of the estimators, and we apply them to the analysis of two datasets related to survival of breast cancer patients and length of hospital stay of COVID-19 patients requiring intensive care.
Collapse
Affiliation(s)
- Wende Clarence Safari
- Department of Mathematics, Faculty of Computer Science, CITIC, 117349University of A Coruña, A Coruña, Spain
| | - Ignacio López-de-Ullibarri
- Department of Mathematics, 88066Escuela Politécnica de Ingeniería de Ferrol, University of A Coruña, A Coruña, , Spain
| | - María Amalia Jácome
- Department of Mathematics, Faculty of Science, CITIC, 117349University of A Coruña, A Coruña, Spain
| |
Collapse
|
5
|
Laqueur HS, Shev AB, Kagawa RMC. SuperMICE: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations. Am J Epidemiol 2022; 191:516-525. [PMID: 34788362 DOI: 10.1093/aje/kwab271] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 09/17/2021] [Accepted: 11/08/2021] [Indexed: 11/13/2022] Open
Abstract
Researchers often face the problem of how to address missing data. Multiple imputation is a popular approach, with multiple imputation by chained equations (MICE) being among the most common and flexible methods for execution. MICE iteratively fits a predictive model for each variable with missing values, conditional on other variables in the data. In theory, any imputation model can be used to predict the missing values. However, if the predictive models are incorrectly specified, they may produce biased estimates of the imputed data, yielding inconsistent parameter estimates and invalid inference. Given the set of modeling choices that must be made in conducting multiple imputation, in this paper we propose a data-adaptive approach to model selection. Specifically, we adapt MICE to incorporate an ensemble algorithm, Super Learner, to predict the conditional mean for each missing value, and we also incorporate a local kernel-based estimate of variance. We present a set of simulations indicating that this approach produces final parameter estimates with lower bias and better coverage than other commonly used imputation methods. These results suggest that using a flexible machine learning imputation approach can be useful in settings where data are missing at random, especially when the relationships among the variables are complex.
Collapse
|
6
|
Joint sufficient dimension reduction for estimating continuous treatment effect functions. J MULTIVARIATE ANAL 2019; 168:48-62. [PMID: 30872872 DOI: 10.1016/j.jmva.2018.07.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The estimation of continuous treatment effect functions using observational data often requires parametric specification of the effect curves, the conditional distributions of outcomes and treatment assignments given multi-dimensional covariates. While nonparametric extensions are possible, they typically suffer from the curse of dimensionality. Dimension reduction is often inevitable and we propose a sufficient dimension reduction framework to balance parsimony and flexibility. The joint central subspace can be estimated at a n 1/2-rate without fixing its dimension in advance, and the treatment effect function is estimated by averaging local estimates of a reduced dimension. Asymptotic properties are studied. Unlike binary treatments, continuous treatments require multiple smoothing parameters of different asymptotic orders to borrow different facets of information, and their joint estimation is proposed by a non-standard version of the infinitesimal jackknife.
Collapse
|
7
|
Plug-in marginal estimation under a general regression model with missing responses and covariates. TEST-SPAIN 2019. [DOI: 10.1007/s11749-018-0591-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
8
|
Tomita H, Fujisawa H, Henmi M. A bias-corrected estimator in multiple imputation for missing data. Stat Med 2018; 37:3373-3386. [DOI: 10.1002/sim.7833] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Revised: 05/01/2018] [Accepted: 05/03/2018] [Indexed: 01/15/2023]
Affiliation(s)
- Hiroaki Tomita
- Department of Statistical Science, School of Multidisciplinary Sciences; SOKENDAI (The Graduate University for Advanced Studies); Tokyo Japan
| | - Hironori Fujisawa
- Department of Statistical Science, School of Multidisciplinary Sciences; SOKENDAI (The Graduate University for Advanced Studies); Tokyo Japan
- The Institute of Statistical Mathematics; Tokyo Japan
- Department of Mathematical Statistics; Nagoya University Graduate School of Medicine; Nagoya Japan
| | - Masayuki Henmi
- Department of Statistical Science, School of Multidisciplinary Sciences; SOKENDAI (The Graduate University for Advanced Studies); Tokyo Japan
- The Institute of Statistical Mathematics; Tokyo Japan
| |
Collapse
|
9
|
Qiu Z. Statistical inference under imputation for proportional hazard model with missing covariates. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2016.1275696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Zhiping Qiu
- School of Mathematical Sciences, Huaqiao University, Quanzhou, China
- Research Center for Applied Statistics and Big Data, Huaqiao University, Xiamen, China
| |
Collapse
|
10
|
Boente G, Martínez AM. Estimating additive models with missing responses. COMMUN STAT-THEOR M 2016. [DOI: 10.1080/03610926.2013.815780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
11
|
Hsu CH, He Y, Li Y, Long Q, Friese R. Doubly robust multiple imputation using kernel-based techniques. Biom J 2015; 58:588-606. [PMID: 26647734 DOI: 10.1002/bimj.201400256] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Revised: 08/19/2015] [Accepted: 10/07/2015] [Indexed: 11/08/2022]
Abstract
We consider the problem of estimating the marginal mean of an incompletely observed variable and develop a multiple imputation approach. Using fully observed predictors, we first establish two working models: one predicts the missing outcome variable, and the other predicts the probability of missingness. The predictive scores from the two models are used to measure the similarity between the incomplete and observed cases. Based on the predictive scores, we construct a set of kernel weights for the observed cases, with higher weights indicating more similarity. Missing data are imputed by sampling from the observed cases with probability proportional to their kernel weights. The proposed approach can produce reasonable estimates for the marginal mean and has a double robustness property, provided that one of the two working models is correctly specified. It also shows some robustness against misspecification of both models. We demonstrate these patterns in a simulation study. In a real-data example, we analyze the total helicopter response time from injury in the Arizona emergency medical service data.
Collapse
Affiliation(s)
- Chiu-Hsieh Hsu
- Division of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, 1295 N. Martin A232 Campus, PO Box 245211, Tucson, AZ, 85724, USA.,Arizona Cancer Center, College of Medicine, University of Arizona, Tucson, AZ, 85724, USA
| | - Yulei He
- National Center for Health Statistics, CDC, Hyattsville, MD, 20782, USA
| | - Yisheng Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Qi Long
- Department of Biostatistics and Bioinformatics, School of Public Health, Emory University, Atlanta, GA, 48109, USA
| | - Randall Friese
- Department of Surgery, College of Medicine, University of Arizona, Tucson, AZ, 85724, USA
| |
Collapse
|
12
|
Han P. Combining Inverse Probability Weighting and Multiple Imputation to Improve Robustness of Estimation. Scand Stat Theory Appl 2015. [DOI: 10.1111/sjos.12177] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Peisong Han
- Department of Statistics and Actuarial Science; University of Waterloo
| |
Collapse
|
13
|
|
14
|
A kernel-assisted imputation estimating method for the additive hazards model with missing censoring indicator. Stat Probab Lett 2015. [DOI: 10.1016/j.spl.2014.12.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
15
|
Xue Y, Lazar NA. Empirical likelihood-based hot deck imputation methods. J Nonparametr Stat 2012. [DOI: 10.1080/10485252.2012.690879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
16
|
Qin J, Shao J, Zhang B. Efficient and Doubly Robust Imputation for Covariate-Dependent Missing Responses. J Am Stat Assoc 2012. [DOI: 10.1198/016214508000000238] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Affiliation(s)
- Jing Qin
- Jing Qin is Mathematical Statistician, Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892. Jun Shao is Professor, Department of Statistics, University of Wisconsin, Madison, WI 53706. Biao Zhang is Professor, Department of Mathematics, University of Toledo, Toledo, OH 43606. The research of Shao was supported in part by National Science Foundation (NSF) grants DMS0404535 and SES-0705033. The research of Zhang was
| | - Jun Shao
- Jing Qin is Mathematical Statistician, Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892. Jun Shao is Professor, Department of Statistics, University of Wisconsin, Madison, WI 53706. Biao Zhang is Professor, Department of Mathematics, University of Toledo, Toledo, OH 43606. The research of Shao was supported in part by National Science Foundation (NSF) grants DMS0404535 and SES-0705033. The research of Zhang was
| | - Biao Zhang
- Jing Qin is Mathematical Statistician, Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892. Jun Shao is Professor, Department of Statistics, University of Wisconsin, Madison, WI 53706. Biao Zhang is Professor, Department of Mathematics, University of Toledo, Toledo, OH 43606. The research of Shao was supported in part by National Science Foundation (NSF) grants DMS0404535 and SES-0705033. The research of Zhang was
| |
Collapse
|
17
|
Hu Z, Follmann DA, Qin J. Semiparametric dimension reduction estimation for mean response with missing data. Biometrika 2010; 97:305-319. [PMID: 23049121 DOI: 10.1093/biomet/asq005] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Model misspecification can be a concern for high-dimensional data. Nonparametric regression obviates model specification but is impeded by the curse of dimensionality. This paper focuses on the estimation of the marginal mean response when there is missingness in the response and multiple covariates are available. We propose estimating the mean response through nonparametric functional estimation, where the dimension is reduced by a parametric working index. The proposed semiparametric estimator is robust to model misspecification: it is consistent for any working index if the missing mechanism of the response is known or correctly specified up to unknown parameters; even with misspecification in the missing mechanism, it is consistent so long as the working index can recover E(Y | X), the conditional mean response given the covariates. In addition, when the missing mechanism is correctly specified, the semiparametric estimator attains the optimal efficiency if E(Y | X) is recoverable through the working index. Robustness and efficiency of the proposed estimator is further investigated by simulations. We apply the proposed method to a clinical trial for HIV.
Collapse
Affiliation(s)
- Zonghui Hu
- Biostatistics Research Branch , National Institute of Allergy and Infectious Diseases, National Institutes of Health , Maryland 20892-7609 , U.S.A.
| | | | | |
Collapse
|
18
|
Bianco A, Boente G, González-Manteiga W, Pérez-González A. Estimation of the marginal location under a partially linear model with missing responses. Comput Stat Data Anal 2010. [DOI: 10.1016/j.csda.2009.09.028] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
19
|
|
20
|
Pérez-González A, Vilar-Fernández JM, González-Manteiga W. Asymptotic properties of local polynomial regression with missing data and correlated errors. ANN I STAT MATH 2007. [DOI: 10.1007/s10463-007-0136-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
21
|
González-Manteiga W, Pérez-González A. Nonparametric Mean Estimation with Missing Data. COMMUN STAT-THEOR M 2004. [DOI: 10.1081/sta-120028374] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|