1
|
Tan ALM, Getzen EJ, Hutch MR, Strasser ZH, Gutiérrez-Sacristán A, Le TT, Dagliati A, Morris M, Hanauer DA, Moal B, Bonzel CL, Yuan W, Chiudinelli L, Das P, Zhang HG, Aronow BJ, Avillach P, Brat GA, Cai T, Hong C, La Cava WG, Hooi Will Loh H, Luo Y, Murphy SN, Yuan Hgiam K, Omenn GS, Patel LP, Jebathilagam Samayamuthu M, Shriver ER, Shakeri Hossein Abad Z, Tan BWL, Visweswaran S, Wang X, Weber GM, Xia Z, Verdy B, Long Q, Mowery DL, Holmes JH. Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record? J Biomed Inform 2023; 139:104306. [PMID: 36738870 PMCID: PMC10849195 DOI: 10.1016/j.jbi.2023.104306] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 01/21/2023] [Accepted: 01/29/2023] [Indexed: 02/05/2023]
Abstract
BACKGROUND In electronic health records, patterns of missing laboratory test results could capture patients' course of disease as well as reflect clinician's concerns or worries for possible conditions. These patterns are often understudied and overlooked. This study aims to identify informative patterns of missingness among laboratory data collected across 15 healthcare system sites in three countries for COVID-19 inpatients. METHODS We collected and analyzed demographic, diagnosis, and laboratory data for 69,939 patients with positive COVID-19 PCR tests across three countries from 1 January 2020 through 30 September 2021. We analyzed missing laboratory measurements across sites, missingness stratification by demographic variables, temporal trends of missingness, correlations between labs based on missingness indicators over time, and clustering of groups of labs based on their missingness/ordering pattern. RESULTS With these analyses, we identified mapping issues faced in seven out of 15 sites. We also identified nuances in data collection and variable definition for the various sites. Temporal trend analyses may support the use of laboratory test result missingness patterns in identifying severe COVID-19 patients. Lastly, using missingness patterns, we determined relationships between various labs that reflect clinical behaviors. CONCLUSION In this work, we use computational approaches to relate missingness patterns to hospital treatment capacity and highlight the heterogeneity of looking at COVID-19 over time and at multiple sites, where there might be different phases, policies, etc. Changes in missingness could suggest a change in a patient's condition, and patterns of missingness among laboratory measurements could potentially identify clinical outcomes. This allows sites to consider missing data as informative to analyses and help researchers identify which sites are better poised to study particular questions.
Collapse
Affiliation(s)
| | - Emily J Getzen
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | | | | | | | - Trang T Le
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | | | | | | | | | | | | | | | - Priam Das
- Harvard Medical School, Cambridge, MA, USA
| | | | - Bruce J Aronow
- Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | | | | | - Tianxi Cai
- Harvard Medical School, Cambridge, MA, USA
| | - Chuan Hong
- Harvard Medical School, Cambridge, MA, USA; Duke University, Durham, NC, USA
| | - William G La Cava
- Harvard Medical School, Cambridge, MA, USA; Boston Children's Hospital, Boston, MA, USA
| | | | - Yuan Luo
- Northwestern University, Chicago, IL, USA
| | | | | | | | - Lav P Patel
- University of Kansas Medical Center, United States
| | | | - Emily R Shriver
- University of Pennsylvania Health System, Philadelphia, PA, USA
| | | | | | | | - Xuan Wang
- Harvard Medical School, Cambridge, MA, USA
| | | | - Zongqi Xia
- University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Qi Long
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Danielle L Mowery
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - John H Holmes
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| |
Collapse
|
2
|
Spineli LM, Papadimitropoulou K, Kalyvas C. Pattern-mixture model in network meta-analysis of binary missing outcome data: one-stage or two-stage approach? BMC Med Res Methodol 2021; 21:12. [PMID: 33413138 PMCID: PMC7792003 DOI: 10.1186/s12874-020-01205-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 12/23/2020] [Indexed: 02/01/2023] Open
Abstract
Background Trials with binary outcomes can be synthesised using within-trial exact likelihood or approximate normal likelihood in one-stage or two-stage approaches, respectively. The performance of the one-stage and the two-stage approaches has been documented extensively in the literature. However, little is known about how these approaches behave in the presence of missing outcome data (MOD), which are ubiquitous in clinical trials. In this work, we compare the one-stage versus two-stage approach via a pattern-mixture model in the network meta-analysis using Bayesian methods to handle MOD appropriately. Methods We used 29 published networks to empirically compare the two approaches concerning the relative treatment effects of several competing interventions and the between-trial variance (τ2), while considering the extent and level of balance of MOD in the included trials. We additionally conducted a simulation study to compare the competing approaches regarding the bias and width of the 95% credible interval of the (summary) log odds ratios (OR) and τ2 in the presence of moderate and large MOD. Results The empirical study did not reveal any systematic bias between the compared approaches regarding the log OR, but showed systematically larger uncertainty around the log OR under the one-stage approach for networks with at least one small trial or low event risk and moderate MOD. For these networks, the simulation study revealed that the bias in log OR for comparisons with the reference intervention in the network was relatively higher in the two-stage approach. Contrariwise, the bias in log OR for the remaining comparisons was relatively higher in the one-stage approach. Overall, bias increased for large MOD. For these networks, the empirical results revealed slightly higher τ2 estimates under the one-stage approach irrespective of the extent of MOD. The one-stage approach also led to less precise log OR and τ2 when compared with the two-stage approach for large MOD. Conclusions Due to considerable bias in the log ORs overall, especially for large MOD, none of the competing approaches was superior. Until a more competent model is developed, the researchers may prefer the one-stage approach to handle MOD, while acknowledging its limitations. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-020-01205-6.
Collapse
Affiliation(s)
- Loukia M Spineli
- Midwifery Research and Education Unit (OE 9210), Hannover Medical School, Carl-Neuberg-Straße 1, 30625, Hannover, Germany.
| | - Katerina Papadimitropoulou
- Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands.,Data Science and Biometrics, Danone Nutricia Research, Utrecht, The Netherlands
| | | |
Collapse
|
3
|
Spineli LM, Kalyvas C, Papadimitropoulou K. Continuous(ly) missing outcome data in network meta-analysis: A one-stage pattern-mixture model approach. Stat Methods Med Res 2021; 30:958-975. [PMID: 33406990 PMCID: PMC8209314 DOI: 10.1177/0962280220983544] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Appropriate handling of aggregate missing outcome data is necessary to minimise bias in
the conclusions of systematic reviews. The two-stage pattern-mixture model has been
already proposed to address aggregate missing continuous outcome data. While this approach
is more proper compared with the exclusion of missing continuous outcome data and simple
imputation methods, it does not offer flexible modelling of missing continuous outcome
data to investigate their implications on the conclusions thoroughly. Therefore, we
propose a one-stage pattern-mixture model approach under the Bayesian framework to address
missing continuous outcome data in a network of interventions and gain knowledge about the
missingness process in different trials and interventions. We extend the hierarchical
network meta-analysis model for one aggregate continuous outcome to incorporate a
missingness parameter that measures the departure from the missing at random assumption.
We consider various effect size estimates for continuous data, and two informative
missingness parameters, the informative missingness difference of means and the
informative missingness ratio of means. We incorporate our prior belief about the
missingness parameters while allowing for several possibilities of prior structures to
account for the fact that the missingness process may differ in the network. The method is
exemplified in two networks from published reviews comprising a different amount of
missing continuous outcome data.
Collapse
Affiliation(s)
- Loukia M Spineli
- Midwifery Research and Education Unit, Hannover Medical School, Hannover, Germany
| | - Chrysostomos Kalyvas
- Biostatistics and Research Decision Sciences, MSD Europe Inc., Brussels, Belgium
| | - Katerina Papadimitropoulou
- Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands.,Data Science and Biometrics, Danone Nutricia Research, Utrecht, The Netherlands
| |
Collapse
|
5
|
Spineli LM, Kalyvas C. Comparison of exclusion, imputation and modelling of missing binary outcome data in frequentist network meta-analysis. BMC Med Res Methodol 2020; 20:48. [PMID: 32111167 PMCID: PMC7049189 DOI: 10.1186/s12874-020-00929-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 02/17/2020] [Indexed: 01/01/2023] Open
Abstract
Background Missing participant outcome data (MOD) are ubiquitous in systematic reviews with network meta-analysis (NMA) as they invade from the inclusion of clinical trials with reported participant losses. There are available strategies to address aggregate MOD, and in particular binary MOD, while considering the missing at random (MAR) assumption as a starting point. Little is known about their performance though regarding the meta-analytic parameters of a random-effects model for aggregate binary outcome data as obtained from trial-reports (i.e. the number of events and number of MOD out of the total randomised per arm). Methods We used four strategies to handle binary MOD under MAR and we classified these strategies to those modelling versus excluding/imputing MOD and to those accounting for versus ignoring uncertainty about MAR. We investigated the performance of these strategies in terms of core NMA estimates by performing both an empirical and simulation study using random-effects NMA based on electrical network theory. We used Bland-Altman plots to illustrate the agreement between the compared strategies, and we considered the mean bias, coverage probability and width of the confidence interval to be the frequentist measures of performance. Results Modelling MOD under MAR agreed with exclusion and imputation under MAR in terms of estimated log odds ratios and inconsistency factor, whereas accountability or not of the uncertainty regarding MOD affected intervention hierarchy and precision around the NMA estimates: strategies that ignore uncertainty about MOD led to more precise NMA estimates, and increased between-trial variance. All strategies showed good performance for low MOD (<5%), consistent evidence and low between-trial variance, whereas performance was compromised for large informative MOD (> 20%), inconsistent evidence and substantial between-trial variance, especially for strategies that ignore uncertainty due to MOD. Conclusions The analysts should avoid applying strategies that manipulate MOD before analysis (i.e. exclusion and imputation) as they implicate the inferences negatively. Modelling MOD, on the other hand, via a pattern-mixture model to propagate the uncertainty about MAR assumption constitutes both conceptually and statistically proper strategy to address MOD in a systematic review.
Collapse
Affiliation(s)
- Loukia M Spineli
- Midwifery Research and Education Unit (OE 6410), Hannover Medical School, Carl-Neuberg-Straße 1, 30625, Hannover, Germany.
| | - Chrysostomos Kalyvas
- Department of Biostatistics and Research Decision Sciences, MSD Europe Inc, Clos du Lynx 5, 1200, Brussels, Belgium
| |
Collapse
|
6
|
Spineli LM, Kalyvas C, Pateras K. Participants' outcomes gone missing within a network of interventions: Bayesian modeling strategies. Stat Med 2019; 38:3861-3879. [PMID: 31134664 PMCID: PMC7754380 DOI: 10.1002/sim.8207] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 04/19/2019] [Accepted: 05/03/2019] [Indexed: 12/18/2022]
Abstract
Objectives: To investigate the implications of addressing informative missing binary outcome data (MOD) on network meta‐analysis (NMA) estimates while applying the missing at random (MAR) assumption under different prior structures of the missingness parameter. Methods: In three motivating examples, we compared six different prior structures of the informative missingness odds ratio (IMOR) parameter in logarithmic scale under pattern‐mixture and selection models. Then, we simulated 1000 triangle networks of two‐arm trials assuming informative MOD related to interventions. We extended the Bayesian random‐effects NMA model for binary outcomes and node‐splitting approach to incorporate these 12 models in total. With interval plots, we illustrated the posterior distribution of log OR, common between‐trial variance (τ2), inconsistency factor and probability of being best per intervention under each model. Results: All models gave similar point estimates for all NMA estimates regardless of simulation scenario. For moderate and large MOD, intervention‐specific prior structure of log IMOR led to larger posterior standard deviation of log ORs compared to trial‐specific and common‐within‐network prior structures. Hierarchical prior structure led to slightly more precise τ2 compared to identical prior structure, particularly for moderate inconsistency and large MOD. Pattern‐mixture and selection models agreed for all NMA estimates. Conclusions: Analyzing informative MOD assuming MAR with different prior structures of log IMOR affected mainly the precision of NMA estimates. Reviewers should decide in advance on the prior structure of log IMOR that best aligns with the condition and interventions investigated.
Collapse
Affiliation(s)
- Loukia M Spineli
- Midwifery Research and Education Unit, Hannover Medical School, Hannover, Germany
| | - Chrysostomos Kalyvas
- Department of Biostatistics and Research Decision Sciences, MSD Europe Inc, Brussels, Belgium
| | - Konstantinos Pateras
- Department of Biostatistics and Research Support, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|